====== HGSimpleCorpusNetwork ====== HGSimpleCorpusNetwork is a tool written in Python which can be used to do batch frequency analyses on corrupted corpus data. Given a set of search terms and a set of text files, the script will generate an adjacency matrix, a gexf file, and a graphml file linking the search terms to the texts. In order to account for corrupted data (i.e. OCR-corrupted data), the search algorithm supports [[https://en.wikipedia.org/wiki/Levenshtein_distance|levenshtein distances]] and [[http://collaboration.cmc.ec.gc.ca/science/rpn/biblio/ddj/Website/articles/DDJ/1988/8807/8807c/8807c.htm|gestalt pattern matching]] in order to also recognise similar (i.e. distorted) tokens. HGSimpleCorpusNetwork is **available freely under an MIT-License** on [[https://github.com/heidelgram/HGSimpleCorpusNetwork|GitHub]].