User Tools

Site Tools

FAQ - Frequently Asked Questions

The Corpus and its Structure

What is the motivation to compile this corpus?
English grammar books have never been investigated diachronically from the beginnings of the genre in the late 16th century to the first modern grammars of the late 19th century, which are the precursors of the extensive grammars of the early 20th century, e.g. Jespersen (1909-1949). Existing studies usually concentrate on one century or single grammarians. The compilation of a corpus of English grammars is thus much desired and long overdue. The corpus will contain all principal works of the genre, and mirror the various subgenres, forms and audiences of the grammar books in rough proportion.
XML markup is TEI-based with project-specific additions. Apart from structural markup, annotation includes references to other grammars and grammarians, linguistic terminology, and evaluative and normative expressions that show the author's stance towards language use (and misuse).
The corpus will serve as the basis for corpus and network analyses with the aim to reveal mechanisms behind and motivations of grammar writing, connections between grammarians, and currents and breaks in the genre.
How were the corpus texts chosen?
There are two main criteria for text selection: (a) popularity and distribution of the grammar books and (b) the variety in function, audience, and text type. Popularity and distribution of the grammars is determined by
  • bibliographic listings of grammar books (e.g. in Michael 1987, Görlach 1998)
  • numbers of editions
  • book catalogues, advertisements, etc.
  • contemporaries' comments, e.g. in literary genres, private letters
  • curricula of schools, colleges, etc.
Is the corpus representative?
The complete corpus will be representative. It will contain all major works of the genre, take the increased number of grammars in the 18th and 19th centuries into account, and mirror the different subgenres, forms and audiences of grammars proportionally.
What are size and structure of the corpus?
The corpus of English grammars will contain approximately 100 grammars (ca. 10 mio. words). Due to the uneven output of grammars per century, the number of selected texts varies as well: From the 16th century, all five preserved English grammars were chosen. The 17th-century-part of the corpus contains 18 grammars. Given the increasing number of works in the following two centuries, we selected 30 grammars from the 18th century and 50 works from the 19th century.

The corpus contains both scholarly grammars and teaching grammars. The prevalent form of grammars is textbooks, followed by treatises and catechisms.

How is the corpus annotated?
See our annotation page.
How is the corpus implemented technically? Which technologies are used?
The grammar books are scanned, and optical character recognition (OCR) is applied, using Abbyy Finereader. The data are text files with XML markup, according to a custom Relax-NG schema based on TEI standards. Images are supplied in png format. The website is based on DokuWiki.
The web-based tools are written in PHP 5 and directly access the XML files.
Is there existing research based on this corpus?
See list of publications. If you work with our corpus, please inform us about your research.
Who is responsible for the corpus?
See our team page.
How will the corpus be developed in the future?
The plan is to compile a representative and balanced XML-annotated corpus of British grammar books of the 16th to 19th centuries. The corpus will contain approximately 10 mio. words. Possible enhancements of the corpus are American grammars of the respective period, and bilingual grammars (English and Latin) from the 16th and 17th centuries.
Are there similar projects?
In his extensive bibliography of 19th-century grammars, Görlach (1998) makes several suggestions of research questions.
Anderwald (2014, 2016) investigates the influence of prescriptive grammars on 19th-century English, focussing, for instance, on the BE-perfect and on the progressive.

  • Anderwald, Lieselotte. 2016. “The progressive as a symbol of national superiority in nineteenth-century British grammars”. Language and Communication 48: 66-78.
  • Anderwald, Lieselotte. 2014. “The decline of the BE-perfect, linguistic relativity, and grammar writing in the 19th century”. in Hundt, Marianne (ed.). Late Modern English Syntax. Cambridge: Cambridge University Press: 13-37.
  • Görlach, Manfred. 1998. An Annotated Bibliography of Nineteenth-Century Grammars of English. Vol. 26. Amsterdam Studies in the Theory and History of Linguistic Science / 5. Amsterdam [et al.]: Benjamins.


To what extent can I use the corpus data? Are there any copyright restrictions?
In general, the data can be freely used under a creative commons licence as long as proper credit is given (see citation remarks). In some cases, copyright restrictions can apply. Please refer to the individual files for details.
How to cite the corpus

Busse, Beatrix, Kirsten Gather, and Ingo Kleiber (eds.). 2015-. HeidelGram. A Corpus of English Grammar Books between 1550 and 1900. Heidelberg: University of Heidelberg, English Department.

@BOOK {,
 editor={Busse, Beatrix and Gather, Kirsten and Kleiber, Ingo},
 title={HeidelGram. A Corpus of English Grammar Books between 1550 and 1900},
 publisher={University of Heidelberg, English Department},
How to access the corpus data
You can either view and download single grammar book files on this website or download the whole corpus in archived form.
Can I access the XML schema file?
Yes, please look at our schema documentation.
Which software can be used to analyse the corpus data?

Feedback and Contribution

How can I give feedback or report a problem?
See our contact page.