Faculty of Language, Literature and Humanities - Corpus Linguistics and Morphology

Design

Design of the Falko corpus

Falko comprises several sub-corpora and is ever-growing.

  • The summary corpus contains text summaries written by advanced learners of German. The methods of data collection and the structure of the corpus are explained here: Falko Kernbeschreibung Englisch.pdf. In addition, there is:
    • An extended corpus with text summaries written by Danish learners of German from Copenhagen,
    • A baseline corpus with texts written by German native speakers
    • The collection of original texts
  • The essay corpus contains essays written by advanced learners. Information about data collection is also given. Currently, we have essays from Adana, Berlin (summer courses at the HU language center), Copenhagen, Mombasa, Nairobi, Nyeri, Tashkent, Stellenbosch, and Turin. The essay corpus is still being expanded. If you would like to help us collect data, we would be happy to hear from you!
  • There is also a baseline corpus with native speaker data available for the essay corpus. (data from high schools in Berlin)
  • The longitudinal corpus contains data collected over several semesters from learners with different proficiency levels in Georgetown University, Washington. The methods and conditions under which the elicitation was done are described here in German. Additionally, there is a comparative corpus of book reviews written by native speakers (Falko Basline comparison). There is a metadata spreadsheet available for the longitudinal corpus.
  • Annis search: Demo video on YouTube