Faculty of Language, Literature and Humanities - Corpus Linguistics and Morphology

Architecture & Annotation

Architecture & Annotation of the Falko project

The sub-corpora are annotated on different levels. Falko, unlike most other current learner corpora, has a flexible architecture which allows the user to add new annotation levels at any time, which can also be edited independently (multi-layer stand-off annotation). In addition to the automatically annotated levels for Part-of-Speech (POS) and Lemma (all sub-corpora), we also use specfic annotation levels for the labelling of learner errors:

  • Target hypothesis (currenly annotated in summary and longitudinal corpora): Each kind of error is annotated according to a target hypothesis, which functions as a basis for the annotation.
  • Learner errors are annotated according to the target hypothesis. Alongside the automatic POS annotation, there is also a manually corrected one. In the summary and longitudinal corpora, syntactic fields (document in German) and errors are annotated. We are currently working on the annotation of other error levels.
  • The essay corpus is annotated with both minimal and maximal target hypotheses that have an automatic annotation for part-of-speech and lemma. Differences between the target hypotheses and the learner texts are annotated on another tier. All these features make the data a good foundation for further error annotation.
    For a more detailed description, refer to the Falko handbook.