Faculty of Language, Literature and Humanities - Corpus Linguistics and Morphology


The Kansas Developmental Learner corpus - a freely available longitudinal learner corpus of beginning to intermediate learners of German as a foreign language

This project set out to collect, annotate, and analyze a longitudinal corpus of university students’ writing in German as a Foreign Language at low proficiency levels. This project contributes to Second Language Acquisition (SLA) research by facilitating analyses of longitudinal language development both in groups and individuals, whereas most of SLA studies have been either cross-sectional or small-scale longitudinal (tracking 1-2 participants over time). Moreover, this project is novel because the corpus data have been collected: 1) from beginning learners of German (as opposed to the majority of studies focusing on intermediate to advanced proficiency levels); 2) from learners with a homogenous native language background (overwhelmingly American English); 3) from learners with an equal type and amount of exposure to the target language (mostly restricted to the classroom and instructional materials); 4) at dense time intervals (3-5 weeks); 5) in response to real-life classroom instruction tasks; and 6) with multiple types of task and learner metadata.


Access to the corpus
The corpus data are freely available and may be downloaded as well as queried via ANNIS. For further information on the corpora of the Falko family, please consult the Falko web page. To query KanDeL in ANNIS, please refer directly to the ANNIS search interface.


Corpus description

KanDeL comprises developmental data collected from US students who enrolled in a basic German language program over four consecutive 16-week-long semesters at the University of Kansas (KU) and agreed to participate in this research. This instructional program completes the foreign language requirement for certain majors at KU, a large public US university. The writing samples are rough drafts of essays written by the students in response to curricular tasks every three to five weeks during each semester. The genres are personal narratives and personal accounts with argumentative tasks added at later time points. All learner texts have been tokenized, lemmatized, and automatically annotated for parts-of-speech. Next, they were manually annotated for target hypotheses by multiple annotators (see Falko-Handbuch). Finally, the target hypothesis layer of the corpus was also automatically lemmatized and annotated for parts-of-speech.
For further information, please also refer to Vyatkina (2016) (reference below).


Project duration
Corpus collection: January 2008 – December 2011

Corpus annotation: January 2012 – August 2013

Corpus release: 2015

Corpus analysis: Ongoing


Principal investigator
Nina Vyatkina (University of Kansas)


Co-investigators (Humboldt-Universität zu Berlin)

Hagen Hirschmann

Felix Golcher

Marc Reznicek


Research assistants / annotators

Michael Dehaven

Michael Gruenbaum

Emily Hackmann

Melanie Piltingsrud


Partner Universities
University of Kansas, Department of Germanic Languages and Literatures
Humboldt-Universität zu Berlin, Korpuslinguistik und Morphologie



Vyatkina, N. (2016). KANDEL: A developmental corpus of learner German. International Journal of Learner Corpus Research, 2(1), 102-120.

Vyatkina, N., Hirschmann, H., & Golcher, F. (2015). Syntactic modification at early stages of L2 German writing development: A longitudinal learner corpus study. Journal of Second Language Writing, 29, 28-50.

Vyatkina, N. (2013). Specific syntactic complexity: Developmental profiling of individuals based on an annotated learner corpus. Modern Language Journal, 97(s1), 11-30.

Vyatkina, N. (2013). Analyzing part-of-speech variability in a longitudinal learner corpus and a pedagogic corpus. In S. Granger, G. Gilquin, & F. Meunier (Eds.), Twenty years of learner corpus research: Looking back, moving ahead. Corpora and Language in Use - Proceedings 1 (pp. 479-491). Louvain-la-Neuve, Belgium: Presses universitaires de Louvain.

Vyatkina, N. (2012). The development of second language writing complexity in groups and individuals: A longitudinal learner corpus study. Modern Language Journal, 96(4), 572-594.



Vyatkina, N., Hirschmann, H., & Golcher, F. (2016, July). Instructed second language acquisition and longitudinal learner corpus research: The case of lexical and syntactic complexity. Teaching and Language Corpora (TaLC), Giessen, Germany.

Vyatkina, N., Hirschmann, H., & Golcher, F. (2014, November). Syntactic modification at early stages of L2 German writing development: A longitudinal learner corpus study. Colloquium for Language and Literacy Development across the Life Span (LANSPAN), University of Groningen, the Netherlands.

Vyatkina, N., Hirschmann, H., & Golcher, F. (2014, October). The acquisition of modifiers in German as a foreign language: A longitudinal corpus study. Colloquium for Corpus Linguistics, Humboldt-Universität zu Berlin.

Vyatkina, N., & Reznicek, M. (2013, March). L2 complexity as syntactic modification in a developmental L2 German corpus. American Association of Applied Linguistics (AAAL), Dallas, TX.

Vyatkina, N. (2012, June). Digital resources for L2 research and teaching: An annotated longitudinal corpus of learner German. Computer Assisted Language Instruction Consortium (CALICO), Notre Dame, IN.

Vyatkina, N. (2011, September). Analyzing part-of-speech variability in a longitudinal learner corpus and a pedagogic corpus. Learner Corpus Research (LCR) conference, Université Catholique de Louvain, Louvain-la-Neuve, Belgium.


Funded by:

Institute for Digital Research in the Humanities, University of Kansas

General Research Fund, University of Kansas

Language Learning Small Grants Research Program

Fulbright Scholar Program

The German-American Fulbright Commission