Faculty of Language, Literature and Humanities - Corpus Linguistics and Morphology

Dr. Anna Shadrova

Postdoc (corpus linguistics, quantitative methodology & epistemology of quantitative text analysis, sla & interlanguage, variation, lexicosyntax)

Contact Information

Dorotheenstraße 24
room 3.339
10117 Berlin - Mitte
Tel.: 030 2093 9774
anna [dot] shadrova [ät] hu-berlin [dot] de
mailing address:
c/o Institut für deutsche Sprache und Linguistik
Sprach- und Literaturwissenschaftliche Fakultät
Humboldt-Universität zu Berlin
Unter den Linden 6
D-10099 Berlin

Research Interests

Corpus linguistics, esp. methodology, data modeling, nlp and quantitative methods in small and medium sized corpora (SMISC); SLA, native-like selection, linguistic variation, register

Current Projects

Contrastive corpus methodology and language modeling and analysis

Workshop at the 43rd annual meeting of the German Linguistic Society in Freiburg, 24.-26. Februar 2021. With Martin Klotz und Anke Lüdeling. Details and presentations: https://www.linguistik.hu-berlin.de/de/institut/professuren/korpuslinguistik/events/kurz-ag-msc

Leibniz project on linguistic developments in German Federal Constitutional Court decisions

I work as a Corpus and Computational Linguist and Researcher in Prof. Dr. Christoph Möller's Leibniz project at the Humboldt University Faculty of Law analyzing linguistic developments in German Federal Constitutional Court decisions based on a longitudinal corpus reaching back to the beginning of the GFCC in 1951.

Current work includes the modeling of complex corpus data in a graph-based corpus architecture (text as graph); the development of an epistemologically well-rooted employment of topic modeling in text-based research; an analysis of thematic distributions by types of proceeding in the jurisdiction of the Court; canonization and citation practice of the Court.

Other relevant topics include NLP, data modeling, quantitative linguistics, stilometry, Pattern Recognition, Network Analysis, Information Retrieval in formalized language, formalization as a linguistic property, linguistic formalization of formalized language on syntactic and semantic levels.


My dissertation "Measuring coselectional constraint in learner corpora: A graph-based approach" has recently been published: http://edoc.hu-berlin.de/18452/22356

It investigates the structural development of coselectional constraint (~collocation, idiomaticity, the idiom principle) in the use of verb-argument structures in learners at different stages of acquisition. The study is based on essays written by L1-Chinese and L1-Belarusian/Russian learners of German collected by Netzwerk Kobalt-DaF.

The research question was whether it is possible to measure the nativelikeness of coselectional constraint in small to medium-sized corpora and whether there is a process of restructuring with an increase in coselectional constraint with increasing target language ability; and whether there is an intermittent decrease of coselectional constraint at intermediate stages, i.e. a u-shaped learning development.

I analyzed the data in a graph-based approach making use of Louvain modularity (Blondel et al. 2008). An increase in modularity is observable in both learner groups, but a u-shaped development was only found in Belarusian learners. This is discussed from typological, cultural, and cognitive perspectives.

The thesis further discusses the lack of theoretical embedding of coselection in usage-based linguistics, the low explanatory power of the much-presumed "phraseological continuum", the inadequacy of statistical measures of lexical association for the evaluation of coselectional constraint in corpora from a linguistic and a more mathematical perspective, and makes suggestions to the incorporation of graph-based methods in lexical and lexicosyntactic research.

Supervised by Prof. Dr. Anke Lüdeling and Prof. Dr. Amir Zeldes (Georgetown University, Washington, D.C.), defended summa cum laude (10.7.20). Graciously funded through a BMBF scholarship granted by the Hans Böckler Foundation (2014-2018) and a Research Track scholarship granted by the Humboldt Graduate School (2013).

Keywords: Corpus linguistics, second language acquisition, formalization of usage-based linguistics, methodology of quantitative linguistics in small and medium-sized corpora, graph-based corpus methods, validation


Blondel, Vincent D; Guillaume, Jean-Loup; Lambiotte, Renaud; Lefebvre, Etienne (9 October 2008). "Fast unfolding of communities in large networks". Journal of Statistical Mechanics: Theory and Experiment. 2008 (10): P10008. arXiv:0803.0476.


RUEG - Research Unit Emerging Grammar

I am associated with the Research Unit Emerging Grammars that investigates emerging grammars in situations of internal and external language contact in monolingual speakers and in the majority and heritage language of bilingual speakers. My research interests in this context lie in the development of quantitative methods for small to medium-sized corpora, such as the employment of graph metrics and network analysis in core-linguistic research, Bayesian vs. complex frequentist statistics (mixed-effect modeling in particular), the application of machine learning techniques for the advancement of knowledge and information retrieval through introspection; and their optimization for smaller data.


DALeKo - Dokumentation und Analyse von Lernersprache

In this project run by the Arbeitskreis Fremdsprachendidaktik (working group on foreign language teaching) of English, Romance and Slavic Studies at the Humboldt University, a corpus of student-written essays in four school-taught languages (English, French, Russian and Spanish) was compiled. At this point, Russian texts elicited in school and university contexts are available with pos and lemma annotations through the  Annis³ search engine and interface. Due to legal restrictions, the data is only available after registration. Please contact Prof. Dr. Anka Bergmann for further information.

INDUS Research Group on Individualized Language Learning and Approaches from Language Technology


Learn more here.


Talks & Publications


[Paper] Anna Shadrova (in prep.): Topic models do not model topics: Epistemological remarks and steps towards best practices.

[Paper] Anna Shadrova (in prep.): More than the sum of their parts: Meaning and content cannot be reliably quantified from counting words alone.

[Peer-reviewed paper] Wendel, Luisa, Anna Shadrova & Alexander Tischbirek (submitted): From Modeled Topics to Areas of Law: A Comparative Analysis of Types of Proceedings in the German Federal Constitutional Court.

[Peer-reviewed paper] Lüdeling, Anke, Hagen Hirschmann, Anna Shadrova & Shujun Wan (2021): Tiefe Analyse von Lernerkorpora. In H. Lobin, A. Witt & A. Wöllstein (Ed.), Deutsch in Europa (pp. 235-284). Berlin, Boston: De Gruyter. https://doi.org/10.1515/9783110731514-013

[Peer-reviewed proceedings paper] Shadrova, A. (in press): It may be in the structure, not the combinations: Graph metrics as an alternative to statistical measures in corpus-linguistic research. DhD Graph Proceedings 2019/2020.

[Peer-reviewed Paper] Ighreiz, A., C. Möllers, L. Rolfes, A. Shadrova & A. Tischbirek (in press): Karlsruher Kanones: Selbst- und Fremdkanonisierung der Rechtsprechung des Bundesverfassungsgerichts. Archiv des öffentlichen Rechts.

[Dissertation] Shadrova, Anna (2020): Measuring coselectional constraint in learner corpora: A graph-based approach. Univ.-Dissertation: Humboldt-Universität zu Berlin. http://dx.doi.org/10.18452/21606.

[Peer-reviewed Paper] Lüdeling, Anke; Hirschmann, Hagen & Shadrova, Anna (2017) Linguistic Models, Acquisition Theories, and Learner Corpora: Morphological Productivity in SLA Research Exemplified by Complex Verbs in German. Language Learning Special Issue on Language learning research at the intersection of experimental, corpus-based and computational methods: Evidence and interpretation 67 (S1),  96-129.

[Chapter] Thomas, E. M., Cantone, K. F., Davies, S., & Shadrova, A. (2014). Cross-linguistic influence and patterns of acquisition: The emergence of gender and word order in German-Welsh bilinguals. In: E. M. Thomas and I. Mennen (Eds.): Advances in the Study of Bilingualism, p. 41-62. Clevedon: Multilingual Matters.

[Masterarbeit] Shadrova, A. (2013): Mehr Chunks! – Entwicklungsperspektiven für die Konstruktionsgrammatik unter Einbeziehung von Phraseologie, Psycholinguistik und L2-Erwerbsforschung. Online publiziert auf dem HU-Edocs-Server (hier).



[Corpus, scripts and data from analysis] Shadrova, Anna (2020): Extended Kobalt-DaF corpus, scripts for pre-processing and analysis, extracted lexicosyntactic graphs (JSON), and R-plots from PhD thesis and beyond: https://doi.org/10.5281/zenodo.3584091

[Corpus] Möllers, Christoph, Anna Shadrova & Luisa Wendel (2021): BVerfGE-Korpus 1.0. Mit freundlicher Unterstützung des Mohr-Siebeck-Verlags. https://doi.org/10.5281/zenodo.4551408

[Data from analysis] Ighreiz, Ali, Christoph Möllers, Lous Rolfes, Anna Shadrova & Alexander Tischbirek (2021): Karlsruher Kanones? Netzwerke, Tabellen und Analyseplots. https://doi.org/10.5281/zenodo.4464810

Talks & Workshops

[Public defense] Shadrova, Anna (2020): Interlanguage-Effekte in L1 und L2: Eine graphbasierte lexikosyntaktische Betrachtung anhand geschriebener Korpusdaten aus Falko und RUEG, HU Berlin, 10.07.2020.

[Talk] Shadrova, Anna (2020): No free lunch: Ob und wie Topic Modeling und andere probabilistische Informationsexktraktionsverfahren zum Erkenntnisgewinn genutzt werden können. Korpuslinguistisches Kolloquium, HU Berlin, 08.07.2020.

[Conference talk] Shadrova, Anna (2020): Graph metrics as an alternative to statistical
measures in linguistic research. Graph Technologies in the Digital Humanities 2020, Wien, 21.02.2020.

[Talk] Shadrova, Anna (2020): Korpuslinguistische Modellierung juristischer
Fragen in einem Korpus von BVerfG-Entscheidungen. Korpuslinguistisches Kolloquium, HU Berlin, 22.01.2020.

[Talk] Shadrova, Anna (2019): Individuelle Varianz und Textlängeneffekte:
Wie geht Sampling in Lernerkorpora? Korpuslinguistisches Kolloquium, HU Berlin, 05.06.2019.

[Talk] Lüdeling, Anke & Anna Shadrova (2020): Forschungsfragen, Modelle, Auswertung. Möglichkeiten und Grenzen der korpusgestützten Textanalyse. Workshop "Methoden quantitativer Textanalyse", Berlin, 21.11.2019.

[Talk] Tischbirek, Alexander & Anna Shadrova (2020): Karlsruher Kanones? Selbst- und Fremdkanonisierung der Rechtsprechung des BVerfG. Workshop "Methoden quantitativer Textanalyse", Berlin, 21.11.2019.

[Conference talk] Shadrova, Anna (2019): U-shaped learning of verb argument
coselection in learners of German. Learner Corpus Research 2019, Warschau, 13.09.2019.

[Talk] Shadrova, Anna (2018): Lernerkorpora: Mehrebenenannotation und Zielhypothesen als Such- und Analysewerkzeug. Workshop "Von Lernerdaten zu Lernerkorpora", Schloss Rauischholzhausen, 12.07.2018.

[Talk] Shadrova, Anna (2017): Korpuslinguistische Kollokationsanalyse als Trendscout-Analyse zum Förderprogramm „Industrielle Gemeinschaftsforschung – IGF“. Vortrag beim IGF-Arbeitstreffen am BWMI, 04.10.17.[Vortrag] Shadrova, Anna (2017): Lexikalische Assoziatiosmaße und Idiomatizität: Eine Problemskizze anhand von Lernerdaten aus dem Kobalt-Korpus. Korpuslinguistisches Kolloquium, HU Berlin, 24.05.2017.

[Conference talk] Shadrova, A. (2015): Learners know their German: Statistical similarities of surface features in German L1 and L2 essays. International Symposium on Bilingualism 10, 24.05.2015.

[Talk] Shadrova, Anna & Anke Lüdeling (2015): Individuelle Differenzen in Lernerdaten. INDUS-Netzwerktreffen, Universität Duisburg-Essen.

[Talk] Shadrova, Anna (2014): "Kobalt-E: Erste Ergebnisse". Netwerk Kobalt-DaF. Arbeitstreffen in Tübingen, 04.11.14. Folien


[Workshop] Krause, Thomas & Anna Shadrova (2016) Korpus III: Einführung in die Annis-API mit Python. Linguistischer Methodenworkshop 2016, Institut für deutsche Sprache und Linguistik. Humboldt-Universität zu Berlin, 23.02.2016.

[Workshop] Shadrova, Anna & Thomas Krause (2016) Korpus II: Frequenzanalyse, Dependenzen, Metadatensuche mit Annis. Linguistischer Methodenworkshop 2016, Institut für deutsche Sprache und Linguistik. Humboldt-Universität zu Berlin, 23.02.2016.




winter 17/18

Models of Grammatical Description
Erasmus students and students from similar programs

(Seminar: Modelle grammatischer Beschreibung)

Methods in Linguistics
Erasmus students and students from similar programs

(Übung: Methoden der Linguistik)

Learner German and Hood German
B.A. German Studies/Germanic Lingustics

(Seminar: Lernerdeutsch und Kiezdeutsch=

summer 17

Grammar of German
B.A. German Studies/Germanic Lingustics/Historical Linguistics

(Übung Deutsche Grammatik)

summer 16

Intro to Natural Language Processing with Python
B.A. German Studies/Germanic Linguistics/Historical Linguistics; M.A. Linguistics

winter 15/16

Intro to Linguistics
B.A. German Studies/Germanic Linguistics/Historical Linguistics

winter 14/15

Grammatical and Textual Regularities of Internet Language
B.A. German Studies/Germanic Linguistics

(Grammatische und textbezogene Regularitäten der Internetsprache, Modul "Text und Diskurs I")

summer 14

Grammar of German
B.A. German Studies/Germanic Lingustics/Historical Linguistics

(Übung Deutsche Grammatik)

winter 13/14

Grammar of German
B.A. German Studies/Germanic Lingustics/Historical Linguistics

(Übung Deutsche Grammatik)