Sprach- und literaturwissenschaftliche Fakultät - Korpuslinguistik und Morphologie

DGfS workshop “Contrastive Corpus Methodology for Language Modeling and Analysis”

The workshop will taketook place as part of the 43. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft (43rd Annual Meeting of the German Linguistic Society) in Freiburg, February 24 to 26 2021.

Find all information on the conference here:

Due to the SARS-CoV-2 pandemic, the conference will bewas held as an online event.

Workshop Description

As we try to understand and empirically investigate language, a wide range of methods are at our disposal and many decisions are to be made. One of the first decisions in the process is the amount of data we include in our corpora or the depth of annotation. We often sacrifice more data for deeper, manually obtained linguistic annotation and prefer a richer and more explicit description of language over shallow but larger data sets. This is sometimes perceived as not so much the outcome of a conscious decision in favor of analytical depth, but as a compromise we are forced to make due to restricted temporal, human, and financial resources.

However, research indicates that when dealing with language, gathering more data does not necessarily result in more insight. In fact, statistical analysis might not always yield better results just because it yields different ones for bigger data. In addition, when we attempt a combination of established methods of statistical analyses with complex and adequate linguistic models in corpora, we still encounter limitations of sample size and thus we can easily find ourselves lost in front of the tool cabinet. In the same way that theoretical linguistics has been in an ongoing and productive debate around the virtues of varying syntactic models such as for instance constraint vs. phrase structure grammars (and combinations thereof), corpus linguistics requires an informed discussion about the virtues and limitations of different models for each linguistic phenomenon. With adequate and meaningful models, fewer data may yield more satisfactory results than larger datasets that only provide shallow linguistic annotation.

How can we understand the limitations of the tools that we have at our disposal and develop new models, methods, measures, or frameworks that fit the linguistic needs of our analyses? And what is the influence of the theoretical model in analysis, how does it affect our results? This workshop encourages discussions of methods dealing with small and mid-sized corpora as a resource for linguistic analysis rooted in in-depth theoretical modelling. It addresses linguists working empirically on all linguistic levels, corpus, and computational linguists, as well as statisticians.


Martin Klotz
Anna Shadrova
Anke Lüdeling

contact: dgfs2021.ccmlma(at)lists.hu-berlin.de

Invited Speaker

We are happy to welcome Wander Lowie from the University of Groningen as our invited speaker. Prof. Lowie has contributed a wide range of research to the areas of L2 acquisition, variability, dynamic systems and usage-based dynamics, and quantitative modeling in linguistic domains as diverse as phonology, lexical acquisition, language assessment, and conceptual representations.

Speakers and Titles

Opening presentation and frame

Anna Shadrova, Martin Klotz, Anke Lüdeling: Linguistic Modeling and Analysis


Felix Bildhauer, Elisabeth Pankratz, Roland Schäfer: Corpora, Inference, and Models of Register

Natalia Levshina: A comparison of frequentist and Bayesian models of language variation – the problems of priors and sample size

Wander Lowie, Keynote Speaker (Groningen): The group and the individual: complementary dimensions of language development

Giuseppe Samo: Machine Learning and syntactic theory focus on German and German varieties

Christof Schöch, Julia Dudar, Cora Rok, Keli Du: Deviation of Proportions as the Basis for a Keyness









Contributions to the workshop may cover, but are not limited to the following topics:

  • new and/or comparative methods for data analysis within and beyond statistical frameworks
  • effects of different data sizes and data partitions for linguistic analyses
  • a contrastive perspective of modelling decisions and results
  • the influence of linguistic models in data modelling decisions and means of analysis
  • recent trends in linguistic data analysis


Authors should submit 1 page abstracts (including references) in a 12 point font (e.g. Times New Roman) to


References should be formatted according to the APA guidelines. Talks will be given 30 or 60 minute slots including discussion, depending on the program. Please specify your preferred length in your submission. The workshop language is English for both abstracts and talks. According to DGfS regulations, speakers can only present a paper in one workshop.

Important dates

  • submission of abstracts: 15.09.2020
  • notification of acceptance:  01.10.2020
  • workshop: 24.–26.02.2021