Sprach- und literaturwissenschaftliche Fakultät - Korpuslinguistik und Morphologie

Crosslingual Language Varieties

The overarching goal of this project is to investigate the characteristics of language production that is influenced by the existence of another linguistic system.

CLV Logo Farbig 6 für kleines FeldIn what ways do languages in the individual compete? Why can adequately trained algorithms tell apart translated texts from monolingual ones? How come we know right away a text was produced by a language learner (and not a functional illiterate or a child, for instance), and sometimes even their native language?

What unites Crosslingual Language Varieties is the attempt at producing utterances in what is commonly seen as one language when two or more linguistic systems are at the disposal of the speaker/writer. This happens for example in language learners, translators, as well as in heritage speakers.

We are investigating the differences and commonalities of different CLVs, using both computational methods on large and small corpora, as well as psycholinguistic ones. While Hebrew, German and English will be our target systems, there will be a multitude of languages taking the place of the “competitor”.


Approaching different CLVs uniformly, but with different methods, we are not only trying to identify characteristics, but also to figure out what cognitive factors may drive the use of patterns that turn out to be typical for one or all CLVs. In order to achieve this, we need to assess our data in a way that takes into account not only their respective technical differences, but considers comparability across all dimensions, especially concerning proficiency levels. The methodological challenges this raises are part of our endeavor to answer fundamental questions of what is showing where why.

Research Questions

  • Which characteristics are common to the various CLVs and distinguish them from native language?
  • What properties distinguish CLVs from each other?
  • Are such differences language pair specific or "universal"?
  • What are the cognitive and computational processes that support CLVs?
  • Which circumstances lead to similarities and differences between individuals and language settings?

Resources we use

Learner corpora:

Translation corpora: