Humboldt-Universität zu Berlin - Sprach- und literaturwissenschaftliche Fakultät - Korpuslinguistik und Morphologie

Kurz-AG Encoding language and linguistic information in historical corpora

39. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft 2017 (DGfS)

Conference date: 08.-10.03.2017

Conference venue: Saarbrücken, Germany

Conference Homepage: http://dgfs2017.uni-saarland.de/

AG 4 Short session Homepage: http://dgfs2017.uni-saarland.de/wordpress/en/sessions/ag-4/


Working group: Encoding language and linguistic information in historical corpora

Historical corpora have been established as an empirical digital base for various types of linguistic studies. The corpora are based on texts (sometimes images) and often require special information encodings, e.g. transcription and normalization. With respect to corpus linguistics as a method, annotating a (historical) corpus is always a matter of interpretation, either of its structure or of its content, and need not be universally consensual. Additionally, annotations have to balance between a diplomatic representation of historical texts and its linguistic analysis. This requires a linguistic modelling of annotations to develop (i) annotation guidelines, standardized and customized ones, (ii) annotation concepts, such as spans, trees or graphs, (iii) annotation assignment methods, and (iv) corpus architectures.

This working group would like to ask which methods of annotation have proven successful in order to address the balancing of historical diplomatic representation and linguistic analyses in historical, corpus-linguistic studies. Additionally, we would like to learn from cases, where common linguistic annotations are not sufficient for the structured exploration of the historical corpus data, and where new approaches address these requirements.


Invited Speaker: Prof. Dr. Mathilde Hennig (Justus-Liebig-Universität Gießen, Homepage)


This workshop would like to bring together linguists interested in and using historical corpora, corpus linguists, and computational linguists.


Program AG 4

Universität des Saarlandes, building B 3.1, room 0.12
Thursday, March 9th, 2017

11:15 – 12:15

Mathilde Hennig
Basic categories in multi layered grammatical annotation (Abstract)


12:15 – 12:45

Svetlana Petrova
Particle verb constructions in historical German and what corpus studies reveal about them (Abstract)


12:45 – 13:45

Lunch break

13:45 – 14:15

Lisa Dücker, Stefan Hartmann & Renata Szczepaniak
Annotating a multiregional diachronic corpus of Early New High German handwritten texts (Abstract)

Friday, March 10th, 2017
11.30 - 12.00

Maarten Janssen

TEITOK: Combining language and linguistic information without compromise (Abstract)

12:00 – 12:30

Zarah Weiß & Gohar Schnelle
Annotation of an Early New High German Corpus: The LangBank Pipeline (Abstract)

12:30 – 13:00

Cătălina Mărănduc, Cenel-Augusto Perez, Ludmila Malahov & Alexandru Colesnicov
A diachronic corpus for Romanian (RoDia) (Abstract)



13:00 – 13:30

Katrin Goldschmidt
Development and annotation of a newspaper corpus as part of a doctoral thesis on text structure and cohesion in news items from the 17th and 18th centuries (Abstract)

13:30 – 14:00

Nicoletta Puddu
Encoding sociolinguistic variables in a corpus of Medieval Sardinian texts (Abstract)



We are looking forward to seeing you at the DGfS 2017!


Kerstin Eckart and Carolin Odebrecht

Kerstin Eckart
Universität Stuttgart
Institut für Maschinelle Sprachverarbeitung

Pfaffenwaldring 5b, D-70569 Stuttgart

Carolin Odebrecht
Humboldt-Universität zu Berlin
Korpuslinguistik und Morphologie

Unter den Linden 6, D-10099 Berlin