Sprach- und literaturwissenschaftliche Fakultät - Korpuslinguistik und Morphologie

Corpus Linguistics

General Information

Teacher: Anke Lüdeling, anke.luedeling@rz.hu-berlin.de

Level: Basic

Prerequisites: Basic background in linguistics, no background in computational linguistics or corpus linguistics required

Credits: 3

Requirements for credits:

  • Regular attendance.
  • Mini-projects (done in groups) which will be presented orally (10 minutes). Additionally a short written summary is required (5 pages).

Background reading: McEnery, Tony & Wilson, Andrew (2001) Corpus Linguistics. Edinburgh University Press, Edinburgh (2nd edition)

Course Plan

Date Content References & Slides
Aug 15, 200 different kinds of linguistic data: introspection, psycholinguistic and neurolinguistic experiments, field data etc.: where does corpus data fit in? for which research question can corpus data be used?
brief overview over the history of corpus linguistics
McEnery, Tony & Wilson, Andrew (2001) Corpus Linguistics. Edinburgh University Press, Edinburgh (2nd edition

Kepser, Stephan & Reis, Marga (eds) (2005): Linguistic Evidence. Empirical, Theoretical, and Computational Perspectives. Mouton de Gruyter, Berlin
Aug 17, 2006 corpus design
pre-processing 1: tokenizing
Aug 19, 2006 pre-processing 2: pos-tagging

Garside, Roger; Leech, Geoffrey & McEnery, Tony (eds) (1997) Corpus Annotation: Linguistic Information from Computer Text Corpora. Addison Wesley Longman, New York

Leech, Geoffrey (1993) Corpus Annotation Schemes. In: Literary and Linguistic Computing 8(4), 275 - 281
Aug 22, 2006 pre-processing 3: syntactic annotation, phonetic/phonological annotation, sense tagging

Aug 24, 2006 case study 1: corpora in language teaching, learner corpora
Manning, Christopher & Schütze, Hinrich (1999) Foundations of Statistical Natural Language Processing. MIT Press, Cambridge MA

Baroni, Marco (to appear) Distributions in text. In Anke Lüdeling and Merja Kytö (eds.) Corpus linguistics: An international handbook, Mouton de Gruyter, Berlin. Available online at: http://sslmit.unibo.it/~baroni/research.html

Granger, Sylviane (2002) A bird's-eye view of learner corpus research. In: Granger, Sylviane; Hung, Joseph; Petch-Tyson, Stephanie (eds, 2002) Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching. John Benjamins, Amsterdam, 3-33

Nesselhauf, Nadja (2004) Learner corpora and their potential for language teaching. In: Sinclair, John (ed, 2004) How to Use Corpora in Language Teaching. John Benjamins, Amsterdam, 125-152

Aug 26, 2006 case study 1, continued: quantitative and qualitative evaluation of learner corpora (contrastive interlanguage analysis, error tagging)
slides Aug 24 & Aug 26
references learner corpora
Aug 29, 2006 case study 2: corpora from the Web/Web as corpus

Lüdeling, Anke; Evert, Stefan & Baroni, Marco (to appear 2006) Using Web data for linguistic purposes. In Marianne Hundt, Caroline Biewer and Nadja Nesselhauf (eds.), Corpus linguistics and the Web. Amsterdam: Rodopi.

Baroni, Marco & Bernardini, Silvia (2004) BootCaT: Bootstrapping corpora and terms from the web. Proceedings of LREC 2004, Lisbon: ELDA, 1313-1316. Available online at: http://sslmit.unibo.it/~baroni/research.html

Aug 31, 2006 presentation of mini-projects
final discussion

Roman Sigg & Vanessa Shokeir: Building a corpus of Swabian
Monika Schulz: Tagging problems in non-standard data
Nicola Brocca & Ellen Rupprecht: Tagging problems in non-standard data: well and that

Further References

  • Carstensen, Kai-Uwe et al. (2004) Computerlinguistik und Sprachtechnologie. Eine Einführung. 2. überarbeitete und erweiterte Auflage. Elsevier/Spektrum Akademischer Verlag, München
  • Jurafsky, Daniel S. & Martin, James H. (2000) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall, Upper Saddle River, NJ
  • Lemnitzer, Lothar & Zinsmeister, Heike (2006): Korpuslinguistik. Eine Einführung. narr studienbücher. Tübingen: Gunther Narr Verlag.
  • Mitkov, Ruslan (ed) (2003) The Oxford Handbook of Computational Linguistics. Oxford University Press, Oxford