Faculty of Language, Literature and Humanities - German in Multilingual Contexts

Corpus "Deutsch in Namibia"

Straßenzug in Namibia mit mehrsprachigen Schildern: Luisen Apotheke/Pharmacy/Apteek, Otto Mühr & Co.

Photo: Heike Wiese

Multilingual pharmacy sign in Windhoek.

The corpus "German in Namibia" („Deutsch in Namibia“ –DNam) was created in the period 2016-2021, in the DFG project „NamDeutsch: Die Dynamik des Deutschen im mehrsprachigen Kontext Namibias“ ("NamDeutsch: The Dynamics of German in Namibia's Multilingual Context" – WI 2155/9-1 and SI 750/4-1, directed by Heike Wiese and Horst Simon in cooperation with Marianne Zappen-Thomson) at the University of Potsdam (until 2019) and at HU Berlin (since 2019), at the FU Berlin and at UNAM Windhoek.

The corpus documents language use in formal and informal situations and language attitudes within the German minority community in Namibia. The data are available as audio data with aligned and annotated transcriptions, supplemented by metadata on the speakers (biographical data, information on language competence and language use).

More details on the DNam-Corpus.

In addition to the main corpus, there is a supplementary corpus DNam-Wenker, which contains "Wenker" data on Namibian German: Renderings of the 40 classic "Wenker sentences" into Namibian German were collected via an online questionnaire, supplemented by a personal questionnaire on the biographical, social and sociolinguistic data of the speakers.

More details on DNam-Wenker.

Main corpus: DNam

Funding

DFG project „NamDeutsch: Die Dynamik des Deutschen im mehrsprachigen Kontext Namibias“ ("NamDeutsch: The Dynamics of German in Namibia's Multilingual Context"), WI 2155/9-1 and SI 750/4-1.

Access to the corpus

The corpus is freely accessible online via the Datenbank für Gesprochenes Deutsch (Database for Spoken German – DGD).

Here is a short tutorial on how to use DGD, with thanks to Dr. Thomas Schmidt:

Corpus size and sub-copora

Total size: 226 recordings, 18:39 hours, 110 speakers

Elicitation set-up Tokens Duration (hh:mm:ss) Speakers Recordings
Free conversations 115.004 9:15:00 65 21
Speech situations 51.509 4:41:30 103 198
Semi-structured interviews 57.879 4:42:15 15 7
Total 224.392 18:38:45 110 226

The recordings transcribed for the corpus are part of a larger collection of data. The selection criteria were:

  • Balanced sample of set-ups (similar weighting of the three set-ups) and speakers (farmers - urban dwellers; pupils from private and government schools; speakers from different areas of Namibia); preference for speakers who were born in Namibia; broad spectrum in terms of educational level, occupations and age groups.
  • In free conversations, conversations with long and frequent pauses, many meta-linguistic comments, few participants and/or pure discourse on the given topics were not considered.

Data collection

Period of collection: 2017

Recording locations: German-speaking schools, farms, private homes, public spaces in (the vicinity of) Windhoek, Witvlei, Omaruru, Swakopmund and Otjiwarongo

Clicking on a town will show you metadata on the speakers in the DGD publication of the corpus.

Created with Raphaël 2.1.0 Windhoek Swakopmund Witvlei Omaruru Otjiwarongo Otavi

Witvlei

 1015202530354045505560657075024681012141618202224Alter (in Jahren)Anzahl Sprecher:innen

The collection of data took place in three different set-ups:

Generally:
  • 2-5 persons, in the absence of the researchers in an informal context.
  • Topics of classic sociolinguistic interviews (e.g. children's games) and/or conversation about aspects of life relevant to everyday life.
In the case of recordings of pupils:
  • One speaker acted as the discussion leader
  • Short music video (Sundowner by EES, a German-Namibian pop musician) as a conversation stimulus at the beginning; further conversation stimuli e.g. music taste and "stories" of parents, experience with Germans from Germany ("Jerries"), children's games and activities (hunting, dolls, "ketti"/stone-throwing, driving a car), experiences in dangerous situations, differences between boarding school residents ("Heimern") and day pupils ("Städtern").
  • Corpus data captures the conversation after the presentation of the music video, as the participants had become accustomed to the recording situation by this time.

For adult recordings:

  • No specific topic suggestions, but rather the request to talk to each other about current topics (this often led to conversations about everyday topics on the farm).
Stimulus for the LangSit survey method: series of pictures of a car accident

Photo: Anika Kroll-Tjingaete

Stimulus for the LangSit survey method: series of pictures of a car accident.

  • Elicited, naturalistic data in formal and informal registers
  • Recordings in small groups
  • Speakers report in a simulated conversation about a traffic accident to two different interlocutors: a German teacher from school, played by the recording supervisor (formal setting) or a close person present (friend, family member).
  • see LangSit-Webseite for detailed information on the survey method
  • Carried out with 2-3 speakers
  • Impulse questions on language biography, language attitudes, perceptual-dialectological aspects

Speakers

Aufnahmesituation mit Forschenden

Foto: Yannic Bracke

Recording situation with researchers (Christian Zimmer, Heike Wiese).

The metadata of the speakers include:

  • Biographical information: gender, year of birth, occupation; for students: school, place of birth (country, town), information on where they grew up (country, region, place name).
  • Sociolinguistic information: languages of mother and father, languages of parents with each other
Group Number Number (male) Number (female) Age
Children (not pupils) 3 3 0 6, 14, 17
Pupils 81 43 38 14-18, Average: 16 (7 no age stated)
Adults 26 13 13 26-75, Average: 48 (1 no age stated)
Total 110 59 51 6-75, Average: 24

Annotation, transcription, anonymisation/sigla

Annotation levels

The data are available as audio files with annotation and in the form of transcripts. The transcripts have six annotation levels:

  • Transcription level (trans): original transcription level (literary transcription)
  • Tokenised transcription level (trans_tok): division of the transcription into individual tokens
  • Normalised level (norm): transcription according to standard orthography; no modification of non-standard utterances (e.g. in terms of case or genus).
  • Word types/part-of-speech tagging (pos): based on STTS 2.0, supplemented by three corpus-specific tags (ATM: audible breathing, META: double bracket in transcription of paraverbal utterances "((laughs))", SOART: contraction of son and inflectional forms)
  • Lemma level (lemma): word lemma
  • Annotation of contact language tokens (FW): information on donor language, extent of integration and existence of a lexicon entry in the online version of the "Duden" dictionary (2020). This annotation level is not yet available in the DGD, but will be made available in future releases. The annotation guidelines for the contact language tokens can be found here.

The following figure illustrates the transcription levels in an EXMARaLDA transcript:

Screenshot eines EXMARaLDA-Transkripts

Screenshot of a transcript with all transcription levels.

Transcription

Einer der Aufnahmeorte: Eine Farm

Photo: Heike Wiese

One of the recording locations: A farm

The orthographic transcription was done with the score editor of EXMARaLDA (Schmidt, 2016); the annotation guidelines are a slight modification of the cGAT conventions (Schmidt et al. 2015). The annotation guidelines with the deviations from the cGat conventions can be found here. The transcription largely follows the standard orthography, but at the same time captures typical phenomena of spoken language (e.g. elisions, contractions, word breaks, pauses in conversation) as well as paraverbal and non-verbal information. The first versions of the transcriptions were each checked by another team member; deviations were discussed and resolved with the original transcriber. A final check was done by a German-speaking Namibian.

 

Anonymisation and sigla

  • Anonymisation of personal names, specific location information (e.g. farms) as well as all statements that allow conclusions to be drawn about the identity of persons.
  • Masking in the audio files
  • Anonymisation through four types of sigla in the corpus, some of which contain meta-linguistic information

    • Sigla for speakers:
      Speaker ID-No. Gender Age Group
      NAM 006 W 1
       
      • 001 - 2xx

        one number per speaker

      • M

        male

      • W

        female

      • 1

        under 21

      • 2

        21 - 40

      • 3

        41 - 60

      • 4

        over 60

    • Sigla for the researchers (e.g. RES1-RES4)
    • Sigla for individual tokens that have been anonymised: initial letter of the anonymised expression + three-digit number, e.g. N001
    • Sigla for anonymised expressions consisting of several tokens: Phrase „anonymisierte_Äußerung“ ("anonymised_expression") + three-digit number, e.g. anonymisierte_Äußerung001.

Project participants

Mehrsprachige Werbung für einen Weihnachtsmarkt in Namibia

Photo: Heike Wiese

Multilingual advertisement for a Christmas market in Namibia.

PIs: Heike Wiese, Horst J. Simon

Cooperation partners: Marianne Zappen-Thomson, Thomas Schmidt, Hans Boas

Project collaborators: Christian Zimmer, Janosch Leugner, Yannic Bracke, Britta Stuhl, Laura Perlitz

Student assistants: Jones Anam, Christian Anders, Alexandra Fosså, Semra Kizilkaya, Carina Schüffler, Claudia Czarniak, Philipp Klaußner, Jula Kostka, Anika Kroll-Tjingaete, Johanna Pott, Britta Stuhl

Citation

Zimmer, Christian; Wiese, Heike; Simon, Horst J.; Zappen-Thomson, Marianne; Leugner, Janosch; Bracke, Yannic; Stuhl, Britta; Perlitz, Laura, & Schmidt, Thomas: DNam-Korpus zum Deutschen in Namibia.

Literature

Wiese, Heike; Simon, Horst J.; Zimmer, Christian & Schumann, Kathleen (2017). German in Namibia: A vital speech community and its multilingual dynamics. In Péter Maitz & Craig A. Volker (Hg.), Language Contact in the German Colonies. S.221-245.

Zimmer, Christian; Wiese, Heike; Simon, Horst J.; Zappen-Thomson, Marianne; Leugner, Janosch; Bracke, Yannic; Stuhl, Britta, & Schmidt, Thomas (2020). Das Korpus Deutsch in Namibia (DNam): Eine Ressource für die Kontakt- Variations- und Soziolinguistik. Deutsche Sprache 3: 210-232.

Supplementary corpus to DNam: DNam-Wenker

In 2013/14, "Wenker" data on Namibian German was collected via an online platform.

The survey was aimed at Namibian speakers of all ages and served to obtain broad data on specific areas of lexicon and grammar, which, through this classic tool of Germanic dialect research, ensure broad comparability with other and even older studies on dialectal forms in German. In order to reach as many speakers as possible, we developed an online questionnaire with the 40 original "Wenker sentences", supplemented by an introductory text on Namibian-German, the research project and the "Wenker sentences", as well as a personal questionnaire on biographical, social and sociolinguistic data at the end.

You will find the exact wording of the task in the online survey here and the 40 Wenker sentences queried here.

Through extensive media work and dissemination in the German-speaking community via radio, newspapers, church congregations and schools, more than 200 participants were recruited; this covers approximately one percent of the speaker community. For their committed support in disseminating information on the "Wenker" survey, we would like to thank the Delta School and the German Higher Private School (Deutschen Höheren Privatschule – DHPS) Windhoek, Wilfried Hähner from "Hitradio Namibia" and the "Allgemeine Zeitung Windhoek" and the then Bishop of the Evangelical Lutheran Church in Namibia, Bishop Hertel.

The results of the Wenker-Namdeutsch survey are freely available as an Excel spreadsheet under the CC BY 3.0-Licence.

Creative Commons License
Wenker-Namdeutsch by Heike Wiese is licensed under a Creative Commons Attribution 3.0 Germany License.

Documentation

Not all survey participants completed the questionnaire in full. In the corresponding data records, the empty fields are marked with "NA". The information provided by the respondents was transferred over without modification, except for three areas that were changed for data protection reasons:

  1. Information on occupation was removed
  2. The year of birth was given as a period of ten years.
  3. For the question regarding place of residence during the period of the sruvey, the country "Namibia" was given for all Namibian towns. All records with a place of residence outside Namibia were removed from the table. Records with no indication of place of residence were retained.

Collaboration / Support

Heike Wiese, Hans C. Boas, Horst J. Simon, Marianne Zappen-Thomson, Laura Perlitz, Oliver Bunk

Citation

Wiese, Heike (2014): DNam-Wenker. Ein Korpus mit 'Wenker'-Sätzen zum Namibiadeutschen.