Linguistic dynamics in the Greater Tunis Area: a corpus-based approach

Principal investigator: Univ.-Prof. Dr. Stephan Procházka
Research affiliate: Dr. Karlheinz Mörth (ÖAW)
Project team members: Mag. Ines Dallaji, Ines Gabsi BA, Ines Ben Brahim BA, Mag. Omar Saim
Duration: 01.08.2013–31.07.2016
Funding: FWF
Granted sum: EUR 247.177,–



The majority of publications on the dialect of the Tunisian capital focus on sociolinguistics, phonological and morphological issues. In-depth studies on syntax are very scarce and there is no up-to-date dictionary available that is based on authentic spoken data. There are also very few relevant studies dedicated to the linguistic dynamics caused by recent demographic changes in the metropolitan area of Tunis. Today, the variety of the Arabic spoken by most inhabitants of the city has become a koiné that has not only spread to the vicinity of the city but is widely used throughout Tunisia.

The project is carried out in co-operation with the Institute for Corpus Linguistics and Text Technology at the Austrian Academy of Sciences and focuses on contemporary language. We will therefore strive to gather data from field recordings made with young speakers who have grown up in the city of Tunis but descend from parents who for the most part had come to the capital from other regions. As part of the project, we will create two digital language resources: (1) a corpus of unmonitored speech that will contain both conversations and narratives and (2) a dictionary based on this corpus and on previously published resources.

Hitherto, no digital corpora for Arabic dialects have been made available that provide both linguistic transcriptions and translations. Besides serving as the primary source for the planned dictionary, the corpus will be used to investigate a number of selected topics dealing with the morphology and syntax of contemporary Tunis Arabic. As for the dictionary, it will not only contain all the lexicographic data of the corpus.  Two additional sources are to be incorporated as well: data elicited from complementary interviews with young Tunisians and lexicographical material compiled from various published sources. The diachronic nature of the dictionary—the exploited printed sources also contain material from the middle of the 20th century and earlier—will enable us to analyze the linguistic dynamics in the realm of the lexicon as well.

The project has been designed as an attempt to combine dialectological approaches with up-to-date text technological methodologies. The tools being developed and tested in the project will be beneficial for similar research questions in the field of Arabic studies and beyond. A particular feature of the project is the importance of the dictionary/corpus interface, which will allow the researcher to navigate from the corpus to the dictionary and vice versa. The project will be conducted in the spirit of open source and open access. Therefore, both the corpus and the lexicographical data of the project will be made available to the scientific community through a publicly accessible web interface, which will enable other scholars to do further analyses and to add their own material.

Project homepage: