Spisz Dialect Corpus – possibilities of the new research tool

Helena Grochola-Szczepanek, Michał Woźniak, Rafał L. Górski (Instytut Języka Polskiego PAN)

In our talk, we will present the corpus of the spoken language from Polish Spisz. The Spisz Dialect Corpus was collected and developed in 2015-2019. It contains about 2 million text forms. It is publicly available at: https://www.spisz.ijp.pan.pl

The development of the corpus based on a non-standard code was associated with a number of difficulties resulting mainly from the diversity of the dialect system, the high variability of the contemporary speech of rural residents, the lack of a coherent recording system (since the dialect exists in spoken form) and the use of IT tools designed to develop a standard language.

In our talk, we will briefly recall project assumptions, stages of work, problems with developing custom code for the needs of the corpus and adopted solutions. We will focus on presenting the possibilities of using the corpus in research on the language and culture of the countryside.