The value of the Janes corpus for Slovenian language standardization

The main objective of this article is to assess the value of the Janes corpus for research in the field of language standardization. Unlike the existing reference corpora of written Slovenian, the newly available Janes corpus of user-generated content mostly consists of texts that have not been modi...

Full description

Bibliographic Details
Main Authors: Špela Arhar Holdt, Kaja Dobrovoljc
Format: Article
Language:English
Published: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts) 2016-09-01
Series:Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave
Subjects:
Online Access:http://slovenscina2.0.trojina.si/arhiv/2016/2/Slo2.0_2016_2_02.pdf
id doaj-7270489a503148d3a0f6b4a69d9880d6
record_format Article
spelling doaj-7270489a503148d3a0f6b4a69d9880d62021-04-02T05:39:18ZengZnanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave2335-27362335-27362016-09-0142137http://dx.doi.org/10.4312/slo2.0.2016.2The value of the Janes corpus for Slovenian language standardizationŠpela Arhar Holdt0Kaja Dobrovoljc1University of Ljubljana, Faculty of ArtsTrojina, Institute for Applied Slovene StudiesThe main objective of this article is to assess the value of the Janes corpus for research in the field of language standardization. Unlike the existing reference corpora of written Slovenian, the newly available Janes corpus of user-generated content mostly consists of texts that have not been modified by a proofreading expert; it therefore offers a more realistic insight into the trends of language use, as well as the intuitiveness of existing language rules, within a wider language community. We illustrate this methodological potential in a case study of nominal phrases with nonagreeing premodifiers, such as solo petje and RTV prispevek, by comparing their usage in Janes and the reference Kres corpus. The results reveal: this type of phrases is used more often in Janes and includes a longer list of candidates than in Kres; both corpora include a large number of phrases with variant spelling as either one or two words, irrespective of the premodifier in question; and, somewhat surprising, Janes displays a more consistent language use, suggesting that prescriptive regulation actually increases the level of inconsistency in language use. The article, a revised and enhanced extension of a prior conference paper, concludes with a discussion on possible future approaches to this linguistic issue and advocates for inclusion of Janes into Slovenian language standardisation methodology.http://slovenscina2.0.trojina.si/arhiv/2016/2/Slo2.0_2016_2_02.pdfJanes corpusKres corpuslanguage standardisationintuitiveness of language rulesnonagreeing premodifier
collection DOAJ
language English
format Article
sources DOAJ
author Špela Arhar Holdt
Kaja Dobrovoljc
spellingShingle Špela Arhar Holdt
Kaja Dobrovoljc
The value of the Janes corpus for Slovenian language standardization
Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave
Janes corpus
Kres corpus
language standardisation
intuitiveness of language rules
nonagreeing premodifier
author_facet Špela Arhar Holdt
Kaja Dobrovoljc
author_sort Špela Arhar Holdt
title The value of the Janes corpus for Slovenian language standardization
title_short The value of the Janes corpus for Slovenian language standardization
title_full The value of the Janes corpus for Slovenian language standardization
title_fullStr The value of the Janes corpus for Slovenian language standardization
title_full_unstemmed The value of the Janes corpus for Slovenian language standardization
title_sort value of the janes corpus for slovenian language standardization
publisher Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
series Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave
issn 2335-2736
2335-2736
publishDate 2016-09-01
description The main objective of this article is to assess the value of the Janes corpus for research in the field of language standardization. Unlike the existing reference corpora of written Slovenian, the newly available Janes corpus of user-generated content mostly consists of texts that have not been modified by a proofreading expert; it therefore offers a more realistic insight into the trends of language use, as well as the intuitiveness of existing language rules, within a wider language community. We illustrate this methodological potential in a case study of nominal phrases with nonagreeing premodifiers, such as solo petje and RTV prispevek, by comparing their usage in Janes and the reference Kres corpus. The results reveal: this type of phrases is used more often in Janes and includes a longer list of candidates than in Kres; both corpora include a large number of phrases with variant spelling as either one or two words, irrespective of the premodifier in question; and, somewhat surprising, Janes displays a more consistent language use, suggesting that prescriptive regulation actually increases the level of inconsistency in language use. The article, a revised and enhanced extension of a prior conference paper, concludes with a discussion on possible future approaches to this linguistic issue and advocates for inclusion of Janes into Slovenian language standardisation methodology.
topic Janes corpus
Kres corpus
language standardisation
intuitiveness of language rules
nonagreeing premodifier
url http://slovenscina2.0.trojina.si/arhiv/2016/2/Slo2.0_2016_2_02.pdf
work_keys_str_mv AT spelaarharholdt thevalueofthejanescorpusforslovenianlanguagestandardization
AT kajadobrovoljc thevalueofthejanescorpusforslovenianlanguagestandardization
AT spelaarharholdt valueofthejanescorpusforslovenianlanguagestandardization
AT kajadobrovoljc valueofthejanescorpusforslovenianlanguagestandardization
_version_ 1724172309483225088