The value of the Janes corpus for Slovenian language standardization

The main objective of this article is to assess the value of the Janes corpus for research in the field of language standardization. Unlike the existing reference corpora of written Slovenian, the newly available Janes corpus of user-generated content mostly consists of texts that have not been modi...

Full description

Bibliographic Details
Main Authors:	Špela Arhar Holdt, Kaja Dobrovoljc
Format:	Article
Language:	English
Published:	Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts) 2016-09-01
Series:	Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave
Subjects:	Janes corpus Kres corpus language standardisation intuitiveness of language rules nonagreeing premodifier
Online Access:	http://slovenscina2.0.trojina.si/arhiv/2016/2/Slo2.0_2016_2_02.pdf

id	doaj-7270489a503148d3a0f6b4a69d9880d6
record_format	Article
spelling	doaj-7270489a503148d3a0f6b4a69d9880d62021-04-02T05:39:18ZengZnanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave2335-27362335-27362016-09-0142137http://dx.doi.org/10.4312/slo2.0.2016.2The value of the Janes corpus for Slovenian language standardizationŠpela Arhar Holdt0Kaja Dobrovoljc1University of Ljubljana, Faculty of ArtsTrojina, Institute for Applied Slovene StudiesThe main objective of this article is to assess the value of the Janes corpus for research in the field of language standardization. Unlike the existing reference corpora of written Slovenian, the newly available Janes corpus of user-generated content mostly consists of texts that have not been modified by a proofreading expert; it therefore offers a more realistic insight into the trends of language use, as well as the intuitiveness of existing language rules, within a wider language community. We illustrate this methodological potential in a case study of nominal phrases with nonagreeing premodifiers, such as solo petje and RTV prispevek, by comparing their usage in Janes and the reference Kres corpus. The results reveal: this type of phrases is used more often in Janes and includes a longer list of candidates than in Kres; both corpora include a large number of phrases with variant spelling as either one or two words, irrespective of the premodifier in question; and, somewhat surprising, Janes displays a more consistent language use, suggesting that prescriptive regulation actually increases the level of inconsistency in language use. The article, a revised and enhanced extension of a prior conference paper, concludes with a discussion on possible future approaches to this linguistic issue and advocates for inclusion of Janes into Slovenian language standardisation methodology.http://slovenscina2.0.trojina.si/arhiv/2016/2/Slo2.0_2016_2_02.pdfJanes corpusKres corpuslanguage standardisationintuitiveness of language rulesnonagreeing premodifier
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Špela Arhar Holdt Kaja Dobrovoljc
spellingShingle	Špela Arhar Holdt Kaja Dobrovoljc The value of the Janes corpus for Slovenian language standardization Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave Janes corpus Kres corpus language standardisation intuitiveness of language rules nonagreeing premodifier
author_facet	Špela Arhar Holdt Kaja Dobrovoljc
author_sort	Špela Arhar Holdt
title	The value of the Janes corpus for Slovenian language standardization
title_short	The value of the Janes corpus for Slovenian language standardization
title_full	The value of the Janes corpus for Slovenian language standardization
title_fullStr	The value of the Janes corpus for Slovenian language standardization
title_full_unstemmed	The value of the Janes corpus for Slovenian language standardization
title_sort	value of the janes corpus for slovenian language standardization
publisher	Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
series	Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave
issn	2335-2736 2335-2736
publishDate	2016-09-01
description	The main objective of this article is to assess the value of the Janes corpus for research in the field of language standardization. Unlike the existing reference corpora of written Slovenian, the newly available Janes corpus of user-generated content mostly consists of texts that have not been modified by a proofreading expert; it therefore offers a more realistic insight into the trends of language use, as well as the intuitiveness of existing language rules, within a wider language community. We illustrate this methodological potential in a case study of nominal phrases with nonagreeing premodifiers, such as solo petje and RTV prispevek, by comparing their usage in Janes and the reference Kres corpus. The results reveal: this type of phrases is used more often in Janes and includes a longer list of candidates than in Kres; both corpora include a large number of phrases with variant spelling as either one or two words, irrespective of the premodifier in question; and, somewhat surprising, Janes displays a more consistent language use, suggesting that prescriptive regulation actually increases the level of inconsistency in language use. The article, a revised and enhanced extension of a prior conference paper, concludes with a discussion on possible future approaches to this linguistic issue and advocates for inclusion of Janes into Slovenian language standardisation methodology.
topic	Janes corpus Kres corpus language standardisation intuitiveness of language rules nonagreeing premodifier
url	http://slovenscina2.0.trojina.si/arhiv/2016/2/Slo2.0_2016_2_02.pdf
work_keys_str_mv	AT spelaarharholdt thevalueofthejanescorpusforslovenianlanguagestandardization AT kajadobrovoljc thevalueofthejanescorpusforslovenianlanguagestandardization AT spelaarharholdt valueofthejanescorpusforslovenianlanguagestandardization AT kajadobrovoljc valueofthejanescorpusforslovenianlanguagestandardization
_version_	1724172309483225088

The value of the Janes corpus for Slovenian language standardization

Similar Items