The value of the Janes corpus for Slovenian language standardization
The main objective of this article is to assess the value of the Janes corpus for research in the field of language standardization. Unlike the existing reference corpora of written Slovenian, the newly available Janes corpus of user-generated content mostly consists of texts that have not been modi...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
2016-09-01
|
Series: | Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave |
Subjects: | |
Online Access: | http://slovenscina2.0.trojina.si/arhiv/2016/2/Slo2.0_2016_2_02.pdf |
id |
doaj-7270489a503148d3a0f6b4a69d9880d6 |
---|---|
record_format |
Article |
spelling |
doaj-7270489a503148d3a0f6b4a69d9880d62021-04-02T05:39:18ZengZnanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave2335-27362335-27362016-09-0142137http://dx.doi.org/10.4312/slo2.0.2016.2The value of the Janes corpus for Slovenian language standardizationŠpela Arhar Holdt0Kaja Dobrovoljc1University of Ljubljana, Faculty of ArtsTrojina, Institute for Applied Slovene StudiesThe main objective of this article is to assess the value of the Janes corpus for research in the field of language standardization. Unlike the existing reference corpora of written Slovenian, the newly available Janes corpus of user-generated content mostly consists of texts that have not been modified by a proofreading expert; it therefore offers a more realistic insight into the trends of language use, as well as the intuitiveness of existing language rules, within a wider language community. We illustrate this methodological potential in a case study of nominal phrases with nonagreeing premodifiers, such as solo petje and RTV prispevek, by comparing their usage in Janes and the reference Kres corpus. The results reveal: this type of phrases is used more often in Janes and includes a longer list of candidates than in Kres; both corpora include a large number of phrases with variant spelling as either one or two words, irrespective of the premodifier in question; and, somewhat surprising, Janes displays a more consistent language use, suggesting that prescriptive regulation actually increases the level of inconsistency in language use. The article, a revised and enhanced extension of a prior conference paper, concludes with a discussion on possible future approaches to this linguistic issue and advocates for inclusion of Janes into Slovenian language standardisation methodology.http://slovenscina2.0.trojina.si/arhiv/2016/2/Slo2.0_2016_2_02.pdfJanes corpusKres corpuslanguage standardisationintuitiveness of language rulesnonagreeing premodifier |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Špela Arhar Holdt Kaja Dobrovoljc |
spellingShingle |
Špela Arhar Holdt Kaja Dobrovoljc The value of the Janes corpus for Slovenian language standardization Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave Janes corpus Kres corpus language standardisation intuitiveness of language rules nonagreeing premodifier |
author_facet |
Špela Arhar Holdt Kaja Dobrovoljc |
author_sort |
Špela Arhar Holdt |
title |
The value of the Janes corpus for Slovenian language standardization |
title_short |
The value of the Janes corpus for Slovenian language standardization |
title_full |
The value of the Janes corpus for Slovenian language standardization |
title_fullStr |
The value of the Janes corpus for Slovenian language standardization |
title_full_unstemmed |
The value of the Janes corpus for Slovenian language standardization |
title_sort |
value of the janes corpus for slovenian language standardization |
publisher |
Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts) |
series |
Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave |
issn |
2335-2736 2335-2736 |
publishDate |
2016-09-01 |
description |
The main objective of this article is to assess the value of the Janes corpus for research in the field of language standardization. Unlike the existing reference corpora of written Slovenian, the newly available Janes corpus of user-generated content mostly consists of texts that have not been modified by a proofreading expert; it therefore offers a more realistic insight into the trends of language use, as well as the intuitiveness of existing language rules, within a wider language community. We illustrate this methodological potential in a case study of nominal phrases with nonagreeing premodifiers, such as solo petje and RTV prispevek, by comparing their usage in Janes and the reference Kres corpus. The results reveal: this type of phrases is used more often in Janes and includes a longer list of candidates than in Kres; both corpora include a large number of phrases with variant spelling as either one or two words, irrespective of the premodifier in question; and, somewhat surprising, Janes displays a more consistent language use, suggesting that prescriptive regulation actually increases the level of inconsistency in language use. The article, a revised and enhanced extension of a prior conference paper, concludes with a discussion on possible future approaches to this linguistic issue and advocates for inclusion of Janes into Slovenian language standardisation methodology. |
topic |
Janes corpus Kres corpus language standardisation intuitiveness of language rules nonagreeing premodifier |
url |
http://slovenscina2.0.trojina.si/arhiv/2016/2/Slo2.0_2016_2_02.pdf |
work_keys_str_mv |
AT spelaarharholdt thevalueofthejanescorpusforslovenianlanguagestandardization AT kajadobrovoljc thevalueofthejanescorpusforslovenianlanguagestandardization AT spelaarharholdt valueofthejanescorpusforslovenianlanguagestandardization AT kajadobrovoljc valueofthejanescorpusforslovenianlanguagestandardization |
_version_ |
1724172309483225088 |