A metatextual markup in the national corpus of Tuvan language: the structure and functionality

Creating natural language corpora helps solve a number of philological and purely linguistic problems for many languages of the peoples of Russian Federation. National corpus of Tuvan language (http://www.tuvancorpus.ru/) is one of such products jointly developed by faculty and students at two unive...

Full description

Bibliographic Details
Main Author: Choduraa M. Mongush
Format: Article
Language:Russian
Published: Novye Issledovaniâ Tuvy 2016-12-01
Series:Novye Issledovaniâ Tuvy
Subjects:
Online Access:https://nit.tuva.asia/nit/article/view/613
id doaj-5c713ef9ed9d42829656cc2c417dbde3
record_format Article
spelling doaj-5c713ef9ed9d42829656cc2c417dbde32020-11-24T22:12:27ZrusNovye Issledovaniâ Tuvy Novye Issledovaniâ Tuvy2079-84822016-12-0104606A metatextual markup in the national corpus of Tuvan language: the structure and functionalityChoduraa M. Mongush0Тувинский государственный университет, Сибирский федеральный университетCreating natural language corpora helps solve a number of philological and purely linguistic problems for many languages of the peoples of Russian Federation. National corpus of Tuvan language (http://www.tuvancorpus.ru/) is one of such products jointly developed by faculty and students at two universities in Krasnoyarsk and Kyzyl. The article presents a meta-markup system which forms the most important part of the search functionality in any corpus. Meta-markup refers to assigning parameters characterizing the text as a whole. Within a corpus, meta-markup provides the opportunity to search and select texts to include them into subcorpora by the presence of a certain feature(s). Consequently, the larger the set of such features is for each text, the wider become the search functionality for various philological and linguistic purposes. The meta-markup system for the texts included into the National corpus of Tuvan language may include up to 18 parameters, such as the author’s name and gender, the title and creation date (year) of the text, its functional sphere, topic, subject area, time and setting of events described in it, the text’s classification by type of spoken language or literary genre and style, its source, name of the periodical it appeared in, publisher, publication date, medium, comments, as well as some features of its audience, such as age and education level.https://nit.tuva.asia/nit/article/view/613корпусы естественных языковНациональный корпус тувинского языкаметаразметкатувинский языктувинский героический эпос
collection DOAJ
language Russian
format Article
sources DOAJ
author Choduraa M. Mongush
spellingShingle Choduraa M. Mongush
A metatextual markup in the national corpus of Tuvan language: the structure and functionality
Novye Issledovaniâ Tuvy
корпусы естественных языков
Национальный корпус тувинского языка
метаразметка
тувинский язык
тувинский героический эпос
author_facet Choduraa M. Mongush
author_sort Choduraa M. Mongush
title A metatextual markup in the national corpus of Tuvan language: the structure and functionality
title_short A metatextual markup in the national corpus of Tuvan language: the structure and functionality
title_full A metatextual markup in the national corpus of Tuvan language: the structure and functionality
title_fullStr A metatextual markup in the national corpus of Tuvan language: the structure and functionality
title_full_unstemmed A metatextual markup in the national corpus of Tuvan language: the structure and functionality
title_sort metatextual markup in the national corpus of tuvan language: the structure and functionality
publisher Novye Issledovaniâ Tuvy
series Novye Issledovaniâ Tuvy
issn 2079-8482
publishDate 2016-12-01
description Creating natural language corpora helps solve a number of philological and purely linguistic problems for many languages of the peoples of Russian Federation. National corpus of Tuvan language (http://www.tuvancorpus.ru/) is one of such products jointly developed by faculty and students at two universities in Krasnoyarsk and Kyzyl. The article presents a meta-markup system which forms the most important part of the search functionality in any corpus. Meta-markup refers to assigning parameters characterizing the text as a whole. Within a corpus, meta-markup provides the opportunity to search and select texts to include them into subcorpora by the presence of a certain feature(s). Consequently, the larger the set of such features is for each text, the wider become the search functionality for various philological and linguistic purposes. The meta-markup system for the texts included into the National corpus of Tuvan language may include up to 18 parameters, such as the author’s name and gender, the title and creation date (year) of the text, its functional sphere, topic, subject area, time and setting of events described in it, the text’s classification by type of spoken language or literary genre and style, its source, name of the periodical it appeared in, publisher, publication date, medium, comments, as well as some features of its audience, such as age and education level.
topic корпусы естественных языков
Национальный корпус тувинского языка
метаразметка
тувинский язык
тувинский героический эпос
url https://nit.tuva.asia/nit/article/view/613
work_keys_str_mv AT choduraammongush ametatextualmarkupinthenationalcorpusoftuvanlanguagethestructureandfunctionality
AT choduraammongush metatextualmarkupinthenationalcorpusoftuvanlanguagethestructureandfunctionality
_version_ 1725803604613267456