THE EXTRACTION OF LEXICAL AND METRORHYTHMIC FEATURES WHICH ARE CHARACTERISTIC FOR THE GENRE AND THE STYLE AND FOR THEIR COMBINATIONS WITHIN THE PROCESS OF AUTOMATED PROCESSING OF TEXTS IN RUSSIAN

This paper describes the algorithm of automatic extraction of the characteristic features for the genre and the style. This work was carried out in the framework of the development of a software system created in the Institute of Computational Technologies of SB RAS and designed for a complex analys...

Full description

Bibliographic Details
Main Authors: Vladimir B. Barakhnin, Olga Yu. Kozhemyakina, Elena V. Rychkova, Ilya S. Pastushkov, Yuliya S. Borzilova
Format: Article
Language:Russian
Published: The Fund for Promotion of Internet media, IT education, human development «League Internet Media» 2018-12-01
Series:Современные информационные технологии и IT-образование
Subjects:
Online Access:http://sitito.cs.msu.ru/index.php/SITITO/article/view/455
id doaj-984881f85f2f426683a1e3c19d19d112
record_format Article
spelling doaj-984881f85f2f426683a1e3c19d19d1122020-12-02T05:58:14ZrusThe Fund for Promotion of Internet media, IT education, human development «League Internet Media»Современные информационные технологии и IT-образование2411-14732018-12-0114488889510.25559/SITITO.14.201804.888-895THE EXTRACTION OF LEXICAL AND METRORHYTHMIC FEATURES WHICH ARE CHARACTERISTIC FOR THE GENRE AND THE STYLE AND FOR THEIR COMBINATIONS WITHIN THE PROCESS OF AUTOMATED PROCESSING OF TEXTS IN RUSSIANVladimir B. Barakhnin0Olga Yu. Kozhemyakina1Elena V. Rychkova2Ilya S. Pastushkov3Yuliya S. Borzilova4Institute of Computational Technologies of the Siberian Branch of the Russian Academy of Sciences; Novosibirsk State University, Novosibirsk, RussiaInstitute of Computational Technologies of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, RussiaInstitute of Computational Technologies of the Siberian Branch of the Russian Academy of Sciences; Novosibirsk State University, Novosibirsk, RussiaInstitute of Computational Technologies of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, RussiaInstitute of Computational Technologies of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, RussiaThis paper describes the algorithm of automatic extraction of the characteristic features for the genre and the style. This work was carried out in the framework of the development of a software system created in the Institute of Computational Technologies of SB RAS and designed for a complex analysis of metrorhythmic and genre-stylistic characteristics of poetic texts in Russian. The paper presents the structure of the software system developed in the ICT SB RAS and intended for a complex analysis of metrorhythmic and genre-stylistic characteristics of poetic texts in Russian. The system organically combines both original program modules which are created directly by the system developers and intended for the solution of the single-purpose tasks of the analysis of the poetic texts, and open access software products. The generalized approach, which allows to consider the poetic features in the form of a vector, on the one hand, allows to use the modern algorithms of the classification and their ensembles, on the other, such approach has the disadvantages for small volumes of the texts with which it is necessary to work. Therefore, the presence of such a step as verification allows the specialists to adjust the operation of the system based on an expert knowledge, and also makes the classification process transparent. As a tool, the Python libraries were used: scikit-learn, in which the algorithms of the classification and also the methods of their combination were implemented; and ELI5, which allows to establish a correspondence between the components of the feature vector with specific features. So, the extraction of lexical and metrorhythmic features which are characteristic for the genre and style and of their combinations improved the process of automated processing of poetic texts in Russian what is shown on the base of the corpus of poetic texts of A.S. Pushkin and K.N. Batyushkov. The obtained results can be used for the verification of the classifier and for a list of characteristic features for the genre and the style of a poet.http://sitito.cs.msu.ru/index.php/SITITO/article/view/455Patterns recognitionprincipal component analysisautomated analysis of poetic textsalgorithm of classificationensembling
collection DOAJ
language Russian
format Article
sources DOAJ
author Vladimir B. Barakhnin
Olga Yu. Kozhemyakina
Elena V. Rychkova
Ilya S. Pastushkov
Yuliya S. Borzilova
spellingShingle Vladimir B. Barakhnin
Olga Yu. Kozhemyakina
Elena V. Rychkova
Ilya S. Pastushkov
Yuliya S. Borzilova
THE EXTRACTION OF LEXICAL AND METRORHYTHMIC FEATURES WHICH ARE CHARACTERISTIC FOR THE GENRE AND THE STYLE AND FOR THEIR COMBINATIONS WITHIN THE PROCESS OF AUTOMATED PROCESSING OF TEXTS IN RUSSIAN
Современные информационные технологии и IT-образование
Patterns recognition
principal component analysis
automated analysis of poetic texts
algorithm of classification
ensembling
author_facet Vladimir B. Barakhnin
Olga Yu. Kozhemyakina
Elena V. Rychkova
Ilya S. Pastushkov
Yuliya S. Borzilova
author_sort Vladimir B. Barakhnin
title THE EXTRACTION OF LEXICAL AND METRORHYTHMIC FEATURES WHICH ARE CHARACTERISTIC FOR THE GENRE AND THE STYLE AND FOR THEIR COMBINATIONS WITHIN THE PROCESS OF AUTOMATED PROCESSING OF TEXTS IN RUSSIAN
title_short THE EXTRACTION OF LEXICAL AND METRORHYTHMIC FEATURES WHICH ARE CHARACTERISTIC FOR THE GENRE AND THE STYLE AND FOR THEIR COMBINATIONS WITHIN THE PROCESS OF AUTOMATED PROCESSING OF TEXTS IN RUSSIAN
title_full THE EXTRACTION OF LEXICAL AND METRORHYTHMIC FEATURES WHICH ARE CHARACTERISTIC FOR THE GENRE AND THE STYLE AND FOR THEIR COMBINATIONS WITHIN THE PROCESS OF AUTOMATED PROCESSING OF TEXTS IN RUSSIAN
title_fullStr THE EXTRACTION OF LEXICAL AND METRORHYTHMIC FEATURES WHICH ARE CHARACTERISTIC FOR THE GENRE AND THE STYLE AND FOR THEIR COMBINATIONS WITHIN THE PROCESS OF AUTOMATED PROCESSING OF TEXTS IN RUSSIAN
title_full_unstemmed THE EXTRACTION OF LEXICAL AND METRORHYTHMIC FEATURES WHICH ARE CHARACTERISTIC FOR THE GENRE AND THE STYLE AND FOR THEIR COMBINATIONS WITHIN THE PROCESS OF AUTOMATED PROCESSING OF TEXTS IN RUSSIAN
title_sort extraction of lexical and metrorhythmic features which are characteristic for the genre and the style and for their combinations within the process of automated processing of texts in russian
publisher The Fund for Promotion of Internet media, IT education, human development «League Internet Media»
series Современные информационные технологии и IT-образование
issn 2411-1473
publishDate 2018-12-01
description This paper describes the algorithm of automatic extraction of the characteristic features for the genre and the style. This work was carried out in the framework of the development of a software system created in the Institute of Computational Technologies of SB RAS and designed for a complex analysis of metrorhythmic and genre-stylistic characteristics of poetic texts in Russian. The paper presents the structure of the software system developed in the ICT SB RAS and intended for a complex analysis of metrorhythmic and genre-stylistic characteristics of poetic texts in Russian. The system organically combines both original program modules which are created directly by the system developers and intended for the solution of the single-purpose tasks of the analysis of the poetic texts, and open access software products. The generalized approach, which allows to consider the poetic features in the form of a vector, on the one hand, allows to use the modern algorithms of the classification and their ensembles, on the other, such approach has the disadvantages for small volumes of the texts with which it is necessary to work. Therefore, the presence of such a step as verification allows the specialists to adjust the operation of the system based on an expert knowledge, and also makes the classification process transparent. As a tool, the Python libraries were used: scikit-learn, in which the algorithms of the classification and also the methods of their combination were implemented; and ELI5, which allows to establish a correspondence between the components of the feature vector with specific features. So, the extraction of lexical and metrorhythmic features which are characteristic for the genre and style and of their combinations improved the process of automated processing of poetic texts in Russian what is shown on the base of the corpus of poetic texts of A.S. Pushkin and K.N. Batyushkov. The obtained results can be used for the verification of the classifier and for a list of characteristic features for the genre and the style of a poet.
topic Patterns recognition
principal component analysis
automated analysis of poetic texts
algorithm of classification
ensembling
url http://sitito.cs.msu.ru/index.php/SITITO/article/view/455
work_keys_str_mv AT vladimirbbarakhnin theextractionoflexicalandmetrorhythmicfeatureswhicharecharacteristicforthegenreandthestyleandfortheircombinationswithintheprocessofautomatedprocessingoftextsinrussian
AT olgayukozhemyakina theextractionoflexicalandmetrorhythmicfeatureswhicharecharacteristicforthegenreandthestyleandfortheircombinationswithintheprocessofautomatedprocessingoftextsinrussian
AT elenavrychkova theextractionoflexicalandmetrorhythmicfeatureswhicharecharacteristicforthegenreandthestyleandfortheircombinationswithintheprocessofautomatedprocessingoftextsinrussian
AT ilyaspastushkov theextractionoflexicalandmetrorhythmicfeatureswhicharecharacteristicforthegenreandthestyleandfortheircombinationswithintheprocessofautomatedprocessingoftextsinrussian
AT yuliyasborzilova theextractionoflexicalandmetrorhythmicfeatureswhicharecharacteristicforthegenreandthestyleandfortheircombinationswithintheprocessofautomatedprocessingoftextsinrussian
AT vladimirbbarakhnin extractionoflexicalandmetrorhythmicfeatureswhicharecharacteristicforthegenreandthestyleandfortheircombinationswithintheprocessofautomatedprocessingoftextsinrussian
AT olgayukozhemyakina extractionoflexicalandmetrorhythmicfeatureswhicharecharacteristicforthegenreandthestyleandfortheircombinationswithintheprocessofautomatedprocessingoftextsinrussian
AT elenavrychkova extractionoflexicalandmetrorhythmicfeatureswhicharecharacteristicforthegenreandthestyleandfortheircombinationswithintheprocessofautomatedprocessingoftextsinrussian
AT ilyaspastushkov extractionoflexicalandmetrorhythmicfeatureswhicharecharacteristicforthegenreandthestyleandfortheircombinationswithintheprocessofautomatedprocessingoftextsinrussian
AT yuliyasborzilova extractionoflexicalandmetrorhythmicfeatureswhicharecharacteristicforthegenreandthestyleandfortheircombinationswithintheprocessofautomatedprocessingoftextsinrussian
_version_ 1724408921326616576