Analysis of statistical methods for stable combinations determination of keywords identification

The study has solved the task of making comparative analysis and choosing an optimal statistical method to determine stable word combinations while identifying keywords to process English-language and Ukrainian-language Web-resources. The effectiveness of the method directly proportionally depends o...

Full description

Bibliographic Details
Main Authors: Vasyl Lytvyn, Victoria Vysotska, Dmytro Uhryn, Mariya Hrendus, Oleh Naum
Format: Article
Language:English
Published: PC Technology Center 2018-03-01
Series:Eastern-European Journal of Enterprise Technologies
Subjects:
NLP
SEO
Online Access:http://journals.uran.ua/eejet/article/view/126009
id doaj-e9c9b1a94c384e63b01e10c7311f7405
record_format Article
spelling doaj-e9c9b1a94c384e63b01e10c7311f74052020-11-24T22:13:21ZengPC Technology CenterEastern-European Journal of Enterprise Technologies1729-37741729-40612018-03-0122 (92)233710.15587/1729-4061.2018.126009126009Analysis of statistical methods for stable combinations determination of keywords identificationVasyl Lytvyn0Victoria Vysotska1Dmytro Uhryn2Mariya Hrendus3Oleh Naum4Lviv Polytechnic National University S. Bandery str., 12, Lvіv, Ukraine, 79013Lviv Polytechnic National University S. Bandery str., 12, Lvіv, Ukraine, 79013Chernivtsi Faculty of National Technical University «Kharkiv Polytechnic Institute» Holovna str., 203A, Chernivtsi, Ukraine, 58000Lviv Polytechnic National University S. Bandery str., 12, Lvіv, Ukraine, 79013Drohobych Ivan Franko State Pedagogical University I. Franko str., 24, Drohobych, Ukraine, 82100The study has solved the task of making comparative analysis and choosing an optimal statistical method to determine stable word combinations while identifying keywords to process English-language and Ukrainian-language Web-resources. The effectiveness of the method directly proportionally depends on the quality of linguistic analysis, of Ukrainian and English texts, respectively, based on the technology of Web Mining and NLP. A decomposition of methods of linguistic analysis was performed to determine the impact on the quality of forming stable word combinations as keywords. The features of the method are the adaptation of the morphological and syntactic analyses of lexical units to the peculiarities of Ukrainian-language words/texts. To determine stable word combinations effectively, it is essential to exclude functional words (stops or references), pronouns, numerals and verbs because they are not related to the subject and content of a published work. A set of stable word combinations as keywords is determined by qualitative morphological and syntactic analyses of relevant texts. The set of the identified stable word combinations is used further to compare and determine the degree of the text relevance to a specific topic or user request. The internal “dynamics” of forming a set of stable word combinations as keywords was investigated in the study depending on the statistical method applied to the texts. The obtained results have been verified. The study has produced results of the experimental testing of the proposed content-monitoring method for determining stable word combinations to identify keywords in the processing of English-language and Ukrainian-language web-resources of the technical content based on Web Mining technology. It has been determined that the authors of published works often identify the keywords that are far from being considered. It has also been proven that the quality of the result is influenced by the quality of linguistic analysis of texts and subsequent filtering. Further experimental research requires approbation of the proposed method for determining keywords for other categories of texts – scientific, humanitarian, belletristic, journalistic, etc.http://journals.uran.ua/eejet/article/view/126009stable word combinationNLPInformation RetrievalSEOWeb-miningstatistical linguistic analysisquantitative linguisticsheading
collection DOAJ
language English
format Article
sources DOAJ
author Vasyl Lytvyn
Victoria Vysotska
Dmytro Uhryn
Mariya Hrendus
Oleh Naum
spellingShingle Vasyl Lytvyn
Victoria Vysotska
Dmytro Uhryn
Mariya Hrendus
Oleh Naum
Analysis of statistical methods for stable combinations determination of keywords identification
Eastern-European Journal of Enterprise Technologies
stable word combination
NLP
Information Retrieval
SEO
Web-mining
statistical linguistic analysis
quantitative linguistics
heading
author_facet Vasyl Lytvyn
Victoria Vysotska
Dmytro Uhryn
Mariya Hrendus
Oleh Naum
author_sort Vasyl Lytvyn
title Analysis of statistical methods for stable combinations determination of keywords identification
title_short Analysis of statistical methods for stable combinations determination of keywords identification
title_full Analysis of statistical methods for stable combinations determination of keywords identification
title_fullStr Analysis of statistical methods for stable combinations determination of keywords identification
title_full_unstemmed Analysis of statistical methods for stable combinations determination of keywords identification
title_sort analysis of statistical methods for stable combinations determination of keywords identification
publisher PC Technology Center
series Eastern-European Journal of Enterprise Technologies
issn 1729-3774
1729-4061
publishDate 2018-03-01
description The study has solved the task of making comparative analysis and choosing an optimal statistical method to determine stable word combinations while identifying keywords to process English-language and Ukrainian-language Web-resources. The effectiveness of the method directly proportionally depends on the quality of linguistic analysis, of Ukrainian and English texts, respectively, based on the technology of Web Mining and NLP. A decomposition of methods of linguistic analysis was performed to determine the impact on the quality of forming stable word combinations as keywords. The features of the method are the adaptation of the morphological and syntactic analyses of lexical units to the peculiarities of Ukrainian-language words/texts. To determine stable word combinations effectively, it is essential to exclude functional words (stops or references), pronouns, numerals and verbs because they are not related to the subject and content of a published work. A set of stable word combinations as keywords is determined by qualitative morphological and syntactic analyses of relevant texts. The set of the identified stable word combinations is used further to compare and determine the degree of the text relevance to a specific topic or user request. The internal “dynamics” of forming a set of stable word combinations as keywords was investigated in the study depending on the statistical method applied to the texts. The obtained results have been verified. The study has produced results of the experimental testing of the proposed content-monitoring method for determining stable word combinations to identify keywords in the processing of English-language and Ukrainian-language web-resources of the technical content based on Web Mining technology. It has been determined that the authors of published works often identify the keywords that are far from being considered. It has also been proven that the quality of the result is influenced by the quality of linguistic analysis of texts and subsequent filtering. Further experimental research requires approbation of the proposed method for determining keywords for other categories of texts – scientific, humanitarian, belletristic, journalistic, etc.
topic stable word combination
NLP
Information Retrieval
SEO
Web-mining
statistical linguistic analysis
quantitative linguistics
heading
url http://journals.uran.ua/eejet/article/view/126009
work_keys_str_mv AT vasyllytvyn analysisofstatisticalmethodsforstablecombinationsdeterminationofkeywordsidentification
AT victoriavysotska analysisofstatisticalmethodsforstablecombinationsdeterminationofkeywordsidentification
AT dmytrouhryn analysisofstatisticalmethodsforstablecombinationsdeterminationofkeywordsidentification
AT mariyahrendus analysisofstatisticalmethodsforstablecombinationsdeterminationofkeywordsidentification
AT olehnaum analysisofstatisticalmethodsforstablecombinationsdeterminationofkeywordsidentification
_version_ 1725801596355346432