Method of determining keywords for English texts based on DKPro Core

The approaches to search of keywords in text that are divided into two linguistic and statistical categories are considered. Linguistic methods are based on the meaning of words, especially using ontologies and semantic information of words. Unfortunately, these methods are resource-intensive in the...

Full description

Bibliographic Details
Main Authors: Олег Володимирович Бісікало, Олександр Вікторович Яхимович
Format: Article
Language:English
Published: PC Technology Center 2015-01-01
Series:Tehnologìčnij Audit ta Rezervi Virobnictva
Subjects:
Online Access:http://journals.uran.ua/tarp/article/view/37274
id doaj-22b0cbb6b99c49f3bb81556654e342c7
record_format Article
spelling doaj-22b0cbb6b99c49f3bb81556654e342c72020-11-25T01:31:00ZengPC Technology CenterTehnologìčnij Audit ta Rezervi Virobnictva2226-37802312-83722015-01-0112(21)263010.15587/2312-8372.2015.3727437274Method of determining keywords for English texts based on DKPro CoreОлег Володимирович Бісікало0Олександр Вікторович Яхимович1Vinnytsia National Technical University, Khmelnytsky Shosse 95, Vinnitsa, Ukraine, 21000Vinnytsia National Technical University, Khmelnytsky Shosse 95, Vinnitsa, Ukraine, 21000The approaches to search of keywords in text that are divided into two linguistic and statistical categories are considered. Linguistic methods are based on the meaning of words, especially using ontologies and semantic information of words. Unfortunately, these methods are resource-intensive in the early stages - development of ontologies, for example, is very time-consuming process. It is proposed a new method for determining the keywords based on finding connections between word forms of the English text with the instrumental capabilities of package DKPro Core. The method, which illustrated with examples of analysis, aimed at solving problems of efficient processing of text documents - indexing, abstracting, clustering and classification. As a result of theoretical and experimental studies it is found that the developed method found more keywords, specified by the author of the text, compared to analogues. In addition, the proposed method without additional filters at least 5 times reduces the number of stop words among the top ten important (key) words. The results can be used to improve the accuracy of the content analysis of the site and raise the site position in search results. Unlike the existing methods the proposed method of determining the keywords based on the use of additional information about complex relationships between members of the English sentence. For the functional implementation of text analyzer it is selected the popular linguistic package DKPro Core. Experimental studies of theoretical substantiation of method are proved its quality advantages in comparison with known analogues.http://journals.uran.ua/tarp/article/view/37274methodkeywordsEnglishlinguistic packageDKPro Coresyntactic analysis
collection DOAJ
language English
format Article
sources DOAJ
author Олег Володимирович Бісікало
Олександр Вікторович Яхимович
spellingShingle Олег Володимирович Бісікало
Олександр Вікторович Яхимович
Method of determining keywords for English texts based on DKPro Core
Tehnologìčnij Audit ta Rezervi Virobnictva
method
keywords
English
linguistic package
DKPro Core
syntactic analysis
author_facet Олег Володимирович Бісікало
Олександр Вікторович Яхимович
author_sort Олег Володимирович Бісікало
title Method of determining keywords for English texts based on DKPro Core
title_short Method of determining keywords for English texts based on DKPro Core
title_full Method of determining keywords for English texts based on DKPro Core
title_fullStr Method of determining keywords for English texts based on DKPro Core
title_full_unstemmed Method of determining keywords for English texts based on DKPro Core
title_sort method of determining keywords for english texts based on dkpro core
publisher PC Technology Center
series Tehnologìčnij Audit ta Rezervi Virobnictva
issn 2226-3780
2312-8372
publishDate 2015-01-01
description The approaches to search of keywords in text that are divided into two linguistic and statistical categories are considered. Linguistic methods are based on the meaning of words, especially using ontologies and semantic information of words. Unfortunately, these methods are resource-intensive in the early stages - development of ontologies, for example, is very time-consuming process. It is proposed a new method for determining the keywords based on finding connections between word forms of the English text with the instrumental capabilities of package DKPro Core. The method, which illustrated with examples of analysis, aimed at solving problems of efficient processing of text documents - indexing, abstracting, clustering and classification. As a result of theoretical and experimental studies it is found that the developed method found more keywords, specified by the author of the text, compared to analogues. In addition, the proposed method without additional filters at least 5 times reduces the number of stop words among the top ten important (key) words. The results can be used to improve the accuracy of the content analysis of the site and raise the site position in search results. Unlike the existing methods the proposed method of determining the keywords based on the use of additional information about complex relationships between members of the English sentence. For the functional implementation of text analyzer it is selected the popular linguistic package DKPro Core. Experimental studies of theoretical substantiation of method are proved its quality advantages in comparison with known analogues.
topic method
keywords
English
linguistic package
DKPro Core
syntactic analysis
url http://journals.uran.ua/tarp/article/view/37274
work_keys_str_mv AT olegvolodimirovičbísíkalo methodofdeterminingkeywordsforenglishtextsbasedondkprocore
AT oleksandrvíktorovičâhimovič methodofdeterminingkeywordsforenglishtextsbasedondkprocore
_version_ 1725088344936808448