Method of determining keywords for English texts based on DKPro Core
The approaches to search of keywords in text that are divided into two linguistic and statistical categories are considered. Linguistic methods are based on the meaning of words, especially using ontologies and semantic information of words. Unfortunately, these methods are resource-intensive in the...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
PC Technology Center
2015-01-01
|
Series: | Tehnologìčnij Audit ta Rezervi Virobnictva |
Subjects: | |
Online Access: | http://journals.uran.ua/tarp/article/view/37274 |
id |
doaj-22b0cbb6b99c49f3bb81556654e342c7 |
---|---|
record_format |
Article |
spelling |
doaj-22b0cbb6b99c49f3bb81556654e342c72020-11-25T01:31:00ZengPC Technology CenterTehnologìčnij Audit ta Rezervi Virobnictva2226-37802312-83722015-01-0112(21)263010.15587/2312-8372.2015.3727437274Method of determining keywords for English texts based on DKPro CoreОлег Володимирович Бісікало0Олександр Вікторович Яхимович1Vinnytsia National Technical University, Khmelnytsky Shosse 95, Vinnitsa, Ukraine, 21000Vinnytsia National Technical University, Khmelnytsky Shosse 95, Vinnitsa, Ukraine, 21000The approaches to search of keywords in text that are divided into two linguistic and statistical categories are considered. Linguistic methods are based on the meaning of words, especially using ontologies and semantic information of words. Unfortunately, these methods are resource-intensive in the early stages - development of ontologies, for example, is very time-consuming process. It is proposed a new method for determining the keywords based on finding connections between word forms of the English text with the instrumental capabilities of package DKPro Core. The method, which illustrated with examples of analysis, aimed at solving problems of efficient processing of text documents - indexing, abstracting, clustering and classification. As a result of theoretical and experimental studies it is found that the developed method found more keywords, specified by the author of the text, compared to analogues. In addition, the proposed method without additional filters at least 5 times reduces the number of stop words among the top ten important (key) words. The results can be used to improve the accuracy of the content analysis of the site and raise the site position in search results. Unlike the existing methods the proposed method of determining the keywords based on the use of additional information about complex relationships between members of the English sentence. For the functional implementation of text analyzer it is selected the popular linguistic package DKPro Core. Experimental studies of theoretical substantiation of method are proved its quality advantages in comparison with known analogues.http://journals.uran.ua/tarp/article/view/37274methodkeywordsEnglishlinguistic packageDKPro Coresyntactic analysis |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Олег Володимирович Бісікало Олександр Вікторович Яхимович |
spellingShingle |
Олег Володимирович Бісікало Олександр Вікторович Яхимович Method of determining keywords for English texts based on DKPro Core Tehnologìčnij Audit ta Rezervi Virobnictva method keywords English linguistic package DKPro Core syntactic analysis |
author_facet |
Олег Володимирович Бісікало Олександр Вікторович Яхимович |
author_sort |
Олег Володимирович Бісікало |
title |
Method of determining keywords for English texts based on DKPro Core |
title_short |
Method of determining keywords for English texts based on DKPro Core |
title_full |
Method of determining keywords for English texts based on DKPro Core |
title_fullStr |
Method of determining keywords for English texts based on DKPro Core |
title_full_unstemmed |
Method of determining keywords for English texts based on DKPro Core |
title_sort |
method of determining keywords for english texts based on dkpro core |
publisher |
PC Technology Center |
series |
Tehnologìčnij Audit ta Rezervi Virobnictva |
issn |
2226-3780 2312-8372 |
publishDate |
2015-01-01 |
description |
The approaches to search of keywords in text that are divided into two linguistic and statistical categories are considered. Linguistic methods are based on the meaning of words, especially using ontologies and semantic information of words. Unfortunately, these methods are resource-intensive in the early stages - development of ontologies, for example, is very time-consuming process.
It is proposed a new method for determining the keywords based on finding connections between word forms of the English text with the instrumental capabilities of package DKPro Core. The method, which illustrated with examples of analysis, aimed at solving problems of efficient processing of text documents - indexing, abstracting, clustering and classification.
As a result of theoretical and experimental studies it is found that the developed method found more keywords, specified by the author of the text, compared to analogues. In addition, the proposed method without additional filters at least 5 times reduces the number of stop words among the top ten important (key) words. The results can be used to improve the accuracy of the content analysis of the site and raise the site position in search results.
Unlike the existing methods the proposed method of determining the keywords based on the use of additional information about complex relationships between members of the English sentence. For the functional implementation of text analyzer it is selected the popular linguistic package DKPro Core. Experimental studies of theoretical substantiation of method are proved its quality advantages in comparison with known analogues. |
topic |
method keywords English linguistic package DKPro Core syntactic analysis |
url |
http://journals.uran.ua/tarp/article/view/37274 |
work_keys_str_mv |
AT olegvolodimirovičbísíkalo methodofdeterminingkeywordsforenglishtextsbasedondkprocore AT oleksandrvíktorovičâhimovič methodofdeterminingkeywordsforenglishtextsbasedondkprocore |
_version_ |
1725088344936808448 |