Dictionary extraction based on statistical data

Automatic text summarization is an actual problem when working with a large amount of information. Most of the algorithms that work on the basis of statistical data build a summary text content by counting the similarity of text units and units importance. Text unit could be a word, sentence or para...

Full description

Bibliographic Details
Main Authors:	A. Mussina, S. Aubakirov
Format:	Article
Language:	English
Published:	Al-Farabi Kazakh National University 2018-07-01
Series:	Вестник КазНУ. Серия математика, механика, информатика
Subjects:	automatic extraction key-words n-gram
Online Access:	https://bm.kaznu.kz/index.php/kaznu/article/view/447/358

id	doaj-5c3ae0ac58934c50bae6c08533a9016c
record_format	Article
spelling	doaj-5c3ae0ac58934c50bae6c08533a9016c2021-08-02T11:28:27ZengAl-Farabi Kazakh National UniversityВестник КазНУ. Серия математика, механика, информатика1563-02772617-48712018-07-019427282Dictionary extraction based on statistical dataA. Mussina0S. Aubakirov1Al-Farabi Kazakh National UniversityAl-Farabi Kazakh National UniversityAutomatic text summarization is an actual problem when working with a large amount of information. Most of the algorithms that work on the basis of statistical data build a summary text content by counting the similarity of text units and units importance. Text unit could be a word, sentence or paragraph, in our case unit is a sentence. Similarity is considered the presence of key-words in the sentences. Key-words are words that indicate the topic of the text. In this research work we will describe an automatic extraction of key-words dictionary, where key-words are N-grams with N from 1 to 5. Two algorithms were implemented: getting of words that occur only in one of two different corpora and getting of words with high importance. Importance of N- gram denotes its belonging to the topic of the text. Used text languages are Russian and Kazakh. The algorithms show important results, both of them make sense in constructing of full key-words dictionary.https://bm.kaznu.kz/index.php/kaznu/article/view/447/358automatic extractionkey-wordsn-gram
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	A. Mussina S. Aubakirov
spellingShingle	A. Mussina S. Aubakirov Dictionary extraction based on statistical data Вестник КазНУ. Серия математика, механика, информатика automatic extraction key-words n-gram
author_facet	A. Mussina S. Aubakirov
author_sort	A. Mussina
title	Dictionary extraction based on statistical data
title_short	Dictionary extraction based on statistical data
title_full	Dictionary extraction based on statistical data
title_fullStr	Dictionary extraction based on statistical data
title_full_unstemmed	Dictionary extraction based on statistical data
title_sort	dictionary extraction based on statistical data
publisher	Al-Farabi Kazakh National University
series	Вестник КазНУ. Серия математика, механика, информатика
issn	1563-0277 2617-4871
publishDate	2018-07-01
description	Automatic text summarization is an actual problem when working with a large amount of information. Most of the algorithms that work on the basis of statistical data build a summary text content by counting the similarity of text units and units importance. Text unit could be a word, sentence or paragraph, in our case unit is a sentence. Similarity is considered the presence of key-words in the sentences. Key-words are words that indicate the topic of the text. In this research work we will describe an automatic extraction of key-words dictionary, where key-words are N-grams with N from 1 to 5. Two algorithms were implemented: getting of words that occur only in one of two different corpora and getting of words with high importance. Importance of N- gram denotes its belonging to the topic of the text. Used text languages are Russian and Kazakh. The algorithms show important results, both of them make sense in constructing of full key-words dictionary.
topic	automatic extraction key-words n-gram
url	https://bm.kaznu.kz/index.php/kaznu/article/view/447/358
work_keys_str_mv	AT amussina dictionaryextractionbasedonstatisticaldata AT saubakirov dictionaryextractionbasedonstatisticaldata
_version_	1721233101542129664

Dictionary extraction based on statistical data

Similar Items