Dissimilarities Detections in Texts Using Symbol n-grams and Word Histograms

Texts (books, novels, papers, short messages) are sequences of sentences, words or symbols. Each author has an unique writing style. It can be characterized by some collection of attributes obtained from texts. The text verification is the case of an authorship verification where we have some text a...

Full description

Bibliographic Details
Main Authors: Andrejková Gabriela, Almarimi Abdulwahed
Format: Article
Language:English
Published: De Gruyter 2016-11-01
Series:Open Computer Science
Subjects:
Online Access:https://doi.org/10.1515/comp-2016-0014
id doaj-eac608135b7f46b0874452318ce24f6c
record_format Article
spelling doaj-eac608135b7f46b0874452318ce24f6c2021-09-06T19:19:42ZengDe GruyterOpen Computer Science2299-10932016-11-016116817710.1515/comp-2016-0014comp-2016-0014Dissimilarities Detections in Texts Using Symbol n-grams and Word HistogramsAndrejková Gabriela0Almarimi Abdulwahed1Institute of Computer Sciencel, P. J. Šafárik University in KošiceInstitute of Computer Sciencel, P. J. Šafárik University in KošiceTexts (books, novels, papers, short messages) are sequences of sentences, words or symbols. Each author has an unique writing style. It can be characterized by some collection of attributes obtained from texts. The text verification is the case of an authorship verification where we have some text and we analyze if all parts of this textwere written by the same (unknown or known) author. In this paper, there are analyzed and compared results of two developed methods for a text verification based on ngrams of symbols and on local histograms of words. The results of a symbol n-gram method and a method of word histograms for a dissimilarities searching in text parts of each text are analyzed and evaluated. The searched dissimilarities call for an attention to the text (or not) if the text parts were written by the same author or not. The attention depends on selected parameters prepared in experiments. Results illustrate usability of the methods to dissimilarities searching in text parts.https://doi.org/10.1515/comp-2016-0014n-grams of symbols histograms of words bag of words stylistic measure text dissimilarity
collection DOAJ
language English
format Article
sources DOAJ
author Andrejková Gabriela
Almarimi Abdulwahed
spellingShingle Andrejková Gabriela
Almarimi Abdulwahed
Dissimilarities Detections in Texts Using Symbol n-grams and Word Histograms
Open Computer Science
n-grams of symbols
histograms of words
bag of words
stylistic measure
text dissimilarity
author_facet Andrejková Gabriela
Almarimi Abdulwahed
author_sort Andrejková Gabriela
title Dissimilarities Detections in Texts Using Symbol n-grams and Word Histograms
title_short Dissimilarities Detections in Texts Using Symbol n-grams and Word Histograms
title_full Dissimilarities Detections in Texts Using Symbol n-grams and Word Histograms
title_fullStr Dissimilarities Detections in Texts Using Symbol n-grams and Word Histograms
title_full_unstemmed Dissimilarities Detections in Texts Using Symbol n-grams and Word Histograms
title_sort dissimilarities detections in texts using symbol n-grams and word histograms
publisher De Gruyter
series Open Computer Science
issn 2299-1093
publishDate 2016-11-01
description Texts (books, novels, papers, short messages) are sequences of sentences, words or symbols. Each author has an unique writing style. It can be characterized by some collection of attributes obtained from texts. The text verification is the case of an authorship verification where we have some text and we analyze if all parts of this textwere written by the same (unknown or known) author. In this paper, there are analyzed and compared results of two developed methods for a text verification based on ngrams of symbols and on local histograms of words. The results of a symbol n-gram method and a method of word histograms for a dissimilarities searching in text parts of each text are analyzed and evaluated. The searched dissimilarities call for an attention to the text (or not) if the text parts were written by the same author or not. The attention depends on selected parameters prepared in experiments. Results illustrate usability of the methods to dissimilarities searching in text parts.
topic n-grams of symbols
histograms of words
bag of words
stylistic measure
text dissimilarity
url https://doi.org/10.1515/comp-2016-0014
work_keys_str_mv AT andrejkovagabriela dissimilaritiesdetectionsintextsusingsymbolngramsandwordhistograms
AT almarimiabdulwahed dissimilaritiesdetectionsintextsusingsymbolngramsandwordhistograms
_version_ 1717778006148120576