Dissimilarities Detections in Texts Using Symbol n-grams and Word Histograms
Texts (books, novels, papers, short messages) are sequences of sentences, words or symbols. Each author has an unique writing style. It can be characterized by some collection of attributes obtained from texts. The text verification is the case of an authorship verification where we have some text a...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
De Gruyter
2016-11-01
|
Series: | Open Computer Science |
Subjects: | |
Online Access: | https://doi.org/10.1515/comp-2016-0014 |
id |
doaj-eac608135b7f46b0874452318ce24f6c |
---|---|
record_format |
Article |
spelling |
doaj-eac608135b7f46b0874452318ce24f6c2021-09-06T19:19:42ZengDe GruyterOpen Computer Science2299-10932016-11-016116817710.1515/comp-2016-0014comp-2016-0014Dissimilarities Detections in Texts Using Symbol n-grams and Word HistogramsAndrejková Gabriela0Almarimi Abdulwahed1Institute of Computer Sciencel, P. J. Šafárik University in KošiceInstitute of Computer Sciencel, P. J. Šafárik University in KošiceTexts (books, novels, papers, short messages) are sequences of sentences, words or symbols. Each author has an unique writing style. It can be characterized by some collection of attributes obtained from texts. The text verification is the case of an authorship verification where we have some text and we analyze if all parts of this textwere written by the same (unknown or known) author. In this paper, there are analyzed and compared results of two developed methods for a text verification based on ngrams of symbols and on local histograms of words. The results of a symbol n-gram method and a method of word histograms for a dissimilarities searching in text parts of each text are analyzed and evaluated. The searched dissimilarities call for an attention to the text (or not) if the text parts were written by the same author or not. The attention depends on selected parameters prepared in experiments. Results illustrate usability of the methods to dissimilarities searching in text parts.https://doi.org/10.1515/comp-2016-0014n-grams of symbols histograms of words bag of words stylistic measure text dissimilarity |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Andrejková Gabriela Almarimi Abdulwahed |
spellingShingle |
Andrejková Gabriela Almarimi Abdulwahed Dissimilarities Detections in Texts Using Symbol n-grams and Word Histograms Open Computer Science n-grams of symbols histograms of words bag of words stylistic measure text dissimilarity |
author_facet |
Andrejková Gabriela Almarimi Abdulwahed |
author_sort |
Andrejková Gabriela |
title |
Dissimilarities Detections in Texts
Using Symbol n-grams and Word Histograms |
title_short |
Dissimilarities Detections in Texts
Using Symbol n-grams and Word Histograms |
title_full |
Dissimilarities Detections in Texts
Using Symbol n-grams and Word Histograms |
title_fullStr |
Dissimilarities Detections in Texts
Using Symbol n-grams and Word Histograms |
title_full_unstemmed |
Dissimilarities Detections in Texts
Using Symbol n-grams and Word Histograms |
title_sort |
dissimilarities detections in texts
using symbol n-grams and word histograms |
publisher |
De Gruyter |
series |
Open Computer Science |
issn |
2299-1093 |
publishDate |
2016-11-01 |
description |
Texts (books, novels, papers, short messages)
are sequences of sentences, words or symbols. Each author
has an unique writing style. It can be characterized
by some collection of attributes obtained from texts. The
text verification is the case of an authorship verification
where we have some text and we analyze if all parts of this
textwere written by the same (unknown or known) author.
In this paper, there are analyzed and compared results of
two developed methods for a text verification based on ngrams
of symbols and on local histograms of words. The
results of a symbol n-gram method and a method of word
histograms for a dissimilarities searching in text parts of
each text are analyzed and evaluated. The searched dissimilarities
call for an attention to the text (or not) if the
text parts were written by the same author or not. The attention
depends on selected parameters prepared in experiments.
Results illustrate usability of the methods to
dissimilarities searching in text parts. |
topic |
n-grams of symbols histograms of words bag of words stylistic measure text dissimilarity |
url |
https://doi.org/10.1515/comp-2016-0014 |
work_keys_str_mv |
AT andrejkovagabriela dissimilaritiesdetectionsintextsusingsymbolngramsandwordhistograms AT almarimiabdulwahed dissimilaritiesdetectionsintextsusingsymbolngramsandwordhistograms |
_version_ |
1717778006148120576 |