Dissimilarities Detections in Texts Using Symbol n-grams and Word Histograms

Texts (books, novels, papers, short messages) are sequences of sentences, words or symbols. Each author has an unique writing style. It can be characterized by some collection of attributes obtained from texts. The text verification is the case of an authorship verification where we have some text a...

Full description

Bibliographic Details
Main Authors: Andrejková Gabriela, Almarimi Abdulwahed
Format: Article
Language:English
Published: De Gruyter 2016-11-01
Series:Open Computer Science
Subjects:
Online Access:https://doi.org/10.1515/comp-2016-0014
Description
Summary:Texts (books, novels, papers, short messages) are sequences of sentences, words or symbols. Each author has an unique writing style. It can be characterized by some collection of attributes obtained from texts. The text verification is the case of an authorship verification where we have some text and we analyze if all parts of this textwere written by the same (unknown or known) author. In this paper, there are analyzed and compared results of two developed methods for a text verification based on ngrams of symbols and on local histograms of words. The results of a symbol n-gram method and a method of word histograms for a dissimilarities searching in text parts of each text are analyzed and evaluated. The searched dissimilarities call for an attention to the text (or not) if the text parts were written by the same author or not. The attention depends on selected parameters prepared in experiments. Results illustrate usability of the methods to dissimilarities searching in text parts.
ISSN:2299-1093