Vocabulary Richness Metric for Extracting Author’s Semantic Mark in English Written Literary Works

The present paper starts from a short introduction of the major aspects debated regarding the stylometric measures used for extracting the personal signature added by a particular author to its English written works. Those measures are used in the context of indicating an author from a limited cardi...

Full description

Bibliographic Details
Main Authors: Madalina ZURINI, Alin ZAMFIROIU
Format: Article
Language:English
Published: Inforec Association 2016-01-01
Series:Informatică economică
Subjects:
Online Access:http://revistaie.ase.ro/content/79/04%20-%20Zurini,%20Zamfiroiu.pdf
id doaj-e13cfc8ad2814d939c7c3a0b4eb29cfd
record_format Article
spelling doaj-e13cfc8ad2814d939c7c3a0b4eb29cfd2020-11-24T23:30:23ZengInforec AssociationInformatică economică1453-13051842-80882016-01-01203374510.12948/issn14531305/20.3.2016.04Vocabulary Richness Metric for Extracting Author’s Semantic Mark in English Written Literary WorksMadalina ZURINIAlin ZAMFIROIUThe present paper starts from a short introduction of the major aspects debated regarding the stylometric measures used for extracting the personal signature added by a particular author to its English written works. Those measures are used in the context of indicating an author from a limited cardinality set of authors being given a set of documents or a defined indicators values which characterizes the semantic way that an author is writing its works. The paper addresses the problems of the semantic level of a work depending on the tokens that he uses in the paper, tokens that are extracted in a preprocessing step of analysis. The tokens are defined using a lexical ontology, for the English words referring to WordNet, and the automatic extracting of those tokens from the words found in the particular processed papers. The main vocabulary richness evaluation metrics are presented taking into account the major literature review and extracting the main steps into a new proposed metric that is combining the vocabulary richness with the semantic layer of a paper. The concept of author mark is described. The objective of this research paper is highlighted into the new proposed metric that is non-dependent on the main subject discussed in the analyzed paper. This objective leads to a general metric that combines documents from different subjects into a metric that can describe the vocabulary richness of a specific author depending on the works that he had written. Furthermore, the analysis is conducting into a time evolution of this metric, using the extraction of the trend of the author’s vocabulary richness indicator. Using a set of 13 years values of this indicator upon a specific author, the results are presented in this research paper. Future work refers to inserting this metric into a general description of the author mark into his specific English written works.http://revistaie.ase.ro/content/79/04%20-%20Zurini,%20Zamfiroiu.pdfStylometry AnalysisMetricsAuthor MarkLexical OntologyTime-Trend AnalysisIntrinsic Plagiarism Detection
collection DOAJ
language English
format Article
sources DOAJ
author Madalina ZURINI
Alin ZAMFIROIU
spellingShingle Madalina ZURINI
Alin ZAMFIROIU
Vocabulary Richness Metric for Extracting Author’s Semantic Mark in English Written Literary Works
Informatică economică
Stylometry Analysis
Metrics
Author Mark
Lexical Ontology
Time-Trend Analysis
Intrinsic Plagiarism Detection
author_facet Madalina ZURINI
Alin ZAMFIROIU
author_sort Madalina ZURINI
title Vocabulary Richness Metric for Extracting Author’s Semantic Mark in English Written Literary Works
title_short Vocabulary Richness Metric for Extracting Author’s Semantic Mark in English Written Literary Works
title_full Vocabulary Richness Metric for Extracting Author’s Semantic Mark in English Written Literary Works
title_fullStr Vocabulary Richness Metric for Extracting Author’s Semantic Mark in English Written Literary Works
title_full_unstemmed Vocabulary Richness Metric for Extracting Author’s Semantic Mark in English Written Literary Works
title_sort vocabulary richness metric for extracting author’s semantic mark in english written literary works
publisher Inforec Association
series Informatică economică
issn 1453-1305
1842-8088
publishDate 2016-01-01
description The present paper starts from a short introduction of the major aspects debated regarding the stylometric measures used for extracting the personal signature added by a particular author to its English written works. Those measures are used in the context of indicating an author from a limited cardinality set of authors being given a set of documents or a defined indicators values which characterizes the semantic way that an author is writing its works. The paper addresses the problems of the semantic level of a work depending on the tokens that he uses in the paper, tokens that are extracted in a preprocessing step of analysis. The tokens are defined using a lexical ontology, for the English words referring to WordNet, and the automatic extracting of those tokens from the words found in the particular processed papers. The main vocabulary richness evaluation metrics are presented taking into account the major literature review and extracting the main steps into a new proposed metric that is combining the vocabulary richness with the semantic layer of a paper. The concept of author mark is described. The objective of this research paper is highlighted into the new proposed metric that is non-dependent on the main subject discussed in the analyzed paper. This objective leads to a general metric that combines documents from different subjects into a metric that can describe the vocabulary richness of a specific author depending on the works that he had written. Furthermore, the analysis is conducting into a time evolution of this metric, using the extraction of the trend of the author’s vocabulary richness indicator. Using a set of 13 years values of this indicator upon a specific author, the results are presented in this research paper. Future work refers to inserting this metric into a general description of the author mark into his specific English written works.
topic Stylometry Analysis
Metrics
Author Mark
Lexical Ontology
Time-Trend Analysis
Intrinsic Plagiarism Detection
url http://revistaie.ase.ro/content/79/04%20-%20Zurini,%20Zamfiroiu.pdf
work_keys_str_mv AT madalinazurini vocabularyrichnessmetricforextractingauthorssemanticmarkinenglishwrittenliteraryworks
AT alinzamfiroiu vocabularyrichnessmetricforextractingauthorssemanticmarkinenglishwrittenliteraryworks
_version_ 1725541425496457216