Mix Multiple Features to Evaluate the Content and the Linguistic Quality of Text Summaries

In this article, we propose a method of text summary's content and linguistic quality evaluation that is based on a machine learning approach. This method operates by combining multiple features to build predictive models that evaluate the content and the linguistic quality of new summaries (un...

Full description

Bibliographic Details
Main Authors: Samira Ellouze, Maher Jaoua, Lamia Hadrich Belguith
Format: Article
Language:English
Published: University of Zagreb Faculty of Electrical Engineering and Computing 2017-01-01
Series:Journal of Computing and Information Technology
Online Access:http://hrcak.srce.hr/file/270305
id doaj-b592f2cf7731468ab64e672d0e0425f4
record_format Article
spelling doaj-b592f2cf7731468ab64e672d0e0425f42020-11-24T23:11:58ZengUniversity of Zagreb Faculty of Electrical Engineering and ComputingJournal of Computing and Information Technology1330-11361846-39082017-01-0125214916610.20532/cit.2017.1003398183330Mix Multiple Features to Evaluate the Content and the Linguistic Quality of Text SummariesSamira Ellouze0Maher Jaoua1Lamia Hadrich Belguith2University of Sfax, Faculty of Economics and Management of Sfax, Sfax, TunisiaUniversity of Sfax, Faculty of Economics and Management of Sfax, Sfax, TunisiaUniversity of Sfax, Faculty of Economics and Management of Sfax, Sfax, TunisiaIn this article, we propose a method of text summary's content and linguistic quality evaluation that is based on a machine learning approach. This method operates by combining multiple features to build predictive models that evaluate the content and the linguistic quality of new summaries (unseen) constructed from the same source documents as the summaries used in the training and the validation of models. To obtain the best model, many single and ensemble learning classifiers are tested. Using the constructed models, we have achieved a good performance in predicting the content and the linguistic quality scores. In order to evaluate the summarization systems, we calculated the system score as the average of the score of summaries that are built from the same system. Then, we evaluated the correlation of the system score with the manual system score. The obtained correlation indicates that the system score outperforms the baseline scores.http://hrcak.srce.hr/file/270305
collection DOAJ
language English
format Article
sources DOAJ
author Samira Ellouze
Maher Jaoua
Lamia Hadrich Belguith
spellingShingle Samira Ellouze
Maher Jaoua
Lamia Hadrich Belguith
Mix Multiple Features to Evaluate the Content and the Linguistic Quality of Text Summaries
Journal of Computing and Information Technology
author_facet Samira Ellouze
Maher Jaoua
Lamia Hadrich Belguith
author_sort Samira Ellouze
title Mix Multiple Features to Evaluate the Content and the Linguistic Quality of Text Summaries
title_short Mix Multiple Features to Evaluate the Content and the Linguistic Quality of Text Summaries
title_full Mix Multiple Features to Evaluate the Content and the Linguistic Quality of Text Summaries
title_fullStr Mix Multiple Features to Evaluate the Content and the Linguistic Quality of Text Summaries
title_full_unstemmed Mix Multiple Features to Evaluate the Content and the Linguistic Quality of Text Summaries
title_sort mix multiple features to evaluate the content and the linguistic quality of text summaries
publisher University of Zagreb Faculty of Electrical Engineering and Computing
series Journal of Computing and Information Technology
issn 1330-1136
1846-3908
publishDate 2017-01-01
description In this article, we propose a method of text summary's content and linguistic quality evaluation that is based on a machine learning approach. This method operates by combining multiple features to build predictive models that evaluate the content and the linguistic quality of new summaries (unseen) constructed from the same source documents as the summaries used in the training and the validation of models. To obtain the best model, many single and ensemble learning classifiers are tested. Using the constructed models, we have achieved a good performance in predicting the content and the linguistic quality scores. In order to evaluate the summarization systems, we calculated the system score as the average of the score of summaries that are built from the same system. Then, we evaluated the correlation of the system score with the manual system score. The obtained correlation indicates that the system score outperforms the baseline scores.
url http://hrcak.srce.hr/file/270305
work_keys_str_mv AT samiraellouze mixmultiplefeaturestoevaluatethecontentandthelinguisticqualityoftextsummaries
AT maherjaoua mixmultiplefeaturestoevaluatethecontentandthelinguisticqualityoftextsummaries
AT lamiahadrichbelguith mixmultiplefeaturestoevaluatethecontentandthelinguisticqualityoftextsummaries
_version_ 1725603063779033088