Extractive Multi-Document Arabic Text Summarization Using Evolutionary Multi-Objective Optimization With K-Medoid Clustering

The increasing usage of the Internet and social networks has produced a significant amount of online textual data. These online textual data led to information overload and redundancy. It is important to eliminate the information redundancy and preserve the time required for reading these online tex...

Full description

Bibliographic Details
Main Authors: Rana Alqaisi, Wasel Ghanem, Aziz Qaroush
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9303358/
Description
Summary:The increasing usage of the Internet and social networks has produced a significant amount of online textual data. These online textual data led to information overload and redundancy. It is important to eliminate the information redundancy and preserve the time required for reading these online textual data. Thus, there is a persistent need for an automatic text summarization system, which extract the relevant and salient information from a collection of documents, that sharing the same or related topics. Then, presenting this extracted information in a condensed form to preserve the main topics. This paper proposes an automatic, generic, and extractive Arabic multi-document summarization system. The proposed system employs the clustering-based and evolutionary multi-objective optimization methods. The clustering-based method discovers the main topics in the text, while the evolutionary multi-objective optimization method optimizes three objectives based on coverage, diversity/redundancy, and relevancy. The performance of the proposed system is evaluated using TAC 2011 and DUC 2002 datasets. The experimental results are compared using ROUGE evaluation measure. The obtained results showed the effectiveness of the proposed system compared to other peer systems. The proposed system outperformed other peer systems for all ROUGE metrics using TAC 2011. We achieved an F-measure of 38.9%, 17.7%, 35.4%, and 15.8% for Rouge-1, Rouge-2, Rouge-L, and Rouge-SU4, respectively. In addition, the proposed system with DUC 2002 dataset achieved an F-measure of 47.1%, 23.7%, 47.1%, 20.4% for Rouge-1, Rouge-2, Rouge-L, and Rouge-SU4, respectively.
ISSN:2169-3536