CAST: A Cross-Article Structure Theory for Multi-Article Summarization

Over the last decade, discourse relations, also referred to as rhetorical or coherence relations, have been used to improve a range of natural language processing applications. Researchers have devised several theories, including rhetorical structure theory and cross-document structure theory, to ex...

Full description

Bibliographic Details
Main Authors: Nouf Ibrahim Altmami, Mohamed El Bachir Menai
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9099835/
id doaj-d50497faa3954ae7b4c28259f744ee44
record_format Article
spelling doaj-d50497faa3954ae7b4c28259f744ee442021-03-30T02:33:50ZengIEEEIEEE Access2169-35362020-01-01810019410021110.1109/ACCESS.2020.29978819099835CAST: A Cross-Article Structure Theory for Multi-Article SummarizationNouf Ibrahim Altmami0https://orcid.org/0000-0002-8291-6121Mohamed El Bachir Menai1Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi ArabiaDepartment of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi ArabiaOver the last decade, discourse relations, also referred to as rhetorical or coherence relations, have been used to improve a range of natural language processing applications. Researchers have devised several theories, including rhetorical structure theory and cross-document structure theory, to examine relations between generic text units in single and multiple documents, respectively. In this paper, we propose a cross-article structure theory (CAST), that extends the benefit of discourse relations to multi-scientific article applications. It is based on the rhetorical structure theory (RST) and the cross-document structure theory (CST). The insight that underpins CAST is to consider both intra-section and cross-section relations. At the outset, these relations are classified based on the structural features of the article (that is, their appearance within each section type) and then the relations between text portions across multiple articles are classified. The practicality of the theory is showcased by solving a problem that consists to identify the types of relations which exist between each pair of sentences in related sections of different articles. A CAST bank was created and the k-nearest neighbors algorithm was used to develop two classifiers based on CAST and CST, respectively. The performance results obtained markedly demonstrate the role of the specific relations to scientific articles in CAST. Other applications of CAST could address the redundancy and readability problems, which represent main issues for different tasks, such as the summarization of multiple articles.https://ieeexplore.ieee.org/document/9099835/Cross-document structure theorydiscourse relationsmulti-article summarizationrhetorical structure theory
collection DOAJ
language English
format Article
sources DOAJ
author Nouf Ibrahim Altmami
Mohamed El Bachir Menai
spellingShingle Nouf Ibrahim Altmami
Mohamed El Bachir Menai
CAST: A Cross-Article Structure Theory for Multi-Article Summarization
IEEE Access
Cross-document structure theory
discourse relations
multi-article summarization
rhetorical structure theory
author_facet Nouf Ibrahim Altmami
Mohamed El Bachir Menai
author_sort Nouf Ibrahim Altmami
title CAST: A Cross-Article Structure Theory for Multi-Article Summarization
title_short CAST: A Cross-Article Structure Theory for Multi-Article Summarization
title_full CAST: A Cross-Article Structure Theory for Multi-Article Summarization
title_fullStr CAST: A Cross-Article Structure Theory for Multi-Article Summarization
title_full_unstemmed CAST: A Cross-Article Structure Theory for Multi-Article Summarization
title_sort cast: a cross-article structure theory for multi-article summarization
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description Over the last decade, discourse relations, also referred to as rhetorical or coherence relations, have been used to improve a range of natural language processing applications. Researchers have devised several theories, including rhetorical structure theory and cross-document structure theory, to examine relations between generic text units in single and multiple documents, respectively. In this paper, we propose a cross-article structure theory (CAST), that extends the benefit of discourse relations to multi-scientific article applications. It is based on the rhetorical structure theory (RST) and the cross-document structure theory (CST). The insight that underpins CAST is to consider both intra-section and cross-section relations. At the outset, these relations are classified based on the structural features of the article (that is, their appearance within each section type) and then the relations between text portions across multiple articles are classified. The practicality of the theory is showcased by solving a problem that consists to identify the types of relations which exist between each pair of sentences in related sections of different articles. A CAST bank was created and the k-nearest neighbors algorithm was used to develop two classifiers based on CAST and CST, respectively. The performance results obtained markedly demonstrate the role of the specific relations to scientific articles in CAST. Other applications of CAST could address the redundancy and readability problems, which represent main issues for different tasks, such as the summarization of multiple articles.
topic Cross-document structure theory
discourse relations
multi-article summarization
rhetorical structure theory
url https://ieeexplore.ieee.org/document/9099835/
work_keys_str_mv AT noufibrahimaltmami castacrossarticlestructuretheoryformultiarticlesummarization
AT mohamedelbachirmenai castacrossarticlestructuretheoryformultiarticlesummarization
_version_ 1724185007293988864