CAST: A Cross-Article Structure Theory for Multi-Article Summarization
Over the last decade, discourse relations, also referred to as rhetorical or coherence relations, have been used to improve a range of natural language processing applications. Researchers have devised several theories, including rhetorical structure theory and cross-document structure theory, to ex...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9099835/ |
id |
doaj-d50497faa3954ae7b4c28259f744ee44 |
---|---|
record_format |
Article |
spelling |
doaj-d50497faa3954ae7b4c28259f744ee442021-03-30T02:33:50ZengIEEEIEEE Access2169-35362020-01-01810019410021110.1109/ACCESS.2020.29978819099835CAST: A Cross-Article Structure Theory for Multi-Article SummarizationNouf Ibrahim Altmami0https://orcid.org/0000-0002-8291-6121Mohamed El Bachir Menai1Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi ArabiaDepartment of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi ArabiaOver the last decade, discourse relations, also referred to as rhetorical or coherence relations, have been used to improve a range of natural language processing applications. Researchers have devised several theories, including rhetorical structure theory and cross-document structure theory, to examine relations between generic text units in single and multiple documents, respectively. In this paper, we propose a cross-article structure theory (CAST), that extends the benefit of discourse relations to multi-scientific article applications. It is based on the rhetorical structure theory (RST) and the cross-document structure theory (CST). The insight that underpins CAST is to consider both intra-section and cross-section relations. At the outset, these relations are classified based on the structural features of the article (that is, their appearance within each section type) and then the relations between text portions across multiple articles are classified. The practicality of the theory is showcased by solving a problem that consists to identify the types of relations which exist between each pair of sentences in related sections of different articles. A CAST bank was created and the k-nearest neighbors algorithm was used to develop two classifiers based on CAST and CST, respectively. The performance results obtained markedly demonstrate the role of the specific relations to scientific articles in CAST. Other applications of CAST could address the redundancy and readability problems, which represent main issues for different tasks, such as the summarization of multiple articles.https://ieeexplore.ieee.org/document/9099835/Cross-document structure theorydiscourse relationsmulti-article summarizationrhetorical structure theory |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Nouf Ibrahim Altmami Mohamed El Bachir Menai |
spellingShingle |
Nouf Ibrahim Altmami Mohamed El Bachir Menai CAST: A Cross-Article Structure Theory for Multi-Article Summarization IEEE Access Cross-document structure theory discourse relations multi-article summarization rhetorical structure theory |
author_facet |
Nouf Ibrahim Altmami Mohamed El Bachir Menai |
author_sort |
Nouf Ibrahim Altmami |
title |
CAST: A Cross-Article Structure Theory for Multi-Article Summarization |
title_short |
CAST: A Cross-Article Structure Theory for Multi-Article Summarization |
title_full |
CAST: A Cross-Article Structure Theory for Multi-Article Summarization |
title_fullStr |
CAST: A Cross-Article Structure Theory for Multi-Article Summarization |
title_full_unstemmed |
CAST: A Cross-Article Structure Theory for Multi-Article Summarization |
title_sort |
cast: a cross-article structure theory for multi-article summarization |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2020-01-01 |
description |
Over the last decade, discourse relations, also referred to as rhetorical or coherence relations, have been used to improve a range of natural language processing applications. Researchers have devised several theories, including rhetorical structure theory and cross-document structure theory, to examine relations between generic text units in single and multiple documents, respectively. In this paper, we propose a cross-article structure theory (CAST), that extends the benefit of discourse relations to multi-scientific article applications. It is based on the rhetorical structure theory (RST) and the cross-document structure theory (CST). The insight that underpins CAST is to consider both intra-section and cross-section relations. At the outset, these relations are classified based on the structural features of the article (that is, their appearance within each section type) and then the relations between text portions across multiple articles are classified. The practicality of the theory is showcased by solving a problem that consists to identify the types of relations which exist between each pair of sentences in related sections of different articles. A CAST bank was created and the k-nearest neighbors algorithm was used to develop two classifiers based on CAST and CST, respectively. The performance results obtained markedly demonstrate the role of the specific relations to scientific articles in CAST. Other applications of CAST could address the redundancy and readability problems, which represent main issues for different tasks, such as the summarization of multiple articles. |
topic |
Cross-document structure theory discourse relations multi-article summarization rhetorical structure theory |
url |
https://ieeexplore.ieee.org/document/9099835/ |
work_keys_str_mv |
AT noufibrahimaltmami castacrossarticlestructuretheoryformultiarticlesummarization AT mohamedelbachirmenai castacrossarticlestructuretheoryformultiarticlesummarization |
_version_ |
1724185007293988864 |