Automatic discourse structure generation using rhetorical structure theory

This thesis addresses a difficult problem in text processing: creating a System to automatically derive rhetorical structures of text. Although the rhetorical structure has proven to be useful in many fields of text processing such as text summarisation and information extraction, Systems that autom...

Full description

Bibliographic Details
Main Author: LeThanh, Huong
Published: Middlesex University 2004
Subjects:
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.548899
id ndltd-bl.uk-oai-ethos.bl.uk-548899
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-5488992015-03-20T04:52:09ZAutomatic discourse structure generation using rhetorical structure theoryLeThanh, Huong2004This thesis addresses a difficult problem in text processing: creating a System to automatically derive rhetorical structures of text. Although the rhetorical structure has proven to be useful in many fields of text processing such as text summarisation and information extraction, Systems that automatically generate rhetorical structures with high accuracy are difficult to find. This is beccause discourse is one of the biggest and yet least well defined areas in linguistics. An agreement amongst researchcrs on the best method for nnalysing thc rhetorical structure of text has not been found. This thesis focuses on investigating a method to generate the rhetorical structures of text. By exploiting different cohesive devices, it proposes a method to recognise rhetorical relations between spans by checking for the appearance of these devices. These factors include cue phrases, noun-phrase cues, verb-phrase cues, reference words, time references, substitution words, ellipses, and syntactic information. The discourse analyser is divided into two levels: sentence-level and text-level. The former uses syntactic information and cue phrases to segment sentences into elementary discourse units and to generate a rhetorical structure for each sentence. The latter derives rhetorical relations between large spans and then replaces each sentence by its corresponding rhetorical structure to produce the rhetorical structure of text. The rhetorical structure at the text-level is derived by selecting rhetorical relations to connect adjacent and non-overlapping spans to form a discourse structure that covers the entire text. Constraints of textual organisation and textual adjacency are effectively used in a beam search to reduce the search space in generating such rhetorical structures. Experiments carried out in this research received 89.4% F-score for the discourse segmentation, 52.4% F-score for the sentence-level discourse analyser and 38.1% F-score for the final output of the System. It shows that this approach provides good performance cumparison with current research in discourse.005.52Middlesex Universityhttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.548899http://eprints.mdx.ac.uk/8002/Electronic Thesis or Dissertation
collection NDLTD
sources NDLTD
topic 005.52
spellingShingle 005.52
LeThanh, Huong
Automatic discourse structure generation using rhetorical structure theory
description This thesis addresses a difficult problem in text processing: creating a System to automatically derive rhetorical structures of text. Although the rhetorical structure has proven to be useful in many fields of text processing such as text summarisation and information extraction, Systems that automatically generate rhetorical structures with high accuracy are difficult to find. This is beccause discourse is one of the biggest and yet least well defined areas in linguistics. An agreement amongst researchcrs on the best method for nnalysing thc rhetorical structure of text has not been found. This thesis focuses on investigating a method to generate the rhetorical structures of text. By exploiting different cohesive devices, it proposes a method to recognise rhetorical relations between spans by checking for the appearance of these devices. These factors include cue phrases, noun-phrase cues, verb-phrase cues, reference words, time references, substitution words, ellipses, and syntactic information. The discourse analyser is divided into two levels: sentence-level and text-level. The former uses syntactic information and cue phrases to segment sentences into elementary discourse units and to generate a rhetorical structure for each sentence. The latter derives rhetorical relations between large spans and then replaces each sentence by its corresponding rhetorical structure to produce the rhetorical structure of text. The rhetorical structure at the text-level is derived by selecting rhetorical relations to connect adjacent and non-overlapping spans to form a discourse structure that covers the entire text. Constraints of textual organisation and textual adjacency are effectively used in a beam search to reduce the search space in generating such rhetorical structures. Experiments carried out in this research received 89.4% F-score for the discourse segmentation, 52.4% F-score for the sentence-level discourse analyser and 38.1% F-score for the final output of the System. It shows that this approach provides good performance cumparison with current research in discourse.
author LeThanh, Huong
author_facet LeThanh, Huong
author_sort LeThanh, Huong
title Automatic discourse structure generation using rhetorical structure theory
title_short Automatic discourse structure generation using rhetorical structure theory
title_full Automatic discourse structure generation using rhetorical structure theory
title_fullStr Automatic discourse structure generation using rhetorical structure theory
title_full_unstemmed Automatic discourse structure generation using rhetorical structure theory
title_sort automatic discourse structure generation using rhetorical structure theory
publisher Middlesex University
publishDate 2004
url http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.548899
work_keys_str_mv AT lethanhhuong automaticdiscoursestructuregenerationusingrhetoricalstructuretheory
_version_ 1716787342689173504