Broadcast news processing: Structural Classification, Summarisation and Evaluation

This thesis describes the automation and evaluation of structural classification and summarisation of audio documents, specifically broadcast news programmes. News broadcasts are typically 30-minute episodes consisting of several stories describing various events, incidents and current affairs. Some...

Full description

Bibliographic Details
Main Author: Kolluru, BalaKrishna
Published: University of Sheffield 2006
Subjects:
020
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.485892
Description
Summary:This thesis describes the automation and evaluation of structural classification and summarisation of audio documents, specifically broadcast news programmes. News broadcasts are typically 30-minute episodes consisting of several stories describing various events, incidents and current affairs. Some of these news stories are annotated to train the statistical models. Structural classification techniques use speaker-role (eg. anchor, reporter etc) information to categorise these stories into different broad classes such as reader and interview. A few carefully drafted set of rules assign a specific speaker-role to each utterance, which are subsequently used to classify the news stories. It is argued in this thesis that selecting the most relevant subsentence linguistic components is ari efficient information gathering mechanism for summarisation. Short to intermediate sized (15 to 50 word) summaries are automatically generated by employing an iterative decremental refining process that first decomposes a story into sentences and then further divides them into chunks or phrases. The most relevant parts are retained at each iteration until the desired number of words is reached. These chunks are then joined using a set of junction words which are decided by a combination of language model and probabilistic parser scores to generate a fluent summary. The performance of this approach is measured using a novel bipartite evaluation mechanism. It is shown that the summaries need to be measured for informativeness and therefore an approach based on a comprehension test is employed to calculate such scores. The evaluation mechanism uses afiuency scale which is based on comprehensibility and coherence to quantify the fluency of summaries. In experiments, human-authored summaries were analysed to quantify the subjectivity using the comprehension test. Experimental results indicate that the iterative refining approach is a lot more informative than a baseline constructed from first sentence or the 50 words of a news story. The results indicate that the use ofjunction words improved fluency in the summaries.