Summary: | 博士 === 國立臺灣師範大學 === 資訊工程研究所 === 99 === Speech summarization is inevitably faced with the problem of incorrect information caused by recognition errors. However, it also presents opportunities that do not exist for text summarization; for example, information cues from prosodic analysis including speaker emotions can help the determination of importance and structure of spoken documents. In this dissertation, we discuss the problem of speech summarization from three aspects: features, models and applications. For the feature aspect, we investigate various ways to robustly represent the recognition hypotheses of spoken documents beyond the top scoring ones to alleviate negative eects caused by speech recognition errors. For the model aspect, an unsupervised Kullback-Leibler (KL) divergence based summarization method which
has the capability to accommodate more information cues to alleviate the problem caused by speech recognition errors is presented. We also investigate three disparate training criteria to train a supervised summarizer in a preference-sensitive manner, to overcome the problem of imbalanced data existing in speech summarization. Building on these methods, we propose a risk-aware summarization framework that naturally combines supervised and unsupervised summarization models to inherit their individual merits as well as to overcome their inherent limitations. Various loss functions and modeling paradigms are introduced, providing a principled way to render the redundancy and coherence relationships among sentences and between sentences and the whole document, respectively. For the application aspect,
we demonstrate the possibility of integrating summarization techniques into information retrieval tasks. Experimental results on the broadcast news summarization task suggest that our proposed methods can give substantial improvements over conventional summarization methods.
|