Summary: | 博士 === 國立臺灣大學 === 資訊工程學研究所 === 96 === This thesis consists of two parts. In the first part, we propose two new subword-based approaches for Spoken Document Retrieval (SDR), including Subword-based Position Specific Posterior Lattices (S-PSPL) and Subword-based Confusion Network (S-CN). These approaches are motivated by the PSPL and CN, respectively, but based on subword units instead of words.
We introduce S-PSPL first.
In the S-PSPL approach we encode the posterior probabilities and proximity information of subword units in a word lattice.
A critical issue in S-PSPL is to calculate the subword posterior probabilities (SPP) in a word lattice, which can not be carried out directly by simple dynamic programming.
We make solve the problem by a simple approximation. To verify that this subword posterior probability (SPP) approximation procedure is accurate enough, we bring Subword-based Confusion Network (S-CN) onto stage.
As the original goal of Confusion Network (CN) is to construct a decoding structure to meet the minimum word error rate criterion, S-CN can be used for minimum subword error rate.
We embed the SPP approximation in the S-CN structure and achieved significant improvement in subword error rate reduction. This implicitly verifies the feasibility of the SPP approximation. Moreover, though introduced as a decoding structure, S-CN can be used as an efficient and compact indexing structure. This is the second subword-based approach for SDR proposed in this thesis.
Extensive evaluations are then made on S-PSPL and S-CN to verify their superiorities. Further discussion and analysis are also given to compare the two very similar data structures PSPL/S-PSPL and CN/S-CN.
In the evaluation and analysis S-PSPL is proved to be very attractive and even better than S-CN since it requires less or fairly equal resources while offers better accuracies under most circumstances.
There are some possibilities to improve S-PSPL/S-CN system. In the thesis we propose an algorithm, Lexicon Adaptation with Reduced Character Errors (LARCE), to adapt the lexicon in the LVCSR system to improve the character recognition accuracy. In the evaluation, LARCE gives significant improvements in terms of character accuracy. It can be expected that with the improved subword recognition, S-PSPL/S-CN can be improved respectively.
In the second part, we present a formulation and a framework for a new type of dialogue systems, referred to as the extit{type-II dialogue systems}, which evolves from the SDR systems but with a whole new definition and formulation. extit{Type-II dialogue systems} are proposed for the difficulties which can not be solved by traditional SDR systems. The new definition and formulation emphasize the interactions between the user and the system and this carries the term extit{dialogue systems}. However, it is significantly different from the conventional spoken dialogue systems and this is why we refer to it as extit{type-II}.
The distinct feature of such dialogue systems
is their tasks of information access from unstructured knowledge sources, or the lack of a well-organized back-end database offering the information for the user.
Typical example tasks of this type of dialogue systems include information retrieval/browsing and question answering.
The functionalities of each module in such extit{type-II dialogue systems} are analyzed, presented, and compared with the respective modules in extit{type-I dialogue systems}.
A series of novel technologies helpful in constructing extit{type-II dialogue systems} are then proposed in the thesis. In addition to the new SDR technologies already presented in part one, Named Entity Recognition (NER) from text and spoken documents, topic hierarchy construction for spoken documents, and dialogue modelling for information access are discussed here.
For the NER, two novel approaches are proposed for text and spoken documents, respectively. For text documents we introduce to use global information in addition to local information (internal and external information) widely used in the NER community. For spoken documents, we propose to utilize the relevant documents retrieved from internet to augment the new NEs into the recognized lattice to compensate for the defects of the ASR system since many NEs are Out-of-Vocabulary words (OOVs).
For the topic hierarchy construction, a novel approach HAC+P proposed recently cite{ChuangTOIS05} is used. We use the NEs extracted from the spoken documents to construct the balanced tree structures by HAC+P, to be used as a convenient system output for user interaction.
For the dialogue modelling, a Markov Decision Process (MDP) based method is proposed to learn the best path to guide the user during the retrieval process. In many cases, the user''s initial query leads to too many retrieval results and the way for the system to guide the user is through the query expansion to specify user''s information need more clearly. In the proposed approach, the system learns to predict the user''s information need so as to be able to recommend the most discriminative and informative terms for query expansion with an MDP-based method.
There is still a long way to go in the research and development of SDR technologies. It is hoped that the works in this thesis will be helpful in this research topic.
|