Unsupervised Spoken Term Detection with Spoken Queries

博士 === 國立臺灣大學 === 電信工程學研究所 === 100 === Unsupervised spoken term detection (STD) with spoken queries is a new and important topic in multimedia retrieval. The unsupervised approaches without the need of annotated data bypass various problems in speech recognition particularly the recognition errors u...

Full description

Bibliographic Details
Main Authors: Chun-an Chan, 詹竣安
Other Authors: Lin-shan Lee
Format: Others
Language:en_US
Published: 2012
Online Access:http://ndltd.ncl.edu.tw/handle/64207135227834246049
id ndltd-TW-100NTU05435075
record_format oai_dc
spelling ndltd-TW-100NTU054350752015-10-13T21:50:17Z http://ndltd.ncl.edu.tw/handle/64207135227834246049 Unsupervised Spoken Term Detection with Spoken Queries 以口語查詢之非督導式口語詞彙偵測 Chun-an Chan 詹竣安 博士 國立臺灣大學 電信工程學研究所 100 Unsupervised spoken term detection (STD) with spoken queries is a new and important topic in multimedia retrieval. The unsupervised approaches without the need of annotated data bypass various problems in speech recognition particularly the recognition errors under different acoustic and linguistic conditions. Such approaches even make searching for spoken terms possible in low-resourced languages or languages without writing system. In this dissertation, we propose several techniques to solve the problem of unsupervised STD problem with spoken queries. We propose two improved DTW-based approaches to handle the speaking rate distortion and computation efficiency issues in the conventional segmental DTW approach. The Slope-Constrained Dynamic Time Warping (SC-DTW) approach is developed to handle the speaking rate distortion problem. The segment-based DTW approach is devised to reduce the computational burden. The concatenation of these two approaches and the Weighted Pseudo Similarity of SC-DTW approach in the Pseudo Relevance Feedback (PRF) framework show significant improvement on both detection and efficiency performances. We also propose two model-based approaches for unsupervised STD. We design procedures to construct a set of Acoustic Segment Models (ASMs) that describes the patterns and structures of the target language. In this way, the signal trajectory modeling techniques can be leveraged using the ASMs. Using the ASMs, we propose the Document State Matching (DSM) approach to match spoken queries to the ASM states in the documents. The Duration-Constrained Viterbi algorithm is developed in the DSM approach. Another Pseudo Likelihood Ratio approach is proposed to verify the hypotheses in the PRF framework. Experimental results show that the model-based approaches achieve comparable detection performances in much smaller computation time. Our attempt of migrating from DTW-based approaches to model-based approaches creates the possibilities of leveraging well-developed model-based speech processing techniques in unsupervised STD. Finally, we tested various approach integration configurations in our system. With the combined model-based and DTW-based approaches, a 14.2\% of absolute Mean Average Precision improvement was achieved using only 23\% of CPU time on the Mandarin broadcast news corpus. Lin-shan Lee 李琳山 2012 學位論文 ; thesis 75 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 博士 === 國立臺灣大學 === 電信工程學研究所 === 100 === Unsupervised spoken term detection (STD) with spoken queries is a new and important topic in multimedia retrieval. The unsupervised approaches without the need of annotated data bypass various problems in speech recognition particularly the recognition errors under different acoustic and linguistic conditions. Such approaches even make searching for spoken terms possible in low-resourced languages or languages without writing system. In this dissertation, we propose several techniques to solve the problem of unsupervised STD problem with spoken queries. We propose two improved DTW-based approaches to handle the speaking rate distortion and computation efficiency issues in the conventional segmental DTW approach. The Slope-Constrained Dynamic Time Warping (SC-DTW) approach is developed to handle the speaking rate distortion problem. The segment-based DTW approach is devised to reduce the computational burden. The concatenation of these two approaches and the Weighted Pseudo Similarity of SC-DTW approach in the Pseudo Relevance Feedback (PRF) framework show significant improvement on both detection and efficiency performances. We also propose two model-based approaches for unsupervised STD. We design procedures to construct a set of Acoustic Segment Models (ASMs) that describes the patterns and structures of the target language. In this way, the signal trajectory modeling techniques can be leveraged using the ASMs. Using the ASMs, we propose the Document State Matching (DSM) approach to match spoken queries to the ASM states in the documents. The Duration-Constrained Viterbi algorithm is developed in the DSM approach. Another Pseudo Likelihood Ratio approach is proposed to verify the hypotheses in the PRF framework. Experimental results show that the model-based approaches achieve comparable detection performances in much smaller computation time. Our attempt of migrating from DTW-based approaches to model-based approaches creates the possibilities of leveraging well-developed model-based speech processing techniques in unsupervised STD. Finally, we tested various approach integration configurations in our system. With the combined model-based and DTW-based approaches, a 14.2\% of absolute Mean Average Precision improvement was achieved using only 23\% of CPU time on the Mandarin broadcast news corpus.
author2 Lin-shan Lee
author_facet Lin-shan Lee
Chun-an Chan
詹竣安
author Chun-an Chan
詹竣安
spellingShingle Chun-an Chan
詹竣安
Unsupervised Spoken Term Detection with Spoken Queries
author_sort Chun-an Chan
title Unsupervised Spoken Term Detection with Spoken Queries
title_short Unsupervised Spoken Term Detection with Spoken Queries
title_full Unsupervised Spoken Term Detection with Spoken Queries
title_fullStr Unsupervised Spoken Term Detection with Spoken Queries
title_full_unstemmed Unsupervised Spoken Term Detection with Spoken Queries
title_sort unsupervised spoken term detection with spoken queries
publishDate 2012
url http://ndltd.ncl.edu.tw/handle/64207135227834246049
work_keys_str_mv AT chunanchan unsupervisedspokentermdetectionwithspokenqueries
AT zhānjùnān unsupervisedspokentermdetectionwithspokenqueries
AT chunanchan yǐkǒuyǔcháxúnzhīfēidūdǎoshìkǒuyǔcíhuìzhēncè
AT zhānjùnān yǐkǒuyǔcháxúnzhīfēidūdǎoshìkǒuyǔcíhuìzhēncè
_version_ 1718068958847827968