Unsupervised Spoken Term Detection with Spoken Queries
博士 === 國立臺灣大學 === 電信工程學研究所 === 100 === Unsupervised spoken term detection (STD) with spoken queries is a new and important topic in multimedia retrieval. The unsupervised approaches without the need of annotated data bypass various problems in speech recognition particularly the recognition errors u...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2012
|
Online Access: | http://ndltd.ncl.edu.tw/handle/64207135227834246049 |
id |
ndltd-TW-100NTU05435075 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-100NTU054350752015-10-13T21:50:17Z http://ndltd.ncl.edu.tw/handle/64207135227834246049 Unsupervised Spoken Term Detection with Spoken Queries 以口語查詢之非督導式口語詞彙偵測 Chun-an Chan 詹竣安 博士 國立臺灣大學 電信工程學研究所 100 Unsupervised spoken term detection (STD) with spoken queries is a new and important topic in multimedia retrieval. The unsupervised approaches without the need of annotated data bypass various problems in speech recognition particularly the recognition errors under different acoustic and linguistic conditions. Such approaches even make searching for spoken terms possible in low-resourced languages or languages without writing system. In this dissertation, we propose several techniques to solve the problem of unsupervised STD problem with spoken queries. We propose two improved DTW-based approaches to handle the speaking rate distortion and computation efficiency issues in the conventional segmental DTW approach. The Slope-Constrained Dynamic Time Warping (SC-DTW) approach is developed to handle the speaking rate distortion problem. The segment-based DTW approach is devised to reduce the computational burden. The concatenation of these two approaches and the Weighted Pseudo Similarity of SC-DTW approach in the Pseudo Relevance Feedback (PRF) framework show significant improvement on both detection and efficiency performances. We also propose two model-based approaches for unsupervised STD. We design procedures to construct a set of Acoustic Segment Models (ASMs) that describes the patterns and structures of the target language. In this way, the signal trajectory modeling techniques can be leveraged using the ASMs. Using the ASMs, we propose the Document State Matching (DSM) approach to match spoken queries to the ASM states in the documents. The Duration-Constrained Viterbi algorithm is developed in the DSM approach. Another Pseudo Likelihood Ratio approach is proposed to verify the hypotheses in the PRF framework. Experimental results show that the model-based approaches achieve comparable detection performances in much smaller computation time. Our attempt of migrating from DTW-based approaches to model-based approaches creates the possibilities of leveraging well-developed model-based speech processing techniques in unsupervised STD. Finally, we tested various approach integration configurations in our system. With the combined model-based and DTW-based approaches, a 14.2\% of absolute Mean Average Precision improvement was achieved using only 23\% of CPU time on the Mandarin broadcast news corpus. Lin-shan Lee 李琳山 2012 學位論文 ; thesis 75 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
博士 === 國立臺灣大學 === 電信工程學研究所 === 100 === Unsupervised spoken term detection (STD) with spoken queries is a new and important topic in multimedia retrieval. The unsupervised approaches without the need of annotated data bypass various problems in speech recognition particularly the recognition errors under different acoustic and linguistic conditions. Such approaches even make searching for spoken terms possible in low-resourced languages or languages without writing system. In this dissertation, we propose several techniques to solve the problem of unsupervised STD problem with spoken queries.
We propose two improved DTW-based approaches to handle the speaking rate distortion and computation efficiency issues in the conventional segmental DTW approach. The Slope-Constrained Dynamic Time Warping (SC-DTW) approach is developed to handle the speaking rate distortion problem. The segment-based DTW approach is devised to reduce the computational burden. The concatenation of these two approaches and the Weighted Pseudo Similarity of SC-DTW approach in the Pseudo Relevance Feedback (PRF) framework show significant improvement on both detection and efficiency performances.
We also propose two model-based approaches for unsupervised STD. We design procedures to construct a set of Acoustic Segment Models (ASMs) that describes the patterns and structures of the target language. In this way, the signal trajectory modeling techniques can be leveraged using the ASMs. Using the ASMs, we propose the Document State Matching (DSM) approach to match spoken queries to the ASM states in the documents. The Duration-Constrained Viterbi algorithm is developed in the DSM approach. Another Pseudo Likelihood Ratio approach is proposed to verify the hypotheses in the PRF framework. Experimental results show that the model-based approaches achieve comparable detection performances in much smaller computation time. Our attempt of migrating from DTW-based approaches to model-based approaches creates the possibilities of leveraging well-developed model-based speech processing techniques in unsupervised STD.
Finally, we tested various approach integration configurations in our system. With the combined model-based and DTW-based approaches, a 14.2\% of absolute Mean Average Precision improvement was achieved using only 23\% of CPU time on the Mandarin broadcast news corpus.
|
author2 |
Lin-shan Lee |
author_facet |
Lin-shan Lee Chun-an Chan 詹竣安 |
author |
Chun-an Chan 詹竣安 |
spellingShingle |
Chun-an Chan 詹竣安 Unsupervised Spoken Term Detection with Spoken Queries |
author_sort |
Chun-an Chan |
title |
Unsupervised Spoken Term Detection with Spoken Queries |
title_short |
Unsupervised Spoken Term Detection with Spoken Queries |
title_full |
Unsupervised Spoken Term Detection with Spoken Queries |
title_fullStr |
Unsupervised Spoken Term Detection with Spoken Queries |
title_full_unstemmed |
Unsupervised Spoken Term Detection with Spoken Queries |
title_sort |
unsupervised spoken term detection with spoken queries |
publishDate |
2012 |
url |
http://ndltd.ncl.edu.tw/handle/64207135227834246049 |
work_keys_str_mv |
AT chunanchan unsupervisedspokentermdetectionwithspokenqueries AT zhānjùnān unsupervisedspokentermdetectionwithspokenqueries AT chunanchan yǐkǒuyǔcháxúnzhīfēidūdǎoshìkǒuyǔcíhuìzhēncè AT zhānjùnān yǐkǒuyǔcháxúnzhīfēidūdǎoshìkǒuyǔcíhuìzhēncè |
_version_ |
1718068958847827968 |