Unsupervised Discovery of Structured Acoustic Tokens and Speech Features with Applications to Spoken Term Detection

博士 === 國立臺灣大學 === 電機工程學研究所 === 105 === In the era of big data, huge quantities of raw speech data is easy to obtain, but annotated speech data remain hard to acquire. This leads to the increased importance of unsupervised learning scenarios where annotated data is not required, a typical application...

Full description

Bibliographic Details
Main Authors:	Cheng-Tao Chung, 鍾承道
Other Authors:	Lin-Shan Lee
Format:	Others
Language:	en_US
Published:	2017
Online Access:	http://ndltd.ncl.edu.tw/handle/p9p96r

id	ndltd-TW-105NTU05442082
record_format	oai_dc
spelling	ndltd-TW-105NTU054420822019-05-15T23:39:40Z http://ndltd.ncl.edu.tw/handle/p9p96r Unsupervised Discovery of Structured Acoustic Tokens and Speech Features with Applications to Spoken Term Detection 無監督式結構化語音模型和語音特徵及其在語音檢索的運用 Cheng-Tao Chung 鍾承道博士國立臺灣大學電機工程學研究所 105 In the era of big data, huge quantities of raw speech data is easy to obtain, but annotated speech data remain hard to acquire. This leads to the increased importance of unsupervised learning scenarios where annotated data is not required, a typical application for which is the Query-by-Example Spoken Term Detection (QbE-STD). With the dominant paradigm of automatic speech recognition (ASR) technologies being supervised learning, such a scenario is still a relatively less explored area. In this thesis, we present the Hierarchical Paradigm and the Multi-granularity Paradigm for unsupervised discovery of structured acoustic tokens directly from speech corpora. The Hierarchical Paradigm attempts to jointly learn two level of representations that are correlated to phonemes and words. The Multi-granularity Paradigm makes no assumptions on which set of tokens to select, and seeks to capture all available information with multiple sets of tokens with different model granularities. Furthermore, unsupervised speech features can be extracted using the Multi-granular acoustic tokens with a framework which we call the Multi-granular Acoustic Tokenizing Deep Neural Network (MAT-DNN). We unified the two paradigms in a single theoretical framework and performed query-by-example spoken term detection experiments on the token sets and frame-level features. The theories and principles on acoustic tokens and frame-level features proposed in this thesis are supported by competitive results against strong baselines on standard corpora using well-defined metrics. Lin-Shan Lee 李琳山 2017 學位論文 ; thesis 97 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	博士 === 國立臺灣大學 === 電機工程學研究所 === 105 === In the era of big data, huge quantities of raw speech data is easy to obtain, but annotated speech data remain hard to acquire. This leads to the increased importance of unsupervised learning scenarios where annotated data is not required, a typical application for which is the Query-by-Example Spoken Term Detection (QbE-STD). With the dominant paradigm of automatic speech recognition (ASR) technologies being supervised learning, such a scenario is still a relatively less explored area. In this thesis, we present the Hierarchical Paradigm and the Multi-granularity Paradigm for unsupervised discovery of structured acoustic tokens directly from speech corpora. The Hierarchical Paradigm attempts to jointly learn two level of representations that are correlated to phonemes and words. The Multi-granularity Paradigm makes no assumptions on which set of tokens to select, and seeks to capture all available information with multiple sets of tokens with different model granularities. Furthermore, unsupervised speech features can be extracted using the Multi-granular acoustic tokens with a framework which we call the Multi-granular Acoustic Tokenizing Deep Neural Network (MAT-DNN). We unified the two paradigms in a single theoretical framework and performed query-by-example spoken term detection experiments on the token sets and frame-level features. The theories and principles on acoustic tokens and frame-level features proposed in this thesis are supported by competitive results against strong baselines on standard corpora using well-defined metrics.
author2	Lin-Shan Lee
author_facet	Lin-Shan Lee Cheng-Tao Chung 鍾承道
author	Cheng-Tao Chung 鍾承道
spellingShingle	Cheng-Tao Chung 鍾承道 Unsupervised Discovery of Structured Acoustic Tokens and Speech Features with Applications to Spoken Term Detection
author_sort	Cheng-Tao Chung
title	Unsupervised Discovery of Structured Acoustic Tokens and Speech Features with Applications to Spoken Term Detection
title_short	Unsupervised Discovery of Structured Acoustic Tokens and Speech Features with Applications to Spoken Term Detection
title_full	Unsupervised Discovery of Structured Acoustic Tokens and Speech Features with Applications to Spoken Term Detection
title_fullStr	Unsupervised Discovery of Structured Acoustic Tokens and Speech Features with Applications to Spoken Term Detection
title_full_unstemmed	Unsupervised Discovery of Structured Acoustic Tokens and Speech Features with Applications to Spoken Term Detection
title_sort	unsupervised discovery of structured acoustic tokens and speech features with applications to spoken term detection
publishDate	2017
url	http://ndltd.ncl.edu.tw/handle/p9p96r
work_keys_str_mv	AT chengtaochung unsuperviseddiscoveryofstructuredacoustictokensandspeechfeatureswithapplicationstospokentermdetection AT zhōngchéngdào unsuperviseddiscoveryofstructuredacoustictokensandspeechfeatureswithapplicationstospokentermdetection AT chengtaochung wújiāndūshìjiégòuhuàyǔyīnmóxínghéyǔyīntèzhēngjíqízàiyǔyīnjiǎnsuǒdeyùnyòng AT zhōngchéngdào wújiāndūshìjiégòuhuàyǔyīnmóxínghéyǔyīntèzhēngjíqízàiyǔyīnjiǎnsuǒdeyùnyòng
_version_	1719151893335244800

Unsupervised Discovery of Structured Acoustic Tokens and Speech Features with Applications to Spoken Term Detection

Similar Items