Efficient Alignment between Long Utterances and Texts

碩士 === 國立清華大學 === 資訊系統與應用研究所 === 100 === This thesis describes our research on the efficient alignment of long utterances to texts. Currently, the most use toolkit for speech to text alignment is the Hidden Markov Model Toolkit (HTK), but HTK is not designed for relatively longer inputs and would ca...

Full description

Bibliographic Details
Main Authors:	Peng, Yu-Ya, 彭郁雅
Other Authors:	Jang, Jyh-Shing Roger
Format:	Others
Language:	zh-TW
Published:	2012
Online Access:	http://ndltd.ncl.edu.tw/handle/16620738245555736406

id	ndltd-TW-100NTHU5394031
record_format	oai_dc
spelling	ndltd-TW-100NTHU53940312015-10-13T21:27:24Z http://ndltd.ncl.edu.tw/handle/16620738245555736406 Efficient Alignment between Long Utterances and Texts 長音檔與文本之快速對位 Peng, Yu-Ya, 彭郁雅碩士國立清華大學資訊系統與應用研究所 100 This thesis describes our research on the efficient alignment of long utterances to texts. Currently, the most use toolkit for speech to text alignment is the Hidden Markov Model Toolkit (HTK), but HTK is not designed for relatively longer inputs and would cause stack overflows when handling such inputs. Another known system which uses large vocabulary recognition, SailAlign, can handle longer inputs, but has lower recognition rates and is relatively more time consuming. We present a method that doesn’t require large vocabulary recognition – by using dynamic programming algorithm to slice and combine each audio file and text into segments of sentences, then use force-alignment with HTK, thus reducing HTK’s overall workload. We use speech recordings from the website TED as corpus, and the recordings’ manuscript for the text. We experiment on alignment efficiency using the said corpus with HTK, SailAlign and the proposed DP method. For unverified causes, results suggest a poorer accuracy with the DP method, but processing time is decreased over 37%. Jang, Jyh-Shing Roger 張智星 2012 學位論文 ; thesis 39 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立清華大學 === 資訊系統與應用研究所 === 100 === This thesis describes our research on the efficient alignment of long utterances to texts. Currently, the most use toolkit for speech to text alignment is the Hidden Markov Model Toolkit (HTK), but HTK is not designed for relatively longer inputs and would cause stack overflows when handling such inputs. Another known system which uses large vocabulary recognition, SailAlign, can handle longer inputs, but has lower recognition rates and is relatively more time consuming. We present a method that doesn’t require large vocabulary recognition – by using dynamic programming algorithm to slice and combine each audio file and text into segments of sentences, then use force-alignment with HTK, thus reducing HTK’s overall workload. We use speech recordings from the website TED as corpus, and the recordings’ manuscript for the text. We experiment on alignment efficiency using the said corpus with HTK, SailAlign and the proposed DP method. For unverified causes, results suggest a poorer accuracy with the DP method, but processing time is decreased over 37%.
author2	Jang, Jyh-Shing Roger
author_facet	Jang, Jyh-Shing Roger Peng, Yu-Ya, 彭郁雅
author	Peng, Yu-Ya, 彭郁雅
spellingShingle	Peng, Yu-Ya, 彭郁雅 Efficient Alignment between Long Utterances and Texts
author_sort	Peng, Yu-Ya,
title	Efficient Alignment between Long Utterances and Texts
title_short	Efficient Alignment between Long Utterances and Texts
title_full	Efficient Alignment between Long Utterances and Texts
title_fullStr	Efficient Alignment between Long Utterances and Texts
title_full_unstemmed	Efficient Alignment between Long Utterances and Texts
title_sort	efficient alignment between long utterances and texts
publishDate	2012
url	http://ndltd.ncl.edu.tw/handle/16620738245555736406
work_keys_str_mv	AT pengyuya efficientalignmentbetweenlongutterancesandtexts AT péngyùyǎ efficientalignmentbetweenlongutterancesandtexts AT pengyuya zhǎngyīndàngyǔwénběnzhīkuàisùduìwèi AT péngyùyǎ zhǎngyīndàngyǔwénběnzhīkuàisùduìwèi
_version_	1718062733309509632

Efficient Alignment between Long Utterances and Texts

Similar Items