Efficient Alignment between Long Utterances and Texts

碩士 === 國立清華大學 === 資訊系統與應用研究所 === 100 === This thesis describes our research on the efficient alignment of long utterances to texts. Currently, the most use toolkit for speech to text alignment is the Hidden Markov Model Toolkit (HTK), but HTK is not designed for relatively longer inputs and would ca...

Full description

Bibliographic Details
Main Authors: Peng, Yu-Ya, 彭郁雅
Other Authors: Jang, Jyh-Shing Roger
Format: Others
Language:zh-TW
Published: 2012
Online Access:http://ndltd.ncl.edu.tw/handle/16620738245555736406
id ndltd-TW-100NTHU5394031
record_format oai_dc
spelling ndltd-TW-100NTHU53940312015-10-13T21:27:24Z http://ndltd.ncl.edu.tw/handle/16620738245555736406 Efficient Alignment between Long Utterances and Texts 長音檔與文本之快速對位 Peng, Yu-Ya, 彭郁雅 碩士 國立清華大學 資訊系統與應用研究所 100 This thesis describes our research on the efficient alignment of long utterances to texts. Currently, the most use toolkit for speech to text alignment is the Hidden Markov Model Toolkit (HTK), but HTK is not designed for relatively longer inputs and would cause stack overflows when handling such inputs. Another known system which uses large vocabulary recognition, SailAlign, can handle longer inputs, but has lower recognition rates and is relatively more time consuming. We present a method that doesn’t require large vocabulary recognition – by using dynamic programming algorithm to slice and combine each audio file and text into segments of sentences, then use force-alignment with HTK, thus reducing HTK’s overall workload. We use speech recordings from the website TED as corpus, and the recordings’ manuscript for the text. We experiment on alignment efficiency using the said corpus with HTK, SailAlign and the proposed DP method. For unverified causes, results suggest a poorer accuracy with the DP method, but processing time is decreased over 37%. Jang, Jyh-Shing Roger 張智星 2012 學位論文 ; thesis 39 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立清華大學 === 資訊系統與應用研究所 === 100 === This thesis describes our research on the efficient alignment of long utterances to texts. Currently, the most use toolkit for speech to text alignment is the Hidden Markov Model Toolkit (HTK), but HTK is not designed for relatively longer inputs and would cause stack overflows when handling such inputs. Another known system which uses large vocabulary recognition, SailAlign, can handle longer inputs, but has lower recognition rates and is relatively more time consuming. We present a method that doesn’t require large vocabulary recognition – by using dynamic programming algorithm to slice and combine each audio file and text into segments of sentences, then use force-alignment with HTK, thus reducing HTK’s overall workload. We use speech recordings from the website TED as corpus, and the recordings’ manuscript for the text. We experiment on alignment efficiency using the said corpus with HTK, SailAlign and the proposed DP method. For unverified causes, results suggest a poorer accuracy with the DP method, but processing time is decreased over 37%.
author2 Jang, Jyh-Shing Roger
author_facet Jang, Jyh-Shing Roger
Peng, Yu-Ya,
彭郁雅
author Peng, Yu-Ya,
彭郁雅
spellingShingle Peng, Yu-Ya,
彭郁雅
Efficient Alignment between Long Utterances and Texts
author_sort Peng, Yu-Ya,
title Efficient Alignment between Long Utterances and Texts
title_short Efficient Alignment between Long Utterances and Texts
title_full Efficient Alignment between Long Utterances and Texts
title_fullStr Efficient Alignment between Long Utterances and Texts
title_full_unstemmed Efficient Alignment between Long Utterances and Texts
title_sort efficient alignment between long utterances and texts
publishDate 2012
url http://ndltd.ncl.edu.tw/handle/16620738245555736406
work_keys_str_mv AT pengyuya efficientalignmentbetweenlongutterancesandtexts
AT péngyùyǎ efficientalignmentbetweenlongutterancesandtexts
AT pengyuya zhǎngyīndàngyǔwénběnzhīkuàisùduìwèi
AT péngyùyǎ zhǎngyīndàngyǔwénběnzhīkuàisùduìwèi
_version_ 1718062733309509632