Efficient Alignment between Long Utterances and Texts
碩士 === 國立清華大學 === 資訊系統與應用研究所 === 100 === This thesis describes our research on the efficient alignment of long utterances to texts. Currently, the most use toolkit for speech to text alignment is the Hidden Markov Model Toolkit (HTK), but HTK is not designed for relatively longer inputs and would ca...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2012
|
Online Access: | http://ndltd.ncl.edu.tw/handle/16620738245555736406 |
id |
ndltd-TW-100NTHU5394031 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-100NTHU53940312015-10-13T21:27:24Z http://ndltd.ncl.edu.tw/handle/16620738245555736406 Efficient Alignment between Long Utterances and Texts 長音檔與文本之快速對位 Peng, Yu-Ya, 彭郁雅 碩士 國立清華大學 資訊系統與應用研究所 100 This thesis describes our research on the efficient alignment of long utterances to texts. Currently, the most use toolkit for speech to text alignment is the Hidden Markov Model Toolkit (HTK), but HTK is not designed for relatively longer inputs and would cause stack overflows when handling such inputs. Another known system which uses large vocabulary recognition, SailAlign, can handle longer inputs, but has lower recognition rates and is relatively more time consuming. We present a method that doesn’t require large vocabulary recognition – by using dynamic programming algorithm to slice and combine each audio file and text into segments of sentences, then use force-alignment with HTK, thus reducing HTK’s overall workload. We use speech recordings from the website TED as corpus, and the recordings’ manuscript for the text. We experiment on alignment efficiency using the said corpus with HTK, SailAlign and the proposed DP method. For unverified causes, results suggest a poorer accuracy with the DP method, but processing time is decreased over 37%. Jang, Jyh-Shing Roger 張智星 2012 學位論文 ; thesis 39 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立清華大學 === 資訊系統與應用研究所 === 100 === This thesis describes our research on the efficient alignment of long utterances to texts. Currently, the most use toolkit for speech to text alignment is the Hidden Markov Model Toolkit (HTK), but HTK is not designed for relatively longer inputs and would cause stack overflows when handling such inputs. Another known system which uses large vocabulary recognition, SailAlign, can handle longer inputs, but has lower recognition rates and is relatively more time consuming. We present a method that doesn’t require large vocabulary recognition – by using dynamic programming algorithm to slice and combine each audio file and text into segments of sentences, then use force-alignment with HTK, thus reducing HTK’s overall workload.
We use speech recordings from the website TED as corpus, and the recordings’ manuscript for the text. We experiment on alignment efficiency using the said corpus with HTK, SailAlign and the proposed DP method. For unverified causes, results suggest a poorer accuracy with the DP method, but processing time is decreased over 37%.
|
author2 |
Jang, Jyh-Shing Roger |
author_facet |
Jang, Jyh-Shing Roger Peng, Yu-Ya, 彭郁雅 |
author |
Peng, Yu-Ya, 彭郁雅 |
spellingShingle |
Peng, Yu-Ya, 彭郁雅 Efficient Alignment between Long Utterances and Texts |
author_sort |
Peng, Yu-Ya, |
title |
Efficient Alignment between Long Utterances and Texts |
title_short |
Efficient Alignment between Long Utterances and Texts |
title_full |
Efficient Alignment between Long Utterances and Texts |
title_fullStr |
Efficient Alignment between Long Utterances and Texts |
title_full_unstemmed |
Efficient Alignment between Long Utterances and Texts |
title_sort |
efficient alignment between long utterances and texts |
publishDate |
2012 |
url |
http://ndltd.ncl.edu.tw/handle/16620738245555736406 |
work_keys_str_mv |
AT pengyuya efficientalignmentbetweenlongutterancesandtexts AT péngyùyǎ efficientalignmentbetweenlongutterancesandtexts AT pengyuya zhǎngyīndàngyǔwénběnzhīkuàisùduìwèi AT péngyùyǎ zhǎngyīndàngyǔwénběnzhīkuàisùduìwèi |
_version_ |
1718062733309509632 |