Recognizing Packet Loss Speech Using Client-End and Server-End Mitigation Techniques

碩士 === 國立臺北科技大學 === 電機工程研究所 === 105 === Due to the rapid evolution of artificial intelligence technology, distributed speech recognition now has becoming a convenient standard human-machine interface on smart mobile devices. Speech feature is extracted at the client-end and is sent through the netwo...

Full description

Bibliographic Details
Main Authors:	Yi-Jia Huang, 黃奕嘉
Other Authors:	Fu-Rong Jean
Format:	Others
Language:	zh-TW
Published:	2017
Online Access:	http://ndltd.ncl.edu.tw/handle/txgp84

id	ndltd-TW-105TIT05442053
record_format	oai_dc
spelling	ndltd-TW-105TIT054420532019-05-15T23:53:23Z http://ndltd.ncl.edu.tw/handle/txgp84 Recognizing Packet Loss Speech Using Client-End and Server-End Mitigation Techniques 使用客戶端與伺服端緩解技術辨識遺失封包語音 Yi-Jia Huang 黃奕嘉碩士國立臺北科技大學電機工程研究所 105 Due to the rapid evolution of artificial intelligence technology, distributed speech recognition now has becoming a convenient standard human-machine interface on smart mobile devices. Speech feature is extracted at the client-end and is sent through the network channel to the server-end for recognition. Speech feature packets may be lost due to inevitably delay or transmission error over error-prone channels. The purpose of the theses aims at finding ways of reducing the effect of packet loss on speech recognition. In this thesis, we use Mel-frequency cepstral coefficients (MFCC) as the speech feature parameter. Moreover, matrix interleaving at the client-end and matrix de-interleaving at the server-end are applied to shape the burst-like packet loss into uniform-like distribution prior to recognition resulting in less damage. Furthermore, the standard ETSI-duplication or feature interpolation is exploited for reconstructing missing frames. In addition to the full frame rate decoder, speech recognition is also experimented by using the weighted Viterbi algorithm (WVA) and model adaptation (MA) method. The experimental results show that using matrix interleaving and matrix de-interleaving indeed can reduce the length of continuous missing frames, and the higher the matrix interleaving order is, the less the recognition damage should be. As we can see in the experiments, the performance of feature interpolation is slightly better than that of ETSI-duplication. Among all, the model adaptation method achieves the highest recognition rate in average of all SNR conditions. Fu-Rong Jean 簡福榮 2017 學位論文 ; thesis 77 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立臺北科技大學 === 電機工程研究所 === 105 === Due to the rapid evolution of artificial intelligence technology, distributed speech recognition now has becoming a convenient standard human-machine interface on smart mobile devices. Speech feature is extracted at the client-end and is sent through the network channel to the server-end for recognition. Speech feature packets may be lost due to inevitably delay or transmission error over error-prone channels. The purpose of the theses aims at finding ways of reducing the effect of packet loss on speech recognition. In this thesis, we use Mel-frequency cepstral coefficients (MFCC) as the speech feature parameter. Moreover, matrix interleaving at the client-end and matrix de-interleaving at the server-end are applied to shape the burst-like packet loss into uniform-like distribution prior to recognition resulting in less damage. Furthermore, the standard ETSI-duplication or feature interpolation is exploited for reconstructing missing frames. In addition to the full frame rate decoder, speech recognition is also experimented by using the weighted Viterbi algorithm (WVA) and model adaptation (MA) method. The experimental results show that using matrix interleaving and matrix de-interleaving indeed can reduce the length of continuous missing frames, and the higher the matrix interleaving order is, the less the recognition damage should be. As we can see in the experiments, the performance of feature interpolation is slightly better than that of ETSI-duplication. Among all, the model adaptation method achieves the highest recognition rate in average of all SNR conditions.
author2	Fu-Rong Jean
author_facet	Fu-Rong Jean Yi-Jia Huang 黃奕嘉
author	Yi-Jia Huang 黃奕嘉
spellingShingle	Yi-Jia Huang 黃奕嘉 Recognizing Packet Loss Speech Using Client-End and Server-End Mitigation Techniques
author_sort	Yi-Jia Huang
title	Recognizing Packet Loss Speech Using Client-End and Server-End Mitigation Techniques
title_short	Recognizing Packet Loss Speech Using Client-End and Server-End Mitigation Techniques
title_full	Recognizing Packet Loss Speech Using Client-End and Server-End Mitigation Techniques
title_fullStr	Recognizing Packet Loss Speech Using Client-End and Server-End Mitigation Techniques
title_full_unstemmed	Recognizing Packet Loss Speech Using Client-End and Server-End Mitigation Techniques
title_sort	recognizing packet loss speech using client-end and server-end mitigation techniques
publishDate	2017
url	http://ndltd.ncl.edu.tw/handle/txgp84
work_keys_str_mv	AT yijiahuang recognizingpacketlossspeechusingclientendandserverendmitigationtechniques AT huángyìjiā recognizingpacketlossspeechusingclientendandserverendmitigationtechniques AT yijiahuang shǐyòngkèhùduānyǔcìfúduānhuǎnjiějìshùbiànshíyíshīfēngbāoyǔyīn AT huángyìjiā shǐyòngkèhùduānyǔcìfúduānhuǎnjiějìshùbiànshíyíshīfēngbāoyǔyīn
_version_	1719156330116153344

Recognizing Packet Loss Speech Using Client-End and Server-End Mitigation Techniques

Similar Items