Recognizing Packet Loss Speech Using Client-End and Server-End Mitigation Techniques
碩士 === 國立臺北科技大學 === 電機工程研究所 === 105 === Due to the rapid evolution of artificial intelligence technology, distributed speech recognition now has becoming a convenient standard human-machine interface on smart mobile devices. Speech feature is extracted at the client-end and is sent through the netwo...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2017
|
Online Access: | http://ndltd.ncl.edu.tw/handle/txgp84 |
id |
ndltd-TW-105TIT05442053 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-105TIT054420532019-05-15T23:53:23Z http://ndltd.ncl.edu.tw/handle/txgp84 Recognizing Packet Loss Speech Using Client-End and Server-End Mitigation Techniques 使用客戶端與伺服端緩解技術辨識遺失封包語音 Yi-Jia Huang 黃奕嘉 碩士 國立臺北科技大學 電機工程研究所 105 Due to the rapid evolution of artificial intelligence technology, distributed speech recognition now has becoming a convenient standard human-machine interface on smart mobile devices. Speech feature is extracted at the client-end and is sent through the network channel to the server-end for recognition. Speech feature packets may be lost due to inevitably delay or transmission error over error-prone channels. The purpose of the theses aims at finding ways of reducing the effect of packet loss on speech recognition. In this thesis, we use Mel-frequency cepstral coefficients (MFCC) as the speech feature parameter. Moreover, matrix interleaving at the client-end and matrix de-interleaving at the server-end are applied to shape the burst-like packet loss into uniform-like distribution prior to recognition resulting in less damage. Furthermore, the standard ETSI-duplication or feature interpolation is exploited for reconstructing missing frames. In addition to the full frame rate decoder, speech recognition is also experimented by using the weighted Viterbi algorithm (WVA) and model adaptation (MA) method. The experimental results show that using matrix interleaving and matrix de-interleaving indeed can reduce the length of continuous missing frames, and the higher the matrix interleaving order is, the less the recognition damage should be. As we can see in the experiments, the performance of feature interpolation is slightly better than that of ETSI-duplication. Among all, the model adaptation method achieves the highest recognition rate in average of all SNR conditions. Fu-Rong Jean 簡福榮 2017 學位論文 ; thesis 77 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺北科技大學 === 電機工程研究所 === 105 === Due to the rapid evolution of artificial intelligence technology, distributed speech recognition now has becoming a convenient standard human-machine interface on smart mobile devices. Speech feature is extracted at the client-end and is sent through the network channel to the server-end for recognition. Speech feature packets may be lost due to inevitably delay or transmission error over error-prone channels. The purpose of the theses aims at finding ways of reducing the effect of packet loss on speech recognition. In this thesis, we use Mel-frequency cepstral coefficients (MFCC) as the speech feature parameter. Moreover, matrix interleaving at the client-end and matrix de-interleaving at the server-end are applied to shape the burst-like packet loss into uniform-like distribution prior to recognition resulting in less damage. Furthermore, the standard ETSI-duplication or feature interpolation is exploited for reconstructing missing frames. In addition to the full frame rate decoder, speech recognition is also experimented by using the weighted Viterbi algorithm (WVA) and model adaptation (MA) method. The experimental results show that using matrix interleaving and matrix de-interleaving indeed can reduce the length of continuous missing frames, and the higher the matrix interleaving order is, the less the recognition damage should be. As we can see in the experiments, the performance of feature interpolation is slightly better than that of ETSI-duplication. Among all, the model adaptation method achieves the highest recognition rate in average of all SNR conditions.
|
author2 |
Fu-Rong Jean |
author_facet |
Fu-Rong Jean Yi-Jia Huang 黃奕嘉 |
author |
Yi-Jia Huang 黃奕嘉 |
spellingShingle |
Yi-Jia Huang 黃奕嘉 Recognizing Packet Loss Speech Using Client-End and Server-End Mitigation Techniques |
author_sort |
Yi-Jia Huang |
title |
Recognizing Packet Loss Speech Using Client-End and Server-End Mitigation Techniques |
title_short |
Recognizing Packet Loss Speech Using Client-End and Server-End Mitigation Techniques |
title_full |
Recognizing Packet Loss Speech Using Client-End and Server-End Mitigation Techniques |
title_fullStr |
Recognizing Packet Loss Speech Using Client-End and Server-End Mitigation Techniques |
title_full_unstemmed |
Recognizing Packet Loss Speech Using Client-End and Server-End Mitigation Techniques |
title_sort |
recognizing packet loss speech using client-end and server-end mitigation techniques |
publishDate |
2017 |
url |
http://ndltd.ncl.edu.tw/handle/txgp84 |
work_keys_str_mv |
AT yijiahuang recognizingpacketlossspeechusingclientendandserverendmitigationtechniques AT huángyìjiā recognizingpacketlossspeechusingclientendandserverendmitigationtechniques AT yijiahuang shǐyòngkèhùduānyǔcìfúduānhuǎnjiějìshùbiànshíyíshīfēngbāoyǔyīn AT huángyìjiā shǐyòngkèhùduānyǔcìfúduānhuǎnjiějìshùbiànshíyíshīfēngbāoyǔyīn |
_version_ |
1719156330116153344 |