Recognizing Packet Loss Speech Using Client-End and Server-End Mitigation Techniques

碩士 === 國立臺北科技大學 === 電機工程研究所 === 105 === Due to the rapid evolution of artificial intelligence technology, distributed speech recognition now has becoming a convenient standard human-machine interface on smart mobile devices. Speech feature is extracted at the client-end and is sent through the netwo...

Full description

Bibliographic Details
Main Authors: Yi-Jia Huang, 黃奕嘉
Other Authors: Fu-Rong Jean
Format: Others
Language:zh-TW
Published: 2017
Online Access:http://ndltd.ncl.edu.tw/handle/txgp84
id ndltd-TW-105TIT05442053
record_format oai_dc
spelling ndltd-TW-105TIT054420532019-05-15T23:53:23Z http://ndltd.ncl.edu.tw/handle/txgp84 Recognizing Packet Loss Speech Using Client-End and Server-End Mitigation Techniques 使用客戶端與伺服端緩解技術辨識遺失封包語音 Yi-Jia Huang 黃奕嘉 碩士 國立臺北科技大學 電機工程研究所 105 Due to the rapid evolution of artificial intelligence technology, distributed speech recognition now has becoming a convenient standard human-machine interface on smart mobile devices. Speech feature is extracted at the client-end and is sent through the network channel to the server-end for recognition. Speech feature packets may be lost due to inevitably delay or transmission error over error-prone channels. The purpose of the theses aims at finding ways of reducing the effect of packet loss on speech recognition. In this thesis, we use Mel-frequency cepstral coefficients (MFCC) as the speech feature parameter. Moreover, matrix interleaving at the client-end and matrix de-interleaving at the server-end are applied to shape the burst-like packet loss into uniform-like distribution prior to recognition resulting in less damage. Furthermore, the standard ETSI-duplication or feature interpolation is exploited for reconstructing missing frames. In addition to the full frame rate decoder, speech recognition is also experimented by using the weighted Viterbi algorithm (WVA) and model adaptation (MA) method. The experimental results show that using matrix interleaving and matrix de-interleaving indeed can reduce the length of continuous missing frames, and the higher the matrix interleaving order is, the less the recognition damage should be. As we can see in the experiments, the performance of feature interpolation is slightly better than that of ETSI-duplication. Among all, the model adaptation method achieves the highest recognition rate in average of all SNR conditions. Fu-Rong Jean 簡福榮 2017 學位論文 ; thesis 77 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺北科技大學 === 電機工程研究所 === 105 === Due to the rapid evolution of artificial intelligence technology, distributed speech recognition now has becoming a convenient standard human-machine interface on smart mobile devices. Speech feature is extracted at the client-end and is sent through the network channel to the server-end for recognition. Speech feature packets may be lost due to inevitably delay or transmission error over error-prone channels. The purpose of the theses aims at finding ways of reducing the effect of packet loss on speech recognition. In this thesis, we use Mel-frequency cepstral coefficients (MFCC) as the speech feature parameter. Moreover, matrix interleaving at the client-end and matrix de-interleaving at the server-end are applied to shape the burst-like packet loss into uniform-like distribution prior to recognition resulting in less damage. Furthermore, the standard ETSI-duplication or feature interpolation is exploited for reconstructing missing frames. In addition to the full frame rate decoder, speech recognition is also experimented by using the weighted Viterbi algorithm (WVA) and model adaptation (MA) method. The experimental results show that using matrix interleaving and matrix de-interleaving indeed can reduce the length of continuous missing frames, and the higher the matrix interleaving order is, the less the recognition damage should be. As we can see in the experiments, the performance of feature interpolation is slightly better than that of ETSI-duplication. Among all, the model adaptation method achieves the highest recognition rate in average of all SNR conditions.
author2 Fu-Rong Jean
author_facet Fu-Rong Jean
Yi-Jia Huang
黃奕嘉
author Yi-Jia Huang
黃奕嘉
spellingShingle Yi-Jia Huang
黃奕嘉
Recognizing Packet Loss Speech Using Client-End and Server-End Mitigation Techniques
author_sort Yi-Jia Huang
title Recognizing Packet Loss Speech Using Client-End and Server-End Mitigation Techniques
title_short Recognizing Packet Loss Speech Using Client-End and Server-End Mitigation Techniques
title_full Recognizing Packet Loss Speech Using Client-End and Server-End Mitigation Techniques
title_fullStr Recognizing Packet Loss Speech Using Client-End and Server-End Mitigation Techniques
title_full_unstemmed Recognizing Packet Loss Speech Using Client-End and Server-End Mitigation Techniques
title_sort recognizing packet loss speech using client-end and server-end mitigation techniques
publishDate 2017
url http://ndltd.ncl.edu.tw/handle/txgp84
work_keys_str_mv AT yijiahuang recognizingpacketlossspeechusingclientendandserverendmitigationtechniques
AT huángyìjiā recognizingpacketlossspeechusingclientendandserverendmitigationtechniques
AT yijiahuang shǐyòngkèhùduānyǔcìfúduānhuǎnjiějìshùbiànshíyíshīfēngbāoyǔyīn
AT huángyìjiā shǐyòngkèhùduānyǔcìfúduānhuǎnjiějìshùbiànshíyíshīfēngbāoyǔyīn
_version_ 1719156330116153344