Automatic Recognition of Life Sounds

碩士 === 國立清華大學 === 電機工程學系 === 100 === There are many kinds of different sounds in human daily lives. Whether it is speech or non-speech, we can recognize the sounds by characteristic sounds through the human ears and realize what is happening around us. With technical advances, the identification of...

Full description

Bibliographic Details
Main Authors: Wu, Chen Wei, 吳晨瑋
Other Authors: Liu, Yi Wen
Format: Others
Language:zh-TW
Published: 2012
Online Access:http://ndltd.ncl.edu.tw/handle/16103160345738752230
id ndltd-TW-100NTHU5442108
record_format oai_dc
spelling ndltd-TW-100NTHU54421082015-10-13T21:27:24Z http://ndltd.ncl.edu.tw/handle/16103160345738752230 Automatic Recognition of Life Sounds 生活聲響之自動辨認 Wu, Chen Wei 吳晨瑋 碩士 國立清華大學 電機工程學系 100 There are many kinds of different sounds in human daily lives. Whether it is speech or non-speech, we can recognize the sounds by characteristic sounds through the human ears and realize what is happening around us. With technical advances, the identification of the sound has become a practical technology gradually, especially in the speech recognition. The recognition of sound has gradually got into home safety. Regardless of the user's age or status, emergency can happen at home, accompanied by non-speech sounds. In the past, the recognition of the sound mostly focused on the voice and the speaker. If it is possible to classify and recognize any sound that indicates dangerous situations in the house, that will help analyze the scenario and increase people’s sense of security while living alone. In this paper, we have collected eight classes of audio files, 372 files in total for experiments. The files were equally divided into training and testing datasets. We use them to develop methods for sound recognition in normal or noisy situations. As for feature extraction, the feature vector consists of Mel-scale Frequency Cepstral Coefficients (MFCC) and Perceptual Features. Gaussian mixture model (GMM) is used as the front-end in the classifier, and an outlier rejection mechanism is added to it. The outlier rejection mechanism is based on Likelihood Ratio Test (LRT), which compares the test audio files and non-dataset files respectively with dataset. That way, we can prevent the non-dataset audio files from being enforced to recognize by mistake. In this paper, we use three methods to classify the audio files: the variance-mean method, the frame-vote method, and the selected frame-vote method. At the present time for the comparison of the dataset and the test audio files, the methods can reach 96.24% of recognition accuracy at best in the normal situation. In addition, we make a complete evaluation for the robustness against noise and echoes. As for the outlier rejection mechanism, we have collected a total of 120 non-dataset audio files to experiment on it, and the overall error rate can be reduced to 19%. What is more, we found a total of 100 non-dataset audio files to experiment on it again, and the overall error rate can be reduced to 23%. Liu, Yi Wen 劉奕汶 2012 學位論文 ; thesis 73 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立清華大學 === 電機工程學系 === 100 === There are many kinds of different sounds in human daily lives. Whether it is speech or non-speech, we can recognize the sounds by characteristic sounds through the human ears and realize what is happening around us. With technical advances, the identification of the sound has become a practical technology gradually, especially in the speech recognition. The recognition of sound has gradually got into home safety. Regardless of the user's age or status, emergency can happen at home, accompanied by non-speech sounds. In the past, the recognition of the sound mostly focused on the voice and the speaker. If it is possible to classify and recognize any sound that indicates dangerous situations in the house, that will help analyze the scenario and increase people’s sense of security while living alone. In this paper, we have collected eight classes of audio files, 372 files in total for experiments. The files were equally divided into training and testing datasets. We use them to develop methods for sound recognition in normal or noisy situations. As for feature extraction, the feature vector consists of Mel-scale Frequency Cepstral Coefficients (MFCC) and Perceptual Features. Gaussian mixture model (GMM) is used as the front-end in the classifier, and an outlier rejection mechanism is added to it. The outlier rejection mechanism is based on Likelihood Ratio Test (LRT), which compares the test audio files and non-dataset files respectively with dataset. That way, we can prevent the non-dataset audio files from being enforced to recognize by mistake. In this paper, we use three methods to classify the audio files: the variance-mean method, the frame-vote method, and the selected frame-vote method. At the present time for the comparison of the dataset and the test audio files, the methods can reach 96.24% of recognition accuracy at best in the normal situation. In addition, we make a complete evaluation for the robustness against noise and echoes. As for the outlier rejection mechanism, we have collected a total of 120 non-dataset audio files to experiment on it, and the overall error rate can be reduced to 19%. What is more, we found a total of 100 non-dataset audio files to experiment on it again, and the overall error rate can be reduced to 23%.
author2 Liu, Yi Wen
author_facet Liu, Yi Wen
Wu, Chen Wei
吳晨瑋
author Wu, Chen Wei
吳晨瑋
spellingShingle Wu, Chen Wei
吳晨瑋
Automatic Recognition of Life Sounds
author_sort Wu, Chen Wei
title Automatic Recognition of Life Sounds
title_short Automatic Recognition of Life Sounds
title_full Automatic Recognition of Life Sounds
title_fullStr Automatic Recognition of Life Sounds
title_full_unstemmed Automatic Recognition of Life Sounds
title_sort automatic recognition of life sounds
publishDate 2012
url http://ndltd.ncl.edu.tw/handle/16103160345738752230
work_keys_str_mv AT wuchenwei automaticrecognitionoflifesounds
AT wúchénwěi automaticrecognitionoflifesounds
AT wuchenwei shēnghuóshēngxiǎngzhīzìdòngbiànrèn
AT wúchénwěi shēnghuóshēngxiǎngzhīzìdòngbiànrèn
_version_ 1718063430580043776