Automatic Recognition of Life Sounds
碩士 === 國立清華大學 === 電機工程學系 === 100 === There are many kinds of different sounds in human daily lives. Whether it is speech or non-speech, we can recognize the sounds by characteristic sounds through the human ears and realize what is happening around us. With technical advances, the identification of...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2012
|
Online Access: | http://ndltd.ncl.edu.tw/handle/16103160345738752230 |
id |
ndltd-TW-100NTHU5442108 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-100NTHU54421082015-10-13T21:27:24Z http://ndltd.ncl.edu.tw/handle/16103160345738752230 Automatic Recognition of Life Sounds 生活聲響之自動辨認 Wu, Chen Wei 吳晨瑋 碩士 國立清華大學 電機工程學系 100 There are many kinds of different sounds in human daily lives. Whether it is speech or non-speech, we can recognize the sounds by characteristic sounds through the human ears and realize what is happening around us. With technical advances, the identification of the sound has become a practical technology gradually, especially in the speech recognition. The recognition of sound has gradually got into home safety. Regardless of the user's age or status, emergency can happen at home, accompanied by non-speech sounds. In the past, the recognition of the sound mostly focused on the voice and the speaker. If it is possible to classify and recognize any sound that indicates dangerous situations in the house, that will help analyze the scenario and increase people’s sense of security while living alone. In this paper, we have collected eight classes of audio files, 372 files in total for experiments. The files were equally divided into training and testing datasets. We use them to develop methods for sound recognition in normal or noisy situations. As for feature extraction, the feature vector consists of Mel-scale Frequency Cepstral Coefficients (MFCC) and Perceptual Features. Gaussian mixture model (GMM) is used as the front-end in the classifier, and an outlier rejection mechanism is added to it. The outlier rejection mechanism is based on Likelihood Ratio Test (LRT), which compares the test audio files and non-dataset files respectively with dataset. That way, we can prevent the non-dataset audio files from being enforced to recognize by mistake. In this paper, we use three methods to classify the audio files: the variance-mean method, the frame-vote method, and the selected frame-vote method. At the present time for the comparison of the dataset and the test audio files, the methods can reach 96.24% of recognition accuracy at best in the normal situation. In addition, we make a complete evaluation for the robustness against noise and echoes. As for the outlier rejection mechanism, we have collected a total of 120 non-dataset audio files to experiment on it, and the overall error rate can be reduced to 19%. What is more, we found a total of 100 non-dataset audio files to experiment on it again, and the overall error rate can be reduced to 23%. Liu, Yi Wen 劉奕汶 2012 學位論文 ; thesis 73 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立清華大學 === 電機工程學系 === 100 === There are many kinds of different sounds in human daily lives. Whether it is speech or non-speech, we can recognize the sounds by characteristic sounds through the human ears and realize what is happening around us. With technical advances, the identification of the sound has become a practical technology gradually, especially in the speech recognition. The recognition of sound has gradually got into home safety. Regardless of the user's age or status, emergency can happen at home, accompanied by non-speech sounds. In the past, the recognition of the sound mostly focused on the voice and the speaker. If it is possible to classify and recognize any sound that indicates dangerous situations in the house, that will help analyze the scenario and increase people’s sense of security while living alone.
In this paper, we have collected eight classes of audio files, 372 files in total for experiments. The files were equally divided into training and testing datasets. We use them to develop methods for sound recognition in normal or noisy situations. As for feature extraction, the feature vector consists of Mel-scale Frequency Cepstral Coefficients (MFCC) and Perceptual Features. Gaussian mixture model (GMM) is used as the front-end in the classifier, and an outlier rejection mechanism is added to it. The outlier rejection mechanism is based on Likelihood Ratio Test (LRT), which compares the test audio files and non-dataset files respectively with dataset. That way, we can prevent the non-dataset audio files from being enforced to recognize by mistake.
In this paper, we use three methods to classify the audio files: the variance-mean method, the frame-vote method, and the selected frame-vote method. At the present time for the comparison of the dataset and the test audio files, the methods can reach 96.24% of recognition accuracy at best in the normal situation. In addition, we make a complete evaluation for the robustness against noise and echoes. As for the outlier rejection mechanism, we have collected a total of 120 non-dataset audio files to experiment on it, and the overall error rate can be reduced to 19%. What is more, we found a total of 100 non-dataset audio files to experiment on it again, and the overall error rate can be reduced to 23%.
|
author2 |
Liu, Yi Wen |
author_facet |
Liu, Yi Wen Wu, Chen Wei 吳晨瑋 |
author |
Wu, Chen Wei 吳晨瑋 |
spellingShingle |
Wu, Chen Wei 吳晨瑋 Automatic Recognition of Life Sounds |
author_sort |
Wu, Chen Wei |
title |
Automatic Recognition of Life Sounds |
title_short |
Automatic Recognition of Life Sounds |
title_full |
Automatic Recognition of Life Sounds |
title_fullStr |
Automatic Recognition of Life Sounds |
title_full_unstemmed |
Automatic Recognition of Life Sounds |
title_sort |
automatic recognition of life sounds |
publishDate |
2012 |
url |
http://ndltd.ncl.edu.tw/handle/16103160345738752230 |
work_keys_str_mv |
AT wuchenwei automaticrecognitionoflifesounds AT wúchénwěi automaticrecognitionoflifesounds AT wuchenwei shēnghuóshēngxiǎngzhīzìdòngbiànrèn AT wúchénwěi shēnghuóshēngxiǎngzhīzìdòngbiànrèn |
_version_ |
1718063430580043776 |