A Voice Separation System Based on Median Filtering and a few Improvements

碩士 === 國立臺灣科技大學 === 資訊工程系 === 102 === In this thesis, we study some relevant problems about voice separation that subtracts music spectrum from mixed spectrum. To extract the music spectrogram from the mixed spectrogram, we adopt the concepts, searching nearest neighbor frames and median filtering....

Full description

Bibliographic Details
Main Authors: Yu-Min Jiang, 姜育民
Other Authors: Hung-yan Gu
Format: Others
Language:zh-TW
Published: 2014
Online Access:http://ndltd.ncl.edu.tw/handle/q8e9vq
id ndltd-TW-102NTUS5392019
record_format oai_dc
spelling ndltd-TW-102NTUS53920192019-05-15T21:13:20Z http://ndltd.ncl.edu.tw/handle/q8e9vq A Voice Separation System Based on Median Filtering and a few Improvements 基於中值濾波及數項改進之語音分離系統 Yu-Min Jiang 姜育民 碩士 國立臺灣科技大學 資訊工程系 102 In this thesis, we study some relevant problems about voice separation that subtracts music spectrum from mixed spectrum. To extract the music spectrogram from the mixed spectrogram, we adopt the concepts, searching nearest neighbor frames and median filtering. As the achievement, we have not only proposed some methods to improve the separation performance, but also implemented an on-line voice separation system. First, for the number of nearest neighbor frames to keep and the mask parameter value, we have run a few calibration experiments. By using the best values, the average SDR (source to distortion ratio) is raised 0.94dB. Next, for selecting the nearest neighbor frames, spectrum magnitude is changed from linear scale to logarithmic scale to calculate the spectral distance between two frames. Also, we have attempted to equalize a spectrum by using its average magnitude. According to the results of the experiments, using logarithmic magnitude to calculate the spectral distance may raise the average SDR considerably, i.e. 0.97dB. In addition, a spectral-flatness measure is used to detect the frames of drum sound. Then, the spectrum bins of these frames are reassigned to music spectrogram. Consequently, the separated voice can get rid of the interference of the drum sound, and the average SDR is raised 0.02dB. As to the removed spectrum bins in the drum-sound frames, it is found that filling or without filling the empty spectrums will not have noticeable difference. Moreover, we have attempted to remove the low frequency bins of the spectrum in order to reduce the interference from the low frequency music signal. By removing low frequency bins, the average SDR is further raised 1.01dB. Overall, using logarithmic magnitude spectrum to calculate spectral distance, removing drum sound, and removing low frequency bins can have the quality of the separated voice being considerably promoted, and the average SDR is raise from 2.48dB to 5.42dB. Hung-yan Gu 古鴻炎 2014 學位論文 ; thesis 63 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺灣科技大學 === 資訊工程系 === 102 === In this thesis, we study some relevant problems about voice separation that subtracts music spectrum from mixed spectrum. To extract the music spectrogram from the mixed spectrogram, we adopt the concepts, searching nearest neighbor frames and median filtering. As the achievement, we have not only proposed some methods to improve the separation performance, but also implemented an on-line voice separation system. First, for the number of nearest neighbor frames to keep and the mask parameter value, we have run a few calibration experiments. By using the best values, the average SDR (source to distortion ratio) is raised 0.94dB. Next, for selecting the nearest neighbor frames, spectrum magnitude is changed from linear scale to logarithmic scale to calculate the spectral distance between two frames. Also, we have attempted to equalize a spectrum by using its average magnitude. According to the results of the experiments, using logarithmic magnitude to calculate the spectral distance may raise the average SDR considerably, i.e. 0.97dB. In addition, a spectral-flatness measure is used to detect the frames of drum sound. Then, the spectrum bins of these frames are reassigned to music spectrogram. Consequently, the separated voice can get rid of the interference of the drum sound, and the average SDR is raised 0.02dB. As to the removed spectrum bins in the drum-sound frames, it is found that filling or without filling the empty spectrums will not have noticeable difference. Moreover, we have attempted to remove the low frequency bins of the spectrum in order to reduce the interference from the low frequency music signal. By removing low frequency bins, the average SDR is further raised 1.01dB. Overall, using logarithmic magnitude spectrum to calculate spectral distance, removing drum sound, and removing low frequency bins can have the quality of the separated voice being considerably promoted, and the average SDR is raise from 2.48dB to 5.42dB.
author2 Hung-yan Gu
author_facet Hung-yan Gu
Yu-Min Jiang
姜育民
author Yu-Min Jiang
姜育民
spellingShingle Yu-Min Jiang
姜育民
A Voice Separation System Based on Median Filtering and a few Improvements
author_sort Yu-Min Jiang
title A Voice Separation System Based on Median Filtering and a few Improvements
title_short A Voice Separation System Based on Median Filtering and a few Improvements
title_full A Voice Separation System Based on Median Filtering and a few Improvements
title_fullStr A Voice Separation System Based on Median Filtering and a few Improvements
title_full_unstemmed A Voice Separation System Based on Median Filtering and a few Improvements
title_sort voice separation system based on median filtering and a few improvements
publishDate 2014
url http://ndltd.ncl.edu.tw/handle/q8e9vq
work_keys_str_mv AT yuminjiang avoiceseparationsystembasedonmedianfilteringandafewimprovements
AT jiāngyùmín avoiceseparationsystembasedonmedianfilteringandafewimprovements
AT yuminjiang jīyúzhōngzhílǜbōjíshùxiànggǎijìnzhīyǔyīnfēnlíxìtǒng
AT jiāngyùmín jīyúzhōngzhílǜbōjíshùxiànggǎijìnzhīyǔyīnfēnlíxìtǒng
AT yuminjiang voiceseparationsystembasedonmedianfilteringandafewimprovements
AT jiāngyùmín voiceseparationsystembasedonmedianfilteringandafewimprovements
_version_ 1719110873719504896