A Voice Separation System Based on Median Filtering and a few Improvements
碩士 === 國立臺灣科技大學 === 資訊工程系 === 102 === In this thesis, we study some relevant problems about voice separation that subtracts music spectrum from mixed spectrum. To extract the music spectrogram from the mixed spectrogram, we adopt the concepts, searching nearest neighbor frames and median filtering....
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2014
|
Online Access: | http://ndltd.ncl.edu.tw/handle/q8e9vq |
id |
ndltd-TW-102NTUS5392019 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-102NTUS53920192019-05-15T21:13:20Z http://ndltd.ncl.edu.tw/handle/q8e9vq A Voice Separation System Based on Median Filtering and a few Improvements 基於中值濾波及數項改進之語音分離系統 Yu-Min Jiang 姜育民 碩士 國立臺灣科技大學 資訊工程系 102 In this thesis, we study some relevant problems about voice separation that subtracts music spectrum from mixed spectrum. To extract the music spectrogram from the mixed spectrogram, we adopt the concepts, searching nearest neighbor frames and median filtering. As the achievement, we have not only proposed some methods to improve the separation performance, but also implemented an on-line voice separation system. First, for the number of nearest neighbor frames to keep and the mask parameter value, we have run a few calibration experiments. By using the best values, the average SDR (source to distortion ratio) is raised 0.94dB. Next, for selecting the nearest neighbor frames, spectrum magnitude is changed from linear scale to logarithmic scale to calculate the spectral distance between two frames. Also, we have attempted to equalize a spectrum by using its average magnitude. According to the results of the experiments, using logarithmic magnitude to calculate the spectral distance may raise the average SDR considerably, i.e. 0.97dB. In addition, a spectral-flatness measure is used to detect the frames of drum sound. Then, the spectrum bins of these frames are reassigned to music spectrogram. Consequently, the separated voice can get rid of the interference of the drum sound, and the average SDR is raised 0.02dB. As to the removed spectrum bins in the drum-sound frames, it is found that filling or without filling the empty spectrums will not have noticeable difference. Moreover, we have attempted to remove the low frequency bins of the spectrum in order to reduce the interference from the low frequency music signal. By removing low frequency bins, the average SDR is further raised 1.01dB. Overall, using logarithmic magnitude spectrum to calculate spectral distance, removing drum sound, and removing low frequency bins can have the quality of the separated voice being considerably promoted, and the average SDR is raise from 2.48dB to 5.42dB. Hung-yan Gu 古鴻炎 2014 學位論文 ; thesis 63 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣科技大學 === 資訊工程系 === 102 === In this thesis, we study some relevant problems about voice separation that subtracts music spectrum from mixed spectrum. To extract the music spectrogram from the mixed spectrogram, we adopt the concepts, searching nearest neighbor frames and median filtering. As the achievement, we have not only proposed some methods to improve the separation performance, but also implemented an on-line voice separation system. First, for the number of nearest neighbor frames to keep and the mask parameter value, we have run a few calibration experiments. By using the best values, the average SDR (source to distortion ratio) is raised 0.94dB. Next, for selecting the nearest neighbor frames, spectrum magnitude is changed from linear scale to logarithmic scale to calculate the spectral distance between two frames. Also, we have attempted to equalize a spectrum by using its average magnitude. According to the results of the experiments, using logarithmic magnitude to calculate the spectral distance may raise the average SDR considerably, i.e. 0.97dB. In addition, a spectral-flatness measure is used to detect the frames of drum sound. Then, the spectrum bins of these frames are reassigned to music spectrogram. Consequently, the separated voice can get rid of the interference of the drum sound, and the average SDR is raised 0.02dB. As to the removed spectrum bins in the drum-sound frames, it is found that filling or without filling the empty spectrums will not have noticeable difference. Moreover, we have attempted to remove the low frequency bins of the spectrum in order to reduce the interference from the low frequency music signal. By removing low frequency bins, the average SDR is further raised 1.01dB. Overall, using logarithmic magnitude spectrum to calculate spectral distance, removing drum sound, and removing low frequency bins can have the quality of the separated voice being considerably promoted, and the average SDR is raise from 2.48dB to 5.42dB.
|
author2 |
Hung-yan Gu |
author_facet |
Hung-yan Gu Yu-Min Jiang 姜育民 |
author |
Yu-Min Jiang 姜育民 |
spellingShingle |
Yu-Min Jiang 姜育民 A Voice Separation System Based on Median Filtering and a few Improvements |
author_sort |
Yu-Min Jiang |
title |
A Voice Separation System Based on Median Filtering and a few Improvements |
title_short |
A Voice Separation System Based on Median Filtering and a few Improvements |
title_full |
A Voice Separation System Based on Median Filtering and a few Improvements |
title_fullStr |
A Voice Separation System Based on Median Filtering and a few Improvements |
title_full_unstemmed |
A Voice Separation System Based on Median Filtering and a few Improvements |
title_sort |
voice separation system based on median filtering and a few improvements |
publishDate |
2014 |
url |
http://ndltd.ncl.edu.tw/handle/q8e9vq |
work_keys_str_mv |
AT yuminjiang avoiceseparationsystembasedonmedianfilteringandafewimprovements AT jiāngyùmín avoiceseparationsystembasedonmedianfilteringandafewimprovements AT yuminjiang jīyúzhōngzhílǜbōjíshùxiànggǎijìnzhīyǔyīnfēnlíxìtǒng AT jiāngyùmín jīyúzhōngzhílǜbōjíshùxiànggǎijìnzhīyǔyīnfēnlíxìtǒng AT yuminjiang voiceseparationsystembasedonmedianfilteringandafewimprovements AT jiāngyùmín voiceseparationsystembasedonmedianfilteringandafewimprovements |
_version_ |
1719110873719504896 |