Summary: | 碩士 === 國立中央大學 === 資訊工程學系 === 107 === DNNs(Deep neural networks) have made rapid progress in the field of audio processing. In the past, most of them used spectrum information via STFT (Short Term Fourier Transform), but them usually only deal with real parts. In recent years, in order to avoid the information loss caused by the lack of consideration of complex value, deep learning models have gradually been proposed for audio source separation based on time domain for end-to-end processing. However, those models are huge, i.e., the number of parameters is very large. Therefore, it is difficult to use them where the computing resources of the device is limited. On the other hand, it generally takes a long term input to obtain a good result for separation, which represents high delay. It is less helpful for some applications that require low latency.
Based on the previous research, this thesis proposes a lightweight end-to-end music source separation deep learning model. To reduce the number of parameters and accelerate the computation, and then propose a novel decoder that can further enhance the result of separation while the input context length is limited. The experimental results show that the method proposed in this paper can obtain better than the previous results by only uses 10% or less parameters.
|