Summary: | 碩士 === 國立臺灣科技大學 === 電子工程系 === 105 === With massive amounts of computational power, deep learning has been widely studied in recent years. In this study, we have proposed several systems for audio denoising, identification, clustering, and dimensionality reduction based on deep neural networks (DNN). First of all, we use the recurrent neural network (RNN) for audio noise removal according to a priori frequency representations of both clean speech and noisy speech. Owing to the sequential data, the RNN provides a structure of time series with lags which is more suitable for audio modeling than other DNNs. The RNN with long short-term memory (LSTM) module raises the signal-to-noise rate for noisy TIMIT. Moreover, we propose scaled short-time Fourier transform (SSTFT) that provides effective features for avian call identification. The scaled representation can enhance the acoustics characteristic of visual pattern fed into the RNN classifier for identifying avian species, and achieve up to 91\% hitting rate when more than 360 avian species are involved. On other hand, we use the autoencoders for unsupervised dimensionality reduction applied to a multiposition room impulse response database, including horizontal, elevational, and omni-directional subsets, for finding similarities among room responses. For direct sound clustering, the distribution mainly depends on the location of sound source, whereas for reflection and reverberation clustering, the unsupervised stacked autoencoder (SAE) not only provides dimensionality reduction but indicates the major and minor virtual sound sources with its supervised vision.
|