Summary: | 碩士 === 國立臺灣科技大學 === 資訊工程系 === 107 === In this thesis, we have studied two types of compound classifier structures, hierarchical structure and classifier-mixing structure, in order to improve the accuracy of music genre classification. First, four kinds of spectral features are analyzed from an inputted music signal, including mel-frequency spectrum, mel-frequency cepstrum, modulation spectrum and percussive spectrum. These features are considered as basic acoustic features (BAF). Then, three dimension-reduction methods, mean & standard deviation, principal component analysis (PCA), and convolutional neural network (CNN), are used to extract advanced acoustic features (AAF) from BAF. Next, we use AAF to train four types of basic classifiers, i.e. support vector machine, k-nearest neighbor, Gaussian mixture model and multiple layer perceptron. By combing different AAF and basic classifiers, we pick one best-performance combination of AAF and basic classifier for each kind of BAF. Then, the four best-performance combinations are used to construct a hierarchical classifier structure. On the other hand, each kind of BAF is used to train a corresponding expert network based on CNN. Then, the four expert networks are used to construct a classifier-mixing structure. According to the results of music genre classification experiments with different datasets, both compound classifier structures studied here can obtain considerable improvement in classification rate. The hierarchical classifier structure achieves the classification accuracy, 87.1% whereas the classifier-mixing structure achieves the higher classification accuracy, 88.0%.
|