Summary: | 碩士 === 國立中央大學 === 通訊工程學系 === 104 === With the increasing of multimedia data, it becomes more and more important to quickly search the interests from large databases. Keyword annotation is the traditional approach, but it needs large amount of manual effort to annotate the keyword. As the size of data increases, the keyword annotation approach becomes infeasible. Content-based retrieval is more natural, it extracts features from music content to create a representation that overcomes human labeling errors.
This thesis focuses on the AAC file which is widely used by streaming internet sources. Here, the proposed system directly maps the modified discrete cosine transform coefficients (MDCT) into a 12-dimensional chroma feature. We combine frames to a segment as the input of deep learning, deep learning can automatically find more meaningful features of music data. We also applied sparse autoencoder to reduce dimensionality of songs. With these efforts, significant matching time can be saved. The experimental results show that the proposed method can reach 0.505 of mean reciprocal rank (MRR) and save over 70% matching time compared with conventional approaches.
|