Summary: | 碩士 === 國立清華大學 === 資訊工程學系所 === 105 === Sometimes, we may hear a familiar song on the street but we do not know what the name it is or who the singer is. Or, we may hear a good melody, but we do not know how to find further information of the song since we have never heard this song before. In these situations, one may use a melody recognition system to help.
A melody recognition system has three main components: (1) an input query, (2) a database of songs, and (3) method for comparison. Perhaps the most user-friendly form of input query is by singing and humming (QBSH), where a user sings or hums a melody with a microphone, and then the input data is transformed to a suitable format for comparison with the database to be performed. In the past, Jyh-Shing Jang (formerly with National Tsing Hua University, and now with National Taiwan University) and his team have designed a system that has high recognition rate to solving this problem. The idea is to change the input to a pitch vector, change the database to a note vector, and use linear scaling method to compare the input with every song in the database. After that, the most song with the highest score is reported to the user. Their system performs with good efficiency and achieves high recognition rate.
In this thesis, we attempt to further improve the above performance. Instead of comparing all songs in the database with the input query using linear scaling, we use suffix tree of the songs in the database as a filter to obtain a few candidate songs that are most likely to be a match with the input query. After that, only these candidate songs will be matched, carefully, with the linear scaling method. Experimental results show that this new approach not only speeds up the overall algorithm, but, to our surprise, also improves slightly the recognition rate.
|