Summary: | 博士 === 國立成功大學 === 資訊工程學系 === 103 === This dissertation presents a discussion on the task of score alignment, which properly aligns an audio recording with its corresponding score. Conventional methods have difficulty in performing this task because of asynchrony in the recording of simultaneous notes in the score. We approach this target by contributing an alignment system in two manners: transcription and separation. Firstly, we propose a note-based score alignment employing the pitch-by-time feature, some called it the piano-roll feature, which presents the processing of converting audio spectrogram to a piano-roll-like feature. Based on the dynamic time warping algorithm, we propose a pitch-wise alignment algorithm considering every single pitch sequence (i.e. the row of piano roll) using such a feature. Secondly, to transcribe each musical note precisely, a musical sound source separation algorithm called the score-driven complex matrix factorization (CMF) is adopted in this dissertation. We propose a constrained CMF method with the score information, which can be used to separate a musical piece into notes for the separation part of the proposed system. Furthermore, we observe that transcription and separation parts of the system give a priori knowledge to each other. Such findings lead to the proposed iterative approach by performing the two analysis jobs alternatively to improve the qualities of both works. We also show how these methods can be applied to single-channel source separation/transcription and compare them with the current state-of-the-art methods.
|