Summary: | 碩士 === 國立臺灣大學 === 電機工程學研究所 === 95 === Nowadays, the media player software is often featured with some visual effects when listening to music. But most of them are always meaningless patterns to the musical content. In this thesis, a novel method is exhibited here to show a fancy media player show, integrating auditory effect and visual cognition. The work is divided into three parts. A database of 400 travel and nature photos and 200 clips from film soundtracks is constructed with emotion labels by near five hundreds of users through web. This work marks the ground truth for emotion labels. The second part of the work focuses on the process to automatically detect the emotion of the two kinds of media. Digital photos and music are analyzed with low level features and SVM (Support Vector Machine) is utilized to classify the emotion of the media. In the final part, we demonstrate a strategy to combine these two media. A hierarchical methodology is proposed. In the first phase, a complete music is analyzed and segmented according to the beat tracking algorithm. Music emotion detection is invoked to mark each segment and images with the same emotion become the candidate data source. In the second phase, we formulate the music and photo alignment into an optimization problem and a greedy algorithm is used to solve it. Spectral centroid and spectral flux of music and color brightness and contrast of images are used as the features to coordinate. Results of subjective feedbacks show that the users have given good evaluations.
|