Pop Music Highlighter: Marking the Emotion Keypoints

The goal of music highlight extraction, or thumbnailing, is to extract a short consecutive segment of a piece of music that is somehow representative of the whole piece. In a previous work, we introduced an attention-based convolutional recurrent neural network that uses music emotion classification...

Full description

Bibliographic Details
Main Authors: Yu-Siang Huang, Szu-Yu Chou, Yi-Hsuan Yang
Format: Article
Language:English
Published: Ubiquity Press 2018-09-01
Series:Transactions of the International Society for Music Information Retrieval
Subjects:
Online Access:https://transactions.ismir.net/articles/14
id doaj-48c0012bd4aa4c579bcf34b51db15ec4
record_format Article
spelling doaj-48c0012bd4aa4c579bcf34b51db15ec42020-11-25T02:00:20ZengUbiquity PressTransactions of the International Society for Music Information Retrieval2514-32982018-09-0111687810.5334/tismir.141Pop Music Highlighter: Marking the Emotion KeypointsYu-Siang Huang0Szu-Yu Chou1Yi-Hsuan Yang2Research Center of IT Innovation, Academia Sinica; Graduate Institute of Networking and Multimedia, National Taiwan UniversityResearch Center of IT Innovation, Academia Sinica; Graduate Institute of Networking and Multimedia, National Taiwan UniversityResearch Center of IT Innovation, Academia SinicaThe goal of music highlight extraction, or thumbnailing, is to extract a short consecutive segment of a piece of music that is somehow representative of the whole piece. In a previous work, we introduced an attention-based convolutional recurrent neural network that uses music emotion classification as a surrogate task for music highlight extraction, assuming that the most emotional part of a song usually corresponds to the highlight. This paper extends our previous work in the following two aspects. First, methodology-wise we experiment with a new architecture that does not need any recurrent layers, making the training process faster. Moreover, we compare a late-fusion variant and an early-fusion variant to study which one better exploits the attention mechanism. Second, we conduct and report an extensive set of experiments comparing the proposed attention-based methods to a heuristic energy-based method, a structural repetition-based method, and three other simple feature-based methods, respectively. Due to the lack of public-domain labeled data for highlight extraction, following our previous work we use the RWC-Pop 100-song data set to evaluate how the detected highlights overlap with any chorus sections of the songs. The experiments demonstrate superior effectiveness of our methods over the competing methods. For reproducibility, we share the code and the pre-trained model at https://github.com/remyhuang/pop-music-highlighter/.https://transactions.ismir.net/articles/14Music thumbnailinghighlight extractionchorus detectionstructure analysisconvolutional neural networkattention mechanism
collection DOAJ
language English
format Article
sources DOAJ
author Yu-Siang Huang
Szu-Yu Chou
Yi-Hsuan Yang
spellingShingle Yu-Siang Huang
Szu-Yu Chou
Yi-Hsuan Yang
Pop Music Highlighter: Marking the Emotion Keypoints
Transactions of the International Society for Music Information Retrieval
Music thumbnailing
highlight extraction
chorus detection
structure analysis
convolutional neural network
attention mechanism
author_facet Yu-Siang Huang
Szu-Yu Chou
Yi-Hsuan Yang
author_sort Yu-Siang Huang
title Pop Music Highlighter: Marking the Emotion Keypoints
title_short Pop Music Highlighter: Marking the Emotion Keypoints
title_full Pop Music Highlighter: Marking the Emotion Keypoints
title_fullStr Pop Music Highlighter: Marking the Emotion Keypoints
title_full_unstemmed Pop Music Highlighter: Marking the Emotion Keypoints
title_sort pop music highlighter: marking the emotion keypoints
publisher Ubiquity Press
series Transactions of the International Society for Music Information Retrieval
issn 2514-3298
publishDate 2018-09-01
description The goal of music highlight extraction, or thumbnailing, is to extract a short consecutive segment of a piece of music that is somehow representative of the whole piece. In a previous work, we introduced an attention-based convolutional recurrent neural network that uses music emotion classification as a surrogate task for music highlight extraction, assuming that the most emotional part of a song usually corresponds to the highlight. This paper extends our previous work in the following two aspects. First, methodology-wise we experiment with a new architecture that does not need any recurrent layers, making the training process faster. Moreover, we compare a late-fusion variant and an early-fusion variant to study which one better exploits the attention mechanism. Second, we conduct and report an extensive set of experiments comparing the proposed attention-based methods to a heuristic energy-based method, a structural repetition-based method, and three other simple feature-based methods, respectively. Due to the lack of public-domain labeled data for highlight extraction, following our previous work we use the RWC-Pop 100-song data set to evaluate how the detected highlights overlap with any chorus sections of the songs. The experiments demonstrate superior effectiveness of our methods over the competing methods. For reproducibility, we share the code and the pre-trained model at https://github.com/remyhuang/pop-music-highlighter/.
topic Music thumbnailing
highlight extraction
chorus detection
structure analysis
convolutional neural network
attention mechanism
url https://transactions.ismir.net/articles/14
work_keys_str_mv AT yusianghuang popmusichighlightermarkingtheemotionkeypoints
AT szuyuchou popmusichighlightermarkingtheemotionkeypoints
AT yihsuanyang popmusichighlightermarkingtheemotionkeypoints
_version_ 1724961276220669952