Multi-objective optimization for model selection in music classification

With the breakthrough of machine learning techniques, the research concerning music emotion classification has been getting notable progress combining various audio features and state-of-the-art machine learning models. Still, it is known that the way to preprocess music samples and to choose which...

Full description

Bibliographic Details
Main Author:	Ujihara, Rintaro
Format:	Others
Language:	English
Published:	KTH, Optimeringslära och systemteori 2021
Subjects:	Music emotion recognition Mel spectrogram MFCC CENS Onset Tonnetz HPSS 1D convolutional neural network Attention LSTM 1DCNN BiLSTM Pareto optimality Mathematics Matematik
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-298370

id	ndltd-UPSALLA1-oai-DiVA.org-kth-298370
record_format	oai_dc
spelling	ndltd-UPSALLA1-oai-DiVA.org-kth-2983702021-07-07T05:24:34ZMulti-objective optimization for model selection in music classificationengFlermålsoptimering för modellval i musikklassificeringUjihara, RintaroKTH, Optimeringslära och systemteori2021Music emotion recognitionMel spectrogramMFCCCENSOnsetTonnetzHPSS1D convolutional neural networkAttention LSTM1DCNN BiLSTMPareto optimalityMathematicsMatematikWith the breakthrough of machine learning techniques, the research concerning music emotion classification has been getting notable progress combining various audio features and state-of-the-art machine learning models. Still, it is known that the way to preprocess music samples and to choose which machine classification algorithm to use depends on data sets and the objective of each project work. The collaborating company of this thesis, Ichigoichie AB, is currently developing a system to categorize music data into positive/negative classes. To enhance the accuracy of the existing system, this project aims to figure out the best model through experiments with six audio features (Mel spectrogram, MFCC, HPSS, Onset, CENS, Tonnetz) and several machine learning models including deep neural network models for the classification task. For each model, hyperparameter tuning is performed and the model evaluation is carried out according to pareto optimality with regard to accuracy and execution time. The results show that the most promising model accomplished 95% correct classification with an execution time of less than 15 seconds. I och med genombrottet av maskininlärningstekniker har forskning kring känsloklassificering i musik sett betydande framsteg genom att kombinera olikamusikanalysverktyg med nya maskinlärningsmodeller. Trots detta är hur man förbehandlar ljuddatat och valet av vilken maskinklassificeringsalgoritm som ska tillämpas beroende på vilken typ av data man arbetar med samt målet med projektet. Denna uppsats samarbetspartner, Ichigoichie AB, utvecklar för närvarande ett system för att kategorisera musikdata enligt positiva och negativa känslor. För att höja systemets noggrannhet är målet med denna uppsats att experimentellt hitta bästa modellen baserat på sex musik-egenskaper (Mel-spektrogram, MFCC, HPSS, Onset, CENS samt Tonnetz) och ett antal olika maskininlärningsmodeller, inklusive Deep Learning-modeller. Varje modell hyperparameteroptimeras och utvärderas enligt paretooptimalitet med hänsyn till noggrannhet och beräkningstid. Resultaten visar att den mest lovande modellen uppnådde 95% korrekt klassificering med en beräkningstid på mindre än 15 sekunder. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-298370TRITA-SCI-GRU ; 2021:221application/pdfinfo:eu-repo/semantics/openAccess
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Music emotion recognition Mel spectrogram MFCC CENS Onset Tonnetz HPSS 1D convolutional neural network Attention LSTM 1DCNN BiLSTM Pareto optimality Mathematics Matematik
spellingShingle	Music emotion recognition Mel spectrogram MFCC CENS Onset Tonnetz HPSS 1D convolutional neural network Attention LSTM 1DCNN BiLSTM Pareto optimality Mathematics Matematik Ujihara, Rintaro Multi-objective optimization for model selection in music classification
description	With the breakthrough of machine learning techniques, the research concerning music emotion classification has been getting notable progress combining various audio features and state-of-the-art machine learning models. Still, it is known that the way to preprocess music samples and to choose which machine classification algorithm to use depends on data sets and the objective of each project work. The collaborating company of this thesis, Ichigoichie AB, is currently developing a system to categorize music data into positive/negative classes. To enhance the accuracy of the existing system, this project aims to figure out the best model through experiments with six audio features (Mel spectrogram, MFCC, HPSS, Onset, CENS, Tonnetz) and several machine learning models including deep neural network models for the classification task. For each model, hyperparameter tuning is performed and the model evaluation is carried out according to pareto optimality with regard to accuracy and execution time. The results show that the most promising model accomplished 95% correct classification with an execution time of less than 15 seconds. === I och med genombrottet av maskininlärningstekniker har forskning kring känsloklassificering i musik sett betydande framsteg genom att kombinera olikamusikanalysverktyg med nya maskinlärningsmodeller. Trots detta är hur man förbehandlar ljuddatat och valet av vilken maskinklassificeringsalgoritm som ska tillämpas beroende på vilken typ av data man arbetar med samt målet med projektet. Denna uppsats samarbetspartner, Ichigoichie AB, utvecklar för närvarande ett system för att kategorisera musikdata enligt positiva och negativa känslor. För att höja systemets noggrannhet är målet med denna uppsats att experimentellt hitta bästa modellen baserat på sex musik-egenskaper (Mel-spektrogram, MFCC, HPSS, Onset, CENS samt Tonnetz) och ett antal olika maskininlärningsmodeller, inklusive Deep Learning-modeller. Varje modell hyperparameteroptimeras och utvärderas enligt paretooptimalitet med hänsyn till noggrannhet och beräkningstid. Resultaten visar att den mest lovande modellen uppnådde 95% korrekt klassificering med en beräkningstid på mindre än 15 sekunder.
author	Ujihara, Rintaro
author_facet	Ujihara, Rintaro
author_sort	Ujihara, Rintaro
title	Multi-objective optimization for model selection in music classification
title_short	Multi-objective optimization for model selection in music classification
title_full	Multi-objective optimization for model selection in music classification
title_fullStr	Multi-objective optimization for model selection in music classification
title_full_unstemmed	Multi-objective optimization for model selection in music classification
title_sort	multi-objective optimization for model selection in music classification
publisher	KTH, Optimeringslära och systemteori
publishDate	2021
url	http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-298370
work_keys_str_mv	AT ujihararintaro multiobjectiveoptimizationformodelselectioninmusicclassification AT ujihararintaro flermalsoptimeringformodellvalimusikklassificering
_version_	1719415791325020160

Multi-objective optimization for model selection in music classification

Similar Items