Multi-label feature selection with application to musical instrument recognition

Thesis (PhD)--Stellenbosch University, 2013. === ENGLISH ABSTRACT: An area of data mining and statistics that is currently receiving considerable attention is the field of multi-label learning. Problems in this field are concerned with scenarios where each data case can be associated with a set of l...

Full description

Bibliographic Details
Main Author: Sandrock, Trudie
Other Authors: Steel, S. J.
Format: Others
Language:en_ZA
Published: Stellenbosch : Stellenbosch University 2014
Subjects:
Online Access:http://hdl.handle.net/10019.1/95550
Description
Summary:Thesis (PhD)--Stellenbosch University, 2013. === ENGLISH ABSTRACT: An area of data mining and statistics that is currently receiving considerable attention is the field of multi-label learning. Problems in this field are concerned with scenarios where each data case can be associated with a set of labels instead of only one. In this thesis, we review the field of multi-label learning and discuss the lack of suitable benchmark data available for evaluating multi-label algorithms. We propose a technique for simulating multi-label data, which allows good control over different data characteristics and which could be useful for conducting comparative studies in the multi-label field. We also discuss the explosion in data in recent years, and highlight the need for some form of dimension reduction in order to alleviate some of the challenges presented by working with large datasets. Feature (or variable) selection is one way of achieving dimension reduction, and after a brief discussion of different feature selection techniques, we propose a new technique for feature selection in a multi-label context, based on the concept of independent probes. This technique is empirically evaluated by using simulated multi-label data and it is shown to achieve classification accuracy with a reduced set of features similar to that achieved with a full set of features. The proposed technique for feature selection is then also applied to the field of music information retrieval (MIR), specifically the problem of musical instrument recognition. An overview of the field of MIR is given, with particular emphasis on the instrument recognition problem. The particular goal of (polyphonic) musical instrument recognition is to automatically identify the instruments playing simultaneously in an audio clip, which is not a simple task. We specifically consider the case of duets – in other words, where two instruments are playing simultaneously – and approach the problem as a multi-label classification one. In our empirical study, we illustrate the complexity of musical instrument data and again show that our proposed feature selection technique is effective in identifying relevant features and thereby reducing the complexity of the dataset without negatively impacting on performance. === AFRIKAANSE OPSOMMING: ‘n Area van dataontginning en statistiek wat tans baie aandag ontvang, is die veld van multi-etiket leerteorie. Probleme in hierdie veld beskou scenarios waar elke datageval met ‘n stel etikette geassosieer kan word, instede van slegs een. In hierdie skripsie gee ons ‘n oorsig oor die veld van multi-etiket leerteorie en bespreek die gebrek aan geskikte standaard datastelle beskikbaar vir die evaluering van multi-etiket algoritmes. Ons stel ‘n tegniek vir die simulasie van multi-etiket data voor, wat goeie kontrole oor verskillende data eienskappe bied en wat nuttig kan wees om vergelykende studies in die multi-etiket veld uit te voer. Ons bespreek ook die onlangse ontploffing in data, en beklemtoon die behoefte aan ‘n vorm van dimensie reduksie om sommige van die uitdagings wat deur sulke groot datastelle gestel word die hoof te bied. Veranderlike seleksie is een manier van dimensie reduksie, en na ‘n vlugtige bespreking van verskillende veranderlike seleksie tegnieke, stel ons ‘n nuwe tegniek vir veranderlike seleksie in ‘n multi-etiket konteks voor, gebaseer op die konsep van onafhanklike soek-veranderlikes. Hierdie tegniek word empiries ge-evalueer deur die gebruik van gesimuleerde multi-etiket data en daar word gewys dat dieselfde klassifikasie akkuraatheid behaal kan word met ‘n verminderde stel veranderlikes as met die volle stel veranderlikes. Die voorgestelde tegniek vir veranderlike seleksie word ook toegepas in die veld van musiek dataontginning, spesifiek die probleem van die herkenning van musiekinstrumente. ‘n Oorsig van die musiek dataontginning veld word gegee, met spesifieke klem op die herkenning van musiekinstrumente. Die spesifieke doel van (polifoniese) musiekinstrument-herkenning is om instrumente te identifiseer wat saam in ‘n oudiosnit speel. Ons oorweeg spesifiek die geval van duette – met ander woorde, waar twee instrumente saam speel – en hanteer die probleem as ‘n multi-etiket klassifikasie een. In ons empiriese studie illustreer ons die kompleksiteit van musiekinstrumentdata en wys weereens dat ons voorgestelde veranderlike seleksie tegniek effektief daarin slaag om relevante veranderlikes te identifiseer en sodoende die kompleksiteit van die datastel te verminder sonder ‘n negatiewe impak op klassifikasie akkuraatheid.