New Variants of Nonnegative Matrix Factorization with Application to Speech Coding and Speech Enhancement

In this thesis, new variants of nonnegative matrix factorization (NMF) based ona convolutional data model, -divergence and sparsication are developed andanalyzed. These NMF variants are collectively referred to as -CNMF. Commonsparsication techniques such as L1-norm minimization and elastic net ared...

Full description

Bibliographic Details
Main Author:	Jafeth Villasana Tinajero, Pedro
Format:	Others
Language:	English
Published:	KTH, Skolan för elektroteknik och datavetenskap (EECS) 2019
Subjects:	Engineering and Technology Teknik och teknologier
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-253264

id	ndltd-UPSALLA1-oai-DiVA.org-kth-253264
record_format	oai_dc
spelling	ndltd-UPSALLA1-oai-DiVA.org-kth-2532642019-06-14T04:26:08ZNew Variants of Nonnegative Matrix Factorization with Application to Speech Coding and Speech EnhancementengJafeth Villasana Tinajero, PedroKTH, Skolan för elektroteknik och datavetenskap (EECS)2019Engineering and TechnologyTeknik och teknologierIn this thesis, new variants of nonnegative matrix factorization (NMF) based ona convolutional data model, -divergence and sparsication are developed andanalyzed. These NMF variants are collectively referred to as -CNMF. Commonsparsication techniques such as L1-norm minimization and elastic net arediscussed and a new regularizer is proposed. It is shown that the new regularizer,unlike the above-mentioned sparsication techniques, has control overthe number of active bases in the NMF dictionary. Moreover, the -CNMF isextended to multichannel signals: it learns a common dictionary by exploitingthe correlation between channels through a multichannel coecient matrix. Asa result, an algorithm for source separation based on multichannel -CNMF isdeveloped. The algorithm is further tested in a multilayer setting, in which thefrequency-shifted coecient matrices serve as input to the next higher layer.Finally, three variants of the algorithm are evaluated in the context of speechenhancement, focusing on the problem of speech extraction from complex auditoryscenes. Figures obtained from the SiSEC 2016 data show that the proposedalgorithms perform comparably or better than the state of the art. Den här rapporten behandlar utveckling och analys av nya varianter av icke-negativ matrisfaktorisering (eng: nonnegative matrix factorization, NMF), som baseras på en datormodell med faltning, β-divergens och glesa matriser. Dessa varianter av NMF:er kallas allmänt för β-CNMF:er, där C:et står för “convolutional”. Vidare diskuteras vanliga tekniker för regularisering, såsom L1-normminimering och elastiska nät, och en ny formulering för regularisering föreslås. Det visar sig att denna nya formulering, till skillnad från ovan nämnda regulariseringstekniker, möjliggör kontroll av antalet aktiva basfunktioner i NMF:ens bibliotek. Utöver detta så utökas även β-CNMF:en till att behandla multikanalsignaler genom att tränas på en gemensam bibliotek som utnyttjar korskorrelationen mellan kanalerna. Detta möjliggör utveckling av en algoritm för källseparation av multikanalsignaler. Vidare så testas algoritmen i multipla led, där frekvensskiftade koefficientmatriser i ett led utgör indata till nästa led. Slutligen så bedöms tre olika varianter av algoritmen för talförbättring, med fokus på extrahering av tal ur komplexa ljudmiljöer. Mätningar från SiSEC 2016 visar att den föreslagna algoritmen presterar lika bra eller överträffar nu-varande befintliga algoritmer. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-253264TRITA-EECS-EX ; 2018:659application/pdfinfo:eu-repo/semantics/openAccess
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Engineering and Technology Teknik och teknologier
spellingShingle	Engineering and Technology Teknik och teknologier Jafeth Villasana Tinajero, Pedro New Variants of Nonnegative Matrix Factorization with Application to Speech Coding and Speech Enhancement
description	In this thesis, new variants of nonnegative matrix factorization (NMF) based ona convolutional data model, -divergence and sparsication are developed andanalyzed. These NMF variants are collectively referred to as -CNMF. Commonsparsication techniques such as L1-norm minimization and elastic net arediscussed and a new regularizer is proposed. It is shown that the new regularizer,unlike the above-mentioned sparsication techniques, has control overthe number of active bases in the NMF dictionary. Moreover, the -CNMF isextended to multichannel signals: it learns a common dictionary by exploitingthe correlation between channels through a multichannel coecient matrix. Asa result, an algorithm for source separation based on multichannel -CNMF isdeveloped. The algorithm is further tested in a multilayer setting, in which thefrequency-shifted coecient matrices serve as input to the next higher layer.Finally, three variants of the algorithm are evaluated in the context of speechenhancement, focusing on the problem of speech extraction from complex auditoryscenes. Figures obtained from the SiSEC 2016 data show that the proposedalgorithms perform comparably or better than the state of the art. === Den här rapporten behandlar utveckling och analys av nya varianter av icke-negativ matrisfaktorisering (eng: nonnegative matrix factorization, NMF), som baseras på en datormodell med faltning, β-divergens och glesa matriser. Dessa varianter av NMF:er kallas allmänt för β-CNMF:er, där C:et står för “convolutional”. Vidare diskuteras vanliga tekniker för regularisering, såsom L1-normminimering och elastiska nät, och en ny formulering för regularisering föreslås. Det visar sig att denna nya formulering, till skillnad från ovan nämnda regulariseringstekniker, möjliggör kontroll av antalet aktiva basfunktioner i NMF:ens bibliotek. Utöver detta så utökas även β-CNMF:en till att behandla multikanalsignaler genom att tränas på en gemensam bibliotek som utnyttjar korskorrelationen mellan kanalerna. Detta möjliggör utveckling av en algoritm för källseparation av multikanalsignaler. Vidare så testas algoritmen i multipla led, där frekvensskiftade koefficientmatriser i ett led utgör indata till nästa led. Slutligen så bedöms tre olika varianter av algoritmen för talförbättring, med fokus på extrahering av tal ur komplexa ljudmiljöer. Mätningar från SiSEC 2016 visar att den föreslagna algoritmen presterar lika bra eller överträffar nu-varande befintliga algoritmer.
author	Jafeth Villasana Tinajero, Pedro
author_facet	Jafeth Villasana Tinajero, Pedro
author_sort	Jafeth Villasana Tinajero, Pedro
title	New Variants of Nonnegative Matrix Factorization with Application to Speech Coding and Speech Enhancement
title_short	New Variants of Nonnegative Matrix Factorization with Application to Speech Coding and Speech Enhancement
title_full	New Variants of Nonnegative Matrix Factorization with Application to Speech Coding and Speech Enhancement
title_fullStr	New Variants of Nonnegative Matrix Factorization with Application to Speech Coding and Speech Enhancement
title_full_unstemmed	New Variants of Nonnegative Matrix Factorization with Application to Speech Coding and Speech Enhancement
title_sort	new variants of nonnegative matrix factorization with application to speech coding and speech enhancement
publisher	KTH, Skolan för elektroteknik och datavetenskap (EECS)
publishDate	2019
url	http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-253264
work_keys_str_mv	AT jafethvillasanatinajeropedro newvariantsofnonnegativematrixfactorizationwithapplicationtospeechcodingandspeechenhancement
_version_	1719206275078684672

New Variants of Nonnegative Matrix Factorization with Application to Speech Coding and Speech Enhancement

Similar Items