A Framework for Speech Recognition using Logistic Regression

Although discriminative approaches like the support vector machine or logistic regression have had great success in many pattern recognition application, they have only achieved limited success in speech recognition. Two of the difficulties often encountered include 1) speech signals typically have...

Full description

Bibliographic Details
Main Author:	Birkenes, Øystein
Format:	Doctoral Thesis
Language:	English
Published:	Norges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk 2007
Subjects:	Automatic speech recognition Signal processing Signalbehandling
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-1599 http://nbn-resolving.de/urn:isbn:978-82-471-3621-8

id	ndltd-UPSALLA1-oai-DiVA.org-ntnu-1599
record_format	oai_dc
spelling	ndltd-UPSALLA1-oai-DiVA.org-ntnu-15992013-01-08T13:04:31ZA Framework for Speech Recognition using Logistic RegressionengBirkenes, ØysteinNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikkFakultet for informasjonsteknologi, matematikk og elektroteknikk2007Automatic speech recognitionSignal processingSignalbehandlingAlthough discriminative approaches like the support vector machine or logistic regression have had great success in many pattern recognition application, they have only achieved limited success in speech recognition. Two of the difficulties often encountered include 1) speech signals typically have variable lengths, and 2) speech recognition is a sequence labeling problem, where each spoken utterance corresponds to a sequence of words or phones. In this thesis, we present a framework for automatic speech recognition using logistic regression. We solve the difficulty of variable length speech signals by including a mapping in the logistic regression framework that transforms each speech signal into a fixed-dimensional vector. The mapping is defined either explicitly with a set of hidden Markov models (HMMs) for the use in penalized logistic regression (PLR), or implicitly through a sequence kernel to be used with kernel logistic regression (KLR). Unlike previous work that has used HMMs in combination with a discriminative classification approach, we jointly optimize the logistic regression parameters and the HMM parameters using a penalized likelihood criterion. Experiments show that joint optimization improves the recognition accuracy significantly. The sequence kernel we present is motivated by the dynamic time warping (DTW) distance between two feature vector sequences. Instead of considering only the optimal alignment path, we sum up the contributions from all alignment paths. Preliminary experiments with the sequence kernel show promising results. A two-step approach is used for handling the sequence labeling problem. In the first step, a set of HMMs is used to generate an N-best list of sentence hypotheses for a spoken utterance. In the second step, these sentence hypotheses are rescored using logistic regression on the segments in the N-best list. A garbage class is introduced in the logistic regression framework in order to get reliable probability estimates for the segments in the N-best lists. We present results on both a connected digit recognition task and a continuous phone recognition task. Doctoral thesis, monographinfo:eu-repo/semantics/doctoralThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-1599urn:isbn:978-82-471-3621-8Doktoravhandlinger ved NTNU, 1503-8181 ; 2007:165application/pdfinfo:eu-repo/semantics/openAccess
collection	NDLTD
language	English
format	Doctoral Thesis
sources	NDLTD
topic	Automatic speech recognition Signal processing Signalbehandling
spellingShingle	Automatic speech recognition Signal processing Signalbehandling Birkenes, Øystein A Framework for Speech Recognition using Logistic Regression
description	Although discriminative approaches like the support vector machine or logistic regression have had great success in many pattern recognition application, they have only achieved limited success in speech recognition. Two of the difficulties often encountered include 1) speech signals typically have variable lengths, and 2) speech recognition is a sequence labeling problem, where each spoken utterance corresponds to a sequence of words or phones. In this thesis, we present a framework for automatic speech recognition using logistic regression. We solve the difficulty of variable length speech signals by including a mapping in the logistic regression framework that transforms each speech signal into a fixed-dimensional vector. The mapping is defined either explicitly with a set of hidden Markov models (HMMs) for the use in penalized logistic regression (PLR), or implicitly through a sequence kernel to be used with kernel logistic regression (KLR). Unlike previous work that has used HMMs in combination with a discriminative classification approach, we jointly optimize the logistic regression parameters and the HMM parameters using a penalized likelihood criterion. Experiments show that joint optimization improves the recognition accuracy significantly. The sequence kernel we present is motivated by the dynamic time warping (DTW) distance between two feature vector sequences. Instead of considering only the optimal alignment path, we sum up the contributions from all alignment paths. Preliminary experiments with the sequence kernel show promising results. A two-step approach is used for handling the sequence labeling problem. In the first step, a set of HMMs is used to generate an N-best list of sentence hypotheses for a spoken utterance. In the second step, these sentence hypotheses are rescored using logistic regression on the segments in the N-best list. A garbage class is introduced in the logistic regression framework in order to get reliable probability estimates for the segments in the N-best lists. We present results on both a connected digit recognition task and a continuous phone recognition task.
author	Birkenes, Øystein
author_facet	Birkenes, Øystein
author_sort	Birkenes, Øystein
title	A Framework for Speech Recognition using Logistic Regression
title_short	A Framework for Speech Recognition using Logistic Regression
title_full	A Framework for Speech Recognition using Logistic Regression
title_fullStr	A Framework for Speech Recognition using Logistic Regression
title_full_unstemmed	A Framework for Speech Recognition using Logistic Regression
title_sort	framework for speech recognition using logistic regression
publisher	Norges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk
publishDate	2007
url	http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-1599 http://nbn-resolving.de/urn:isbn:978-82-471-3621-8
work_keys_str_mv	AT birkenesøystein aframeworkforspeechrecognitionusinglogisticregression AT birkenesøystein frameworkforspeechrecognitionusinglogisticregression
_version_	1716507945500409856

A Framework for Speech Recognition using Logistic Regression

Similar Items