Noise Robust Pitch Tracking by Subband Autocorrelation Classification

Speech pitch tracking is one of the elementary tasks of the Computational Auditory Scene Analysis (CASA). While a human can easily listen to the voiced pitch in highly noisy recordings, the performance of automatic speech pitch tracking degrades in unknown noisy audio conditions. Traditional pitch t...

Full description

Bibliographic Details
Main Author: Lee, Byung Suk
Language:English
Published: 2012
Subjects:
Online Access:https://doi.org/10.7916/D8SJ1SPJ
id ndltd-columbia.edu-oai-academiccommons.columbia.edu-10.7916-D8SJ1SPJ
record_format oai_dc
spelling ndltd-columbia.edu-oai-academiccommons.columbia.edu-10.7916-D8SJ1SPJ2019-05-09T15:13:54ZNoise Robust Pitch Tracking by Subband Autocorrelation ClassificationLee, Byung Suk2012ThesesElectrical engineeringComputer scienceSpeech pitch tracking is one of the elementary tasks of the Computational Auditory Scene Analysis (CASA). While a human can easily listen to the voiced pitch in highly noisy recordings, the performance of automatic speech pitch tracking degrades in unknown noisy audio conditions. Traditional pitch trackers use either autocorrelation or the Fourier transform to calculate periodicity, which works well for clean recordings. For noisy recordings, however, the accuracy of these pitch trackers degrades in general. For example, the information in parts of the frequency spectrum may be lost due to analog radio band transmission and/or contain additive noise of various kinds. Instead of explicitly using the most obvious features of autocorrelation, we propose a trained classier-based approach, which we call Subband Autocorrelation Classification (SAcC). A multi-layer perceptron (MLP) classier is trained on the principal components of the autocorrelations of subbands from an auditory filterbank. The output of the MLP classifier is temporally smoothed to produce the pitch track by finding the Viterbi path of a Hidden Markov Model (HMM). Training on various types of noisy speech recordings leads to a great increase in performance over state-of-the-art algorithms, according to both the traditional Gross Pitch Error (GPE) measure, and a proposed novel Pitch Tracking Error (PTE) which more fully reflects the accuracy of both pitch estimation/extraction and voicing detection in a single measure. To verify the generalization and specificity of SAcC, we test SAcC on a real world problem that has a large-scale noisy speech corpus. The data is from the DARPA Robust Automatic Transcription of Speech (RATS) program. The experiments on the performance evaluation of SAcC pitch tracking confirm the generalization power of SAcC across various unknown noise conditions and distinct speech corpora. We also report the use of SAcC output adds a significant improvement to a Speaker Identification (SID) system for RATS as well, suggesting the potential contribution of SAcC pitch tracking in the higher-level tasks.Englishhttps://doi.org/10.7916/D8SJ1SPJ
collection NDLTD
language English
sources NDLTD
topic Electrical engineering
Computer science
spellingShingle Electrical engineering
Computer science
Lee, Byung Suk
Noise Robust Pitch Tracking by Subband Autocorrelation Classification
description Speech pitch tracking is one of the elementary tasks of the Computational Auditory Scene Analysis (CASA). While a human can easily listen to the voiced pitch in highly noisy recordings, the performance of automatic speech pitch tracking degrades in unknown noisy audio conditions. Traditional pitch trackers use either autocorrelation or the Fourier transform to calculate periodicity, which works well for clean recordings. For noisy recordings, however, the accuracy of these pitch trackers degrades in general. For example, the information in parts of the frequency spectrum may be lost due to analog radio band transmission and/or contain additive noise of various kinds. Instead of explicitly using the most obvious features of autocorrelation, we propose a trained classier-based approach, which we call Subband Autocorrelation Classification (SAcC). A multi-layer perceptron (MLP) classier is trained on the principal components of the autocorrelations of subbands from an auditory filterbank. The output of the MLP classifier is temporally smoothed to produce the pitch track by finding the Viterbi path of a Hidden Markov Model (HMM). Training on various types of noisy speech recordings leads to a great increase in performance over state-of-the-art algorithms, according to both the traditional Gross Pitch Error (GPE) measure, and a proposed novel Pitch Tracking Error (PTE) which more fully reflects the accuracy of both pitch estimation/extraction and voicing detection in a single measure. To verify the generalization and specificity of SAcC, we test SAcC on a real world problem that has a large-scale noisy speech corpus. The data is from the DARPA Robust Automatic Transcription of Speech (RATS) program. The experiments on the performance evaluation of SAcC pitch tracking confirm the generalization power of SAcC across various unknown noise conditions and distinct speech corpora. We also report the use of SAcC output adds a significant improvement to a Speaker Identification (SID) system for RATS as well, suggesting the potential contribution of SAcC pitch tracking in the higher-level tasks.
author Lee, Byung Suk
author_facet Lee, Byung Suk
author_sort Lee, Byung Suk
title Noise Robust Pitch Tracking by Subband Autocorrelation Classification
title_short Noise Robust Pitch Tracking by Subband Autocorrelation Classification
title_full Noise Robust Pitch Tracking by Subband Autocorrelation Classification
title_fullStr Noise Robust Pitch Tracking by Subband Autocorrelation Classification
title_full_unstemmed Noise Robust Pitch Tracking by Subband Autocorrelation Classification
title_sort noise robust pitch tracking by subband autocorrelation classification
publishDate 2012
url https://doi.org/10.7916/D8SJ1SPJ
work_keys_str_mv AT leebyungsuk noiserobustpitchtrackingbysubbandautocorrelationclassification
_version_ 1719045602915909632