Noise Robust Pitch Tracking by Subband Autocorrelation Classification
Speech pitch tracking is one of the elementary tasks of the Computational Auditory Scene Analysis (CASA). While a human can easily listen to the voiced pitch in highly noisy recordings, the performance of automatic speech pitch tracking degrades in unknown noisy audio conditions. Traditional pitch t...
Main Author: | |
---|---|
Language: | English |
Published: |
2012
|
Subjects: | |
Online Access: | https://doi.org/10.7916/D8SJ1SPJ |
id |
ndltd-columbia.edu-oai-academiccommons.columbia.edu-10.7916-D8SJ1SPJ |
---|---|
record_format |
oai_dc |
spelling |
ndltd-columbia.edu-oai-academiccommons.columbia.edu-10.7916-D8SJ1SPJ2019-05-09T15:13:54ZNoise Robust Pitch Tracking by Subband Autocorrelation ClassificationLee, Byung Suk2012ThesesElectrical engineeringComputer scienceSpeech pitch tracking is one of the elementary tasks of the Computational Auditory Scene Analysis (CASA). While a human can easily listen to the voiced pitch in highly noisy recordings, the performance of automatic speech pitch tracking degrades in unknown noisy audio conditions. Traditional pitch trackers use either autocorrelation or the Fourier transform to calculate periodicity, which works well for clean recordings. For noisy recordings, however, the accuracy of these pitch trackers degrades in general. For example, the information in parts of the frequency spectrum may be lost due to analog radio band transmission and/or contain additive noise of various kinds. Instead of explicitly using the most obvious features of autocorrelation, we propose a trained classier-based approach, which we call Subband Autocorrelation Classification (SAcC). A multi-layer perceptron (MLP) classier is trained on the principal components of the autocorrelations of subbands from an auditory filterbank. The output of the MLP classifier is temporally smoothed to produce the pitch track by finding the Viterbi path of a Hidden Markov Model (HMM). Training on various types of noisy speech recordings leads to a great increase in performance over state-of-the-art algorithms, according to both the traditional Gross Pitch Error (GPE) measure, and a proposed novel Pitch Tracking Error (PTE) which more fully reflects the accuracy of both pitch estimation/extraction and voicing detection in a single measure. To verify the generalization and specificity of SAcC, we test SAcC on a real world problem that has a large-scale noisy speech corpus. The data is from the DARPA Robust Automatic Transcription of Speech (RATS) program. The experiments on the performance evaluation of SAcC pitch tracking confirm the generalization power of SAcC across various unknown noise conditions and distinct speech corpora. We also report the use of SAcC output adds a significant improvement to a Speaker Identification (SID) system for RATS as well, suggesting the potential contribution of SAcC pitch tracking in the higher-level tasks.Englishhttps://doi.org/10.7916/D8SJ1SPJ |
collection |
NDLTD |
language |
English |
sources |
NDLTD |
topic |
Electrical engineering Computer science |
spellingShingle |
Electrical engineering Computer science Lee, Byung Suk Noise Robust Pitch Tracking by Subband Autocorrelation Classification |
description |
Speech pitch tracking is one of the elementary tasks of the Computational Auditory Scene Analysis (CASA). While a human can easily listen to the voiced pitch in highly noisy recordings, the performance of automatic speech pitch tracking degrades in unknown noisy audio conditions. Traditional pitch trackers use either autocorrelation or the Fourier transform to calculate periodicity, which works well for clean recordings. For noisy recordings, however, the accuracy of these pitch trackers degrades in general. For example, the information in parts of the frequency spectrum may be lost due to analog radio band transmission and/or contain additive noise of various kinds. Instead of explicitly using the most obvious features of autocorrelation, we propose a trained classier-based approach, which we call Subband Autocorrelation Classification (SAcC). A multi-layer perceptron (MLP) classier is trained on the principal components of the autocorrelations of subbands from an auditory filterbank. The output of the MLP classifier is temporally smoothed to produce the pitch track by finding the Viterbi path of a Hidden Markov Model (HMM). Training on various types of noisy speech recordings leads to a great increase in performance over state-of-the-art algorithms, according to both the traditional Gross Pitch Error (GPE) measure, and a proposed novel Pitch Tracking Error (PTE) which more fully reflects the accuracy of both pitch estimation/extraction and voicing detection in a single measure. To verify the generalization and specificity of SAcC, we test SAcC on a real world problem that has a large-scale noisy speech corpus. The data is from the DARPA Robust Automatic Transcription of Speech (RATS) program. The experiments on the performance evaluation of SAcC pitch tracking confirm the generalization power of SAcC across various unknown noise conditions and distinct speech corpora. We also report the use of SAcC output adds a significant improvement to a Speaker Identification (SID) system for RATS as well, suggesting the potential contribution of SAcC pitch tracking in the higher-level tasks. |
author |
Lee, Byung Suk |
author_facet |
Lee, Byung Suk |
author_sort |
Lee, Byung Suk |
title |
Noise Robust Pitch Tracking by Subband Autocorrelation Classification |
title_short |
Noise Robust Pitch Tracking by Subband Autocorrelation Classification |
title_full |
Noise Robust Pitch Tracking by Subband Autocorrelation Classification |
title_fullStr |
Noise Robust Pitch Tracking by Subband Autocorrelation Classification |
title_full_unstemmed |
Noise Robust Pitch Tracking by Subband Autocorrelation Classification |
title_sort |
noise robust pitch tracking by subband autocorrelation classification |
publishDate |
2012 |
url |
https://doi.org/10.7916/D8SJ1SPJ |
work_keys_str_mv |
AT leebyungsuk noiserobustpitchtrackingbysubbandautocorrelationclassification |
_version_ |
1719045602915909632 |