High-Pitch Formant Estimation by Exploiting Temporal Change of Pitch

This paper considers the problem of obtaining an accurate spectral representation of speech formant structure when the voicing source exhibits a high fundamental frequency. Our work is inspired by auditory perception and physiological studies implicating the use of pitch dynamics in speech by humans...

Full description

Bibliographic Details
Main Authors: Wang, Tianyu Tom (Contributor), Quatieri, Thomas F. (Contributor)
Other Authors: Harvard University- (Contributor), Lincoln Laboratory (Contributor)
Format: Article
Language:English
Published: Institute of Electrical and Electronics Engineers, 2010-04-06T21:01:56Z.
Subjects:
Online Access:Get fulltext
LEADER 02496 am a22003013u 4500
001 53522
042 |a dc 
100 1 0 |a Wang, Tianyu Tom  |e author 
100 1 0 |a Harvard University-  |e contributor 
100 1 0 |a Harvard University-  |e contributor 
100 1 0 |a Lincoln Laboratory  |e contributor 
100 1 0 |a Quatieri, Thomas F.  |e contributor 
100 1 0 |a Wang, Tianyu Tom  |e contributor 
100 1 0 |a Quatieri, Thomas F.  |e contributor 
700 1 0 |a Quatieri, Thomas F.  |e author 
245 0 0 |a High-Pitch Formant Estimation by Exploiting Temporal Change of Pitch 
260 |b Institute of Electrical and Electronics Engineers,   |c 2010-04-06T21:01:56Z. 
856 |z Get fulltext  |u http://hdl.handle.net/1721.1/53522 
520 |a This paper considers the problem of obtaining an accurate spectral representation of speech formant structure when the voicing source exhibits a high fundamental frequency. Our work is inspired by auditory perception and physiological studies implicating the use of pitch dynamics in speech by humans. We develop and assess signal processing schemes aimed at exploiting temporal change of pitch to address the high-pitch formant frequency estimation problem. Specifically, we propose a 2-D analysis framework using 2-D transformations of the time-frequency space. In one approach, we project changing spectral harmonics over time to a 1-D function of frequency. In a second approach, we draw upon previous work of Quatieri and Ezzat , , with similarities to the auditory modeling efforts of Chi , where localized 2-D Fourier transforms of the time-frequency space provide improved source-filter separation when pitch is changing. Our methods show quantitative improvements for synthesized vowels with stationary formant structure in comparison to traditional and homomorphic linear prediction. We also demonstrate the feasibility of applying our methods on stationary vowel regions of natural speech spoken by high-pitch females of the TIMIT corpus. Finally, we show improvements afforded by the proposed analysis framework in formant tracking on examples of stationary and time-varying formant structure. 
520 |a United States. Dept. of Defense (Air Force Contract FA8721 05 C 0002) 
546 |a en_US 
690 |a temporal change of pitch 
690 |a spectrotemporal analysis 
690 |a linear prediction 
690 |a high-pitch effects 
690 |a formant estimation 
655 7 |a Article 
773 |t IEEE Transactions on Audio, Speech, and Language Processing,