Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005. === This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. === Includes bibliographical refe...

Full description

Bibliographic Details
Main Author:	Kondacs, Attila, 1972-
Other Authors:	Gerald J. Sussman.
Format:	Others
Language:	en_US
Published:	Massachusetts Institute of Technology 2005
Subjects:	Electrical Engineering and Computer Science.
Online Access:	http://hdl.handle.net/1721.1/27867

id	ndltd-MIT-oai-dspace.mit.edu-1721.1-27867
record_format	oai_dc
spelling	ndltd-MIT-oai-dspace.mit.edu-1721.1-278672019-05-02T15:56:13Z Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods Kondacs, Attila, 1972- Gerald J. Sussman. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Includes bibliographical references (p. 99-104). In this thesis I will be concerned with linking the observed speech signal to the configuration of articulators. Due to the potentially rapid motion of the articulators, the speech signal can be highly non-stationary. The typical linear analysis techniques that assume quasi-stationarity may not have sufficient time-frequency resolution to determine the place of articulation. I argue that the traditional low and high-level primitives of speech processing, frequency and phonemes, are inadequate and should be replaced by a representation with three layers: 1. short pitch period resonances and other spatio-temporal patterns; 2. articulator configuration trajectories; 3. syllables. The patterns indicate articulator configuration trajectories (how the tongue, jaws, etc. are moving), which are interpreted as syllables and words. My patterns are an alternative to frequency. I use short time-domain features of the sound waveform, which can be extracted from each vowel pitch period pattern, to identify the positions of the articulators with high reliability. These features are important because by capitalizing on detailed measurements within a single pitch period, the rapid articulator movements can be tracked. No linear signal processing approach can achieve the combination of sensitivity to short term changes and measurement accuracy resulting from these nonlinear techniques. The measurements I use are neurophysiologically plausible: the auditory system could be using similar methods. I have demonstrated this approach by constructing a robust technique for categorizing the English voiced stops as the consonants B, D, or G based on the vocalic portions of their releases. The classification recognizes 93.5%, 81.8% and 86.1% of the b, d and g to ae transitions with false positive rates 2.9%, 8.7% and 2.6% respectively. by Attila Kondacs. Ph.D. 2005-09-26T15:53:43Z 2005-09-26T15:53:43Z 2005 2005 Thesis http://hdl.handle.net/1721.1/27867 60662988 en_US M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 104 p. 3085753 bytes 3113870 bytes application/pdf application/pdf application/pdf Massachusetts Institute of Technology
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
topic	Electrical Engineering and Computer Science.
spellingShingle	Electrical Engineering and Computer Science. Kondacs, Attila, 1972- Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods
description	Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005. === This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. === Includes bibliographical references (p. 99-104). === In this thesis I will be concerned with linking the observed speech signal to the configuration of articulators. Due to the potentially rapid motion of the articulators, the speech signal can be highly non-stationary. The typical linear analysis techniques that assume quasi-stationarity may not have sufficient time-frequency resolution to determine the place of articulation. I argue that the traditional low and high-level primitives of speech processing, frequency and phonemes, are inadequate and should be replaced by a representation with three layers: 1. short pitch period resonances and other spatio-temporal patterns; 2. articulator configuration trajectories; 3. syllables. The patterns indicate articulator configuration trajectories (how the tongue, jaws, etc. are moving), which are interpreted as syllables and words. My patterns are an alternative to frequency. I use short time-domain features of the sound waveform, which can be extracted from each vowel pitch period pattern, to identify the positions of the articulators with high reliability. These features are important because by capitalizing on detailed measurements within a single pitch period, the rapid articulator movements can be tracked. No linear signal processing approach can achieve the combination of sensitivity to short term changes and measurement accuracy resulting from these nonlinear techniques. The measurements I use are neurophysiologically plausible: the auditory system could be using similar methods. I have demonstrated this approach by constructing a robust technique for categorizing the English voiced stops as the consonants B, D, or G based on the vocalic portions of their releases. The classification recognizes 93.5%, 81.8% and 86.1% of the b, d and g to ae transitions with false positive rates 2.9%, 8.7% and 2.6% respectively. === by Attila Kondacs. === Ph.D.
author2	Gerald J. Sussman.
author_facet	Gerald J. Sussman. Kondacs, Attila, 1972-
author	Kondacs, Attila, 1972-
author_sort	Kondacs, Attila, 1972-
title	Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods
title_short	Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods
title_full	Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods
title_fullStr	Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods
title_full_unstemmed	Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods
title_sort	determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods
publisher	Massachusetts Institute of Technology
publishDate	2005
url	http://hdl.handle.net/1721.1/27867
work_keys_str_mv	AT kondacsattila1972 determiningarticulatorconfigurationinvoicedstopconsonantsbymatchingtimedomainpatternsinpitchperiods
_version_	1719031270196903936

Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods

Similar Items