Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005. === This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. === Includes bibliographical refe...

Full description

Bibliographic Details
Main Author: Kondacs, Attila, 1972-
Other Authors: Gerald J. Sussman.
Format: Others
Language:en_US
Published: Massachusetts Institute of Technology 2005
Subjects:
Online Access:http://hdl.handle.net/1721.1/27867
id ndltd-MIT-oai-dspace.mit.edu-1721.1-27867
record_format oai_dc
spelling ndltd-MIT-oai-dspace.mit.edu-1721.1-278672019-05-02T15:56:13Z Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods Kondacs, Attila, 1972- Gerald J. Sussman. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Includes bibliographical references (p. 99-104). In this thesis I will be concerned with linking the observed speech signal to the configuration of articulators. Due to the potentially rapid motion of the articulators, the speech signal can be highly non-stationary. The typical linear analysis techniques that assume quasi-stationarity may not have sufficient time-frequency resolution to determine the place of articulation. I argue that the traditional low and high-level primitives of speech processing, frequency and phonemes, are inadequate and should be replaced by a representation with three layers: 1. short pitch period resonances and other spatio-temporal patterns; 2. articulator configuration trajectories; 3. syllables. The patterns indicate articulator configuration trajectories (how the tongue, jaws, etc. are moving), which are interpreted as syllables and words. My patterns are an alternative to frequency. I use short time-domain features of the sound waveform, which can be extracted from each vowel pitch period pattern, to identify the positions of the articulators with high reliability. These features are important because by capitalizing on detailed measurements within a single pitch period, the rapid articulator movements can be tracked. No linear signal processing approach can achieve the combination of sensitivity to short term changes and measurement accuracy resulting from these nonlinear techniques. The measurements I use are neurophysiologically plausible: the auditory system could be using similar methods. I have demonstrated this approach by constructing a robust technique for categorizing the English voiced stops as the consonants B, D, or G based on the vocalic portions of their releases. The classification recognizes 93.5%, 81.8% and 86.1% of the b, d and g to ae transitions with false positive rates 2.9%, 8.7% and 2.6% respectively. by Attila Kondacs. Ph.D. 2005-09-26T15:53:43Z 2005-09-26T15:53:43Z 2005 2005 Thesis http://hdl.handle.net/1721.1/27867 60662988 en_US M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 104 p. 3085753 bytes 3113870 bytes application/pdf application/pdf application/pdf Massachusetts Institute of Technology
collection NDLTD
language en_US
format Others
sources NDLTD
topic Electrical Engineering and Computer Science.
spellingShingle Electrical Engineering and Computer Science.
Kondacs, Attila, 1972-
Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods
description Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005. === This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. === Includes bibliographical references (p. 99-104). === In this thesis I will be concerned with linking the observed speech signal to the configuration of articulators. Due to the potentially rapid motion of the articulators, the speech signal can be highly non-stationary. The typical linear analysis techniques that assume quasi-stationarity may not have sufficient time-frequency resolution to determine the place of articulation. I argue that the traditional low and high-level primitives of speech processing, frequency and phonemes, are inadequate and should be replaced by a representation with three layers: 1. short pitch period resonances and other spatio-temporal patterns; 2. articulator configuration trajectories; 3. syllables. The patterns indicate articulator configuration trajectories (how the tongue, jaws, etc. are moving), which are interpreted as syllables and words. My patterns are an alternative to frequency. I use short time-domain features of the sound waveform, which can be extracted from each vowel pitch period pattern, to identify the positions of the articulators with high reliability. These features are important because by capitalizing on detailed measurements within a single pitch period, the rapid articulator movements can be tracked. No linear signal processing approach can achieve the combination of sensitivity to short term changes and measurement accuracy resulting from these nonlinear techniques. The measurements I use are neurophysiologically plausible: the auditory system could be using similar methods. I have demonstrated this approach by constructing a robust technique for categorizing the English voiced stops as the consonants B, D, or G based on the vocalic portions of their releases. The classification recognizes 93.5%, 81.8% and 86.1% of the b, d and g to ae transitions with false positive rates 2.9%, 8.7% and 2.6% respectively. === by Attila Kondacs. === Ph.D.
author2 Gerald J. Sussman.
author_facet Gerald J. Sussman.
Kondacs, Attila, 1972-
author Kondacs, Attila, 1972-
author_sort Kondacs, Attila, 1972-
title Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods
title_short Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods
title_full Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods
title_fullStr Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods
title_full_unstemmed Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods
title_sort determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods
publisher Massachusetts Institute of Technology
publishDate 2005
url http://hdl.handle.net/1721.1/27867
work_keys_str_mv AT kondacsattila1972 determiningarticulatorconfigurationinvoicedstopconsonantsbymatchingtimedomainpatternsinpitchperiods
_version_ 1719031270196903936