Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005. === This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. === Includes bibliographical refe...
Main Author: | |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
Massachusetts Institute of Technology
2005
|
Subjects: | |
Online Access: | http://hdl.handle.net/1721.1/27867 |
id |
ndltd-MIT-oai-dspace.mit.edu-1721.1-27867 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-MIT-oai-dspace.mit.edu-1721.1-278672019-05-02T15:56:13Z Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods Kondacs, Attila, 1972- Gerald J. Sussman. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Includes bibliographical references (p. 99-104). In this thesis I will be concerned with linking the observed speech signal to the configuration of articulators. Due to the potentially rapid motion of the articulators, the speech signal can be highly non-stationary. The typical linear analysis techniques that assume quasi-stationarity may not have sufficient time-frequency resolution to determine the place of articulation. I argue that the traditional low and high-level primitives of speech processing, frequency and phonemes, are inadequate and should be replaced by a representation with three layers: 1. short pitch period resonances and other spatio-temporal patterns; 2. articulator configuration trajectories; 3. syllables. The patterns indicate articulator configuration trajectories (how the tongue, jaws, etc. are moving), which are interpreted as syllables and words. My patterns are an alternative to frequency. I use short time-domain features of the sound waveform, which can be extracted from each vowel pitch period pattern, to identify the positions of the articulators with high reliability. These features are important because by capitalizing on detailed measurements within a single pitch period, the rapid articulator movements can be tracked. No linear signal processing approach can achieve the combination of sensitivity to short term changes and measurement accuracy resulting from these nonlinear techniques. The measurements I use are neurophysiologically plausible: the auditory system could be using similar methods. I have demonstrated this approach by constructing a robust technique for categorizing the English voiced stops as the consonants B, D, or G based on the vocalic portions of their releases. The classification recognizes 93.5%, 81.8% and 86.1% of the b, d and g to ae transitions with false positive rates 2.9%, 8.7% and 2.6% respectively. by Attila Kondacs. Ph.D. 2005-09-26T15:53:43Z 2005-09-26T15:53:43Z 2005 2005 Thesis http://hdl.handle.net/1721.1/27867 60662988 en_US M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 104 p. 3085753 bytes 3113870 bytes application/pdf application/pdf application/pdf Massachusetts Institute of Technology |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
topic |
Electrical Engineering and Computer Science. |
spellingShingle |
Electrical Engineering and Computer Science. Kondacs, Attila, 1972- Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods |
description |
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005. === This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. === Includes bibliographical references (p. 99-104). === In this thesis I will be concerned with linking the observed speech signal to the configuration of articulators. Due to the potentially rapid motion of the articulators, the speech signal can be highly non-stationary. The typical linear analysis techniques that assume quasi-stationarity may not have sufficient time-frequency resolution to determine the place of articulation. I argue that the traditional low and high-level primitives of speech processing, frequency and phonemes, are inadequate and should be replaced by a representation with three layers: 1. short pitch period resonances and other spatio-temporal patterns; 2. articulator configuration trajectories; 3. syllables. The patterns indicate articulator configuration trajectories (how the tongue, jaws, etc. are moving), which are interpreted as syllables and words. My patterns are an alternative to frequency. I use short time-domain features of the sound waveform, which can be extracted from each vowel pitch period pattern, to identify the positions of the articulators with high reliability. These features are important because by capitalizing on detailed measurements within a single pitch period, the rapid articulator movements can be tracked. No linear signal processing approach can achieve the combination of sensitivity to short term changes and measurement accuracy resulting from these nonlinear techniques. The measurements I use are neurophysiologically plausible: the auditory system could be using similar methods. I have demonstrated this approach by constructing a robust technique for categorizing the English voiced stops as the consonants B, D, or G based on the vocalic portions of their releases. The classification recognizes 93.5%, 81.8% and 86.1% of the b, d and g to ae transitions with false positive rates 2.9%, 8.7% and 2.6% respectively. === by Attila Kondacs. === Ph.D. |
author2 |
Gerald J. Sussman. |
author_facet |
Gerald J. Sussman. Kondacs, Attila, 1972- |
author |
Kondacs, Attila, 1972- |
author_sort |
Kondacs, Attila, 1972- |
title |
Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods |
title_short |
Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods |
title_full |
Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods |
title_fullStr |
Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods |
title_full_unstemmed |
Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods |
title_sort |
determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods |
publisher |
Massachusetts Institute of Technology |
publishDate |
2005 |
url |
http://hdl.handle.net/1721.1/27867 |
work_keys_str_mv |
AT kondacsattila1972 determiningarticulatorconfigurationinvoicedstopconsonantsbymatchingtimedomainpatternsinpitchperiods |
_version_ |
1719031270196903936 |