Fine-grained statistical structure of speech.
In spite of its acoustic diversity, the speech signal presents statistical regularities that can be exploited by biological or artificial systems for efficient coding. Independent Component Analysis (ICA) revealed that on small time scales (∼ 10 ms), the overall structure of speech is well captured...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2020-01-01
|
Series: | PLoS ONE |
Online Access: | https://doi.org/10.1371/journal.pone.0230233 |
id |
doaj-323192d9fc6f46a2beebe77bbbdca382 |
---|---|
record_format |
Article |
spelling |
doaj-323192d9fc6f46a2beebe77bbbdca3822021-03-03T21:36:11ZengPublic Library of Science (PLoS)PLoS ONE1932-62032020-01-01153e023023310.1371/journal.pone.0230233Fine-grained statistical structure of speech.François DelocheIn spite of its acoustic diversity, the speech signal presents statistical regularities that can be exploited by biological or artificial systems for efficient coding. Independent Component Analysis (ICA) revealed that on small time scales (∼ 10 ms), the overall structure of speech is well captured by a time-frequency representation whose frequency selectivity follows the same power law in the high frequency range 1-8 kHz as cochlear frequency selectivity in mammals. Variations in the power-law exponent, i.e. different time-frequency trade-offs, have been shown to provide additional adaptation to phonetic categories. Here, we adopt a parametric approach to investigate the variations of the exponent at a finer level of speech. The estimation procedure is based on a measure that reflects the sparsity of decompositions in a set of Gabor dictionaries whose atoms are Gaussian-modulated sinusoids. We examine the variations of the exponent associated with the best decomposition, first at the level of phonemes, then at an intra-phonemic level. We show that this analysis offers a rich interpretation of the fine-grained statistical structure of speech, and that the exponent values can be related to key acoustic properties. Two main results are: i) for plosives, the exponent is lowered by the release bursts, concealing higher values during the opening phases; ii) for vowels, the exponent is bound to formant bandwidths and decreases with the degree of acoustic radiation at the lips. This work further suggests that an efficient coding strategy is to reduce frequency selectivity with sound intensity level, congruent with the nonlinear behavior of cochlear filtering.https://doi.org/10.1371/journal.pone.0230233 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
François Deloche |
spellingShingle |
François Deloche Fine-grained statistical structure of speech. PLoS ONE |
author_facet |
François Deloche |
author_sort |
François Deloche |
title |
Fine-grained statistical structure of speech. |
title_short |
Fine-grained statistical structure of speech. |
title_full |
Fine-grained statistical structure of speech. |
title_fullStr |
Fine-grained statistical structure of speech. |
title_full_unstemmed |
Fine-grained statistical structure of speech. |
title_sort |
fine-grained statistical structure of speech. |
publisher |
Public Library of Science (PLoS) |
series |
PLoS ONE |
issn |
1932-6203 |
publishDate |
2020-01-01 |
description |
In spite of its acoustic diversity, the speech signal presents statistical regularities that can be exploited by biological or artificial systems for efficient coding. Independent Component Analysis (ICA) revealed that on small time scales (∼ 10 ms), the overall structure of speech is well captured by a time-frequency representation whose frequency selectivity follows the same power law in the high frequency range 1-8 kHz as cochlear frequency selectivity in mammals. Variations in the power-law exponent, i.e. different time-frequency trade-offs, have been shown to provide additional adaptation to phonetic categories. Here, we adopt a parametric approach to investigate the variations of the exponent at a finer level of speech. The estimation procedure is based on a measure that reflects the sparsity of decompositions in a set of Gabor dictionaries whose atoms are Gaussian-modulated sinusoids. We examine the variations of the exponent associated with the best decomposition, first at the level of phonemes, then at an intra-phonemic level. We show that this analysis offers a rich interpretation of the fine-grained statistical structure of speech, and that the exponent values can be related to key acoustic properties. Two main results are: i) for plosives, the exponent is lowered by the release bursts, concealing higher values during the opening phases; ii) for vowels, the exponent is bound to formant bandwidths and decreases with the degree of acoustic radiation at the lips. This work further suggests that an efficient coding strategy is to reduce frequency selectivity with sound intensity level, congruent with the nonlinear behavior of cochlear filtering. |
url |
https://doi.org/10.1371/journal.pone.0230233 |
work_keys_str_mv |
AT francoisdeloche finegrainedstatisticalstructureofspeech |
_version_ |
1714816113620025344 |