Fine-grained statistical structure of speech.

In spite of its acoustic diversity, the speech signal presents statistical regularities that can be exploited by biological or artificial systems for efficient coding. Independent Component Analysis (ICA) revealed that on small time scales (∼ 10 ms), the overall structure of speech is well captured...

Full description

Bibliographic Details
Main Author:	François Deloche
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2020-01-01
Series:	PLoS ONE
Online Access:	https://doi.org/10.1371/journal.pone.0230233

id	doaj-323192d9fc6f46a2beebe77bbbdca382
record_format	Article
spelling	doaj-323192d9fc6f46a2beebe77bbbdca3822021-03-03T21:36:11ZengPublic Library of Science (PLoS)PLoS ONE1932-62032020-01-01153e023023310.1371/journal.pone.0230233Fine-grained statistical structure of speech.François DelocheIn spite of its acoustic diversity, the speech signal presents statistical regularities that can be exploited by biological or artificial systems for efficient coding. Independent Component Analysis (ICA) revealed that on small time scales (∼ 10 ms), the overall structure of speech is well captured by a time-frequency representation whose frequency selectivity follows the same power law in the high frequency range 1-8 kHz as cochlear frequency selectivity in mammals. Variations in the power-law exponent, i.e. different time-frequency trade-offs, have been shown to provide additional adaptation to phonetic categories. Here, we adopt a parametric approach to investigate the variations of the exponent at a finer level of speech. The estimation procedure is based on a measure that reflects the sparsity of decompositions in a set of Gabor dictionaries whose atoms are Gaussian-modulated sinusoids. We examine the variations of the exponent associated with the best decomposition, first at the level of phonemes, then at an intra-phonemic level. We show that this analysis offers a rich interpretation of the fine-grained statistical structure of speech, and that the exponent values can be related to key acoustic properties. Two main results are: i) for plosives, the exponent is lowered by the release bursts, concealing higher values during the opening phases; ii) for vowels, the exponent is bound to formant bandwidths and decreases with the degree of acoustic radiation at the lips. This work further suggests that an efficient coding strategy is to reduce frequency selectivity with sound intensity level, congruent with the nonlinear behavior of cochlear filtering.https://doi.org/10.1371/journal.pone.0230233
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	François Deloche
spellingShingle	François Deloche Fine-grained statistical structure of speech. PLoS ONE
author_facet	François Deloche
author_sort	François Deloche
title	Fine-grained statistical structure of speech.
title_short	Fine-grained statistical structure of speech.
title_full	Fine-grained statistical structure of speech.
title_fullStr	Fine-grained statistical structure of speech.
title_full_unstemmed	Fine-grained statistical structure of speech.
title_sort	fine-grained statistical structure of speech.
publisher	Public Library of Science (PLoS)
series	PLoS ONE
issn	1932-6203
publishDate	2020-01-01
description	In spite of its acoustic diversity, the speech signal presents statistical regularities that can be exploited by biological or artificial systems for efficient coding. Independent Component Analysis (ICA) revealed that on small time scales (∼ 10 ms), the overall structure of speech is well captured by a time-frequency representation whose frequency selectivity follows the same power law in the high frequency range 1-8 kHz as cochlear frequency selectivity in mammals. Variations in the power-law exponent, i.e. different time-frequency trade-offs, have been shown to provide additional adaptation to phonetic categories. Here, we adopt a parametric approach to investigate the variations of the exponent at a finer level of speech. The estimation procedure is based on a measure that reflects the sparsity of decompositions in a set of Gabor dictionaries whose atoms are Gaussian-modulated sinusoids. We examine the variations of the exponent associated with the best decomposition, first at the level of phonemes, then at an intra-phonemic level. We show that this analysis offers a rich interpretation of the fine-grained statistical structure of speech, and that the exponent values can be related to key acoustic properties. Two main results are: i) for plosives, the exponent is lowered by the release bursts, concealing higher values during the opening phases; ii) for vowels, the exponent is bound to formant bandwidths and decreases with the degree of acoustic radiation at the lips. This work further suggests that an efficient coding strategy is to reduce frequency selectivity with sound intensity level, congruent with the nonlinear behavior of cochlear filtering.
url	https://doi.org/10.1371/journal.pone.0230233
work_keys_str_mv	AT francoisdeloche finegrainedstatisticalstructureofspeech
_version_	1714816113620025344

Fine-grained statistical structure of speech.

Similar Items