Computational neuroscience of speech recognition

Physical variability of speech combined with its perceptual constancy make speech recognition a challenging task. The human auditory brain, however, is able to perform speech recognition effortlessly. This thesis aims to understand the precise computational mechanisms that allow the auditory brain t...

Full description

Bibliographic Details
Main Author:	Higgins, Irina
Other Authors:	Stringer, Simon ; Schnupp, Jan
Published:	University of Oxford 2015
Subjects:	612.8
Online Access:	https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.714056

id	ndltd-bl.uk-oai-ethos.bl.uk-714056
record_format	oai_dc
spelling	ndltd-bl.uk-oai-ethos.bl.uk-7140562018-09-05T03:34:53ZComputational neuroscience of speech recognitionHiggins, IrinaStringer, Simon ; Schnupp, Jan2015Physical variability of speech combined with its perceptual constancy make speech recognition a challenging task. The human auditory brain, however, is able to perform speech recognition effortlessly. This thesis aims to understand the precise computational mechanisms that allow the auditory brain to do so. In particular, we look for the minimal subset of sub-cortical auditory brain areas that allow the primary auditory cortex to learn 'good representations' of speech-like auditory objects through spike-timing dependent plasticity (STDP) learning mechanisms as described by Bi & Poo (1998). A 'good representation' is defined as that which is informative of the stimulus class regardless of the variability in the raw input, while being less redundant and more compressed than the representations within the auditory nerve, which provides the firing inputs to the rest of the auditory brain hierarchy (Barlow 1961). Neurophysiological studies have provided insights into the architecture and response properties of different areas within the auditory brain hierarchy. We use these insights to guide the development of an unsupervised spiking neural network grounded in the neurophysiology of the auditory brain and equipped with spike-time dependent plasticity (STDP) learning (Bi & Poo 1998). The model was exposed to simple controlled speech- like stimuli (artificially synthesised phonemes and naturally spoken words) to investigate how stable representations that are invariant to the within- and between-speaker differences can emerge in the output area of the model. The output of the model is roughly equivalent to the primary auditory cortex. The aim of the first part of the thesis was to investigate what was the minimal taxonomy necessary for such representations to emerge through the interactions of spiking dynamics of the network neurons, their ability to learn through STDP learning and the statistics of the auditory input stimuli. It was found that sub-cortical pre-processing within the ventral cochlear nucleus and inferior colliculus was necessary to remove jitter inherent to the auditory nerve spike rasters, which would disrupt STDP learning in the primary auditory cortex otherwise. The second half of the thesis investigated the nature of neural encoding used within the primary auditory cortex stage of the model to represent the learnt auditory object categories. It was found that single cell binary encoding (DeWeese & Zador 2003) was sufficient to represent two synthesised vowel classes, however more complex population encoding using precisely timed spikes within polychronous chains (Izhikevich 2006) represented more complex naturally spoken words in speaker-invariant manner.612.8University of Oxfordhttps://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.714056https://ora.ox.ac.uk/objects/uuid:daa8d096-6534-4174-b63e-cc4161291c90Electronic Thesis or Dissertation
collection	NDLTD
sources	NDLTD
topic	612.8
spellingShingle	612.8 Higgins, Irina Computational neuroscience of speech recognition
description	Physical variability of speech combined with its perceptual constancy make speech recognition a challenging task. The human auditory brain, however, is able to perform speech recognition effortlessly. This thesis aims to understand the precise computational mechanisms that allow the auditory brain to do so. In particular, we look for the minimal subset of sub-cortical auditory brain areas that allow the primary auditory cortex to learn 'good representations' of speech-like auditory objects through spike-timing dependent plasticity (STDP) learning mechanisms as described by Bi & Poo (1998). A 'good representation' is defined as that which is informative of the stimulus class regardless of the variability in the raw input, while being less redundant and more compressed than the representations within the auditory nerve, which provides the firing inputs to the rest of the auditory brain hierarchy (Barlow 1961). Neurophysiological studies have provided insights into the architecture and response properties of different areas within the auditory brain hierarchy. We use these insights to guide the development of an unsupervised spiking neural network grounded in the neurophysiology of the auditory brain and equipped with spike-time dependent plasticity (STDP) learning (Bi & Poo 1998). The model was exposed to simple controlled speech- like stimuli (artificially synthesised phonemes and naturally spoken words) to investigate how stable representations that are invariant to the within- and between-speaker differences can emerge in the output area of the model. The output of the model is roughly equivalent to the primary auditory cortex. The aim of the first part of the thesis was to investigate what was the minimal taxonomy necessary for such representations to emerge through the interactions of spiking dynamics of the network neurons, their ability to learn through STDP learning and the statistics of the auditory input stimuli. It was found that sub-cortical pre-processing within the ventral cochlear nucleus and inferior colliculus was necessary to remove jitter inherent to the auditory nerve spike rasters, which would disrupt STDP learning in the primary auditory cortex otherwise. The second half of the thesis investigated the nature of neural encoding used within the primary auditory cortex stage of the model to represent the learnt auditory object categories. It was found that single cell binary encoding (DeWeese & Zador 2003) was sufficient to represent two synthesised vowel classes, however more complex population encoding using precisely timed spikes within polychronous chains (Izhikevich 2006) represented more complex naturally spoken words in speaker-invariant manner.
author2	Stringer, Simon ; Schnupp, Jan
author_facet	Stringer, Simon ; Schnupp, Jan Higgins, Irina
author	Higgins, Irina
author_sort	Higgins, Irina
title	Computational neuroscience of speech recognition
title_short	Computational neuroscience of speech recognition
title_full	Computational neuroscience of speech recognition
title_fullStr	Computational neuroscience of speech recognition
title_full_unstemmed	Computational neuroscience of speech recognition
title_sort	computational neuroscience of speech recognition
publisher	University of Oxford
publishDate	2015
url	https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.714056
work_keys_str_mv	AT higginsirina computationalneuroscienceofspeechrecognition
_version_	1718730568830550016

Computational neuroscience of speech recognition

Similar Items