Summary: | Human listeners possess good speaker recognition abilities, and are capable of discriminating and identifying speakers from a range of spoken utterances. However, voice recognition can be enhanced when a listener is capable of understanding the speech produced by a talker. A well-established demonstration of this is known as the “Language-Familiarity” Effect (LFE) for voice recognition. This effect manifests as an impairment for voice recognition in foreign language speech conditions, as contrasted with recognition of talkers who are speaking in a listener’s mother tongue, and has been repeatedly demonstrated across a range of different tasks and languages. The LFE has previously been conceptualized as an analogue to the even better-known “Other-Race” Effect (ORE) for face recognition, where own-race faces are better remembered than other-race faces. An influential theoretical model of the ORE posits that faces are represented in a multidimensional “face-space”, whose dimensions are shaped by perceptual experience and code for features which are diagnostic for face individuation (Valentine, 1991). Over the course of an individual’s perceptual experience, these dimensions might become attuned for own-race face recognition; as a consequence, the dimensions will be sub-optimal for other-race recognition, leading to the illusion of increased similarity among different other-race faces, relative to own-race faces – what has been termed the “they-all-look-alike” effect. The idea of a complementary “voice-space” has already been posited in the auditory domain, and might serve as a useful model for the LFE. Speakers might be individuated on the basis of diagnostic dimensions which might code for important voice-acoustical attributes. However, these dimensions might also be shaped according to linguistic experience, and voice individuation (and recognition) might be optimised when listeners can take advantage of both general voice acoustics and stored representations of their native language to tell speakers apart. The face-space hypothesis represents a plausible model for the ORE, and evidence for it has accrued through computational modelling and neuroimaging work. Conversely, however, at present it merely serves as a descriptive model for the LFE. In this thesis, I combine behavioural testing, and neuroimaging studies using functional Magnetic Resonance Imaging (fMRI) to probe the nature of the representations of native and foreign speakers. Chapter 1 provides a general overview of voice processing with an emphasis on voice recognition. Subsequently, I provide a review of relevant literature pertaining to the LFE, and introduce a brief comparison to the ORE for faces in the context of the Valentine (1991) similarity model, ending with a description of the aims of the thesis. In Chapter 2, I present the results of a behavioural experiment where native English and Mandarin speaking listeners rated all pairwise combinations of a series of English- and Mandarin-speaking voices. Crucially, the LFE does not appear to be dependent on full comprehension of the linguistic message, as young infants can better tell apart speakers in their native language than in a foreign language before their speech comprehension abilities are fully mature. This suggests that exposure to the sound-structure characteristic of infants’ nascent mother tongue might be sufficient to enhance native language speaker discrimination, in the absence of full comprehension. Therefore, to examine a counterpart in adults, speech stimuli were subjected to time-reversal, a process which precludes lexical and semantic access but which leaves intact certain phonemic properties of the original speech signal. Both the English and Mandarin listeners rated pairs of native-language voices as sounding more dissimilar than foreign voices, suggesting that the language-specific sound-structure elements remaining in the reversed speech enabled an enhanced individuation of native voices. Next, in Chapter 3, I aimed to probe the neural basis of this enhanced individuation in an fMRI experiment which was intended to capture dissimilarities among paired cerebral responses to unintelligible native and foreign speakers. Here, I did not find a direct correlate of the behavioural effect, but did find that local patterns of response estimates in the bilateral superior temporal cortex (STC) appear to “discriminate” the different language categories in both English and Mandarin listeners. Specifically, when the pairwise dissimilarity in brain responses to different speakers was collected, relatively high dissimilarity was observed for pairs consisting of a response to an English speaker and a Mandarin speaker, whereas relatively low dissimilarity was observed for pairs consisting of two English or two Mandarin speakers. In Chapter 4, I report what is, to my knowledge, the first explicit examination of the neural basis for the LFE in intelligible speech. A monolingual sample of English speakers participated in an fMRI experiment where they listened to the voices of English and Mandarin speakers. Importantly, speech stimuli in both language conditions were matched in inter-speaker acoustical variability. Combined response patterns from bilateral voice-sensitive temporal lobe regions enabled a learning algorithm to decode the identities of the voices who elicited the responses, but, crucially, only in the native speech (English) condition. Interestingly, native-language speaker decoding was also achieved from a left-hemisphere voice-sensitive region alone, but not a right-hemisphere region. This putative leftward bias might reflect a higher discriminability of native-language talkers in the brain, via an enhanced ability to individuate voices on the basis of indexical variation around stored speech-sound representations. Finally, in Chapter 5, I conclude with a general discussion of the foregoing results, their implications for an analogous conception of the LFE and ORE, and some strands of thought for future investigation.
|