Summary: | The work described in this thesis aims to efficiently develop automatic speech recognition (ASR) in languages for which no such technology currently exists. The focus is on minority languages such as Welsh. An overview of the challenges of ASR development in lesser-spoken languages is presented. The specification of a 2000-speaker database for Welsh is described, with special reference to a novel lexicon searching process for Celtic minority languages. The collection of this database is also detailed, along with an analysis of the pitfalls facing those collecting minority language speech resources for ASR, and ways to overcome them. ASR is carried out on a small subset of the Welsh database (no more than 350 male speakers uttering one isolated digit each). This simulates a worst-case scenario for a language having only limited funds for a database collection. It is found that for Welsh, together with English and German, no more than 100-125 training speakers are required to reach a point beyond which the improvement in recognition performance is logarithmic. To reduce the number of training speakers beyond which this logarithmic improvement occurs, a model combination method similar to Yoshizawa et al.'s (Eurospeech 2001) is investigated. Model combination is achieved by creating a phonetic map between Welsh and German, in a manner similar to that proposed by Dalsgaard et al. (Eurospeech 1991). Use of the model combination method reduces the point at which logarithmic improvement occurs from 100-125 to 75-100 speakers.
|