Effect of pronunciations on OOV queries in spoken term detection

The spoken term detection (STD) task aims to return relevant segments from a spoken archive that contain the query terms whether or not they are in the system vocabulary. This paper focuses on pronunciation modeling for out-of-vocabulary (OOV) terms which frequently occur in STD queries. The STD sys...

Full description

Bibliographic Details
Main Authors: Can, Dogan (Author), Cooper, Erica (Contributor), Sethy, Abhinav (Author), White, Chris (Author), Ramabhadran, Bhuvana (Author), Saraclar, Murat (Author)
Format: Article
Language:English
Published: Institute of Electrical and Electronics Engineers, 2010-10-07T15:33:00Z.
Subjects:
Online Access:Get fulltext
LEADER 01992 am a22003013u 4500
001 58936
042 |a dc 
100 1 0 |a Can, Dogan  |e author 
100 1 0 |a Cooper, Erica  |e contributor 
100 1 0 |a Cooper, Erica  |e contributor 
700 1 0 |a Cooper, Erica  |e author 
700 1 0 |a Sethy, Abhinav  |e author 
700 1 0 |a White, Chris  |e author 
700 1 0 |a Ramabhadran, Bhuvana  |e author 
700 1 0 |a Saraclar, Murat  |e author 
245 0 0 |a Effect of pronunciations on OOV queries in spoken term detection 
260 |b Institute of Electrical and Electronics Engineers,   |c 2010-10-07T15:33:00Z. 
856 |z Get fulltext  |u http://hdl.handle.net/1721.1/58936 
520 |a The spoken term detection (STD) task aims to return relevant segments from a spoken archive that contain the query terms whether or not they are in the system vocabulary. This paper focuses on pronunciation modeling for out-of-vocabulary (OOV) terms which frequently occur in STD queries. The STD system described in this paper indexes word-level and sub-word level lattices or confusion networks produced by an LVCSR system using weighted finite state transducers (WFST).We investigate the inclusion of n-best pronunciation variants for OOV terms (obtained from letter-to-sound rules) into the search and present the results obtained by indexing confusion networks as well as lattices. The following observations are worth mentioning: phone indexes generated from sub-words represent OOVs well and too many variants for the OOV terms degrade performance if pronunciations are not weighted. 
520 |a Bogazici University Research Fund 
520 |a Scientific and Technical Research Council of Turkey (TUBITAK) (BIDEB) 
546 |a en_US 
690 |a Weighted Finite State Transducers 
690 |a Spoken Term Detection 
690 |a Speech Recognition 
690 |a Speech Indexing and Retrieval 
655 7 |a Article 
773 |t Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2009