A Study in Speaker Dependent Medium Vocabulary Word Recognition: Application to Human/Computer Interface

Human interfaces to computers continue to be an active area of research. The keyboard is considered the basic interface for editing control as well as text input. Problems of correct typing and typing speed have urged research for alternative means for keyboard replacement, or at least "resizin...

Full description

Bibliographic Details
Main Author:	Abdallah, Moatassem Mahmoud
Other Authors:	Electrical and Computer Engineering
Format:	Others
Language:	en_US
Published:	Virginia Tech 2017
Subjects:	Modular Neural Network Speech Processing Human/Computer interface
Online Access:	http://hdl.handle.net/10919/77997 http://scholar.lib.vt.edu/theses/available/etd-02032000-08530023/

id	ndltd-VTETD-oai-vtechworks.lib.vt.edu-10919-77997
record_format	oai_dc
spelling	ndltd-VTETD-oai-vtechworks.lib.vt.edu-10919-779972020-09-29T05:30:44Z A Study in Speaker Dependent Medium Vocabulary Word Recognition: Application to Human/Computer Interface Abdallah, Moatassem Mahmoud Electrical and Computer Engineering VanLandingham, Hugh F. Abbott, A. Lynn Roach, John W. Moose, Richard L. Riad, Sedki Mohamed Modular Neural Network Speech Processing Human/Computer interface Human interfaces to computers continue to be an active area of research. The keyboard is considered the basic interface for editing control as well as text input. Problems of correct typing and typing speed have urged research for alternative means for keyboard replacement, or at least "resizing" its monopoly. Pointing devices (e.g. a mouse) have been developed, and supporting software with icons is now widely used. Two other means are being developed and operationally tested, namely, the pen for handwriting text, commands and drawings, and spoken language interface, which is the subject of this thesis. Human/computer interface is an interactive man-machine communication facility that enjoys the following advantages. <ul><li>High input speed: some experiments reveal that the rate of information input by speech is three times faster than keyboard input and eight times faster than inputting characters by hand. </li><li>No training needed: because the generation of speech is a very natural human action, it requires no special training. </li><li>Parallel processing with other information: production of speech works quite well in conjunction with gestures of hands and feet for visual perception of information. </li><li>Simple and economical input sensor: microphones are inexpensive and are readily available. </li><li>Coping with handicaps: these interfaces can be used in unusual circumstances of darkness, blindness, or other visual handicap.</li></ul> This dissertation presents a design of a Human Computer Interface (HCI) system that can be trained to work with an individual speaker. A new approach is introduced to extract key voice features, called Median Linear Predictive Coding (MLPC). MLPC reduces the HCI calculation time and gives an improved recognition rate. This design eliminated the typical Multi-layer Perceptron (MLP) problems of complexity growth with vocabulary size, the large training times required and the need for complete re-training whenever the vocabulary is extended. A novel modular neural network architecture, called a Pyramidal Modular Neural Network (PMNN), is introduced for recursive speech identification. In addition, many other system algorithms/components, such as speech endpoint detection, automatic noise thresholding, etc., must be tailored correctly in order to achieve high recognition accuracy. Ph. D. 2017-06-09T18:30:46Z 2017-06-09T18:30:46Z 2000-01-27 2000-02-03 2006-10-12 2000-02-05 Dissertation Text etd-02032000-08530023 http://hdl.handle.net/10919/77997 http://scholar.lib.vt.edu/theses/available/etd-02032000-08530023/ en_US In Copyright http://rightsstatements.org/vocab/InC/1.0/ application/pdf application/pdf Virginia Tech
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
topic	Modular Neural Network Speech Processing Human/Computer interface
spellingShingle	Modular Neural Network Speech Processing Human/Computer interface Abdallah, Moatassem Mahmoud A Study in Speaker Dependent Medium Vocabulary Word Recognition: Application to Human/Computer Interface
description	Human interfaces to computers continue to be an active area of research. The keyboard is considered the basic interface for editing control as well as text input. Problems of correct typing and typing speed have urged research for alternative means for keyboard replacement, or at least "resizing" its monopoly. Pointing devices (e.g. a mouse) have been developed, and supporting software with icons is now widely used. Two other means are being developed and operationally tested, namely, the pen for handwriting text, commands and drawings, and spoken language interface, which is the subject of this thesis. Human/computer interface is an interactive man-machine communication facility that enjoys the following advantages. <ul><li>High input speed: some experiments reveal that the rate of information input by speech is three times faster than keyboard input and eight times faster than inputting characters by hand. </li><li>No training needed: because the generation of speech is a very natural human action, it requires no special training. </li><li>Parallel processing with other information: production of speech works quite well in conjunction with gestures of hands and feet for visual perception of information. </li><li>Simple and economical input sensor: microphones are inexpensive and are readily available. </li><li>Coping with handicaps: these interfaces can be used in unusual circumstances of darkness, blindness, or other visual handicap.</li></ul> This dissertation presents a design of a Human Computer Interface (HCI) system that can be trained to work with an individual speaker. A new approach is introduced to extract key voice features, called Median Linear Predictive Coding (MLPC). MLPC reduces the HCI calculation time and gives an improved recognition rate. This design eliminated the typical Multi-layer Perceptron (MLP) problems of complexity growth with vocabulary size, the large training times required and the need for complete re-training whenever the vocabulary is extended. A novel modular neural network architecture, called a Pyramidal Modular Neural Network (PMNN), is introduced for recursive speech identification. In addition, many other system algorithms/components, such as speech endpoint detection, automatic noise thresholding, etc., must be tailored correctly in order to achieve high recognition accuracy. === Ph. D.
author2	Electrical and Computer Engineering
author_facet	Electrical and Computer Engineering Abdallah, Moatassem Mahmoud
author	Abdallah, Moatassem Mahmoud
author_sort	Abdallah, Moatassem Mahmoud
title	A Study in Speaker Dependent Medium Vocabulary Word Recognition: Application to Human/Computer Interface
title_short	A Study in Speaker Dependent Medium Vocabulary Word Recognition: Application to Human/Computer Interface
title_full	A Study in Speaker Dependent Medium Vocabulary Word Recognition: Application to Human/Computer Interface
title_fullStr	A Study in Speaker Dependent Medium Vocabulary Word Recognition: Application to Human/Computer Interface
title_full_unstemmed	A Study in Speaker Dependent Medium Vocabulary Word Recognition: Application to Human/Computer Interface
title_sort	study in speaker dependent medium vocabulary word recognition: application to human/computer interface
publisher	Virginia Tech
publishDate	2017
url	http://hdl.handle.net/10919/77997 http://scholar.lib.vt.edu/theses/available/etd-02032000-08530023/
work_keys_str_mv	AT abdallahmoatassemmahmoud astudyinspeakerdependentmediumvocabularywordrecognitionapplicationtohumancomputerinterface AT abdallahmoatassemmahmoud studyinspeakerdependentmediumvocabularywordrecognitionapplicationtohumancomputerinterface
_version_	1719343192448434176

A Study in Speaker Dependent Medium Vocabulary Word Recognition: Application to Human/Computer Interface

Similar Items