Summary: | 博士 === 國立臺灣大學 === 電機工程學研究所 === 89 === “Spoken Dialogue” usually refers to the communicative act among people through natural language with speech. The eventual goal of spoken dialogue systems is of course to make computers capable of communicating with users in such a human-like way. However, natural language is the major tool for human beings in producing, expressing and delivering knowledge about everything in the world, and is extremely sophisticated for computers to analyze. It is thus no doubt that spoken dialogue systems with natural language capabilities imitating human-like behavior are very difficult to achieve. In order to reduce the complexity for analysis, the process of spoken dialogue in such systems is conventionally separated into five parts: speech recognition, language understanding (or speech understanding combining these two parts), dialogue management, sentence generation and text-to-speech synthesis (or voice response generation combining these two parts).
Though substantial progresses have been made in speech recognition technologies in recent years, speech recognition errors are always inevitable, which naturally degrades the accuracy and reliability for the spoken dialogue systems. Furthermore, due to the complexity of spoken dialogue systems which inevitably include many components such as language model, parsing, semantic processing and dialogue control, recognition errors make the performance and error analysis for such systems even more difficult.
On the other hand, with the advances of spoken dialogue technologies, people also expect more on system’s functionalities, such as the capabilities of switching among more than one topic concurrently, and providing users with more freedom and initiative. However, this not only requires much more sophisticated and challenging dialogue control, but the increased topics, domains and modalities may make the system design very difficult with the conventional dialogue modeling schemes and system architectures. It is therefore highly desired to have portable dialogue modeling schemes and extensible system architectures capable of handling sophisticated dialogue behavior, including conversation across multiple topics and domains with multiple modalities.
In this dissertation, some new technologies are developed for dealing with the problems mentioned above. The main results and contributions are listed below.
1.A robust and flexible speech understanding approach
This approach is in fact a tag-graph search algorithm. With this approach, the knowledge from acoustic recognition, language model, and grammar rules can be successfully integrated, and the understanding errors can be reduced.
2.A dialogue modeling scheme capable of handling multiple topics and domains
This is a plan-based dialogue modeling scheme based on an expert system model. By modularizing the dialogue manager, the domain-independent control functions are separated from the domain-dependent data, and can therefore be reused across topics and domains. The detailed approaches handling such issues as initiative taking, automatically popping the suspended topics, and knowledge consistency, are also discussed.
3. A distributed agent architecture with high extensibility
This architecture utilizes the concept of distributed systems and intelligent agent to handle the issue of extensibility for multi-domain spoken dialogue systems. In this architecture, a spoken dialogue system can be partitioned into a user interface agent, multiple dialogue agents, and the dialogue state and history shared by all dialogue agents. All dialogue agents can cooperate with one another to achieve user’s multiple goals across different domains. Knowledge consistency can be kept through common dialogue state and history.
4.Design and analysis for spoken dialogue systems based on quantitative simulations
A series of design and analysis methodologies for spoken dialogue systems with quantitative simulations are proposed. With this approach, the analysis and improvement of spoken dialogue systems can be performed before a prototype is accomplished. All factors in dialogue, such as system’s prompt strategy, user’s response pattern, recognition and understanding error, and system’s update strategy, can be controlled individually and precisely in the simulation. Through this approach, the spoken dialogue systems can be designed and analyzed by engineering approach instead of human experiences.
5.Error analysis for word-graph based speech understanding systems
This analysis approach is based on the “reference path” obtained from a target-given graph search. According to such analysis, Different types and sources of errors and the right directions for improvements can be identified more precisely, and the error analysis and system improvements can be much more efficient.
|