An information theoretic approach to natural language processing.

A new method of natural language processing, based on the theory of information is described. Parsing of a sentence is accomplished not in a sequential manner, but in a fashion that begins by searching for the main verb of the sentence, then for the object, subject and perhaps for a prepositional ph...

Full description

Bibliographic Details
Main Author: Grubbs, Elmer Andrew.
Other Authors: Schooley, Larry C.
Language:en
Published: The University of Arizona. 1994
Online Access:http://hdl.handle.net/10150/186886
Description
Summary:A new method of natural language processing, based on the theory of information is described. Parsing of a sentence is accomplished not in a sequential manner, but in a fashion that begins by searching for the main verb of the sentence, then for the object, subject and perhaps for a prepositional phrase. As each new part of speech is located, the uncertainty of the sentence's meaning is reduced. When the uncertainty reaches zero, the parsing is complete, and the machine performs the task assigned by the input sentence. The process is modeled by a Markov Chain, which can often be used for the internal representation of the sentence. All of this work is done for communication with an intelligent task oriented machine, but the theoretical basis for extending this to other, more complicated domains is also described. A description of a methodology for extending the theory, so that it can be used for the implementation of a machine that learns is also described in this paper. By using belief networks, the machine constructs additions to its basic Markov Chain in order to handle new verbs and objects, which were not included in the original programming. Once implemented, the system will then treat the new word as if it had originally been programmed into the machine. Finally, several prototypes are described which have been written to validate the theory presented. The information theoretic system contained herein is compared to other techniques of natural language processing, and shown to have significant advantages.