Data Mining for Induction of Adjacency Grammars and Application to Terrain Pattern Recognition

The process of syntactic pattern recognition makes the analogy between the syntax of languages and the structure of spatial patterns. The recognition process is achieved by parsing a given pattern to determine if it is syntactically correct with respect to a defined grammar. The generation of patter...

Full description

Bibliographic Details
Main Author: Leighty, Brian David
Format: Others
Published: NSUWorks 2009
Subjects:
Online Access:http://nsuworks.nova.edu/gscis_etd/212
http://nsuworks.nova.edu/cgi/viewcontent.cgi?article=1211&context=gscis_etd
Description
Summary:The process of syntactic pattern recognition makes the analogy between the syntax of languages and the structure of spatial patterns. The recognition process is achieved by parsing a given pattern to determine if it is syntactically correct with respect to a defined grammar. The generation of pattern grammars can be a cumbersome process when many objects are involved. This has led to the problem of spatial grammar inference. Current approaches have used genetic algorithms and inductive techniques and have demonstrated limitations. Alternative approaches are needed that produce accurate grammars while remaining computationally efficient in light of the NP-hardness of the problem. Co-location rule mining techniques in the field of Knowledge Discovery and Data Mining address the complexity issue using neighborhood restrictions and pruning strategies based on monotonic Measures Of Interest. The goal of this research was to develop and evaluate an inductive method for inferring an adjacency grammar utilizing co-location rule mining techniques to gain efficiency while providing accurate and concise production sets. The method incrementally discovers, without supervision, adjacency patterns in spatial samples, relabels them via a production rule and repeats the procedure with the newly labeled regions. The resulting rules are used to form an adjacency grammar. Grammars were generated and evaluated within the context of a syntactic pattern recognition system that identifies landform patterns in terrain elevation datasets. The proposed method was tested using a k-fold cross-validation methodology. Two variations were also tested using unsupervised and supervised training, both with no rule pruning. Comparison of these variations with the proposed method demonstrated the effectiveness of rule pruning and rule discovery. Results showed that the proposed method of rule inference produced rulesets having recall, precision and accuracy values of 82.6%, 97.7% and 92.8%, respectively, which are similar to those using supervised training. These rulesets were also the smallest, had the lowest average number of rules fired in parsing, and had the shortest average parse time. The use of rule pruning substantially reduced rule inference time (104.4 s vs. 208.9 s). The neighborhood restriction used in adjacency calculations demonstrated linear complexity in the number of regions.