Summary: | The ability to learn from experience is a key aspect of intelligence. Incorporating this ability into a computer is a formidable problem. Genetic algorithms coupled to learning classifier systems are powerful tools for tackling this task. While genetic algorithms can be shown to be near optimal solutions for the search task they perform, no similar proof exists for classifier systems. My research investigated two aspects of classifier systems, classifier selection and credit assignment. Explicit world models, look ahead and incremental planning are incorporated into the classifier system framework in order to make use of more of the information available to the system, and a more sophisticated approach to credit assignment is attempted. The investigation involved the construction of four different classifier systems, and testing each of these systems in three separate virtual worlds. Wilson's Animat research was carefully reconstructed, and used as the control in a scientific experiment testing the efficacy of the various strategies embodied in three experimental systems. The three experimental classifier systems all contained explicit world models and lookahead. One was an extension of Wilson's Animat, the other two involved an entirely new credit assignment scheme inspired by Watkins's Q-learning technique. Use of this technique enabled the incorporation of an incremental planner, similar to Sutton's Dyna-Q research, into one of the classifier systems, distinguishing it from the other Q-learning based classifier system. The research shows that use of explicit world models and lookahead significantly decreases the time required in order to discoverr paths to well rewarded goals. It also shows that incremental planning can be used to further increase learning speed. While the experimental classifier systems were quick at discovery, they did not necessarily exploit these discoveries. Because of this, the performance of these systems was variable between virtual worlds. The experimental systems were outperformed by the control in two out of the three virtual worlds investigated. Nevertheless, the value of explicit world models and lookahead within classifier systems is established. Likewise, the adaptation of Q-learning and incremental planning to classifier systems has been successfully demonstrated.
|