A Self-Organizing Decision Tree Approach to Policy Sharing of Multi-agent Systems

博士 === 國立中正大學 === 電機工程所 === 97 === In a cooperative social environment, a reinforcement learning agent not only learns to achieve the goal by trail-and-error, but also facilitates the learning efficiency through the instantaneous shared information. The purpose of this thesis is investigating how th...

Full description

Bibliographic Details
Main Authors: Yu-Jen Chen, 陳昱仁
Other Authors: Kao-Shing Hwang
Format: Others
Language:en_US
Published: 2009
Online Access:http://ndltd.ncl.edu.tw/handle/09344554392677792073
Description
Summary:博士 === 國立中正大學 === 電機工程所 === 97 === In a cooperative social environment, a reinforcement learning agent not only learns to achieve the goal by trail-and-error, but also facilitates the learning efficiency through the instantaneous shared information. The purpose of this thesis is investigating how the multi-agents share the information and what information is shared in a real environment. For applying reinforcement learning to a real environment, state partition is an important issue and open problem in reinforcement learning, because it affects the performance of learning significantly. The fundamental approach of Q-learning is actually a table-look-up method on a basis of finite discrete state space. Thus the learning incrementally estimates Q-values of a state based on rewards received from the environment and the previous Q-value estimates. Unfortunately, robots always learn and behave in a continuous perceptual space where the observed perceptions are transformed into or coarsely regarded as temporal-spatial states. Nowadays, there is still no elegant unified way claimed to combine discrete actions with continuous observations or states with optimality in computational time, memory storage, and so on. Therefore, how to accommodate continuous states with a finite discrete set of actions has become an important and intriguing issue in this research area. In this thesis, we proposed an adaptive state partition method for discretizing the state space adaptively and effectively making use of decision trees. Instead of exhaustive search by a defined impurity, the proposed method splits the state space according to the temporal difference generated by the reinforcement learning. Based on the above approach, we also introduced an algorithm to define an action policy from a discrete space to a real valued domain; that is, the proposed method can generate a real-valued action based on one action selected from a discrete set and randomly but slightly disturbed by an associated bias. From the viewpoint of exploration and exploitation, the method searches a better action on the basis of a paradigm action in the solution space with a variation within the biased region. Perusing the applicability of the proposed methods to multi-agent systems, we defined a policy sharing mechanism for agents to share the policy of local areas which have better experience with each agent.