Summary: | 碩士 === 國立中正大學 === 電機工程所 === 95 === It is hard to define a state space or the proper reward function in reinforcement learning to make the robot act as expected. In this paper, we demonstrate the expected behavior for a robot. Then a RL-based decision tree approach which decides to split according to long–term evaluations, instead of a top-down greedy strategy which finds out the relationship between the input and output from the demonstration data.
We use this method to teach a robot for target seeking problem. In order to promote the performance in tackling target seeking problem, we add a Q-learning along with the state space based on RL-based decision tree. The experiment result shows that Q-learning can promote the performance quickly.
For demonstration, we build a mobile robot powered by an embedded board. The robot can detect the ball of the range in any direction with omni-directional vision system. With such powerful embedded computing capability and the efficient machine vision system, the robot can inherit the learned behavior from a simulator which has learned the empirical behavior and continue to learn with Q-learning to improve the performance of target seeking problem.
|