Summary: | Classical Q-learning takes huge computation to calculate the Q-value for all possible actions in a particular state and takes large space to store its Q-value for all actions, as a result of which its convergence rate is slow. This paper proposed a new methodology to determine the optimize trajectory of the path for multi-robots in clutter environment using hybridization of improving classical Q-learning based on four fundamental principles with improved particle swarm optimization (IPSO) by modifying parameters and differentially perturbed velocity (DV) algorithm for improving the convergence. The algorithms are used to minimize path length and arrival time of all the robots to their respective destination in the environment and reducing the turning angle of each robot to reduce the energy consumption of each robot. In this proposed scheme, the improve classical Q-learning stores the Q-value of the best action of the state and thus save the storage space, which is used to decide the Pbest and gbest of the improved PSO in each iteration, and the velocity of the IPSO is adjusted by the vector differential operator inherited from differential evolution (DE). The validation of the algorithm is studied in simulated and Khepera-II robot.
|