Summary: | Robotic grasp detection is a fundamental problem in robotic manipulation. The conventional grasp methods, using vision information only, can cause potential damage in force-sensitive tasks. In this paper, we propose a tactile–visual based method using a reproducible sensor to realize a fine-grained and haptic grasping. Although there exist several tactile-based methods, they require expensive custom sensors in coordination with their specific datasets. In order to overcome the limitations, we introduce a low-cost and reproducible tactile fingertip and build a general tactile–visual fusion grasp dataset including 5,110 grasping trials. We further propose a hierarchical encoder–decoder neural network to predict grasp points and force in an end-to-end manner. Then comparisons of our method with the state-of-the-art methods in the benchmark are shown both in vision-based and tactile–visual fusion schemes, and our method outperforms in most scenarios. Furthermore, we also compare our fusion method with the only vision-based method in the physical experiment, and the results indicate that our end-to-end method empowers the robot with a more fine-grained grasp ability, reducing force redundancy by 41%. Our project is available at https://sites.google.com/view/tvgd
|