Summary: | Object recognition, which mainly includes object detection and semantic segmentation, is one of the critical challenges for intelligent vehicles. In most cases, cameras and Lidar are the most common sensors used for object recognition. However, both cameras and Lidar suffer from some inherent drawbacks. Therefore, the fusion of camera and Lidar becomes a natural solution to overcome the inherent defects of each single sensing modality. With the boost of deep learning-based algorithms, multi-sensor fusion methodologies employ deep-learning methods as their fusion strategy, which has made impressive accomplishments on large-scale objects such as vehicles and buses. However, most existing sensor-fusion strategies have the problem of ignoring detailed information caused by down-sampling operations in deep learning, which results in poor detection performance on small-scale objects such as pedestrians and cyclists. In this paper, we propose a real-time multi-sensor (Lidar and color camera) fusion strategy for multi-scale object recognition at the semantic level named Enet-CRF-Lidar. Firstly, a multi-module Enet is designed to adapt both large-scale objects and small-scale ones. Then, the CRF-RNN module is integrated with the multi-module Enet to introduce the low-level details of the input data, which leads to a significant improvement in small-scale object recognition. The experimental results show that the proposed Enet-CRF-Lidar module can provide reliable detection performance on multi-scale objects and can be adapted to complex scenarios.
|