Summary: | Depth estimation is a crucial and fundamental problem in the computer vision field. Conventional methods re-construct scenes using feature points extracted from multiple images; however, these approaches require multiple images and thus are not easily implemented in various real-time applications. Moreover, the special equipment required by hardware-based approaches using 3D sensors is expensive. Therefore, software-based methods for estimating depth from a single image using machine learning or deep learning are emerging as new alternatives. In this paper, we propose an algorithm that generates a depth map in real time using a single image and an optimized lightweight efficient neural network (L-ENet) algorithm instead of physical equipment, such as an infrared sensor or multi-view camera. Because depth values have a continuous nature and can produce locally ambiguous results, pixel-wise prediction with ordinal depth range classification was applied in this study. In addition, in our method various convolution techniques are applied to extract a dense feature map, and the number of parameters is greatly reduced by reducing the network layer. By using the proposed L-ENet algorithm, an accurate depth map can be generated from a single image quickly and, in a comparison with the ground truth, we can produce depth values closer to those of the ground truth with small errors. Experiments confirmed that the proposed L-ENet can achieve a significantly improved estimation performance over the state-of-the-art algorithms in depth estimation based on a single image.
|