Summary: | This study proposes a method for quickly establishing a large image dataset required for object detection. One way of obtaining a large image dataset is to use the green screen capturing method to collect sample images. The limitation is that the color of the object to be photographed must be different from the green background. To solve this issue, this study used an LCD monitor that can change the background color to capture the object images, and it then adopted image processing techniques to obtain the sample images and their masks for image synthesis. This study also used a robotic arm to move the camera to different locations to capture the images from various view directions to increase the diversity of the sample images. A Faster R-CNN model and a YOLOv4 model were used as the deep learning models. After training the models with 10 000 synthetic images that were created by using the proposed method, we could achieve an average mAP of 0.76 for the Faster R-CNN model and that of 0.97 for the YOLOv4 model, which were higher than the average mAP of 0.66 for the Faster R-CNN model and of 0.93 for the YOLOv4 model, both trained with 800 real images. Although more synthetic images were required to achieve better accuracy, only about 2 h were spent to prepare the 10 000 annotated synthetic images, saving about 69% of the required time to capture and label 800 real images. Another advantage of the method is that it generates instance segmentation annotations, which can be easily converted to other annotation types.
|