Summary: | Training data is the bottleneck for training Convolutional Neural Networks. A larger dataset gives better accuracy though also needs longer training time. It is shown by finetuning neural networks on synthetic rendered images, that the mean average precision increases. This method was applied to two different datasets with five distinctive objects in each. The first dataset consisted of random objects with different geometric shapes. The second dataset contained objects used to assemble IKEA furniture. The neural network with the best performance, trained on 5400 images, achieved a mean average precision of 0.81 on a test which was a sample of a video sequence. Analysis of the impact of the factors dataset size, batch size, and numbers of epochs used in training and different network architectures were done. Using synthetic images to train CNN’s is a promising path to take for object detection where access to large amount of annotated image data is hard to come by.
|