Estimates of Maize Plant Density from UAV RGB Images Using Faster-RCNN Detection Model: Impact of the Spatial Resolution

Early-stage plant density is an essential trait that determines the fate of a genotype under given environmental conditions and management practices. The use of RGB images taken from UAVs may replace the traditional visual counting in fields with improved throughput, accuracy, and access to plant lo...

Full description

Bibliographic Details
Main Authors: K. Velumani, R. Lopez-Lozano, S. Madec, W. Guo, J. Gillet, A. Comar, F. Baret
Format: Article
Language:English
Published: American Association for the Advancement of Science 2021-01-01
Series:Plant Phenomics
Online Access:http://dx.doi.org/10.34133/2021/9824843
Description
Summary:Early-stage plant density is an essential trait that determines the fate of a genotype under given environmental conditions and management practices. The use of RGB images taken from UAVs may replace the traditional visual counting in fields with improved throughput, accuracy, and access to plant localization. However, high-resolution images are required to detect the small plants present at the early stages. This study explores the impact of image ground sampling distance (GSD) on the performances of maize plant detection at three-to-five leaves stage using Faster-RCNN object detection algorithm. Data collected at high resolution (GSD≈0.3 cm) over six contrasted sites were used for model training. Two additional sites with images acquired both at high and low (GSD≈0.6 cm) resolutions were used to evaluate the model performances. Results show that Faster-RCNN achieved very good plant detection and counting (rRMSE=0.08) performances when native high-resolution images are used both for training and validation. Similarly, good performances were observed (rRMSE=0.11) when the model is trained over synthetic low-resolution images obtained by downsampling the native training high-resolution images and applied to the synthetic low-resolution validation images. Conversely, poor performances are obtained when the model is trained on a given spatial resolution and applied to another spatial resolution. Training on a mix of high- and low-resolution images allows to get very good performances on the native high-resolution (rRMSE=0.06) and synthetic low-resolution (rRMSE=0.10) images. However, very low performances are still observed over the native low-resolution images (rRMSE=0.48), mainly due to the poor quality of the native low-resolution images. Finally, an advanced super resolution method based on GAN (generative adversarial network) that introduces additional textural information derived from the native high-resolution images was applied to the native low-resolution validation images. Results show some significant improvement (rRMSE=0.22) compared to bicubic upsampling approach, while still far below the performances achieved over the native high-resolution images.
ISSN:2643-6515