Comparison of Machine Learning Regression Algorithms for Cotton Leaf Area Index Retrieval Using Sentinel-2 Spectral Bands

Leaf area index (LAI) is a crucial crop biophysical parameter that has been widely used in a variety of fields. Five state-of-the-art machine learning regression algorithms (MLRAs), namely, artificial neural network (ANN), support vector regression (SVR), Gaussian process regression (GPR), random fo...

Full description

Bibliographic Details
Main Authors: Huihui Mao, Jihua Meng, Fujiang Ji, Qiankun Zhang, Huiting Fang
Format: Article
Language:English
Published: MDPI AG 2019-04-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/9/7/1459
Description
Summary:Leaf area index (LAI) is a crucial crop biophysical parameter that has been widely used in a variety of fields. Five state-of-the-art machine learning regression algorithms (MLRAs), namely, artificial neural network (ANN), support vector regression (SVR), Gaussian process regression (GPR), random forest (RF) and gradient boosting regression tree (GBRT), have been used in the retrieval of cotton LAI with Sentinel-2 spectral bands. The performances of the five machine learning models are compared for better applications of MLRAs in remote sensing, since challenging problems remain in the selection of MLRAs for crop LAI retrieval, as well as the decision as to the optimal number for the training sample size and spectral bands to different MLRAs. A comprehensive evaluation was employed with respect to model accuracy, computational efficiency, sensitivity to training sample size and sensitivity to spectral bands. We conducted the comparison of five MLRAs in an agricultural area of Northwest China over three cotton seasons with the corresponding field campaigns for modeling and validation. Results show that the GBRT model outperforms the other models with respect to model accuracy in average (<inline-formula> <math display="inline"> <semantics> <mrow> <mover accent="true"> <mrow> <msup> <mi>R</mi> <mn>2</mn> </msup> </mrow> <mo stretchy="true">&#175;</mo> </mover> </mrow> </semantics> </math> </inline-formula> = 0.854, <inline-formula> <math display="inline"> <semantics> <mrow> <mover accent="true"> <mrow> <mi>R</mi> <mi>M</mi> <mi>S</mi> <mi>E</mi> </mrow> <mo stretchy="true">&#175;</mo> </mover> </mrow> </semantics> </math> </inline-formula> = 0.674 and <inline-formula> <math display="inline"> <semantics> <mrow> <mover accent="true"> <mrow> <mi>M</mi> <mi>A</mi> <mi>E</mi> </mrow> <mo stretchy="true">&#175;</mo> </mover> </mrow> </semantics> </math> </inline-formula> = 0.456). SVR achieves the best performance in computational efficiency, which means it is fast to train, and to validate that it has great potentials to deliver near-real-time operational products for crop management. As for sensitivity to training sample size, GBRT behaves as the most robust model, and provides the best model accuracy on the average among the variations of training sample size, compared with other models (<inline-formula> <math display="inline"> <semantics> <mrow> <mover accent="true"> <mrow> <msup> <mi>R</mi> <mn>2</mn> </msup> </mrow> <mo stretchy="true">&#175;</mo> </mover> </mrow> </semantics> </math> </inline-formula> = 0.884, <inline-formula> <math display="inline"> <semantics> <mrow> <mover accent="true"> <mrow> <mi>R</mi> <mi>M</mi> <mi>S</mi> <mi>E</mi> </mrow> <mo stretchy="true">&#175;</mo> </mover> </mrow> </semantics> </math> </inline-formula> = 0.615 and <inline-formula> <math display="inline"> <semantics> <mrow> <mover accent="true"> <mrow> <mi>M</mi> <mi>A</mi> <mi>E</mi> </mrow> <mo stretchy="true">&#175;</mo> </mover> </mrow> </semantics> </math> </inline-formula> = 0.452). Spectral bands sensitivity analysis with dCor (distance correlation), combined with the backward elimination approach, indicates that SVR, GPR and RF provide relatively robust performance to the spectral bands, while ANN outperforms the other models in terms of model accuracy on the average among the reduction of spectral bands (<inline-formula> <math display="inline"> <semantics> <mrow> <mover accent="true"> <mrow> <msup> <mi>R</mi> <mn>2</mn> </msup> </mrow> <mo stretchy="true">&#175;</mo> </mover> </mrow> </semantics> </math> </inline-formula> = 0.881, <inline-formula> <math display="inline"> <semantics> <mrow> <mover accent="true"> <mrow> <mi>R</mi> <mi>M</mi> <mi>S</mi> <mi>E</mi> </mrow> <mo stretchy="true">&#175;</mo> </mover> </mrow> </semantics> </math> </inline-formula> = 0.625 and <inline-formula> <math display="inline"> <semantics> <mrow> <mover accent="true"> <mrow> <mi>M</mi> <mi>A</mi> <mi>E</mi> </mrow> <mo stretchy="true">&#175;</mo> </mover> </mrow> </semantics> </math> </inline-formula> = 0.480). A comprehensive evaluation indicates that GBRT is an appealing alternative for cotton LAI retrieval, except for its computational efficiency. Despite the different performance of the ML models, all models exhibited considerable potential for cotton LAI retrieval, which could offer accurate crop parameters information timely and accurately for crop fields management and agricultural production decisions.
ISSN:2076-3417