Long-Term Hindcasts of Wheat Yield in Fields Using Remotely Sensed Phenology, Climate Data and Machine Learning

Satellite remote sensing offers a cost-effective means of generating long-term hindcasts of yield that can be used to understand how yield varies in time and space. This study investigated the use of remotely sensed phenology, climate data and machine learning for estimating yield at a resolution su...

Full description

Bibliographic Details
Main Authors: Fiona H. Evans, Jianxiu Shen
Format: Article
Language:English
Published: MDPI AG 2021-06-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/13/13/2435
id doaj-7ebeb810dc154e2895fc93507473af2d
record_format Article
spelling doaj-7ebeb810dc154e2895fc93507473af2d2021-07-15T15:44:03ZengMDPI AGRemote Sensing2072-42922021-06-01132435243510.3390/rs13132435Long-Term Hindcasts of Wheat Yield in Fields Using Remotely Sensed Phenology, Climate Data and Machine LearningFiona H. Evans0Jianxiu Shen1Centre for Crop and Food Innovation, Food Futures Institute, Murdoch University, 90 South Street, Murdoch, WA 6150, AustraliaCentre for Crop and Food Innovation, Food Futures Institute, Murdoch University, 90 South Street, Murdoch, WA 6150, AustraliaSatellite remote sensing offers a cost-effective means of generating long-term hindcasts of yield that can be used to understand how yield varies in time and space. This study investigated the use of remotely sensed phenology, climate data and machine learning for estimating yield at a resolution suitable for optimising crop management in fields. We used spatially weighted growth curve estimation to identify the timing of phenological events from sequences of Landsat NDVI and derive phenological and seasonal climate metrics. Using data from a 17,000 ha study area, we investigated the relationships between the metrics and yield over 17 years from 2003 to 2019. We compared six statistical and machine learning models for estimating yield: multiple linear regression, mixed effects models, generalised additive models, random forests, support vector regression using radial basis functions and deep learning neural networks. We used a 50-50 train-test split on paddock-years where 50% of paddock-year combinations were randomly selected and used to train each model and the remaining 50% of paddock-years were used to assess the model accuracy. Using only phenological metrics, accuracy was highest using a linear mixed model with a random effect that allowed the relationship between integrated NDVI and yield to vary by year <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mo stretchy="false">(</mo><msup><mi mathvariant="normal">R</mi><mn>2</mn></msup></mrow></semantics></math></inline-formula> = 0.67, MAE = 0.25 t ha<sup>−</sup><sup>1</sup>, RMSE = 0.33 t ha<sup>−1</sup>, NRMSE = 0.25). We quantified the improvements in accuracy when seasonal climate metrics were also used as predictors. We identified two optimal models using the combined phenological and seasonal climate metrics: support vector regression and deep learning models (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msup><mi mathvariant="normal">R</mi><mn>2</mn></msup></mrow></semantics></math></inline-formula> = 0.68, MAE = 0.25 t ha<sup>−1</sup>, RMSE = 0.32 t ha<sup>−1</sup>, NRMSE = 0.25). While the linear mixed model using only phenological metrics performed similarly to the nonlinear models that are also seasonal climate metrics, the nonlinear models can be more easily generalised to estimate yield in years for which training data are unavailable. We conclude that long-term hindcasts of wheat yield in fields, at 30 m spatial resolution, can be produced using remotely sensed phenology from Landsat NDVI, climate data and machine learning.https://www.mdpi.com/2072-4292/13/13/2435LandsatNDVIcrop phenologyyield estimationlong-termhindcasts
collection DOAJ
language English
format Article
sources DOAJ
author Fiona H. Evans
Jianxiu Shen
spellingShingle Fiona H. Evans
Jianxiu Shen
Long-Term Hindcasts of Wheat Yield in Fields Using Remotely Sensed Phenology, Climate Data and Machine Learning
Remote Sensing
Landsat
NDVI
crop phenology
yield estimation
long-term
hindcasts
author_facet Fiona H. Evans
Jianxiu Shen
author_sort Fiona H. Evans
title Long-Term Hindcasts of Wheat Yield in Fields Using Remotely Sensed Phenology, Climate Data and Machine Learning
title_short Long-Term Hindcasts of Wheat Yield in Fields Using Remotely Sensed Phenology, Climate Data and Machine Learning
title_full Long-Term Hindcasts of Wheat Yield in Fields Using Remotely Sensed Phenology, Climate Data and Machine Learning
title_fullStr Long-Term Hindcasts of Wheat Yield in Fields Using Remotely Sensed Phenology, Climate Data and Machine Learning
title_full_unstemmed Long-Term Hindcasts of Wheat Yield in Fields Using Remotely Sensed Phenology, Climate Data and Machine Learning
title_sort long-term hindcasts of wheat yield in fields using remotely sensed phenology, climate data and machine learning
publisher MDPI AG
series Remote Sensing
issn 2072-4292
publishDate 2021-06-01
description Satellite remote sensing offers a cost-effective means of generating long-term hindcasts of yield that can be used to understand how yield varies in time and space. This study investigated the use of remotely sensed phenology, climate data and machine learning for estimating yield at a resolution suitable for optimising crop management in fields. We used spatially weighted growth curve estimation to identify the timing of phenological events from sequences of Landsat NDVI and derive phenological and seasonal climate metrics. Using data from a 17,000 ha study area, we investigated the relationships between the metrics and yield over 17 years from 2003 to 2019. We compared six statistical and machine learning models for estimating yield: multiple linear regression, mixed effects models, generalised additive models, random forests, support vector regression using radial basis functions and deep learning neural networks. We used a 50-50 train-test split on paddock-years where 50% of paddock-year combinations were randomly selected and used to train each model and the remaining 50% of paddock-years were used to assess the model accuracy. Using only phenological metrics, accuracy was highest using a linear mixed model with a random effect that allowed the relationship between integrated NDVI and yield to vary by year <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mo stretchy="false">(</mo><msup><mi mathvariant="normal">R</mi><mn>2</mn></msup></mrow></semantics></math></inline-formula> = 0.67, MAE = 0.25 t ha<sup>−</sup><sup>1</sup>, RMSE = 0.33 t ha<sup>−1</sup>, NRMSE = 0.25). We quantified the improvements in accuracy when seasonal climate metrics were also used as predictors. We identified two optimal models using the combined phenological and seasonal climate metrics: support vector regression and deep learning models (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msup><mi mathvariant="normal">R</mi><mn>2</mn></msup></mrow></semantics></math></inline-formula> = 0.68, MAE = 0.25 t ha<sup>−1</sup>, RMSE = 0.32 t ha<sup>−1</sup>, NRMSE = 0.25). While the linear mixed model using only phenological metrics performed similarly to the nonlinear models that are also seasonal climate metrics, the nonlinear models can be more easily generalised to estimate yield in years for which training data are unavailable. We conclude that long-term hindcasts of wheat yield in fields, at 30 m spatial resolution, can be produced using remotely sensed phenology from Landsat NDVI, climate data and machine learning.
topic Landsat
NDVI
crop phenology
yield estimation
long-term
hindcasts
url https://www.mdpi.com/2072-4292/13/13/2435
work_keys_str_mv AT fionahevans longtermhindcastsofwheatyieldinfieldsusingremotelysensedphenologyclimatedataandmachinelearning
AT jianxiushen longtermhindcastsofwheatyieldinfieldsusingremotelysensedphenologyclimatedataandmachinelearning
_version_ 1721298649567199232