Application of Machine Learning Algorithms in Plant Breeding: Predicting Yield From Hyperspectral Reflectance in Soybean

Recent substantial advances in high-throughput field phenotyping have provided plant breeders with affordable and efficient tools for evaluating a large number of genotypes for important agronomic traits at early growth stages. Nevertheless, the implementation of large datasets generated by high-thr...

Full description

Bibliographic Details
Main Authors: Mohsen Yoosefzadeh-Najafabadi, Hugh J. Earl, Dan Tulpan, John Sulik, Milad Eskandari
Format: Article
Language:English
Published: Frontiers Media S.A. 2021-01-01
Series:Frontiers in Plant Science
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fpls.2020.624273/full
id doaj-1f96a0112055426daf55f08411a89dd6
record_format Article
spelling doaj-1f96a0112055426daf55f08411a89dd62021-01-12T05:22:25ZengFrontiers Media S.A.Frontiers in Plant Science1664-462X2021-01-011110.3389/fpls.2020.624273624273Application of Machine Learning Algorithms in Plant Breeding: Predicting Yield From Hyperspectral Reflectance in SoybeanMohsen Yoosefzadeh-Najafabadi0Hugh J. Earl1Dan Tulpan2John Sulik3Milad Eskandari4Department of Plant Agriculture, University of Guelph, Guelph, ON, CanadaDepartment of Plant Agriculture, University of Guelph, Guelph, ON, CanadaDepartment of Animal Biosciences, University of Guelph, Guelph, ON, CanadaDepartment of Plant Agriculture, University of Guelph, Guelph, ON, CanadaDepartment of Plant Agriculture, University of Guelph, Guelph, ON, CanadaRecent substantial advances in high-throughput field phenotyping have provided plant breeders with affordable and efficient tools for evaluating a large number of genotypes for important agronomic traits at early growth stages. Nevertheless, the implementation of large datasets generated by high-throughput phenotyping tools such as hyperspectral reflectance in cultivar development programs is still challenging due to the essential need for intensive knowledge in computational and statistical analyses. In this study, the robustness of three common machine learning (ML) algorithms, multilayer perceptron (MLP), support vector machine (SVM), and random forest (RF), were evaluated for predicting soybean (Glycine max) seed yield using hyperspectral reflectance. For this aim, the hyperspectral reflectance data for the whole spectra ranged from 395 to 1005 nm, which were collected at the R4 and R5 growth stages on 250 soybean genotypes grown in four environments. The recursive feature elimination (RFE) approach was performed to reduce the dimensionality of the hyperspectral reflectance data and select variables with the largest importance values. The results indicated that R5 is more informative stage for measuring hyperspectral reflectance to predict seed yields. The 395 nm reflectance band was also identified as the high ranked band in predicting the soybean seed yield. By considering either full or selected variables as the input variables, the ML algorithms were evaluated individually and combined-version using the ensemble–stacking (E–S) method to predict the soybean yield. The RF algorithm had the highest performance with a value of 84% yield classification accuracy among all the individual tested algorithms. Therefore, by selecting RF as the metaClassifier for E–S method, the prediction accuracy increased to 0.93, using all variables, and 0.87, using selected variables showing the success of using E–S as one of the ensemble techniques. This study demonstrated that soybean breeders could implement E–S algorithm using either the full or selected spectra reflectance to select the high-yielding soybean genotypes, among a large number of genotypes, at early growth stages.https://www.frontiersin.org/articles/10.3389/fpls.2020.624273/fullartificial intelligencedata-driven modelensemble methodshigh-throughput phenotypingrandom forestrecursive feature elimination
collection DOAJ
language English
format Article
sources DOAJ
author Mohsen Yoosefzadeh-Najafabadi
Hugh J. Earl
Dan Tulpan
John Sulik
Milad Eskandari
spellingShingle Mohsen Yoosefzadeh-Najafabadi
Hugh J. Earl
Dan Tulpan
John Sulik
Milad Eskandari
Application of Machine Learning Algorithms in Plant Breeding: Predicting Yield From Hyperspectral Reflectance in Soybean
Frontiers in Plant Science
artificial intelligence
data-driven model
ensemble methods
high-throughput phenotyping
random forest
recursive feature elimination
author_facet Mohsen Yoosefzadeh-Najafabadi
Hugh J. Earl
Dan Tulpan
John Sulik
Milad Eskandari
author_sort Mohsen Yoosefzadeh-Najafabadi
title Application of Machine Learning Algorithms in Plant Breeding: Predicting Yield From Hyperspectral Reflectance in Soybean
title_short Application of Machine Learning Algorithms in Plant Breeding: Predicting Yield From Hyperspectral Reflectance in Soybean
title_full Application of Machine Learning Algorithms in Plant Breeding: Predicting Yield From Hyperspectral Reflectance in Soybean
title_fullStr Application of Machine Learning Algorithms in Plant Breeding: Predicting Yield From Hyperspectral Reflectance in Soybean
title_full_unstemmed Application of Machine Learning Algorithms in Plant Breeding: Predicting Yield From Hyperspectral Reflectance in Soybean
title_sort application of machine learning algorithms in plant breeding: predicting yield from hyperspectral reflectance in soybean
publisher Frontiers Media S.A.
series Frontiers in Plant Science
issn 1664-462X
publishDate 2021-01-01
description Recent substantial advances in high-throughput field phenotyping have provided plant breeders with affordable and efficient tools for evaluating a large number of genotypes for important agronomic traits at early growth stages. Nevertheless, the implementation of large datasets generated by high-throughput phenotyping tools such as hyperspectral reflectance in cultivar development programs is still challenging due to the essential need for intensive knowledge in computational and statistical analyses. In this study, the robustness of three common machine learning (ML) algorithms, multilayer perceptron (MLP), support vector machine (SVM), and random forest (RF), were evaluated for predicting soybean (Glycine max) seed yield using hyperspectral reflectance. For this aim, the hyperspectral reflectance data for the whole spectra ranged from 395 to 1005 nm, which were collected at the R4 and R5 growth stages on 250 soybean genotypes grown in four environments. The recursive feature elimination (RFE) approach was performed to reduce the dimensionality of the hyperspectral reflectance data and select variables with the largest importance values. The results indicated that R5 is more informative stage for measuring hyperspectral reflectance to predict seed yields. The 395 nm reflectance band was also identified as the high ranked band in predicting the soybean seed yield. By considering either full or selected variables as the input variables, the ML algorithms were evaluated individually and combined-version using the ensemble–stacking (E–S) method to predict the soybean yield. The RF algorithm had the highest performance with a value of 84% yield classification accuracy among all the individual tested algorithms. Therefore, by selecting RF as the metaClassifier for E–S method, the prediction accuracy increased to 0.93, using all variables, and 0.87, using selected variables showing the success of using E–S as one of the ensemble techniques. This study demonstrated that soybean breeders could implement E–S algorithm using either the full or selected spectra reflectance to select the high-yielding soybean genotypes, among a large number of genotypes, at early growth stages.
topic artificial intelligence
data-driven model
ensemble methods
high-throughput phenotyping
random forest
recursive feature elimination
url https://www.frontiersin.org/articles/10.3389/fpls.2020.624273/full
work_keys_str_mv AT mohsenyoosefzadehnajafabadi applicationofmachinelearningalgorithmsinplantbreedingpredictingyieldfromhyperspectralreflectanceinsoybean
AT hughjearl applicationofmachinelearningalgorithmsinplantbreedingpredictingyieldfromhyperspectralreflectanceinsoybean
AT dantulpan applicationofmachinelearningalgorithmsinplantbreedingpredictingyieldfromhyperspectralreflectanceinsoybean
AT johnsulik applicationofmachinelearningalgorithmsinplantbreedingpredictingyieldfromhyperspectralreflectanceinsoybean
AT miladeskandari applicationofmachinelearningalgorithmsinplantbreedingpredictingyieldfromhyperspectralreflectanceinsoybean
_version_ 1724340830145085440