Crowdsourcing image analysis for plant phenomics to generate ground truth data for machine learning.

The accuracy of machine learning tasks critically depends on high quality ground truth data. Therefore, in many cases, producing good ground truth data typically involves trained professionals; however, this can be costly in time, effort, and money. Here we explore the use of crowdsourcing to genera...

Full description

Bibliographic Details
Main Authors: Naihui Zhou, Zachary D Siegel, Scott Zarecor, Nigel Lee, Darwin A Campbell, Carson M Andorf, Dan Nettleton, Carolyn J Lawrence-Dill, Baskar Ganapathysubramanian, Jonathan W Kelly, Iddo Friedberg
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2018-07-01
Series:PLoS Computational Biology
Online Access:http://europepmc.org/articles/PMC6085066?pdf=render
id doaj-ad87ca982fb54245825798da5a52f6ce
record_format Article
spelling doaj-ad87ca982fb54245825798da5a52f6ce2020-11-25T02:04:03ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582018-07-01147e100633710.1371/journal.pcbi.1006337Crowdsourcing image analysis for plant phenomics to generate ground truth data for machine learning.Naihui ZhouZachary D SiegelScott ZarecorNigel LeeDarwin A CampbellCarson M AndorfDan NettletonCarolyn J Lawrence-DillBaskar GanapathysubramanianJonathan W KellyIddo FriedbergThe accuracy of machine learning tasks critically depends on high quality ground truth data. Therefore, in many cases, producing good ground truth data typically involves trained professionals; however, this can be costly in time, effort, and money. Here we explore the use of crowdsourcing to generate a large number of training data of good quality. We explore an image analysis task involving the segmentation of corn tassels from images taken in a field setting. We investigate the accuracy, speed and other quality metrics when this task is performed by students for academic credit, Amazon MTurk workers, and Master Amazon MTurk workers. We conclude that the Amazon MTurk and Master Mturk workers perform significantly better than the for-credit students, but with no significant difference between the two MTurk worker types. Furthermore, the quality of the segmentation produced by Amazon MTurk workers rivals that of an expert worker. We provide best practices to assess the quality of ground truth data, and to compare data quality produced by different sources. We conclude that properly managed crowdsourcing can be used to establish large volumes of viable ground truth data at a low cost and high quality, especially in the context of high throughput plant phenotyping. We also provide several metrics for assessing the quality of the generated datasets.http://europepmc.org/articles/PMC6085066?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Naihui Zhou
Zachary D Siegel
Scott Zarecor
Nigel Lee
Darwin A Campbell
Carson M Andorf
Dan Nettleton
Carolyn J Lawrence-Dill
Baskar Ganapathysubramanian
Jonathan W Kelly
Iddo Friedberg
spellingShingle Naihui Zhou
Zachary D Siegel
Scott Zarecor
Nigel Lee
Darwin A Campbell
Carson M Andorf
Dan Nettleton
Carolyn J Lawrence-Dill
Baskar Ganapathysubramanian
Jonathan W Kelly
Iddo Friedberg
Crowdsourcing image analysis for plant phenomics to generate ground truth data for machine learning.
PLoS Computational Biology
author_facet Naihui Zhou
Zachary D Siegel
Scott Zarecor
Nigel Lee
Darwin A Campbell
Carson M Andorf
Dan Nettleton
Carolyn J Lawrence-Dill
Baskar Ganapathysubramanian
Jonathan W Kelly
Iddo Friedberg
author_sort Naihui Zhou
title Crowdsourcing image analysis for plant phenomics to generate ground truth data for machine learning.
title_short Crowdsourcing image analysis for plant phenomics to generate ground truth data for machine learning.
title_full Crowdsourcing image analysis for plant phenomics to generate ground truth data for machine learning.
title_fullStr Crowdsourcing image analysis for plant phenomics to generate ground truth data for machine learning.
title_full_unstemmed Crowdsourcing image analysis for plant phenomics to generate ground truth data for machine learning.
title_sort crowdsourcing image analysis for plant phenomics to generate ground truth data for machine learning.
publisher Public Library of Science (PLoS)
series PLoS Computational Biology
issn 1553-734X
1553-7358
publishDate 2018-07-01
description The accuracy of machine learning tasks critically depends on high quality ground truth data. Therefore, in many cases, producing good ground truth data typically involves trained professionals; however, this can be costly in time, effort, and money. Here we explore the use of crowdsourcing to generate a large number of training data of good quality. We explore an image analysis task involving the segmentation of corn tassels from images taken in a field setting. We investigate the accuracy, speed and other quality metrics when this task is performed by students for academic credit, Amazon MTurk workers, and Master Amazon MTurk workers. We conclude that the Amazon MTurk and Master Mturk workers perform significantly better than the for-credit students, but with no significant difference between the two MTurk worker types. Furthermore, the quality of the segmentation produced by Amazon MTurk workers rivals that of an expert worker. We provide best practices to assess the quality of ground truth data, and to compare data quality produced by different sources. We conclude that properly managed crowdsourcing can be used to establish large volumes of viable ground truth data at a low cost and high quality, especially in the context of high throughput plant phenotyping. We also provide several metrics for assessing the quality of the generated datasets.
url http://europepmc.org/articles/PMC6085066?pdf=render
work_keys_str_mv AT naihuizhou crowdsourcingimageanalysisforplantphenomicstogenerategroundtruthdataformachinelearning
AT zacharydsiegel crowdsourcingimageanalysisforplantphenomicstogenerategroundtruthdataformachinelearning
AT scottzarecor crowdsourcingimageanalysisforplantphenomicstogenerategroundtruthdataformachinelearning
AT nigellee crowdsourcingimageanalysisforplantphenomicstogenerategroundtruthdataformachinelearning
AT darwinacampbell crowdsourcingimageanalysisforplantphenomicstogenerategroundtruthdataformachinelearning
AT carsonmandorf crowdsourcingimageanalysisforplantphenomicstogenerategroundtruthdataformachinelearning
AT dannettleton crowdsourcingimageanalysisforplantphenomicstogenerategroundtruthdataformachinelearning
AT carolynjlawrencedill crowdsourcingimageanalysisforplantphenomicstogenerategroundtruthdataformachinelearning
AT baskarganapathysubramanian crowdsourcingimageanalysisforplantphenomicstogenerategroundtruthdataformachinelearning
AT jonathanwkelly crowdsourcingimageanalysisforplantphenomicstogenerategroundtruthdataformachinelearning
AT iddofriedberg crowdsourcingimageanalysisforplantphenomicstogenerategroundtruthdataformachinelearning
_version_ 1724944930752692224