From GenBank to GBIF: Phylogeny-Based Predictive Niche Modeling Tests Accuracy of Taxonomic Identifications in Large Occurrence Data Repositories.

Accuracy of taxonomic identifications is crucial to data quality in online repositories of species occurrence data, such as the Global Biodiversity Information Facility (GBIF), which have accumulated several hundred million records over the past 15 years. These data serve as basis for large scale an...

Full description

Bibliographic Details
Main Authors:	B Eugene Smith, Mark K Johnston, Robert Lücking
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2016-01-01
Series:	PLoS ONE
Online Access:	http://europepmc.org/articles/PMC4788202?pdf=render

id	doaj-f7f69d6a074d4704ae4d49ba9351babd
record_format	Article
spelling	doaj-f7f69d6a074d4704ae4d49ba9351babd2020-11-24T21:37:03ZengPublic Library of Science (PLoS)PLoS ONE1932-62032016-01-01113e015123210.1371/journal.pone.0151232From GenBank to GBIF: Phylogeny-Based Predictive Niche Modeling Tests Accuracy of Taxonomic Identifications in Large Occurrence Data Repositories.B Eugene SmithMark K JohnstonRobert LückingAccuracy of taxonomic identifications is crucial to data quality in online repositories of species occurrence data, such as the Global Biodiversity Information Facility (GBIF), which have accumulated several hundred million records over the past 15 years. These data serve as basis for large scale analyses of macroecological and biogeographic patterns and to document environmental changes over time. However, taxonomic identifications are often unreliable, especially for non-vascular plants and fungi including lichens, which may lack critical revisions of voucher specimens. Due to the scale of the problem, restudy of millions of collections is unrealistic and other strategies are needed. Here we propose to use verified, georeferenced occurrence data of a given species to apply predictive niche modeling that can then be used to evaluate unverified occurrences of that species. Selecting the charismatic lichen fungus, Usnea longissima, as a case study, we used georeferenced occurrence records based on sequenced specimens to model its predicted niche. Our results suggest that the target species is largely restricted to a narrow range of boreal and temperate forest in the Northern Hemisphere and that occurrence records in GBIF from tropical regions and the Southern Hemisphere do not represent this taxon, a prediction tested by comparison with taxonomic revisions of Usnea for these regions. As a novel approach, we employed Principal Component Analysis on the environmental grid data used for predictive modeling to visualize potential ecogeographical barriers for the target species; we found that tropical regions conform a strong barrier, explaining why potential niches in the Southern Hemisphere were not colonized by Usnea longissima and instead by morphologically similar species. This approach is an example of how data from two of the most important biodiversity repositories, GenBank and GBIF, can be effectively combined to remotely address the problem of inaccuracy of taxonomic identifications in occurrence data repositories and to provide a filtering mechanism which can considerably reduce the number of voucher specimens that need critical revision, in this case from 4,672 to about 100.http://europepmc.org/articles/PMC4788202?pdf=render
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	B Eugene Smith Mark K Johnston Robert Lücking
spellingShingle	B Eugene Smith Mark K Johnston Robert Lücking From GenBank to GBIF: Phylogeny-Based Predictive Niche Modeling Tests Accuracy of Taxonomic Identifications in Large Occurrence Data Repositories. PLoS ONE
author_facet	B Eugene Smith Mark K Johnston Robert Lücking
author_sort	B Eugene Smith
title	From GenBank to GBIF: Phylogeny-Based Predictive Niche Modeling Tests Accuracy of Taxonomic Identifications in Large Occurrence Data Repositories.
title_short	From GenBank to GBIF: Phylogeny-Based Predictive Niche Modeling Tests Accuracy of Taxonomic Identifications in Large Occurrence Data Repositories.
title_full	From GenBank to GBIF: Phylogeny-Based Predictive Niche Modeling Tests Accuracy of Taxonomic Identifications in Large Occurrence Data Repositories.
title_fullStr	From GenBank to GBIF: Phylogeny-Based Predictive Niche Modeling Tests Accuracy of Taxonomic Identifications in Large Occurrence Data Repositories.
title_full_unstemmed	From GenBank to GBIF: Phylogeny-Based Predictive Niche Modeling Tests Accuracy of Taxonomic Identifications in Large Occurrence Data Repositories.
title_sort	from genbank to gbif: phylogeny-based predictive niche modeling tests accuracy of taxonomic identifications in large occurrence data repositories.
publisher	Public Library of Science (PLoS)
series	PLoS ONE
issn	1932-6203
publishDate	2016-01-01
description	Accuracy of taxonomic identifications is crucial to data quality in online repositories of species occurrence data, such as the Global Biodiversity Information Facility (GBIF), which have accumulated several hundred million records over the past 15 years. These data serve as basis for large scale analyses of macroecological and biogeographic patterns and to document environmental changes over time. However, taxonomic identifications are often unreliable, especially for non-vascular plants and fungi including lichens, which may lack critical revisions of voucher specimens. Due to the scale of the problem, restudy of millions of collections is unrealistic and other strategies are needed. Here we propose to use verified, georeferenced occurrence data of a given species to apply predictive niche modeling that can then be used to evaluate unverified occurrences of that species. Selecting the charismatic lichen fungus, Usnea longissima, as a case study, we used georeferenced occurrence records based on sequenced specimens to model its predicted niche. Our results suggest that the target species is largely restricted to a narrow range of boreal and temperate forest in the Northern Hemisphere and that occurrence records in GBIF from tropical regions and the Southern Hemisphere do not represent this taxon, a prediction tested by comparison with taxonomic revisions of Usnea for these regions. As a novel approach, we employed Principal Component Analysis on the environmental grid data used for predictive modeling to visualize potential ecogeographical barriers for the target species; we found that tropical regions conform a strong barrier, explaining why potential niches in the Southern Hemisphere were not colonized by Usnea longissima and instead by morphologically similar species. This approach is an example of how data from two of the most important biodiversity repositories, GenBank and GBIF, can be effectively combined to remotely address the problem of inaccuracy of taxonomic identifications in occurrence data repositories and to provide a filtering mechanism which can considerably reduce the number of voucher specimens that need critical revision, in this case from 4,672 to about 100.
url	http://europepmc.org/articles/PMC4788202?pdf=render
work_keys_str_mv	AT beugenesmith fromgenbanktogbifphylogenybasedpredictivenichemodelingtestsaccuracyoftaxonomicidentificationsinlargeoccurrencedatarepositories AT markkjohnston fromgenbanktogbifphylogenybasedpredictivenichemodelingtestsaccuracyoftaxonomicidentificationsinlargeoccurrencedatarepositories AT robertlucking fromgenbanktogbifphylogenybasedpredictivenichemodelingtestsaccuracyoftaxonomicidentificationsinlargeoccurrencedatarepositories
_version_	1725938534310739968

From GenBank to GBIF: Phylogeny-Based Predictive Niche Modeling Tests Accuracy of Taxonomic Identifications in Large Occurrence Data Repositories.

Similar Items