The importance of recognizing and reporting sequence database contamination for proteomics

Advances in genome sequencing have made proteomic experiments more successful than ever. However, not all entries in a sequence database are of equal quality. Genome sequences are contaminated more frequently than is admitted. Contamination impacts homology-based proteomic, proteogenomic, and metapr...

Full description

Bibliographic Details
Main Authors: Olivier Pible, Erica M. Hartmann, Gilles Imbert, Jean Armengaud
Format: Article
Language:English
Published: Elsevier 2014-06-01
Series:EuPA Open Proteomics
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2212968514000269
id doaj-acd2171caf0542aaaf9f7ab5c7210ede
record_format Article
spelling doaj-acd2171caf0542aaaf9f7ab5c7210ede2020-11-24T22:35:23ZengElsevierEuPA Open Proteomics2212-96852014-06-013C24624910.1016/j.euprot.2014.04.001The importance of recognizing and reporting sequence database contamination for proteomicsOlivier PibleErica M. HartmannGilles ImbertJean ArmengaudAdvances in genome sequencing have made proteomic experiments more successful than ever. However, not all entries in a sequence database are of equal quality. Genome sequences are contaminated more frequently than is admitted. Contamination impacts homology-based proteomic, proteogenomic, and metaproteomic results. We highlight two examples in the National Center for Biotechnology Information non-redundant database (NCBInr) that are likely contaminated: the bacterium Enterococcus gallinarum EGD-AAK12 and the insect Ceratitis capitata. We hope to incite users of this and other databases to critically evaluate submitted sequences and to contribute to the overall quality of the database by signaling potential errors when possible.http://www.sciencedirect.com/science/article/pii/S2212968514000269DatabaseProteomicsMetaproteomicsContaminationBlast analysisCuration
collection DOAJ
language English
format Article
sources DOAJ
author Olivier Pible
Erica M. Hartmann
Gilles Imbert
Jean Armengaud
spellingShingle Olivier Pible
Erica M. Hartmann
Gilles Imbert
Jean Armengaud
The importance of recognizing and reporting sequence database contamination for proteomics
EuPA Open Proteomics
Database
Proteomics
Metaproteomics
Contamination
Blast analysis
Curation
author_facet Olivier Pible
Erica M. Hartmann
Gilles Imbert
Jean Armengaud
author_sort Olivier Pible
title The importance of recognizing and reporting sequence database contamination for proteomics
title_short The importance of recognizing and reporting sequence database contamination for proteomics
title_full The importance of recognizing and reporting sequence database contamination for proteomics
title_fullStr The importance of recognizing and reporting sequence database contamination for proteomics
title_full_unstemmed The importance of recognizing and reporting sequence database contamination for proteomics
title_sort importance of recognizing and reporting sequence database contamination for proteomics
publisher Elsevier
series EuPA Open Proteomics
issn 2212-9685
publishDate 2014-06-01
description Advances in genome sequencing have made proteomic experiments more successful than ever. However, not all entries in a sequence database are of equal quality. Genome sequences are contaminated more frequently than is admitted. Contamination impacts homology-based proteomic, proteogenomic, and metaproteomic results. We highlight two examples in the National Center for Biotechnology Information non-redundant database (NCBInr) that are likely contaminated: the bacterium Enterococcus gallinarum EGD-AAK12 and the insect Ceratitis capitata. We hope to incite users of this and other databases to critically evaluate submitted sequences and to contribute to the overall quality of the database by signaling potential errors when possible.
topic Database
Proteomics
Metaproteomics
Contamination
Blast analysis
Curation
url http://www.sciencedirect.com/science/article/pii/S2212968514000269
work_keys_str_mv AT olivierpible theimportanceofrecognizingandreportingsequencedatabasecontaminationforproteomics
AT ericamhartmann theimportanceofrecognizingandreportingsequencedatabasecontaminationforproteomics
AT gillesimbert theimportanceofrecognizingandreportingsequencedatabasecontaminationforproteomics
AT jeanarmengaud theimportanceofrecognizingandreportingsequencedatabasecontaminationforproteomics
AT olivierpible importanceofrecognizingandreportingsequencedatabasecontaminationforproteomics
AT ericamhartmann importanceofrecognizingandreportingsequencedatabasecontaminationforproteomics
AT gillesimbert importanceofrecognizingandreportingsequencedatabasecontaminationforproteomics
AT jeanarmengaud importanceofrecognizingandreportingsequencedatabasecontaminationforproteomics
_version_ 1725723596965281792