The influence of transcript assembly on the proteogenomics discovery of microproteins.

Proteogenomics methods have identified many non-annotated protein-coding genes in the human genome. Many of the newly discovered protein-coding genes encode peptides and small proteins, referred to collectively as microproteins. Microproteins are produced through ribosome translation of small open r...

Full description

Bibliographic Details
Main Authors: Jiao Ma, Alan Saghatelian, Maxim Nikolaievich Shokhirev
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2018-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0194518
id doaj-574a8f97be5244d89b8f68c5a60d04cf
record_format Article
spelling doaj-574a8f97be5244d89b8f68c5a60d04cf2021-03-03T21:25:14ZengPublic Library of Science (PLoS)PLoS ONE1932-62032018-01-01133e019451810.1371/journal.pone.0194518The influence of transcript assembly on the proteogenomics discovery of microproteins.Jiao MaAlan SaghatelianMaxim Nikolaievich ShokhirevProteogenomics methods have identified many non-annotated protein-coding genes in the human genome. Many of the newly discovered protein-coding genes encode peptides and small proteins, referred to collectively as microproteins. Microproteins are produced through ribosome translation of small open reading frames (smORFs). The discovery of many smORFs reveals a blind spot in traditional gene-finding algorithms for these genes. Biological studies have found roles for microproteins in cell biology and physiology, and the potential that there exists additional bioactive microproteins drives the interest in detection and discovery of these molecules. A key step in any proteogenomics workflow is the assembly of RNA-Seq data into likely mRNA transcripts that are then used to create a searchable protein database. Here we demonstrate that specific features of the assembled transcriptome impact microprotein detection by shotgun proteomics. By tailoring transcript assembly for downstream mass spectrometry searching, we show that we can detect more than double the number of high-quality microprotein candidates and introduce a novel open-source mRNA assembler for proteogenomics (MAPS) that incorporates all of these features. By integrating our specialized assembler, MAPS, and a popular generalized assembler into our proteogenomics pipeline, we detect 45 novel human microproteins from a high quality proteogenomics dataset of a human cell line. We then characterize the features of the novel microproteins, identifying two classes of microproteins. Our work highlights the importance of specialized transcriptome assembly upstream of proteomics validation when searching for short and potentially rare and poorly conserved proteins.https://doi.org/10.1371/journal.pone.0194518
collection DOAJ
language English
format Article
sources DOAJ
author Jiao Ma
Alan Saghatelian
Maxim Nikolaievich Shokhirev
spellingShingle Jiao Ma
Alan Saghatelian
Maxim Nikolaievich Shokhirev
The influence of transcript assembly on the proteogenomics discovery of microproteins.
PLoS ONE
author_facet Jiao Ma
Alan Saghatelian
Maxim Nikolaievich Shokhirev
author_sort Jiao Ma
title The influence of transcript assembly on the proteogenomics discovery of microproteins.
title_short The influence of transcript assembly on the proteogenomics discovery of microproteins.
title_full The influence of transcript assembly on the proteogenomics discovery of microproteins.
title_fullStr The influence of transcript assembly on the proteogenomics discovery of microproteins.
title_full_unstemmed The influence of transcript assembly on the proteogenomics discovery of microproteins.
title_sort influence of transcript assembly on the proteogenomics discovery of microproteins.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2018-01-01
description Proteogenomics methods have identified many non-annotated protein-coding genes in the human genome. Many of the newly discovered protein-coding genes encode peptides and small proteins, referred to collectively as microproteins. Microproteins are produced through ribosome translation of small open reading frames (smORFs). The discovery of many smORFs reveals a blind spot in traditional gene-finding algorithms for these genes. Biological studies have found roles for microproteins in cell biology and physiology, and the potential that there exists additional bioactive microproteins drives the interest in detection and discovery of these molecules. A key step in any proteogenomics workflow is the assembly of RNA-Seq data into likely mRNA transcripts that are then used to create a searchable protein database. Here we demonstrate that specific features of the assembled transcriptome impact microprotein detection by shotgun proteomics. By tailoring transcript assembly for downstream mass spectrometry searching, we show that we can detect more than double the number of high-quality microprotein candidates and introduce a novel open-source mRNA assembler for proteogenomics (MAPS) that incorporates all of these features. By integrating our specialized assembler, MAPS, and a popular generalized assembler into our proteogenomics pipeline, we detect 45 novel human microproteins from a high quality proteogenomics dataset of a human cell line. We then characterize the features of the novel microproteins, identifying two classes of microproteins. Our work highlights the importance of specialized transcriptome assembly upstream of proteomics validation when searching for short and potentially rare and poorly conserved proteins.
url https://doi.org/10.1371/journal.pone.0194518
work_keys_str_mv AT jiaoma theinfluenceoftranscriptassemblyontheproteogenomicsdiscoveryofmicroproteins
AT alansaghatelian theinfluenceoftranscriptassemblyontheproteogenomicsdiscoveryofmicroproteins
AT maximnikolaievichshokhirev theinfluenceoftranscriptassemblyontheproteogenomicsdiscoveryofmicroproteins
AT jiaoma influenceoftranscriptassemblyontheproteogenomicsdiscoveryofmicroproteins
AT alansaghatelian influenceoftranscriptassemblyontheproteogenomicsdiscoveryofmicroproteins
AT maximnikolaievichshokhirev influenceoftranscriptassemblyontheproteogenomicsdiscoveryofmicroproteins
_version_ 1714816967847706624