A systems approach to computational protein identification

Proteomics is the science of understanding the dynamic protein content of an organism's cells (its proteome), which is one of the largest current challenges in biology. Computational proteomics is an active research area that involves in-silico methods for the analysis of high-throughput protei...

Full description

Bibliographic Details
Main Author: Ramakrishnan, Smriti Rajan
Format: Others
Language:English
Published: 2010
Subjects:
Online Access:http://hdl.handle.net/2152/ETD-UT-2010-05-1036
id ndltd-UTEXAS-oai-repositories.lib.utexas.edu-2152-ETD-UT-2010-05-1036
record_format oai_dc
spelling ndltd-UTEXAS-oai-repositories.lib.utexas.edu-2152-ETD-UT-2010-05-10362015-09-20T16:55:09ZA systems approach to computational protein identificationRamakrishnan, Smriti RajanComputational biologyBioinformaticsIntegrative statistical data analysisComputational proteomicsSystems biologyDatabase indexingProteomics is the science of understanding the dynamic protein content of an organism's cells (its proteome), which is one of the largest current challenges in biology. Computational proteomics is an active research area that involves in-silico methods for the analysis of high-throughput protein identification data. Current methods are based on a technology called tandem mass spectrometry (MS/MS) and suffer from low coverage and accuracy, reliably identifying only 20-40% of the proteome. This dissertation addresses recall, precision, speed and scalability of computational proteomics experiments. This research goes beyond the traditional paradigm of analyzing MS/MS experiments in isolation, instead learning priors of protein presence from the joint analysis of various systems biology data sources. This integrative `systems' approach to protein identification is very effective, as demonstrated by two new methods. The first, MSNet, introduces a social model for protein identification and leverages functional dependencies from genome-scale, probabilistic, gene functional networks. The second, MSPresso, learns a gene expression prior from a joint analysis of mRNA and proteomics experiments on similar samples. These two sources of prior information result in more accurate estimates of protein presence, and increase protein recall by as much as 30% in complex samples, while also increasing precision. A comprehensive suite of benchmarking datasets is introduced for evaluation in yeast. Methods to assess statistical significance in the absence of ground truth are also introduced and employed whenever applicable. This dissertation also describes a database indexing solution to improve speed and scalability of protein identification experiments. The method, MSFound, customizes a metric-space database index and its associated approximate k-nearest-neighbor search algorithm with a semi-metric distance designed to match noisy spectra. MSFound achieves an order of magnitude speedup over traditional spectra database searches while maintaining scalability.text2010-10-21T19:59:40Z2010-10-21T19:59:48Z2010-10-21T19:59:40Z2010-10-21T19:59:48Z2010-052010-10-21May 20102010-10-21T19:59:48Zthesisapplication/pdfhttp://hdl.handle.net/2152/ETD-UT-2010-05-1036eng
collection NDLTD
language English
format Others
sources NDLTD
topic Computational biology
Bioinformatics
Integrative statistical data analysis
Computational proteomics
Systems biology
Database indexing
spellingShingle Computational biology
Bioinformatics
Integrative statistical data analysis
Computational proteomics
Systems biology
Database indexing
Ramakrishnan, Smriti Rajan
A systems approach to computational protein identification
description Proteomics is the science of understanding the dynamic protein content of an organism's cells (its proteome), which is one of the largest current challenges in biology. Computational proteomics is an active research area that involves in-silico methods for the analysis of high-throughput protein identification data. Current methods are based on a technology called tandem mass spectrometry (MS/MS) and suffer from low coverage and accuracy, reliably identifying only 20-40% of the proteome. This dissertation addresses recall, precision, speed and scalability of computational proteomics experiments. This research goes beyond the traditional paradigm of analyzing MS/MS experiments in isolation, instead learning priors of protein presence from the joint analysis of various systems biology data sources. This integrative `systems' approach to protein identification is very effective, as demonstrated by two new methods. The first, MSNet, introduces a social model for protein identification and leverages functional dependencies from genome-scale, probabilistic, gene functional networks. The second, MSPresso, learns a gene expression prior from a joint analysis of mRNA and proteomics experiments on similar samples. These two sources of prior information result in more accurate estimates of protein presence, and increase protein recall by as much as 30% in complex samples, while also increasing precision. A comprehensive suite of benchmarking datasets is introduced for evaluation in yeast. Methods to assess statistical significance in the absence of ground truth are also introduced and employed whenever applicable. This dissertation also describes a database indexing solution to improve speed and scalability of protein identification experiments. The method, MSFound, customizes a metric-space database index and its associated approximate k-nearest-neighbor search algorithm with a semi-metric distance designed to match noisy spectra. MSFound achieves an order of magnitude speedup over traditional spectra database searches while maintaining scalability. === text
author Ramakrishnan, Smriti Rajan
author_facet Ramakrishnan, Smriti Rajan
author_sort Ramakrishnan, Smriti Rajan
title A systems approach to computational protein identification
title_short A systems approach to computational protein identification
title_full A systems approach to computational protein identification
title_fullStr A systems approach to computational protein identification
title_full_unstemmed A systems approach to computational protein identification
title_sort systems approach to computational protein identification
publishDate 2010
url http://hdl.handle.net/2152/ETD-UT-2010-05-1036
work_keys_str_mv AT ramakrishnansmritirajan asystemsapproachtocomputationalproteinidentification
AT ramakrishnansmritirajan systemsapproachtocomputationalproteinidentification
_version_ 1716820955553071104