A systems approach to computational protein identification

Proteomics is the science of understanding the dynamic protein content of an organism's cells (its proteome), which is one of the largest current challenges in biology. Computational proteomics is an active research area that involves in-silico methods for the analysis of high-throughput protei...

Full description

Bibliographic Details
Main Author:	Ramakrishnan, Smriti Rajan
Format:	Others
Language:	English
Published:	2010
Subjects:	Computational biology Bioinformatics Integrative statistical data analysis Computational proteomics Systems biology Database indexing
Online Access:	http://hdl.handle.net/2152/ETD-UT-2010-05-1036

id	ndltd-UTEXAS-oai-repositories.lib.utexas.edu-2152-ETD-UT-2010-05-1036
record_format	oai_dc
spelling	ndltd-UTEXAS-oai-repositories.lib.utexas.edu-2152-ETD-UT-2010-05-10362015-09-20T16:55:09ZA systems approach to computational protein identificationRamakrishnan, Smriti RajanComputational biologyBioinformaticsIntegrative statistical data analysisComputational proteomicsSystems biologyDatabase indexingProteomics is the science of understanding the dynamic protein content of an organism's cells (its proteome), which is one of the largest current challenges in biology. Computational proteomics is an active research area that involves in-silico methods for the analysis of high-throughput protein identification data. Current methods are based on a technology called tandem mass spectrometry (MS/MS) and suffer from low coverage and accuracy, reliably identifying only 20-40% of the proteome. This dissertation addresses recall, precision, speed and scalability of computational proteomics experiments. This research goes beyond the traditional paradigm of analyzing MS/MS experiments in isolation, instead learning priors of protein presence from the joint analysis of various systems biology data sources. This integrative `systems' approach to protein identification is very effective, as demonstrated by two new methods. The first, MSNet, introduces a social model for protein identification and leverages functional dependencies from genome-scale, probabilistic, gene functional networks. The second, MSPresso, learns a gene expression prior from a joint analysis of mRNA and proteomics experiments on similar samples. These two sources of prior information result in more accurate estimates of protein presence, and increase protein recall by as much as 30% in complex samples, while also increasing precision. A comprehensive suite of benchmarking datasets is introduced for evaluation in yeast. Methods to assess statistical significance in the absence of ground truth are also introduced and employed whenever applicable. This dissertation also describes a database indexing solution to improve speed and scalability of protein identification experiments. The method, MSFound, customizes a metric-space database index and its associated approximate k-nearest-neighbor search algorithm with a semi-metric distance designed to match noisy spectra. MSFound achieves an order of magnitude speedup over traditional spectra database searches while maintaining scalability.text2010-10-21T19:59:40Z2010-10-21T19:59:48Z2010-10-21T19:59:40Z2010-10-21T19:59:48Z2010-052010-10-21May 20102010-10-21T19:59:48Zthesisapplication/pdfhttp://hdl.handle.net/2152/ETD-UT-2010-05-1036eng
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Computational biology Bioinformatics Integrative statistical data analysis Computational proteomics Systems biology Database indexing
spellingShingle	Computational biology Bioinformatics Integrative statistical data analysis Computational proteomics Systems biology Database indexing Ramakrishnan, Smriti Rajan A systems approach to computational protein identification
description	Proteomics is the science of understanding the dynamic protein content of an organism's cells (its proteome), which is one of the largest current challenges in biology. Computational proteomics is an active research area that involves in-silico methods for the analysis of high-throughput protein identification data. Current methods are based on a technology called tandem mass spectrometry (MS/MS) and suffer from low coverage and accuracy, reliably identifying only 20-40% of the proteome. This dissertation addresses recall, precision, speed and scalability of computational proteomics experiments. This research goes beyond the traditional paradigm of analyzing MS/MS experiments in isolation, instead learning priors of protein presence from the joint analysis of various systems biology data sources. This integrative `systems' approach to protein identification is very effective, as demonstrated by two new methods. The first, MSNet, introduces a social model for protein identification and leverages functional dependencies from genome-scale, probabilistic, gene functional networks. The second, MSPresso, learns a gene expression prior from a joint analysis of mRNA and proteomics experiments on similar samples. These two sources of prior information result in more accurate estimates of protein presence, and increase protein recall by as much as 30% in complex samples, while also increasing precision. A comprehensive suite of benchmarking datasets is introduced for evaluation in yeast. Methods to assess statistical significance in the absence of ground truth are also introduced and employed whenever applicable. This dissertation also describes a database indexing solution to improve speed and scalability of protein identification experiments. The method, MSFound, customizes a metric-space database index and its associated approximate k-nearest-neighbor search algorithm with a semi-metric distance designed to match noisy spectra. MSFound achieves an order of magnitude speedup over traditional spectra database searches while maintaining scalability. === text
author	Ramakrishnan, Smriti Rajan
author_facet	Ramakrishnan, Smriti Rajan
author_sort	Ramakrishnan, Smriti Rajan
title	A systems approach to computational protein identification
title_short	A systems approach to computational protein identification
title_full	A systems approach to computational protein identification
title_fullStr	A systems approach to computational protein identification
title_full_unstemmed	A systems approach to computational protein identification
title_sort	systems approach to computational protein identification
publishDate	2010
url	http://hdl.handle.net/2152/ETD-UT-2010-05-1036
work_keys_str_mv	AT ramakrishnansmritirajan asystemsapproachtocomputationalproteinidentification AT ramakrishnansmritirajan systemsapproachtocomputationalproteinidentification
_version_	1716820955553071104

A systems approach to computational protein identification

Similar Items