A systems approach to computational protein identification
Proteomics is the science of understanding the dynamic protein content of an organism's cells (its proteome), which is one of the largest current challenges in biology. Computational proteomics is an active research area that involves in-silico methods for the analysis of high-throughput protei...
Main Author: | |
---|---|
Format: | Others |
Language: | English |
Published: |
2010
|
Subjects: | |
Online Access: | http://hdl.handle.net/2152/ETD-UT-2010-05-1036 |
id |
ndltd-UTEXAS-oai-repositories.lib.utexas.edu-2152-ETD-UT-2010-05-1036 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-UTEXAS-oai-repositories.lib.utexas.edu-2152-ETD-UT-2010-05-10362015-09-20T16:55:09ZA systems approach to computational protein identificationRamakrishnan, Smriti RajanComputational biologyBioinformaticsIntegrative statistical data analysisComputational proteomicsSystems biologyDatabase indexingProteomics is the science of understanding the dynamic protein content of an organism's cells (its proteome), which is one of the largest current challenges in biology. Computational proteomics is an active research area that involves in-silico methods for the analysis of high-throughput protein identification data. Current methods are based on a technology called tandem mass spectrometry (MS/MS) and suffer from low coverage and accuracy, reliably identifying only 20-40% of the proteome. This dissertation addresses recall, precision, speed and scalability of computational proteomics experiments. This research goes beyond the traditional paradigm of analyzing MS/MS experiments in isolation, instead learning priors of protein presence from the joint analysis of various systems biology data sources. This integrative `systems' approach to protein identification is very effective, as demonstrated by two new methods. The first, MSNet, introduces a social model for protein identification and leverages functional dependencies from genome-scale, probabilistic, gene functional networks. The second, MSPresso, learns a gene expression prior from a joint analysis of mRNA and proteomics experiments on similar samples. These two sources of prior information result in more accurate estimates of protein presence, and increase protein recall by as much as 30% in complex samples, while also increasing precision. A comprehensive suite of benchmarking datasets is introduced for evaluation in yeast. Methods to assess statistical significance in the absence of ground truth are also introduced and employed whenever applicable. This dissertation also describes a database indexing solution to improve speed and scalability of protein identification experiments. The method, MSFound, customizes a metric-space database index and its associated approximate k-nearest-neighbor search algorithm with a semi-metric distance designed to match noisy spectra. MSFound achieves an order of magnitude speedup over traditional spectra database searches while maintaining scalability.text2010-10-21T19:59:40Z2010-10-21T19:59:48Z2010-10-21T19:59:40Z2010-10-21T19:59:48Z2010-052010-10-21May 20102010-10-21T19:59:48Zthesisapplication/pdfhttp://hdl.handle.net/2152/ETD-UT-2010-05-1036eng |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
topic |
Computational biology Bioinformatics Integrative statistical data analysis Computational proteomics Systems biology Database indexing |
spellingShingle |
Computational biology Bioinformatics Integrative statistical data analysis Computational proteomics Systems biology Database indexing Ramakrishnan, Smriti Rajan A systems approach to computational protein identification |
description |
Proteomics is the science of understanding the dynamic protein content of an organism's cells (its proteome), which is one of the largest current challenges in biology. Computational proteomics is an active research area that involves in-silico methods for the analysis of high-throughput protein identification data. Current methods are based on a technology called tandem mass spectrometry (MS/MS) and suffer from low coverage and accuracy, reliably identifying only 20-40% of the
proteome. This dissertation addresses recall, precision, speed and scalability of computational proteomics experiments.
This research goes beyond the traditional paradigm of analyzing MS/MS experiments in isolation, instead learning priors of protein presence from the joint analysis of various systems biology data sources. This integrative `systems' approach to protein identification is very effective, as demonstrated by two new methods. The first, MSNet, introduces a social model for protein identification and leverages functional dependencies from genome-scale, probabilistic, gene functional networks. The second, MSPresso, learns a gene expression prior from a joint analysis of mRNA and proteomics experiments on similar samples.
These two sources of prior information result in more accurate estimates of protein presence, and increase protein recall by as much as 30% in complex samples, while also increasing precision. A comprehensive suite of benchmarking datasets is
introduced for evaluation in yeast. Methods to assess statistical significance in the absence of ground truth are also introduced and employed whenever applicable.
This dissertation also describes a database indexing solution to improve speed and scalability of protein identification experiments. The method, MSFound, customizes a metric-space database index and its associated approximate k-nearest-neighbor search algorithm with a semi-metric distance designed to match noisy spectra. MSFound achieves an order of magnitude speedup over traditional spectra database searches while maintaining scalability. === text |
author |
Ramakrishnan, Smriti Rajan |
author_facet |
Ramakrishnan, Smriti Rajan |
author_sort |
Ramakrishnan, Smriti Rajan |
title |
A systems approach to computational protein identification |
title_short |
A systems approach to computational protein identification |
title_full |
A systems approach to computational protein identification |
title_fullStr |
A systems approach to computational protein identification |
title_full_unstemmed |
A systems approach to computational protein identification |
title_sort |
systems approach to computational protein identification |
publishDate |
2010 |
url |
http://hdl.handle.net/2152/ETD-UT-2010-05-1036 |
work_keys_str_mv |
AT ramakrishnansmritirajan asystemsapproachtocomputationalproteinidentification AT ramakrishnansmritirajan systemsapproachtocomputationalproteinidentification |
_version_ |
1716820955553071104 |