Distributed Approach for Peptide Identification

A crucial step in protein identification is peptide identification. The Peptide Spectrum Match (PSM) information set is enormous. Hence, it is a time-consuming procedure to work on a single machine. PSMs are situated by a cross connection, a factual score, or a probability that the match between the...

Full description

Bibliographic Details
Main Author: Vedanbhatla, Naga V K Abhinav
Format: Others
Published: TopSCHOLAR® 2015
Subjects:
Online Access:http://digitalcommons.wku.edu/theses/1546
http://digitalcommons.wku.edu/cgi/viewcontent.cgi?article=2550&context=theses
id ndltd-WKU-oai-digitalcommons.wku.edu-theses-2550
record_format oai_dc
spelling ndltd-WKU-oai-digitalcommons.wku.edu-theses-25502015-12-12T04:56:35Z Distributed Approach for Peptide Identification Vedanbhatla, Naga V K Abhinav A crucial step in protein identification is peptide identification. The Peptide Spectrum Match (PSM) information set is enormous. Hence, it is a time-consuming procedure to work on a single machine. PSMs are situated by a cross connection, a factual score, or a probability that the match between the trial and speculative is right and original. This procedure takes quite a while to execute. So, there is demand for enhancement of the performance to handle extensive peptide information sets. Development of appropriate distributed frameworks are expected to lessen the processing time. The designed framework uses a peptide handling algorithm named C-Ranker, which takes peptide data as an input then identifies the accurate PSMs. The framework has two steps: Execute the C-Ranker algorithm on servers specified by the user and compare the correct PSM’s data generated via the distributed approach with the normal execution approach of C-Ranker. The objective of this framework is to process expansive peptide datasets utilizing a distributive approach. The nature of the solution calls for parallel execution and hence a decision to implement the same in Java has been taken. The results clearly show that distributed C-Ranker executes in less time as compared to the conventional centralized CRanker application. Around 66.67% of the overall reduction in execution time is shown with this approach. Besides, there is a reduction in the average memory usage with the distributed system running C-Ranker on multiple servers. A great significant benefit that may get overlooked is the fact the distributed CRanker can be used to solve extraordinarily large problems without incurring expenses for a powerful computer or a super computer. Comparison of this approach with An Apache Hadoop Framework for peptide identification with respect to the cost, execution times and flexibility were discussed. 2015-10-01T07:00:00Z text application/pdf http://digitalcommons.wku.edu/theses/1546 http://digitalcommons.wku.edu/cgi/viewcontent.cgi?article=2550&context=theses Masters Theses & Specialist Projects TopSCHOLAR® C-Ranker machine learning Analytical Chemistry Computer Engineering Computer Sciences
collection NDLTD
format Others
sources NDLTD
topic C-Ranker
machine learning
Analytical Chemistry
Computer Engineering
Computer Sciences
spellingShingle C-Ranker
machine learning
Analytical Chemistry
Computer Engineering
Computer Sciences
Vedanbhatla, Naga V K Abhinav
Distributed Approach for Peptide Identification
description A crucial step in protein identification is peptide identification. The Peptide Spectrum Match (PSM) information set is enormous. Hence, it is a time-consuming procedure to work on a single machine. PSMs are situated by a cross connection, a factual score, or a probability that the match between the trial and speculative is right and original. This procedure takes quite a while to execute. So, there is demand for enhancement of the performance to handle extensive peptide information sets. Development of appropriate distributed frameworks are expected to lessen the processing time. The designed framework uses a peptide handling algorithm named C-Ranker, which takes peptide data as an input then identifies the accurate PSMs. The framework has two steps: Execute the C-Ranker algorithm on servers specified by the user and compare the correct PSM’s data generated via the distributed approach with the normal execution approach of C-Ranker. The objective of this framework is to process expansive peptide datasets utilizing a distributive approach. The nature of the solution calls for parallel execution and hence a decision to implement the same in Java has been taken. The results clearly show that distributed C-Ranker executes in less time as compared to the conventional centralized CRanker application. Around 66.67% of the overall reduction in execution time is shown with this approach. Besides, there is a reduction in the average memory usage with the distributed system running C-Ranker on multiple servers. A great significant benefit that may get overlooked is the fact the distributed CRanker can be used to solve extraordinarily large problems without incurring expenses for a powerful computer or a super computer. Comparison of this approach with An Apache Hadoop Framework for peptide identification with respect to the cost, execution times and flexibility were discussed.
author Vedanbhatla, Naga V K Abhinav
author_facet Vedanbhatla, Naga V K Abhinav
author_sort Vedanbhatla, Naga V K Abhinav
title Distributed Approach for Peptide Identification
title_short Distributed Approach for Peptide Identification
title_full Distributed Approach for Peptide Identification
title_fullStr Distributed Approach for Peptide Identification
title_full_unstemmed Distributed Approach for Peptide Identification
title_sort distributed approach for peptide identification
publisher TopSCHOLAR®
publishDate 2015
url http://digitalcommons.wku.edu/theses/1546
http://digitalcommons.wku.edu/cgi/viewcontent.cgi?article=2550&context=theses
work_keys_str_mv AT vedanbhatlanagavkabhinav distributedapproachforpeptideidentification
_version_ 1718148701131636736