Prediction and classification of aminoacyl tRNA synthetases using PROSITE domains

<p>Abstract</p> <p>Background</p> <p>Aminoacyl tRNA synthetases (aaRSs) catalyse the first step of protein synthesis in all organisms. They are responsible for the precise attachment of amino acids to their cognate transfer RNAs. There are twenty different types of aaRS...

Full description

Bibliographic Details
Main Authors: Panwar Bharat, Raghava Gajendra PS
Format: Article
Language:English
Published: BMC 2010-09-01
Series:BMC Genomics
Online Access:http://www.biomedcentral.com/1471-2164/11/507
id doaj-d85cad636ed24b128dd3bb1279f6fb64
record_format Article
spelling doaj-d85cad636ed24b128dd3bb1279f6fb642020-11-24T22:02:58ZengBMCBMC Genomics1471-21642010-09-0111150710.1186/1471-2164-11-507Prediction and classification of aminoacyl tRNA synthetases using PROSITE domainsPanwar BharatRaghava Gajendra PS<p>Abstract</p> <p>Background</p> <p>Aminoacyl tRNA synthetases (aaRSs) catalyse the first step of protein synthesis in all organisms. They are responsible for the precise attachment of amino acids to their cognate transfer RNAs. There are twenty different types of aaRSs, unique for each amino acid. These aaRSs have been divided into two classes, each comprising ten enzymes. It is important to predict and classify aaRSs in order to understand protein synthesis.</p> <p>Results</p> <p>In this study, all models were developed on a non-redundant dataset containing 117 aaRSs and an equal number of non-aaRSs, in which no two sequences have more than 30% similarity. First, we applied the similarity search technique, BLAST, and achieved a maximum accuracy of 67.52%. We observed that 62% of tRNA synthetases contain one or more domains from amongst the following four PROSITE domains: PS50862, PS00178, PS50860 and PS50861. An SVM-based model was developed to discriminate between aaRSs, and non-aaRSs, and achieved a maximum MCC of 0.68 with accuracy of 83.73%, using selective dipeptide composition. We developed a hybrid approach and achieved a maximum MCC of 0.72 with accuracy of 85.49%, where SVM model developed using selected dipeptide composition and information of four PROSITE domains. We further developed an SVM-based model for classifying the aaRSs into class-1 and class-2, using selective dipeptide composition and achieved an MCC of 0.79. We also observed that two domains (PS00178, PS50889) in class-1 and three domains (PS50862, PS50860, PS50861) in class-2 were preferred. A hybrid method was developed using these domains as descriptor, along with selected dipeptide composition, and achieved an MCC of 0.87 with a sensitivity of 94.55% and an accuracy of 93.19%. All models were evaluated using a five-fold cross-validation technique.</p> <p>Conclusions</p> <p>We have analyzed protein sequences of aaRSs (class-1 and class-2) and non-aaRSs and identified interesting patterns. The high accuracy achieved by our SVM models using selected dipeptide composition demonstrates that certain types of dipeptide are preferred in aaRSs. We were able to identify PROSITE domains that are preferred in aaRSs and their classes, providing interesting insights into tRNA synthetases. The method developed in this study will be useful for researchers studying aaRS enzymes and tRNA biology. The web-server based on the above study, is available at <url>http://www.imtech.res.in/raghava/icaars/</url>.</p> http://www.biomedcentral.com/1471-2164/11/507
collection DOAJ
language English
format Article
sources DOAJ
author Panwar Bharat
Raghava Gajendra PS
spellingShingle Panwar Bharat
Raghava Gajendra PS
Prediction and classification of aminoacyl tRNA synthetases using PROSITE domains
BMC Genomics
author_facet Panwar Bharat
Raghava Gajendra PS
author_sort Panwar Bharat
title Prediction and classification of aminoacyl tRNA synthetases using PROSITE domains
title_short Prediction and classification of aminoacyl tRNA synthetases using PROSITE domains
title_full Prediction and classification of aminoacyl tRNA synthetases using PROSITE domains
title_fullStr Prediction and classification of aminoacyl tRNA synthetases using PROSITE domains
title_full_unstemmed Prediction and classification of aminoacyl tRNA synthetases using PROSITE domains
title_sort prediction and classification of aminoacyl trna synthetases using prosite domains
publisher BMC
series BMC Genomics
issn 1471-2164
publishDate 2010-09-01
description <p>Abstract</p> <p>Background</p> <p>Aminoacyl tRNA synthetases (aaRSs) catalyse the first step of protein synthesis in all organisms. They are responsible for the precise attachment of amino acids to their cognate transfer RNAs. There are twenty different types of aaRSs, unique for each amino acid. These aaRSs have been divided into two classes, each comprising ten enzymes. It is important to predict and classify aaRSs in order to understand protein synthesis.</p> <p>Results</p> <p>In this study, all models were developed on a non-redundant dataset containing 117 aaRSs and an equal number of non-aaRSs, in which no two sequences have more than 30% similarity. First, we applied the similarity search technique, BLAST, and achieved a maximum accuracy of 67.52%. We observed that 62% of tRNA synthetases contain one or more domains from amongst the following four PROSITE domains: PS50862, PS00178, PS50860 and PS50861. An SVM-based model was developed to discriminate between aaRSs, and non-aaRSs, and achieved a maximum MCC of 0.68 with accuracy of 83.73%, using selective dipeptide composition. We developed a hybrid approach and achieved a maximum MCC of 0.72 with accuracy of 85.49%, where SVM model developed using selected dipeptide composition and information of four PROSITE domains. We further developed an SVM-based model for classifying the aaRSs into class-1 and class-2, using selective dipeptide composition and achieved an MCC of 0.79. We also observed that two domains (PS00178, PS50889) in class-1 and three domains (PS50862, PS50860, PS50861) in class-2 were preferred. A hybrid method was developed using these domains as descriptor, along with selected dipeptide composition, and achieved an MCC of 0.87 with a sensitivity of 94.55% and an accuracy of 93.19%. All models were evaluated using a five-fold cross-validation technique.</p> <p>Conclusions</p> <p>We have analyzed protein sequences of aaRSs (class-1 and class-2) and non-aaRSs and identified interesting patterns. The high accuracy achieved by our SVM models using selected dipeptide composition demonstrates that certain types of dipeptide are preferred in aaRSs. We were able to identify PROSITE domains that are preferred in aaRSs and their classes, providing interesting insights into tRNA synthetases. The method developed in this study will be useful for researchers studying aaRS enzymes and tRNA biology. The web-server based on the above study, is available at <url>http://www.imtech.res.in/raghava/icaars/</url>.</p>
url http://www.biomedcentral.com/1471-2164/11/507
work_keys_str_mv AT panwarbharat predictionandclassificationofaminoacyltrnasynthetasesusingprositedomains
AT raghavagajendraps predictionandclassificationofaminoacyltrnasynthetasesusingprositedomains
_version_ 1725833707214864384