Protein structural similarity search by Ramachandran codes

Abstract Background Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional p...

Full description

Bibliographic Details
Main Authors:	Chang Chih-Hung, Huang Po-Jung, Lo Wei-Cheng, Lyu Ping-Chiang
Format:	Article
Language:	English
Published:	BMC 2007-08-01
Series:	BMC Bioinformatics
Online Access:	http://www.biomedcentral.com/1471-2105/8/307

id	doaj-1a9e3203bb6a40d1aa758674187c3785
record_format	Article
spelling	doaj-1a9e3203bb6a40d1aa758674187c37852020-11-25T00:37:40ZengBMCBMC Bioinformatics1471-21052007-08-018130710.1186/1471-2105-8-307Protein structural similarity search by Ramachandran codesChang Chih-HungHuang Po-JungLo Wei-ChengLyu Ping-Chiang<p>Abstract</p> <p>Background</p> <p>Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we aimed to improve the linear encoding methodology and develop efficient search tools that can rapidly retrieve structural homologs from large protein databases.</p> <p>Results</p> <p>We propose a new linear encoding method, SARST (Structural similarity search Aided by Ramachandran Sequential Transformation). SARST transforms protein structures into text strings through a Ramachandran map organized by nearest-neighbor clustering and uses a regenerative approach to produce substitution matrices. Then, classical sequence similarity search methods can be applied to the structural similarity search. Its accuracy is similar to Combinatorial Extension (CE) and works over 243,000 times faster, searching 34,000 proteins in 0.34 sec with a 3.2-GHz CPU. SARST provides statistically meaningful expectation values to assess the retrieved information. It has been implemented into a web service and a stand-alone Java program that is able to run on many different platforms.</p> <p>Conclusion</p> <p>As a database search method, SARST can rapidly distinguish high from low similarities and efficiently retrieve homologous structures. It demonstrates that the easily accessible linear encoding methodology has the potential to serve as a foundation for efficient protein structural similarity search tools. These search tools are supposed applicable to automated and high-throughput functional annotations or predictions for the ever increasing number of published protein structures in this post-genomic era.</p> http://www.biomedcentral.com/1471-2105/8/307
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Chang Chih-Hung Huang Po-Jung Lo Wei-Cheng Lyu Ping-Chiang
spellingShingle	Chang Chih-Hung Huang Po-Jung Lo Wei-Cheng Lyu Ping-Chiang Protein structural similarity search by Ramachandran codes BMC Bioinformatics
author_facet	Chang Chih-Hung Huang Po-Jung Lo Wei-Cheng Lyu Ping-Chiang
author_sort	Chang Chih-Hung
title	Protein structural similarity search by Ramachandran codes
title_short	Protein structural similarity search by Ramachandran codes
title_full	Protein structural similarity search by Ramachandran codes
title_fullStr	Protein structural similarity search by Ramachandran codes
title_full_unstemmed	Protein structural similarity search by Ramachandran codes
title_sort	protein structural similarity search by ramachandran codes
publisher	BMC
series	BMC Bioinformatics
issn	1471-2105
publishDate	2007-08-01
description	<p>Abstract</p> <p>Background</p> <p>Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we aimed to improve the linear encoding methodology and develop efficient search tools that can rapidly retrieve structural homologs from large protein databases.</p> <p>Results</p> <p>We propose a new linear encoding method, SARST (Structural similarity search Aided by Ramachandran Sequential Transformation). SARST transforms protein structures into text strings through a Ramachandran map organized by nearest-neighbor clustering and uses a regenerative approach to produce substitution matrices. Then, classical sequence similarity search methods can be applied to the structural similarity search. Its accuracy is similar to Combinatorial Extension (CE) and works over 243,000 times faster, searching 34,000 proteins in 0.34 sec with a 3.2-GHz CPU. SARST provides statistically meaningful expectation values to assess the retrieved information. It has been implemented into a web service and a stand-alone Java program that is able to run on many different platforms.</p> <p>Conclusion</p> <p>As a database search method, SARST can rapidly distinguish high from low similarities and efficiently retrieve homologous structures. It demonstrates that the easily accessible linear encoding methodology has the potential to serve as a foundation for efficient protein structural similarity search tools. These search tools are supposed applicable to automated and high-throughput functional annotations or predictions for the ever increasing number of published protein structures in this post-genomic era.</p>
url	http://www.biomedcentral.com/1471-2105/8/307
work_keys_str_mv	AT changchihhung proteinstructuralsimilaritysearchbyramachandrancodes AT huangpojung proteinstructuralsimilaritysearchbyramachandrancodes AT loweicheng proteinstructuralsimilaritysearchbyramachandrancodes AT lyupingchiang proteinstructuralsimilaritysearchbyramachandrancodes
_version_	1725300033138458624

Protein structural similarity search by Ramachandran codes

Similar Items