Protein Loop Prediction by Fragment Assembly

If the primary sequence of a protein is known, what is its three-dimensional structure? This is one of the most challenging problems in molecular biology and has many applications in proteomics. During the last three decades, this issue has been extensively researched. Techniques such as the protein...

Full description

Bibliographic Details
Main Author: Liu, Zhifeng
Format: Others
Language:en
Published: 2007
Subjects:
Online Access:http://hdl.handle.net/10012/2655
Description
Summary:If the primary sequence of a protein is known, what is its three-dimensional structure? This is one of the most challenging problems in molecular biology and has many applications in proteomics. During the last three decades, this issue has been extensively researched. Techniques such as the protein folding approach have been demonstrated to be promising in predicting the core areas of proteins - α-helices and β-strands. However, loops that contain no regular units of secondary structure elements remain the most difficult regions for prediction. The protein loop prediction problem is to predict the spatial structure of a loop given the primary sequence of a protein and the spatial structures of all the other regions. There are two major approaches used to conduct loop prediction – the ab initio folding and database searching methods. The loop prediction accuracy is unsatisfactory because of the hypervariable property of the loops. The key contribution proposed by this thesis is a novel fragment assembly algorithm using branch-and-cut to tackle the loop prediction problem. We present various pruning rules to reduce the search space and to speed up the finding of good loop candidates. The algorithm has the advantages of the database-search approach and ensures that the predicted loops are physically reasonable. The algorithm also benefits from ab initio folding since it enumerates all the possible loops in the discrete approximation of the conformation space. We implemented the proposed algorithm as a protein loop prediction tool named LoopLocker. A test set from CASP6, the world wide protein structure prediction competition, was used to evaluate the performance of LoopLocker. Experimental results showed that LoopLocker is capable of predicting loops of 4, 8, 11-12, 13-15 residues with average RMSD errors of 0.452, 1.410, 1.741 and 1.895 A respectively. In the PDB, more than 90% loops are fewer than 15 residues. This concludes that our fragment assembly algorithm is successful in tackling the loop prediction problem.