A multi-level nearest-neighbour algorithm for predicting protein secondary structure

A thesis on machine learning and prediction of protein secondary structure. We develop a variation of the nearest-neighbour algorithm that adopts a multi-level strategy together with a variable window size. The algorithm is applied to the problem of predicting the secondary structure of a protein g...

Full description

Bibliographic Details
Main Author:	Lazar, Iustin
Format:	Others
Published:	1998
Online Access:	http://spectrum.library.concordia.ca/507/1/MQ39987.pdf Lazar, Iustin <http://spectrum.library.concordia.ca/view/creators/Lazar=3AIustin=3A=3A.html> (1998) A multi-level nearest-neighbour algorithm for predicting protein secondary structure. Masters thesis, Concordia University.

Description
Summary:	A thesis on machine learning and prediction of protein secondary structure. We develop a variation of the nearest-neighbour algorithm that adopts a multi-level strategy together with a variable window size. The algorithm is applied to the problem of predicting the secondary structure of a protein given its primary structure: that is, given a sequence of amino-acids, output a sequence of secondary structures (helix, sheet, or coil). A new training set is developed that is orthogonal, and covers the known classes of proteins. Overall accuracy is 65.0%, with 68.7% accuracy for helices, 66.3% accuracy for sheets, and 61.4% for coils. This compares well with existing methods, in that the best results for a single nearest-neighbour classifier is 65.1% by Salzberg and Cost in 1992. Our accuracy rate for sheets is better than known methods, but our accuracy rate for coils is much lower than existing methods.

A multi-level nearest-neighbour algorithm for predicting protein secondary structure

Similar Items