Summary: | 博士 === 國立臺灣大學 === 資訊工程學研究所 === 96 === Protein-DNA interaction plays a key role in living organisms of many genetic activities such as transcription, recombination, DNA replication and repair. Finding binding pairs of proteins and DNA can help us to understand the regulatory pathway of a cell which is an important task of the post-genomic era. Experimental approaches for finding such pairs usually expensive and time-consuming. We propose computational approach called “3D-regulogs” to large scale infer protein-DNA binding partners by using the concept of regulog and the crystal structures of protein-DNA complex as templates. Such method also provides the binding model and interacting amino acids and DNA bases of predicted partners.
The 3D-regulogs uses a scoring method which combines the evolutionary conservation of DNA-contact residues and the preference of interacting residues and nucleotides to evaluate protein-DNA binding partners. By applying the scoring method, we achieve high precision and recall for 66 families of DNA-binding domains, with a false positive rate less than 5% for 250 non-DNA-binding proteins. We also obtained high accuracy in predicting binding free energy of hotspot mutation sets. By testing the regulog mapping of multi-specific families, our method showed good performance to identify proteins with distinct DNA-binding specificity.
For further enhancing the interaction term of the scoring function, we proposed a novel knowledge-based scoring matrix. By using such proposed scoring method, it achieved high correlation with binding affinities of several test sets, including complexes extracted from PRONIT, the Alanine-scanning set, and the base mutation set of zinc finger proteins. We also use the scoring method to scan promoter regions of yeast HO gene and obtained potential transcription factor binding sites.
|