Summary: | 碩士 === 國立臺灣大學 === 工程科學及海洋工程學研究所 === 99 === Our study proposes a novel MIXScore, a scoring function which improves the prediction of protein-ligand binding affinities. The prediction is an important issue in structure-based drug discovery and design. Typically, scoring functions can be classified into three groups: force-field, knowledge-based, and empirical.
Traditional validation methods such as 5-fold cross validation and Leave-One-Out cross validation (LOO) do not encounter over-fitting problem, but the assessments may be too optimistic because the complexes in the same protein families may be distributed in training set and testing set at the same time. Therefore, Kramer and Gedeck provided a special method called Leave-Cluster-Out cross validation (LCO) and recommended that LCO could avoid an overoptimistic bias.
We combine hybridized orbital atom type pair descriptors and X-CSCORE descriptors which in the knowledge-based and empirical fields into a feature vector, totally 210 descriptors. Random forest regression is applied to build the predict model. The performance of MIXScore is evaluated by adopting PDBbind07 and PDBbind09 as benchmarks and compared with several existing scoring functions. PDBbind07 is used for independent test and PDBbind09 is used for LCO cross validation.
The independent test shows that MIXScore is better than RF-Score published in 2010 (RMSE = 1.98kcal/mol and R2 = 0.691). In LCO cross validation, although the similarities between training and testing sets are excluded, MIXScore still provides stable predicting ability such that MIXScore outperforms RF-Score and the work proposed by Kramer and Gedeck. These results show that MIXScore is a competitive scoring function. MIXScore may also have good external predictability as the modified R2 (Rm2) is greater than 0.5 (0.530) in the independent test.
This study not only improves the performance of predicting binding affinities but discovers the homogenous of proteins in PDBbind dataset will cause overoptimistic bias. The strongest outlier in PDBbind09 and the importance of each X-CSCORE descriptors are shown as well.
|