Summary: | <p>Abstract</p> <p>Background</p> <p>In this study we investigated the predictability of three thermodynamic quantities related to complex formation. As a model system we chose the host-guest complexes of <it>β</it>-cyclodextrin (<it>β</it>-CD) with different guest molecules. A training dataset comprised of 176 <it>β</it>-CD guest molecules with experimentally determined thermodynamic quantities was taken from the literature. We compared the performance of three different statistical regression methods – principal component regression (PCR), partial least squares regression (PLSR), and support vector machine regression combined with forward feature selection (SVMR/FSS) – with respect to their ability to generate predictive quantitative structure property relationship (QSPR) models for ΔG°, ΔH° and ΔS° on the basis of computed molecular descriptors.</p> <p>Results</p> <p>We found that SVMR/FFS marginally outperforms PLSR and PCR in the prediction of Δ<it>G°</it>, with PLSR performing slightly better than PCR. PLSR and PCR proved to be more stable in a nested cross-validation protocol. Whereas Δ<it>G° </it>can be predicted in good agreement with experimental values, none of the methods led to comparably good predictive models for Δ<it>H°</it>. In using the methods outlined in this study, we found that Δ<it>S° </it>appears almost unpredictable. In order to understand the differences in the ease of predicting the quantities, we performed a detailed analysis. As a result we can show that free energies are less sensitive (than enthalpy or entropy) to the small structural variations of guest molecules. This property, as well as the lower sensitivity of Δ<it>G° </it>to experimental conditions, are possible explanations for its greater predictability.</p> <p>Conclusion</p> <p>This study shows that the ease of predicting Δ<it>G° </it>cannot be explained by the predictability of either Δ<it>H° </it>or ΔS°. Our analysis suggests that the poor predictability of <it>TΔS° </it>and, to a lesser extent, Δ<it>H° </it>has to do with a stronger dependence of these quantities on the structural details of the complex and only to a lesser extent on experimental error.</p>
|