Summary: | The objective of the study was to find a computational procedure to normalize solubility data determined at various temperatures (e.g., 10 – 50 oC) to values at a “reference” temperature (e.g., 25 °C). A simple procedure was devised to predict enthalpies of solution, ΔHsol, from which the temperature dependence of intrinsic (uncharged form) solubility, log S0, could be calculated. As dependent variables, values of ΔHsol at 25 °C were subjected to multiple linear regression (MLR) analysis, using melting points (mp) and Abraham solvation descriptors. Also, the enthalpy data were subjected to random forest regression (RFR) and recursive partition tree (RPT) analyses. A total of 626 molecules were examined, drawing on 2040 published solubility values measured at various temperatures, along with 77 direct calori metric measurements. The three different prediction methods (RFR, RPT, MLR) all indicated that the estimated standard deviations in the enthalpy data are 11-15 kJ mol-1, which is concordant with the 10 kJ mol-1 propagation error estimated from solubility measurements (assuming 0.05 log S errors), and consistent with the 7 kJ mol-1 average reproducibility in enthalpy values from interlaboratory replicates. According to the MLR model, higher values of mp, H‑bond acidity, polarizability/dipolarity, and dispersion forces relate to more positive (endothermic) enthalpy values. However, molecules that are large and have high H-bond basicity are likely to possess negative (exothermic) enthalpies of solution. With log S0 values normalized to 25 oC, it was shown that the interlaboratory average standard deviations in solubility measurement are reduced to 0.06 ‑ 0.17 log unit, with higher errors for the least-soluble druglike molecules. Such improvements in data mining are expected to contribute to more reliable in silico prediction models of solubility for use in drug discovery.
|