Summary: | Molecular dynamics (MD) simulations enable the prediction of structure and function of biomacromolecules. For example, the activity of small drug molecules can be ranked by molecular docking thereby reducing the number of candidates. To date, several popular empirical force fields are available for MD simulations, such as CHARMM, AMBER, OPLS, and GROMOS. These force fields are commonly used but limited to relatively low accuracy in their predictions as all these force fields compute the atom-atom electrostatic interaction energy using point charges. It is well known that the non-bonded interaction energy, in particular the electrostatic energy, plays a crucial role in determing the structure of proteins, including their docking regions. Therefore, the electrostatic energy is one of the most important parts to be correctly predicted in any force field. As a result, improvements in a force field are critical to obtain more reliable protein simulations. In our work, a novel force field based on Quantum Chemical Topology (QCT) has been actively developed in order to increase the accuracy of the simulations. Currently this force field is referred to as QCTFF. QCTFF calculates the atom-atom electrostatic energy based on the atomic multipole moments instead of point charges. The smallest natural protein, crambin, is used as a pilot system to compare the multipole moment electrostatic interaction energy with the point charge energy. The atom-atom electrostatic interaction energy calculated based on atomic multipole moments is remarkably different from the values calculated by point charges. There are in total fifteen possible atom-atom interactions, including HH, HC, HO, HN, HS, CC, CO, CN, CS, OO, ON, OS, NN, NS, SS, typically occurring in proteins. In order to study the convergence of the multipole expansion behind QCTFF, the minimum internuclear distance in the convergent region for each of the fifteen atom-atom interactions are calculated. The levels of theory used are HF/6-31G(d,p), B3LYP/aug-cc-pVTZ and MP2/aug-cc-pVTZ. This study determines which multipole moment rank is 14 required to obtain the electrostatic energy between two atoms, at a given distance, to a given energy accuracy. In addition, we define an atom type for each element C, H, O, N and S. The element in question is taken as the central atom and a “horizon sphere” centred at its nucleus is systematically enlarged to find out at which radius the atomic multipole moments are no further influenced by atoms entering the growing sphere. All 20 natural amino acids (AAs), from which any protein can be built, are studied in the construction of QCTFF. The minimum energy geometries of each amino acid are optimised and computed at HF/6-31G(d,p), B3LYP/apc-1 and MP2/cc-pVDZ levels of theory. The number of minima of the 20 AAs increased with the size and the flexibility of the side-chain. The multipole moments of the atoms in the five-atom fragment [-NH-Cα-C(=O)]- varied according to the same trends in all three levels of theory. The properties, including bond length and bond angles, of the common seven-atom fragment [-NH-CαH(Cβ)-C(=O)-] (in all but proline and glycine) are investigated to probe the transferability between different AAs. A machine learning method called Kriging is applied to carry out the prediction of the atomic multipole moments in QCTFF. To build the Kriging models, the minima of serine are used to produce a sufficient number of geometries following the distortion procedure of both the normal mode and the redundant internal coordinate approaches. Ultimately, a thousand geometries are used in the training set to build Kriging models and the prediction reliability of the models is tested by the remaining thousand geometries in the test set. The same three levels of theory as before are applied to calculate the wave functions of the distorted geometries using GAUSSIAN03. Moreover, a large number of serine geometries are also extracted from proteins in the Protein Data Bank (PDB). The absolute average error of the total energy predicted by the Kriging models built from these geometries is slightly worse as the range of the geometries is much wider than the geometries produced by the two distortion approaches. Finally, a biologically relevant reaction is studied via the well-known peptide bond hydrolysis enzyme HIV-1 protease. Two types of reaction systems are studied: a small system and a large system. The object of focus is the hydrogen atom in the catalytic water molecule, for which the coordinates change most during the reaction process in both reaction systems. For the small reaction system, the intrinsic reaction coordinate (IRC) method is used to generate 295 different geometries. The Kriging models built from 180 of the 295 geometries can correctly predict both atomic multipole moments and the total interaction energies for the remaining geometries in the test set. For the large reaction system, instead of using the IRC method the motion of the hydrogen atom in question is 15 scanned during the reaction. Finally, 265 geometries are obtained and 180 of these geometries are used to build the Kriging models. The absolute errors of the multipole moments in the test set are all very small.
|