In Pursuit of Local Correlation for Reduced-Scaling Electronic Structure Methods in Molecules and Periodic Solids

Over the course of the last century, electronic structure theory (or, alternatively, computational quantum chemistry) has grown from being a fledgling field to being a ``full partner with experiment" [Goddard textit{Science} textbf{1985}, textit{227} (4689), 917--923]. Numerous instances of...

Full description

Bibliographic Details
Main Author: Clement, Marjory Carolena
Other Authors: Chemistry
Format: Others
Published: Virginia Tech 2021
Subjects:
Online Access:http://hdl.handle.net/10919/104588
Description
Summary:Over the course of the last century, electronic structure theory (or, alternatively, computational quantum chemistry) has grown from being a fledgling field to being a ``full partner with experiment" [Goddard textit{Science} textbf{1985}, textit{227} (4689), 917--923]. Numerous instances of theory matching experiment to very high accuracy abound, with one excellent example being the high-accuracy textit{ab initio} thermochemical data laid out in the 2004 work of Tajti and co-workers [Tajti et al. textit{J. Chem. Phys.} textbf{2004}, textit{121}, 11599] and another being the heats of formation and molecular structures computed by Feller and co-workers in 2008 [Feller et al. textit{J. Chem. Phys.} textbf{2008}, textit{129}, 204105]. But as the authors of both studies point out, this very high accuracy comes at a very high cost. In fact, at this point in time, electronic structure theory does not suffer from an accuracy problem (as it did in its early days) but a cost problem; or, perhaps more precisely, it suffers from an accuracy-to-cost ratio problem. We can compute electronic energies to nearly any precision we like, textit{as long as we are willing to pay the associated cost}. And just what are these high computational costs? For the purposes of this work, we are primarily concerned with the way in which the computational cost of a given method scales with the system size; for notational purposes, we will often introduce a parameter, $N$, that is proportional to the system size. In the case of Hartree-Fock, a one-body wavefunction-based method, the scaling is formally $N^4$, and post-Hartree-Fock methods fare even worse. The coupled cluster singles, doubles, and perturbative triples method [CCSD(T)], which is frequently referred to as the ``gold standard" of quantum chemistry, has an $N^7$ scaling, making it inapplicable to many systems of real-world import. If highly accurate correlated wavefunction methods are to be applied to larger systems of interest, it is crucial that we reduce their computational scaling. One very successful means of doing this relies on the fact that electron correlation is fundamentally a local phenomenon, and the recognition of this fact has led to the development of numerous local implementations of conventional many-body methods. One such method, the DLPNO-CCSD(T) method, was successfully used to calculate the energy of the protein crambin [Riplinger, et al. textit{J. Chem. Phys.} textbf{2013}, textit{139}, 134101]. In the following work, we discuss how the local nature of electron correlation can be exploited, both in terms of the occupied orbitals and the unoccupied (or virtual) orbitals. In the case of the former, we highlight some of the historical developments in orbital localization before applying orbital localization robustly to infinite periodic crystalline systems [Clement, et al. textbf{2021}, textit{Submitted to J. Chem. Theory Comput.}]. In the case of the latter, we discuss a number of different ways in which the virtual space can be compressed before presenting our pioneering work in the area of iteratively-optimized pair natural orbitals (``iPNOs") [Clement, et al. textit{J. Chem. Theory Comput.} textbf{2018}, textit{14} (9), 4581--4589]. Concerning the iPNOs, we were able to recover significant accuracy with respect to traditional PNOs (which are unchanged throughout the course of a correlated calculation) at a comparable truncation level, indicating that our improved PNOs are, in fact, an improved representation of the coupled cluster doubles amplitudes. For example, when studying the percent errors in the absolute correlation energies of a representative sample of weakly bound dimers chosen from the S66 test suite [v{R}ez'{a}c, et al. textit{J. Chem. Theory Comput.} textbf{2011}, textit{7} (8), 2427--2438], we found that our iPNO-CCSD scheme outperformed the standard PNO-CCSD scheme at every truncation threshold ($tpno$) studied. Both PNO-based methods were compared to the canonical CCSD method, with the iPNO-CCSD method being, on average, 1.9 times better than the PNO-CCSD method at $tpno = 10^{-7}$ and more than an order of magnitude better for $tpno < 10^{-10}$ [Clement, et al. textit{J. Chem. Theory Comput.} textbf{2018}, textit{14} (9), 4581--4589]. When our improved PNOs are combined with the PNO-incompleteness correction proposed by Neese and co-workers [Neese, et al. textit{J. Chem. Phys.} textbf{2009}, textit{130}, 114108; Neese, et al. textit{J. Chem. Phys.} textbf{2009}, textit{131}, 064103], the results are truly astounding. For a truncation threshold of $tpno = 10^{-6}$, the mean average absolute error in binding energy for all 66 dimers from the S66 test set was 3 times smaller when the incompleteness-corrected iPNO-CCSD method was used relative to the incompleteness-corrected PNO-CCSD method [Clement, et al. textit{J. Chem. Theory Comput.} textbf{2018}, textit{14} (9), 4581--4589]. In the latter half of this work, we present our implementation of a limited-memory Broyden-Fletcher-Goldfarb-Shanno (BFGS) based Pipek-Mezey Wannier function (PMWF) solver [Clement, et al. textbf{2021}, textit{Submitted to J. Chem. Theory Comput.}]. Although orbital localization in the context of the linear combination of atomic orbitals (LCAO) representation of periodic crystalline solids is not new [Marzari, et al. textit{Rev. Mod. Phys.} textbf{2012}, textit{84} (4), 1419--1475; J`{o}nsson, et al. textit{J. Chem. Theory Comput.} textbf{2017}, textit{13} (2), 460--474], to our knowledge, this is the first implementation to be based on a BFGS solver. In addition, we are pleased to report that our novel BFGS-based solver is extremely robust in terms of the initial guess and the size of the history employed, with the final results and the time to solution, as measured in number of iterations required, being essentially independent of these initial choices. Furthermore, our BFGS-based solver converges much more quickly and consistently than either a steepest ascent (SA) or a non-linear conjugate gradient (CG) based solver, with this fact demonstrated for a number of 1-, 2-, and 3-dimensional systems. Armed with our real, localized Wannier functions, we are now in a position to pursue the application of local implementations of correlated many-body methods to the arena of periodic crystalline solids; a first step toward this goal will, most likely, be the study of PNOs, both conventional and iteratively-optimized, in this context. === Doctor of Philosophy === Increasingly, the study of chemistry is moving from the traditional wet lab to the realm of computers. The physical laws that govern the behavior of chemical systems, along with the corresponding mathematical expressions, have long been known. Rapid growth in computational technology has made solving these equations, at least in an approximate manner, relatively easy for a large number of molecular and solid systems. That the equations must be solved approximately is an unfortunate fact of life, stemming from the mathematical structure of the equations themselves, and much effort has been poured into developing better and better approximations, each trying to balance an acceptable level of accuracy loss with a realistic level of computational cost and complexity. But though there has been much progress in developing approximate computational chemistry methods, there is still great work to be done. textit{Many} chemical systems of real-world import (particularly biomolecules and potential pharmaceuticals) are simply too large to be treated with any methods that consistently deliver acceptable accuracy. As an example of the difficulties that come with trying to apply accurate computational methods to systems of interest, consider the seminal 2013 work of Riplinger and co-workers [Riplinger, et al. textit{J. Chem. Phys.} textbf{2013}, textit{139}, 134101]. In this paper, they present the results of a calculation performed on the protein crambin. The method used was DLPNO-CCSD(T), an approximation to the ``gold standard" computational method CCSD(T). The acronym DLPNO-CCSD(T) stands for ``domain-based local pair natural orbital coupled cluster with singles, doubles, and perturbative triples." In essence, this method exploits the fact that electron-electron interactions (``electron correlation") are a short-range phenomenon in order to represent the system in a mathematically more compact way. This focus on the locality of electron correlation is a crucial piece in the effort to bring down computational cost. When talking about computational cost, we will often talk about how the cost scales with the approximate system size $N$. In the case of CCSD(T), the cost scales as $N^{7}$. To see what this means, consider two chemical systems textit{A} and textit{B}. If system textit{B} is twice as large as system textit{A}, then the same calculation run on both systems will take $2^{7} = 128$ times longer on system textit{B} than on system textit{A}. The DLPNO-CCSD(T) method, on the other hand, scales linearly with the system size, provided the system is sufficiently large (we say that it is ``asymptotically linearly scaling"), and so, for our example systems textit{A} and textit{B}, the calculation run on system textit{B} should only take twice as long as the calculation run on system textit{A}. But despite the favorable scaling afforded by the DLPNO-CCSD(T) method, the time to solution is still prohibitive. In the case of crambin, a relatively small protein with 644 atoms, the calculation took a little over 30 days. Clearly, such timescales are unworkable for the field of biochemical research, where the focus is often on the interactions between multiple proteins or other large biomolecules and where many more data points are required. In the work that follows, we discuss in more detail the genesis of the high costs that are associated with highly accurate computational methods, as well as some of the approximation techniques that have already been employed, with an emphasis on local correlation techniques. We then build off this foundation to discuss our own work and how we have extended such approximation techniques in an attempt to further increase the possible accuracy to cost ratio. In particular, we discuss how iteratively-optimized pair natural orbitals (the PNOs of the DLPNO-CCSD(T) method) can provide a more accurate but also more compact mathematical representation of the system relative to static PNOs [Clement, et al. textit{J. Chem. Theory Comput.} textbf{2018}, textit{14} (9), 4581--4589]. Additionally, we turn our attention to the problem of periodic infinite crystalline systems, a class of materials less commonly studied in the field of computational chemistry, and discuss how the local correlation techniques that have already been applied with great success to molecular systems can potentially be applied in this domain as well [Clement, et al. textbf{2021}, textit{Submitted to J. Chem. Theory Comput.}].