Summary: | Over the course of the last century, electronic structure theory (or, alternatively,
computational quantum chemistry) has grown from being a fledgling field to
being a ``full partner with experiment" [Goddard textit{Science} textbf{1985},
textit{227} (4689), 917--923]. Numerous instances of theory matching experiment
to very high accuracy abound, with one excellent example being the
high-accuracy textit{ab initio} thermochemical data laid out in the 2004
work of Tajti and co-workers [Tajti et al. textit{J. Chem. Phys.} textbf{2004},
textit{121}, 11599] and another being the heats of formation and molecular structures computed by Feller and co-workers in 2008 [Feller et al. textit{J. Chem. Phys.} textbf{2008}, textit{129}, 204105]. But as the authors of both studies point out, this very high accuracy
comes at a very high cost. In fact, at this point in time, electronic structure
theory does not suffer from an accuracy problem (as it did in its early days)
but a cost problem; or, perhaps more precisely, it suffers from an
accuracy-to-cost ratio problem. We can
compute electronic energies to nearly any precision we like, textit{as long
as we are willing to pay the associated cost}.
And just what are these high computational costs? For the purposes of this
work, we are primarily concerned with the way in which the computational
cost of a given method scales with the system size; for notational purposes,
we will often introduce a parameter, $N$, that is proportional to the system size.
In the case of Hartree-Fock, a one-body wavefunction-based method,
the scaling is formally $N^4$, and post-Hartree-Fock methods fare even
worse. The coupled cluster singles, doubles, and perturbative
triples method [CCSD(T)], which is frequently referred to as
the ``gold standard" of quantum chemistry, has an $N^7$ scaling, making
it inapplicable to many systems of real-world import.
If highly accurate correlated wavefunction methods are to be applied
to larger systems of interest, it is crucial that we reduce their
computational scaling. One very successful means of doing this relies
on the fact that electron correlation is fundamentally a local
phenomenon, and the recognition of this fact has led to the development
of numerous local implementations of conventional many-body methods.
One such method, the DLPNO-CCSD(T) method, was successfully used
to calculate the energy of the protein crambin [Riplinger, et al.
textit{J. Chem. Phys.} textbf{2013}, textit{139}, 134101].
In the following work, we discuss how the local nature of electron
correlation can be exploited, both in terms of the occupied orbitals
and the unoccupied (or virtual) orbitals. In the case of the former,
we highlight some of the historical developments in orbital localization
before applying orbital localization robustly to infinite periodic
crystalline systems [Clement, et al. textbf{2021}, textit{Submitted to
J. Chem. Theory Comput.}].
In the case of the latter, we discuss
a number of different ways in which the virtual space can be
compressed before presenting our pioneering work in the area
of iteratively-optimized pair natural orbitals (``iPNOs") [Clement, et al.
textit{J. Chem. Theory Comput.} textbf{2018}, textit{14} (9), 4581--4589].
Concerning the iPNOs, we were able to recover significant accuracy
with respect to traditional PNOs (which are unchanged throughout the course
of a correlated calculation) at a comparable truncation level, indicating
that our improved PNOs are, in fact, an improved representation
of the coupled cluster doubles amplitudes. For example, when studying
the percent errors in the absolute correlation energies of a representative
sample of weakly bound dimers chosen from the S66 test suite
[v{R}ez'{a}c, et al. textit{J. Chem. Theory Comput.}
textbf{2011}, textit{7} (8), 2427--2438], we found that our iPNO-CCSD
scheme outperformed the standard PNO-CCSD scheme at every truncation
threshold ($tpno$) studied.
Both PNO-based methods were compared to the canonical CCSD
method, with the iPNO-CCSD method being, on average, 1.9 times
better than the PNO-CCSD method at $tpno = 10^{-7}$ and more than
an order of magnitude better for $tpno < 10^{-10}$ [Clement, et al.
textit{J. Chem. Theory Comput.} textbf{2018}, textit{14} (9), 4581--4589].
When our improved PNOs are combined with the PNO-incompleteness
correction proposed by Neese and co-workers
[Neese, et al. textit{J. Chem. Phys.} textbf{2009},
textit{130}, 114108; Neese, et al. textit{J. Chem. Phys.} textbf{2009},
textit{131}, 064103],
the results are truly astounding. For a truncation threshold of
$tpno = 10^{-6}$, the mean average absolute error in binding energy for
all 66 dimers from the S66 test set was 3 times smaller when
the incompleteness-corrected iPNO-CCSD method was used
relative to the incompleteness-corrected PNO-CCSD method
[Clement, et al.
textit{J. Chem. Theory Comput.} textbf{2018}, textit{14} (9), 4581--4589].
In the latter half of this work, we present our implementation
of a limited-memory Broyden-Fletcher-Goldfarb-Shanno (BFGS) based Pipek-Mezey
Wannier function (PMWF) solver [Clement, et al. textbf{2021}, textit{Submitted
to J. Chem. Theory Comput.}].
Although orbital localization in the context of the linear combination
of atomic orbitals (LCAO) representation of periodic crystalline solids
is not new [Marzari, et al. textit{Rev. Mod. Phys.}
textbf{2012}, textit{84} (4), 1419--1475;
J`{o}nsson, et al. textit{J. Chem. Theory Comput.}
textbf{2017}, textit{13} (2), 460--474], to our knowledge, this
is the first implementation to be based on a BFGS solver. In addition,
we are pleased to report that our novel BFGS-based solver is extremely
robust in terms of the initial guess and the size of the history
employed, with the final results and the time to solution, as measured
in number of iterations required, being essentially independent
of these initial choices. Furthermore, our BFGS-based solver
converges much more quickly and consistently than either a
steepest ascent (SA) or a non-linear conjugate gradient (CG) based solver,
with this fact demonstrated for a number of 1-, 2-, and 3-dimensional
systems. Armed with our real, localized Wannier functions, we are
now in a position to pursue the application of local implementations
of correlated many-body methods to the arena of periodic crystalline
solids; a first step toward this goal will, most likely, be
the study of PNOs, both conventional and iteratively-optimized,
in this context. === Doctor of Philosophy === Increasingly, the study of chemistry is moving from the traditional
wet lab to the realm of computers. The physical laws that govern the
behavior of chemical systems, along with the corresponding
mathematical expressions, have long been known. Rapid growth
in computational technology has made solving these equations, at least
in an approximate manner, relatively easy for a large number of
molecular and solid systems. That the equations must be solved
approximately is an unfortunate fact of life, stemming from
the mathematical structure of the equations themselves, and much
effort has been poured into developing better and better approximations,
each trying to balance an acceptable level of accuracy loss with
a realistic level of computational cost and complexity.
But though there has been much progress in developing approximate
computational chemistry methods, there is still great work to be
done. textit{Many} chemical systems of real-world import (particularly
biomolecules and potential pharmaceuticals) are simply too large to
be treated with any methods that consistently deliver acceptable accuracy.
As an example of the difficulties that come with trying to apply accurate
computational methods to systems of interest, consider the seminal 2013 work
of Riplinger and co-workers [Riplinger, et al. textit{J. Chem. Phys.}
textbf{2013}, textit{139}, 134101]. In this paper, they present the
results of a calculation performed on the protein crambin.
The method used was DLPNO-CCSD(T), an approximation to the
``gold standard" computational method CCSD(T). The acronym
DLPNO-CCSD(T) stands for ``domain-based local pair natural orbital
coupled cluster with singles, doubles, and perturbative triples."
In essence, this method exploits the fact that electron-electron
interactions (``electron correlation") are a short-range
phenomenon in order to represent
the system in a mathematically more compact way. This focus
on the locality of electron correlation is a crucial piece in
the effort to bring down computational cost.
When talking about computational cost, we will often talk about
how the cost scales with the approximate system size $N$.
In the case of CCSD(T), the cost scales as $N^{7}$. To see what
this means, consider two chemical systems textit{A} and textit{B}.
If system textit{B} is twice as large as system textit{A},
then the same calculation run on both systems will take $2^{7} = 128$
times longer on system textit{B} than on system textit{A}.
The DLPNO-CCSD(T) method, on the other hand,
scales linearly with the system size, provided the system is sufficiently
large (we say that it is ``asymptotically linearly scaling"),
and so, for our example systems textit{A} and textit{B},
the calculation run on system textit{B} should only take
twice as long as the calculation run on system textit{A}.
But despite the favorable
scaling afforded by the DLPNO-CCSD(T) method, the time to solution
is still prohibitive.
In the case of crambin, a relatively small protein with 644 atoms,
the calculation took a little over 30 days. Clearly, such timescales
are unworkable for the field of biochemical research, where
the focus is often on the interactions between multiple proteins
or other large biomolecules and where many more data points are required.
In the work that follows, we discuss in more detail the genesis
of the high costs that are associated with highly accurate computational
methods, as well as some of the approximation techniques that have
already been employed, with an emphasis on local correlation
techniques. We then build off this foundation to discuss
our own work and how we have extended such approximation techniques
in an attempt to further increase the possible accuracy to cost ratio.
In particular, we discuss how iteratively-optimized pair natural orbitals
(the PNOs of the DLPNO-CCSD(T) method) can provide a more accurate
but also more compact mathematical representation of the system
relative to static PNOs
[Clement, et al. textit{J. Chem. Theory Comput.} textbf{2018},
textit{14} (9), 4581--4589].
Additionally, we turn our attention to the problem of periodic infinite
crystalline systems, a class of materials less commonly studied in
the field of computational chemistry, and discuss how the local correlation
techniques that have already been applied with great success to molecular
systems can potentially be applied in this domain as well
[Clement, et al. textbf{2021}, textit{Submitted to J. Chem. Theory Comput.}].
|