Summary: | To assist technology advancements, it is important to continue the search for new materials. The stability of a crystal structures is closely connected to its formation energy. By calculating the formation energies of theoretical crystal structures it is possible to find new stable materials. However, the number of possible structures are so many that traditional methods relying on quantum mechanics, such as Density Functional Theory (DFT), require too much computational time to be viable in such a project. A presented alternative to such calculations is machine learning. Machine learning is an umbrella term for algorithms that can use information gained from one set of data to predict properties of new, similar data. Feature vector representations (descriptors) are used to present data in an appropriate manner to the machine. Thus far, no combination of machine learning method and feature vector representation has been established as general and accurate enough to be of practical use for accelerating the phase diagram calculations necessary for predicting material stability. It is important that the method predicts all types of structures equally well, regardless of stability, composition, or geometrical structure. In this thesis, the performances of different feature vector representations were compared to each other. The machine learning method used was primarily Kernel Ridge Regression, implemented in Python. The training and validation were performed on two different datasets and subsets of these. The representation which consistently yielded the lowest cross-validated error was a representation using the Voronoi tessellation of the structure by Ward et. al. [Phys. Rev. B 96, 024104 (2017)]. Following up was an experimental representation called the SLATM representation presented by Huang and von Lilienfeld [arXiv:1707.04146], which is partially based on the Radial Distribution Function. The Voronoi representation achieved an MAE of 0.16 eV/atom at 3534 training set size for one of the sets, and 0.28 eV/atom at 10086 training set size for the other set. The effect of separating linear and non-linear energy contributions was evaluated using the sinusoidal and Coulomb representations. The result was that separating these improved the error for small training set sizes, but the effect diminishes as the training set size increases. The results from this thesis implicate that further work is still required for machine learning to be used effectively in the search for new materials.
|