Advances in Bayesian inference and stable optimization for large-scale machine learning problems

A core task in machine learning, and the topic of this thesis, is developing faster and more accurate methods of posterior inference in probabilistic models. The thesis has two components. The first explores using deterministic methods to improve the efficiency of Markov Chain Monte Carlo (MCMC) alg...

Full description

Bibliographic Details
Main Author: Fagan, Francois Johannes
Language:English
Published: 2019
Subjects:
Online Access:https://doi.org/10.7916/D8RF7BZP
Description
Summary:A core task in machine learning, and the topic of this thesis, is developing faster and more accurate methods of posterior inference in probabilistic models. The thesis has two components. The first explores using deterministic methods to improve the efficiency of Markov Chain Monte Carlo (MCMC) algorithms. We propose new MCMC algorithms that can use deterministic methods as a “prior” to bias MCMC proposals to be in areas of high posterior density, leading to highly efficient sampling. In Chapter 2 we develop such methods for continuous distributions, and in Chapter 3 for binary distributions. The resulting methods consistently outperform existing state-of-the-art sampling techniques, sometimes by several orders of magnitude. Chapter 4 uses similar ideas as in Chapters 2 and 3, but in the context of modeling the performance of left-handed players in one-on-one interactive sports. The second part of this thesis explores the use of stable stochastic gradient descent (SGD) methods for computing a maximum a posteriori (MAP) estimate in large-scale machine learning problems. In Chapter 5 we propose two such methods for softmax regression. The first is an implementation of Implicit SGD (ISGD), a stable but difficult to implement SGD method, and the second is a new SGD method specifically designed for optimizing a double-sum formulation of the softmax. Both methods comprehensively outperform the previous state-of-the-art on seven real world datasets. Inspired by the success of ISGD on the softmax, we investigate its application to neural networks in Chapter 6. In this chapter we present a novel layer-wise approximation of ISGD that has efficiently computable updates. Experiments show that the resulting method is more robust to high learning rates and generally outperforms standard backpropagation on a variety of tasks.