Optimization for Probabilistic Machine Learning

We have access to great variety of datasets more than any time in the history. Everyday, more data is collected from various natural resources and digital platforms. Great advances in the area of machine learning research in the past few decades have relied strongly on availability of these datasets...

Full description

Bibliographic Details
Main Author: Fazelnia, Ghazal
Language:English
Published: 2019
Subjects:
Online Access:https://doi.org/10.7916/d8-jm7k-2k98
Description
Summary:We have access to great variety of datasets more than any time in the history. Everyday, more data is collected from various natural resources and digital platforms. Great advances in the area of machine learning research in the past few decades have relied strongly on availability of these datasets. However, analyzing them imposes significant challenges that are mainly due to two factors. First, the datasets have complex structures with hidden interdependencies. Second, most of the valuable datasets are high dimensional and are largely scaled. The main goal of a machine learning framework is to design a model that is a valid representative of the observations and develop a learning algorithm to make inference about unobserved or latent data based on the observations. Discovering hidden patterns and inferring latent characteristics in such datasets is one of the greatest challenges in the area of machine learning research. In this dissertation, I will investigate some of the challenges in modeling and algorithm design, and present my research results on how to overcome these obstacles. Analyzing data generally involves two main stages. The first stage is designing a model that is flexible enough to capture complex variation and latent structures in data and is robust enough to generalize well to the unseen data. Designing an expressive and interpretable model is one of crucial objectives in this stage. The second stage involves training learning algorithm on the observed data and measuring the accuracy of model and learning algorithm. This stage usually involves an optimization problem whose objective is to tune the model to the training data and learn the model parameters. Finding global optimal or sufficiently good local optimal solution is one of the main challenges in this step. Probabilistic models are one of the best known models for capturing data generating process and quantifying uncertainties in data using random variables and probability distributions. They are powerful models that are shown to be adaptive and robust and can scale well to large datasets. However, most probabilistic models have a complex structure. Training them could become challenging commonly due to the presence of intractable integrals in the calculation. To remedy this, they require approximate inference strategies that often results in non-convex optimization problems. The optimization part ensures that the model is the best representative of data or data generating process. The non-convexity of an optimization problem take away the general guarantee on finding a global optimal solution. It will be shown later in this dissertation that inference for a significant number of probabilistic models require solving a non-convex optimization problem. One of the well-known methods for approximate inference in probabilistic modeling is variational inference. In the Bayesian setting, the target is to learn the true posterior distribution for model parameters given the observations and prior distributions. The main challenge involves marginalization of all the other variables in the model except for the variable of interest. This high-dimensional integral is generally computationally hard, and for many models there is no known polynomial time algorithm for calculating them exactly. Variational inference deals with finding an approximate posterior distribution for Bayesian models where finding the true posterior distribution is analytically or numerically impossible. It assumes a family of distribution for the estimation, and finds the closest member of that family to the true posterior distribution using a distance measure. For many models though, this technique requires solving a non-convex optimization problem that has no general guarantee on reaching a global optimal solution. This dissertation presents a convex relaxation technique for dealing with hardness of the optimization involved in the inference. The proposed convex relaxation technique is based on semidefinite optimization that has a general applicability to polynomial optimization problem. I will present theoretical foundations and in-depth details of this relaxation in this work. Linear dynamical systems represent the functionality of many real-world physical systems. They can describe the dynamics of a linear time-varying observation which is controlled by a controller unit with quadratic cost function objectives. Designing distributed and decentralized controllers is the goal of many of these systems, which computationally, results in a non-convex optimization problem. In this dissertation, I will further investigate the issues arising in this area and develop a convex relaxation framework to deal with the optimization challenges. Setting the correct number of model parameters is an important aspect for a good probabilistic model. If there are only a few parameters, model may lack capturing all the essential relations and components in the observations while too many parameters may cause significant complications in learning or overfit to the observations. Non-parametric models are suitable techniques to deal with this issue. They allow the model to learn the appropriate number of parameters to describe the data and make predictions. In this dissertation, I will present my work on designing Bayesian non-parametric models as powerful tools for learning representations of data. Moreover, I will describe the algorithm that we derived to efficiently train the model on the observations and learn the number of model parameters. Later in this dissertation, I will present my works on designing probabilistic models in combination with deep learning methods for representing sequential data. Sequential datasets comprise a significant portion of resources in the area of machine learning research. Designing models to capture dependencies in sequential datasets are of great interest and have a wide variety of applications in engineering, medicine and statistics. Recent advances in deep learning research has shown exceptional promises in this area. However, they lack interpretability in their general form. To remedy this, I will present my work on mixing probabilistic models with neural network models that results in better performance and expressiveness of the results.