New Optimization Methods for Modern Machine Learning

Modern machine learning systems pose several new statistical, scalability, privacy and ethical challenges. With the advent of massive datasets and increasingly complex tasks, scalability has especially become a critical issue in these systems. In this thesis, we focus on fundamental challenges relat...

Full description

Bibliographic Details
Main Author:	Reddi, Sashank Jakkam
Format:	Others
Published:	Research Showcase @ CMU 2017
Subjects:	Machine Learning Optimization Large-scale Distributed optimization Communication-efficient Finite-sum
Online Access:	http://repository.cmu.edu/dissertations/1116 http://repository.cmu.edu/cgi/viewcontent.cgi?article=2155&context=dissertations

id	ndltd-cmu.edu-oai-repository.cmu.edu-dissertations-2155
record_format	oai_dc
spelling	ndltd-cmu.edu-oai-repository.cmu.edu-dissertations-21552018-01-26T03:22:01Z New Optimization Methods for Modern Machine Learning Reddi, Sashank Jakkam Modern machine learning systems pose several new statistical, scalability, privacy and ethical challenges. With the advent of massive datasets and increasingly complex tasks, scalability has especially become a critical issue in these systems. In this thesis, we focus on fundamental challenges related to scalability, such as computational and communication efficiency, in modern machine learning applications. The underlying central message of this thesis is that classical statistical thinking leads to highly effective optimization methods for modern big data applications. The first part of the thesis investigates optimization methods for solving large-scale nonconvex Empirical Risk Minimization (ERM) problems. Such problems have surged into prominence, notably through deep learning, and have led to exciting progress. However, our understanding of optimization methods suitable for these problems is still very limited. We develop and analyze a new line of optimization methods for nonconvex ERM problems, based on the principle of variance reduction. We show that our methods exhibit fast convergence to stationary points and improve the state-of-the-art in several nonconvex ERM settings, including nonsmooth and constrained ERM. Using similar principles, we also develop novel optimization methods that provably converge to second-order stationary points. Finally, we show that the key principles behind our methods can be generalized to overcome challenges in other important problems such as Bayesian inference. The second part of the thesis studies two critical aspects of modern distributed machine learning systems — asynchronicity and communication efficiency of optimization methods. We study various asynchronous stochastic algorithms with fast convergence for convex ERM problems and show that these methods achieve near-linear speedups in sparse settings common to machine learning. Another key factor governing the overall performance of a distributed system is its communication efficiency. Traditional optimization algorithms used in machine learning are often ill-suited for distributed environments with high communication cost. To address this issue, we dis- cuss two different paradigms to achieve communication efficiency of algorithms in distributed environments and explore new algorithms with better communication complexity. 2017-07-01T07:00:00Z text application/pdf http://repository.cmu.edu/dissertations/1116 http://repository.cmu.edu/cgi/viewcontent.cgi?article=2155&context=dissertations Dissertations Research Showcase @ CMU Machine Learning Optimization Large-scale Distributed optimization Communication-efficient Finite-sum
collection	NDLTD
format	Others
sources	NDLTD
topic	Machine Learning Optimization Large-scale Distributed optimization Communication-efficient Finite-sum
spellingShingle	Machine Learning Optimization Large-scale Distributed optimization Communication-efficient Finite-sum Reddi, Sashank Jakkam New Optimization Methods for Modern Machine Learning
description	Modern machine learning systems pose several new statistical, scalability, privacy and ethical challenges. With the advent of massive datasets and increasingly complex tasks, scalability has especially become a critical issue in these systems. In this thesis, we focus on fundamental challenges related to scalability, such as computational and communication efficiency, in modern machine learning applications. The underlying central message of this thesis is that classical statistical thinking leads to highly effective optimization methods for modern big data applications. The first part of the thesis investigates optimization methods for solving large-scale nonconvex Empirical Risk Minimization (ERM) problems. Such problems have surged into prominence, notably through deep learning, and have led to exciting progress. However, our understanding of optimization methods suitable for these problems is still very limited. We develop and analyze a new line of optimization methods for nonconvex ERM problems, based on the principle of variance reduction. We show that our methods exhibit fast convergence to stationary points and improve the state-of-the-art in several nonconvex ERM settings, including nonsmooth and constrained ERM. Using similar principles, we also develop novel optimization methods that provably converge to second-order stationary points. Finally, we show that the key principles behind our methods can be generalized to overcome challenges in other important problems such as Bayesian inference. The second part of the thesis studies two critical aspects of modern distributed machine learning systems — asynchronicity and communication efficiency of optimization methods. We study various asynchronous stochastic algorithms with fast convergence for convex ERM problems and show that these methods achieve near-linear speedups in sparse settings common to machine learning. Another key factor governing the overall performance of a distributed system is its communication efficiency. Traditional optimization algorithms used in machine learning are often ill-suited for distributed environments with high communication cost. To address this issue, we dis- cuss two different paradigms to achieve communication efficiency of algorithms in distributed environments and explore new algorithms with better communication complexity.
author	Reddi, Sashank Jakkam
author_facet	Reddi, Sashank Jakkam
author_sort	Reddi, Sashank Jakkam
title	New Optimization Methods for Modern Machine Learning
title_short	New Optimization Methods for Modern Machine Learning
title_full	New Optimization Methods for Modern Machine Learning
title_fullStr	New Optimization Methods for Modern Machine Learning
title_full_unstemmed	New Optimization Methods for Modern Machine Learning
title_sort	new optimization methods for modern machine learning
publisher	Research Showcase @ CMU
publishDate	2017
url	http://repository.cmu.edu/dissertations/1116 http://repository.cmu.edu/cgi/viewcontent.cgi?article=2155&context=dissertations
work_keys_str_mv	AT reddisashankjakkam newoptimizationmethodsformodernmachinelearning
_version_	1718611853219725312

New Optimization Methods for Modern Machine Learning

Similar Items