Newton Methods For Machine Learning

博士 === 國立臺灣大學 === 資訊工程學研究所 === 105 === Newton methods can be applied in many supervised learning approaches. However, for large-scale data, the use of the whole Hessian matrix can be time consuming. In this thesis we aim to make Newton methods practically viable for various large-scale scenarios. Th...

Full description

Bibliographic Details
Main Authors:	Chien-Chih Wang, 王建智
Other Authors:	Chih-Jen Lin
Format:	Others
Language:	zh-TW
Published:	2016
Online Access:	http://ndltd.ncl.edu.tw/handle/54980842511743765214

id	ndltd-TW-105NTU05392016
record_format	oai_dc
spelling	ndltd-TW-105NTU053920162017-10-29T04:35:35Z http://ndltd.ncl.edu.tw/handle/54980842511743765214 Newton Methods For Machine Learning 牛頓法於機器學習之應用 Chien-Chih Wang 王建智博士國立臺灣大學資訊工程學研究所 105 Newton methods can be applied in many supervised learning approaches. However, for large-scale data, the use of the whole Hessian matrix can be time consuming. In this thesis we aim to make Newton methods practically viable for various large-scale scenarios. The first part of this thesis is about sub-sampled Newton methods. Recently, subsampled Newton methods have been proposed to reduce the computational time by using only a subset of data for calculating an approximation of the Hessian matrix. Unfortunately, we find that in some situations the running speed is worse than the standard Newton method because cheaper but less accurate search directions are used. In this thesis, we propose some novel techniques to improve the existing subsampled Hessian Newton method. The main idea is to solve a 2-dimensional sub-problem per iteration to adjust the search direction to better minimize the second-order approximation of the function value. We prove the theoretical convergence of the proposed method. Experiments on logistic regression, linear SVM, maximum entropy, and deep networks indicate that our techniques significantly reduce the running time of the subsampled Hessian Newton method. The resulting algorithm becomes a compelling alternative to the standard Newton method for large-scale data classification. Deep learning involves a difficult non-convex optimization problem because of the large number of weights between any two adjacent layers of a deep structure. In the large-scale scenarios, distributed training is needed, but the calculation of function, gradient, and Hessian is expensive. In particular, the communication and synchronization cost becomes the bottleneck. In this thesis, we propose a novel distributed Newton method for deep learning. First, to reduce the communication cost, we consider storing the Jacobian matrix in a distributed environment, and then propose a diagonalization method such that an approximate Newton direction can be obtained without communication between machines. Second, to reduce the synchronization cost, we terminate the process of finding an approximate Newton direction even though some nodes have not finished their tasks. Third, we consider subsampled Hessian for reducing running time as well as communication cost. Details of some implementation issues in distributed environments are thoroughly investigated. Experiments demonstrate that the proposed method is effective for the distributed training of deep neural networks. Chih-Jen Lin 林智仁 2016 學位論文 ; thesis 105 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	博士 === 國立臺灣大學 === 資訊工程學研究所 === 105 === Newton methods can be applied in many supervised learning approaches. However, for large-scale data, the use of the whole Hessian matrix can be time consuming. In this thesis we aim to make Newton methods practically viable for various large-scale scenarios. The first part of this thesis is about sub-sampled Newton methods. Recently, subsampled Newton methods have been proposed to reduce the computational time by using only a subset of data for calculating an approximation of the Hessian matrix. Unfortunately, we find that in some situations the running speed is worse than the standard Newton method because cheaper but less accurate search directions are used. In this thesis, we propose some novel techniques to improve the existing subsampled Hessian Newton method. The main idea is to solve a 2-dimensional sub-problem per iteration to adjust the search direction to better minimize the second-order approximation of the function value. We prove the theoretical convergence of the proposed method. Experiments on logistic regression, linear SVM, maximum entropy, and deep networks indicate that our techniques significantly reduce the running time of the subsampled Hessian Newton method. The resulting algorithm becomes a compelling alternative to the standard Newton method for large-scale data classification. Deep learning involves a difficult non-convex optimization problem because of the large number of weights between any two adjacent layers of a deep structure. In the large-scale scenarios, distributed training is needed, but the calculation of function, gradient, and Hessian is expensive. In particular, the communication and synchronization cost becomes the bottleneck. In this thesis, we propose a novel distributed Newton method for deep learning. First, to reduce the communication cost, we consider storing the Jacobian matrix in a distributed environment, and then propose a diagonalization method such that an approximate Newton direction can be obtained without communication between machines. Second, to reduce the synchronization cost, we terminate the process of finding an approximate Newton direction even though some nodes have not finished their tasks. Third, we consider subsampled Hessian for reducing running time as well as communication cost. Details of some implementation issues in distributed environments are thoroughly investigated. Experiments demonstrate that the proposed method is effective for the distributed training of deep neural networks.
author2	Chih-Jen Lin
author_facet	Chih-Jen Lin Chien-Chih Wang 王建智
author	Chien-Chih Wang 王建智
spellingShingle	Chien-Chih Wang 王建智 Newton Methods For Machine Learning
author_sort	Chien-Chih Wang
title	Newton Methods For Machine Learning
title_short	Newton Methods For Machine Learning
title_full	Newton Methods For Machine Learning
title_fullStr	Newton Methods For Machine Learning
title_full_unstemmed	Newton Methods For Machine Learning
title_sort	newton methods for machine learning
publishDate	2016
url	http://ndltd.ncl.edu.tw/handle/54980842511743765214
work_keys_str_mv	AT chienchihwang newtonmethodsformachinelearning AT wángjiànzhì newtonmethodsformachinelearning AT chienchihwang niúdùnfǎyújīqìxuéxízhīyīngyòng AT wángjiànzhì niúdùnfǎyújīqìxuéxízhīyīngyòng
_version_	1718558653888331776

Newton Methods For Machine Learning

Similar Items