Newton Methods For Machine Learning

博士 === 國立臺灣大學 === 資訊工程學研究所 === 105 === Newton methods can be applied in many supervised learning approaches. However, for large-scale data, the use of the whole Hessian matrix can be time consuming. In this thesis we aim to make Newton methods practically viable for various large-scale scenarios. Th...

Full description

Bibliographic Details
Main Authors: Chien-Chih Wang, 王建智
Other Authors: Chih-Jen Lin
Format: Others
Language:zh-TW
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/54980842511743765214
id ndltd-TW-105NTU05392016
record_format oai_dc
spelling ndltd-TW-105NTU053920162017-10-29T04:35:35Z http://ndltd.ncl.edu.tw/handle/54980842511743765214 Newton Methods For Machine Learning 牛頓法於機器學習之應用 Chien-Chih Wang 王建智 博士 國立臺灣大學 資訊工程學研究所 105 Newton methods can be applied in many supervised learning approaches. However, for large-scale data, the use of the whole Hessian matrix can be time consuming. In this thesis we aim to make Newton methods practically viable for various large-scale scenarios. The first part of this thesis is about sub-sampled Newton methods. Recently, subsampled Newton methods have been proposed to reduce the computational time by using only a subset of data for calculating an approximation of the Hessian matrix. Unfortunately, we find that in some situations the running speed is worse than the standard Newton method because cheaper but less accurate search directions are used. In this thesis, we propose some novel techniques to improve the existing subsampled Hessian Newton method. The main idea is to solve a 2-dimensional sub-problem per iteration to adjust the search direction to better minimize the second-order approximation of the function value. We prove the theoretical convergence of the proposed method. Experiments on logistic regression, linear SVM, maximum entropy, and deep networks indicate that our techniques significantly reduce the running time of the subsampled Hessian Newton method. The resulting algorithm becomes a compelling alternative to the standard Newton method for large-scale data classification. Deep learning involves a difficult non-convex optimization problem because of the large number of weights between any two adjacent layers of a deep structure. In the large-scale scenarios, distributed training is needed, but the calculation of function, gradient, and Hessian is expensive. In particular, the communication and synchronization cost becomes the bottleneck. In this thesis, we propose a novel distributed Newton method for deep learning. First, to reduce the communication cost, we consider storing the Jacobian matrix in a distributed environment, and then propose a diagonalization method such that an approximate Newton direction can be obtained without communication between machines. Second, to reduce the synchronization cost, we terminate the process of finding an approximate Newton direction even though some nodes have not finished their tasks. Third, we consider subsampled Hessian for reducing running time as well as communication cost. Details of some implementation issues in distributed environments are thoroughly investigated. Experiments demonstrate that the proposed method is effective for the distributed training of deep neural networks. Chih-Jen Lin 林智仁 2016 學位論文 ; thesis 105 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 博士 === 國立臺灣大學 === 資訊工程學研究所 === 105 === Newton methods can be applied in many supervised learning approaches. However, for large-scale data, the use of the whole Hessian matrix can be time consuming. In this thesis we aim to make Newton methods practically viable for various large-scale scenarios. The first part of this thesis is about sub-sampled Newton methods. Recently, subsampled Newton methods have been proposed to reduce the computational time by using only a subset of data for calculating an approximation of the Hessian matrix. Unfortunately, we find that in some situations the running speed is worse than the standard Newton method because cheaper but less accurate search directions are used. In this thesis, we propose some novel techniques to improve the existing subsampled Hessian Newton method. The main idea is to solve a 2-dimensional sub-problem per iteration to adjust the search direction to better minimize the second-order approximation of the function value. We prove the theoretical convergence of the proposed method. Experiments on logistic regression, linear SVM, maximum entropy, and deep networks indicate that our techniques significantly reduce the running time of the subsampled Hessian Newton method. The resulting algorithm becomes a compelling alternative to the standard Newton method for large-scale data classification. Deep learning involves a difficult non-convex optimization problem because of the large number of weights between any two adjacent layers of a deep structure. In the large-scale scenarios, distributed training is needed, but the calculation of function, gradient, and Hessian is expensive. In particular, the communication and synchronization cost becomes the bottleneck. In this thesis, we propose a novel distributed Newton method for deep learning. First, to reduce the communication cost, we consider storing the Jacobian matrix in a distributed environment, and then propose a diagonalization method such that an approximate Newton direction can be obtained without communication between machines. Second, to reduce the synchronization cost, we terminate the process of finding an approximate Newton direction even though some nodes have not finished their tasks. Third, we consider subsampled Hessian for reducing running time as well as communication cost. Details of some implementation issues in distributed environments are thoroughly investigated. Experiments demonstrate that the proposed method is effective for the distributed training of deep neural networks.
author2 Chih-Jen Lin
author_facet Chih-Jen Lin
Chien-Chih Wang
王建智
author Chien-Chih Wang
王建智
spellingShingle Chien-Chih Wang
王建智
Newton Methods For Machine Learning
author_sort Chien-Chih Wang
title Newton Methods For Machine Learning
title_short Newton Methods For Machine Learning
title_full Newton Methods For Machine Learning
title_fullStr Newton Methods For Machine Learning
title_full_unstemmed Newton Methods For Machine Learning
title_sort newton methods for machine learning
publishDate 2016
url http://ndltd.ncl.edu.tw/handle/54980842511743765214
work_keys_str_mv AT chienchihwang newtonmethodsformachinelearning
AT wángjiànzhì newtonmethodsformachinelearning
AT chienchihwang niúdùnfǎyújīqìxuéxízhīyīngyòng
AT wángjiànzhì niúdùnfǎyújīqìxuéxízhīyīngyòng
_version_ 1718558653888331776