Newton Methods For Machine Learning
博士 === 國立臺灣大學 === 資訊工程學研究所 === 105 === Newton methods can be applied in many supervised learning approaches. However, for large-scale data, the use of the whole Hessian matrix can be time consuming. In this thesis we aim to make Newton methods practically viable for various large-scale scenarios. Th...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2016
|
Online Access: | http://ndltd.ncl.edu.tw/handle/54980842511743765214 |
id |
ndltd-TW-105NTU05392016 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-105NTU053920162017-10-29T04:35:35Z http://ndltd.ncl.edu.tw/handle/54980842511743765214 Newton Methods For Machine Learning 牛頓法於機器學習之應用 Chien-Chih Wang 王建智 博士 國立臺灣大學 資訊工程學研究所 105 Newton methods can be applied in many supervised learning approaches. However, for large-scale data, the use of the whole Hessian matrix can be time consuming. In this thesis we aim to make Newton methods practically viable for various large-scale scenarios. The first part of this thesis is about sub-sampled Newton methods. Recently, subsampled Newton methods have been proposed to reduce the computational time by using only a subset of data for calculating an approximation of the Hessian matrix. Unfortunately, we find that in some situations the running speed is worse than the standard Newton method because cheaper but less accurate search directions are used. In this thesis, we propose some novel techniques to improve the existing subsampled Hessian Newton method. The main idea is to solve a 2-dimensional sub-problem per iteration to adjust the search direction to better minimize the second-order approximation of the function value. We prove the theoretical convergence of the proposed method. Experiments on logistic regression, linear SVM, maximum entropy, and deep networks indicate that our techniques significantly reduce the running time of the subsampled Hessian Newton method. The resulting algorithm becomes a compelling alternative to the standard Newton method for large-scale data classification. Deep learning involves a difficult non-convex optimization problem because of the large number of weights between any two adjacent layers of a deep structure. In the large-scale scenarios, distributed training is needed, but the calculation of function, gradient, and Hessian is expensive. In particular, the communication and synchronization cost becomes the bottleneck. In this thesis, we propose a novel distributed Newton method for deep learning. First, to reduce the communication cost, we consider storing the Jacobian matrix in a distributed environment, and then propose a diagonalization method such that an approximate Newton direction can be obtained without communication between machines. Second, to reduce the synchronization cost, we terminate the process of finding an approximate Newton direction even though some nodes have not finished their tasks. Third, we consider subsampled Hessian for reducing running time as well as communication cost. Details of some implementation issues in distributed environments are thoroughly investigated. Experiments demonstrate that the proposed method is effective for the distributed training of deep neural networks. Chih-Jen Lin 林智仁 2016 學位論文 ; thesis 105 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
博士 === 國立臺灣大學 === 資訊工程學研究所 === 105 === Newton methods can be applied in many supervised learning approaches. However, for large-scale data, the use of the whole Hessian matrix can be time consuming. In this thesis we aim to make Newton methods practically viable for various large-scale scenarios. The first part of this thesis is about sub-sampled Newton methods. Recently, subsampled Newton methods have been proposed to reduce the computational time by using only a subset of data for calculating an approximation of the Hessian matrix. Unfortunately, we find that in some situations the running speed is worse than the standard Newton method because cheaper but less accurate search directions are used. In this thesis, we propose some novel techniques to improve the existing subsampled Hessian Newton method. The main idea is to solve a 2-dimensional sub-problem per iteration to adjust the search direction to better minimize the second-order approximation of the function value. We prove the theoretical convergence of the proposed method. Experiments on logistic regression, linear SVM, maximum entropy, and deep networks indicate that our techniques significantly reduce the running time of the subsampled Hessian Newton method. The resulting algorithm becomes a compelling alternative to the standard Newton method for large-scale data classification.
Deep learning involves a difficult non-convex optimization problem because of the large number of weights between any two adjacent layers of a deep structure. In the large-scale scenarios, distributed training is needed, but the calculation of function, gradient, and Hessian is expensive. In particular, the communication and synchronization cost becomes the bottleneck. In this thesis, we propose a novel distributed Newton method for deep learning.
First, to reduce the communication cost, we consider storing the Jacobian matrix in a distributed environment, and then propose a diagonalization method such that an approximate
Newton direction can be obtained without communication between machines. Second, to reduce the synchronization cost, we terminate the process of finding an approximate Newton direction even though some nodes have not finished their tasks. Third, we consider subsampled Hessian for reducing running time as well as communication cost. Details of some implementation issues in distributed environments are thoroughly investigated. Experiments demonstrate that the proposed method is effective for the distributed training of deep neural networks.
|
author2 |
Chih-Jen Lin |
author_facet |
Chih-Jen Lin Chien-Chih Wang 王建智 |
author |
Chien-Chih Wang 王建智 |
spellingShingle |
Chien-Chih Wang 王建智 Newton Methods For Machine Learning |
author_sort |
Chien-Chih Wang |
title |
Newton Methods For Machine Learning |
title_short |
Newton Methods For Machine Learning |
title_full |
Newton Methods For Machine Learning |
title_fullStr |
Newton Methods For Machine Learning |
title_full_unstemmed |
Newton Methods For Machine Learning |
title_sort |
newton methods for machine learning |
publishDate |
2016 |
url |
http://ndltd.ncl.edu.tw/handle/54980842511743765214 |
work_keys_str_mv |
AT chienchihwang newtonmethodsformachinelearning AT wángjiànzhì newtonmethodsformachinelearning AT chienchihwang niúdùnfǎyújīqìxuéxízhīyīngyòng AT wángjiànzhì niúdùnfǎyújīqìxuéxízhīyīngyòng |
_version_ |
1718558653888331776 |