Summary: | 碩士 === 國立交通大學 === 應用數學系數學建模與科學計算碩士班 === 106 === In this thesis, we give a survey of the variant algorithms for stochastic gradient methods, including Stochastic Newton methods (SN), Stochastic Variance Reduced Gradient method (SVRG), and Adam. There are two main categories of the stochastic gradient methods improvement: noise reduction and second-order information. We will discuss the advantage and disadvantage of these methods from different aspects.
Conventionally, machine learning problems utilize the batch approaches to train the model because they directly optimize the empirical risk. However, the computational complexity of the batch approaches depends on the size of the entire training dataset. The huge training dataset lead to the large-scale problems. Therefore, researchers turn to the stochastic approaches.
Compare to batch approaches, stochastic approaches optimize the model based on the random sampling from the training dataset. Under this rule, the computational cost of stochastic approaches is obviously cheaper than batch approaches. In contrast, we cannot guarantee the updating efficiency of stochastic approaches in every iteration. Thus, the variant algorithms are proposed for improving the shortcomings.
In experiments, we implement these stochastic optimization problems to solve the binary classification problems based on the reduced support vector machine(RSVM) with the good properties strongly convexity and smoothness.
|