Relatively Smooth Convex Optimization by First-Order Methods, and Applications

The usual approach to developing and analyzing first-order methods for smooth convex optimization assumes that the gradient of the objective function is uniformly smooth with some Lipschitz constant L. However, in many settings the differentiable convex function f(?) is not uniformly smooth-for exam...

Full description

Bibliographic Details
Main Authors: Nesterov, Yurii (Author), Lu, Haihao (Contributor), Freund, Robert Michael (Contributor)
Other Authors: Massachusetts Institute of Technology. Department of Mathematics (Contributor), Sloan School of Management (Contributor)
Format: Article
Language:English
Published: Society for Industrial & Applied Mathematics (SIAM), 2019-03-11T18:29:55Z.
Subjects:
Online Access:Get fulltext
LEADER 02015 am a22002053u 4500
001 120867
042 |a dc 
100 1 0 |a Nesterov, Yurii  |e author 
100 1 0 |a Massachusetts Institute of Technology. Department of Mathematics  |e contributor 
100 1 0 |a Sloan School of Management  |e contributor 
100 1 0 |a Lu, Haihao  |e contributor 
100 1 0 |a Freund, Robert Michael  |e contributor 
700 1 0 |a Lu, Haihao  |e author 
700 1 0 |a Freund, Robert Michael  |e author 
245 0 0 |a Relatively Smooth Convex Optimization by First-Order Methods, and Applications 
260 |b Society for Industrial & Applied Mathematics (SIAM),   |c 2019-03-11T18:29:55Z. 
856 |z Get fulltext  |u http://hdl.handle.net/1721.1/120867 
520 |a The usual approach to developing and analyzing first-order methods for smooth convex optimization assumes that the gradient of the objective function is uniformly smooth with some Lipschitz constant L. However, in many settings the differentiable convex function f(?) is not uniformly smooth-for example, in D-optimal design where f(x) := -ln det(HXHT) and X := Diag(x), or even the univariate setting with f(x) := -ln(x)+x2. In this paper we develop a notion of "relative smoothness" and relative strong convexity that is determined relative to a user-specified "reference function" h(?) (that should be computationally tractable for algorithms), and we show that many differentiable convex functions are relatively smooth with respect to a correspondingly fairly simple reference function h(?). We extend two standard algorithms-the primal gradient scheme and the dual averaging scheme-to our new setting, with associated computational guarantees. We apply our new approach to develop a new first-order method for the D-optimal design problem, with associated computational complexity analysis. Some of our results have a certain overlap with the recent work [H. H. Bauschke, J. Bolte, and M. Teboulle, Math. Oper. Res., 42 (2017), pp. 330-348]. 
655 7 |a Article 
773 |t SIAM Journal on Optimization