Near-optimal no-regret algorithms for zero-sum

We propose a new no-regret learning algorithm. When used against an adversary, our algorithm achieves average regret that scales as O (1/√T) with the number T of rounds. This regret bound is optimal but not rare, as there are a multitude of learning algorithms with this regret guarantee. However, wh...

Full description

Bibliographic Details
Main Authors:	Daskalakis, Constantinos (Contributor), Deckelbaum, Alan T. (Contributor), Kim, Anthony (Author)
Other Authors:	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science (Contributor), Massachusetts Institute of Technology. Department of Mathematics (Contributor)
Format:	Article
Language:	English
Published:	Society for Industrial and Applied Mathematics, 2012-09-21T15:32:49Z.
Subjects:	Article
Online Access:	Get fulltext


LEADER	02370 am a22002293u 4500
001	73097
042			\|a dc
100	1	0	\|a Daskalakis, Constantinos \|e author
100	1	0	\|a Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science \|e contributor
100	1	0	\|a Massachusetts Institute of Technology. Department of Mathematics \|e contributor
100	1	0	\|a Daskalakis, Constantinos \|e contributor
100	1	0	\|a Deckelbaum, Alan T. \|e contributor
700	1	0	\|a Deckelbaum, Alan T. \|e author
700	1	0	\|a Kim, Anthony \|e author
245	0	0	\|a Near-optimal no-regret algorithms for zero-sum
260			\|b Society for Industrial and Applied Mathematics, \|c 2012-09-21T15:32:49Z.
856			\|z Get fulltext \|u http://hdl.handle.net/1721.1/73097
520			\|a We propose a new no-regret learning algorithm. When used against an adversary, our algorithm achieves average regret that scales as O (1/√T) with the number T of rounds. This regret bound is optimal but not rare, as there are a multitude of learning algorithms with this regret guarantee. However, when our algorithm is used by both players of a zero-sum game, their average regret scales as O (ln T/T), guaranteeing a near-linear rate of convergence to the value of the game. This represents an almost-quadratic improvement on the rate of convergence to the value of a game known to be achieved by any no-regret learning algorithm, and is essentially optimal as we show a lower bound of Ω (1/T). Moreover, the dynamics produced by our algorithm in the game setting are strongly-uncoupled in that each player is oblivious to the payoff matrix of the game and the number of strategies of the other player, has limited private storage, and is not allowed funny bit arithmetic that can trivialize the problem; instead he only observes the performance of his strategies against the actions of the other player and can use private storage to remember past played strategies and observed payoffs, or cumulative information thereof. Here, too, our rate of convergence is nearly-optimal and represents an almost-quadratic improvement over the best previously known strongly-uncoupled dynamics.
520			\|a National Science Foundation (U.S.) (CAREER Award CCF-0953960)
546			\|a en_US
655	7		\|a Article
773			\|t Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms (SODA '11)

Near-optimal no-regret algorithms for zero-sum

Similar Items