Online Optimization and Learning in Games

Prof. Panayotis Mertikopoulos


The goal of online learning is the study of decision-making in changing environments. The changes in the learner's environment could be either exogenous (i.e., independent of the learner's decisions, such as the weather affecting the time of travel) or endogenous (that is, depending on the learner's decisions, as in a game of poker), but the goal is always the same: to make better, more informed decisions over time. As a result, this field has found a wide range of applications, from traffic planning and online advertisement placement, to web ranking and deep network training.

This short course is intended to provide an introduction to online optimization and learning from a multi-agent, game-theoretic viewpoint. The course will consist of two main parts, depending on the type of games and problems considered – finite, or continuous. In particular, we will cover the basics of no-regret learning in finite games (focusing on the multiplicative/exponential weights algorithm and its properties), and the basics of online optimization in continuous games (online gradient descent, follow the regularizer leader, etc.). We will also discuss the impact of the information available to the players, as well as a range of concrete applications to machine learning and operations research, with a specific focus on traffic routing and generative AI models.



1. Tor Lattimore and Csaba Szepesvári, Bandit algorithms, Cambridge University Press, Cambridge, UK, 2020.

2. S. Shalev-Shwartz, Online learning and online convex optimization, Foundations and Trends in Machine Learning 4 (2011), no. 2, 107–194.

3. Lecture notes (To be added)