Sequential Learning in a strategical environment

Publié le 30 septembre 2021

Auteurs : Etienne Boursier

In sequential learning (or repeated games), data is acquired and treated on the fly and an algorithm (or strategy) learns to behave as well as if it got in hindsight the state of nature, e.g., distributions of rewards. In many real life scenarios, learning agents are not alone and interact, or interfere, with many others. As a consequence, their decisions have an impact on the other and, by extension, on the generating process of rewards. We aim at studying how sequential learning algorithms behave in strategic environments, when facing and interfering with each others. This thesis thus considers different problems, where some interactions between learning agents arise and provides computationally efficient algorithms with good performance (small regret) guarantees.When agents are cooperative, the difficulty of the problem comes from its decentralized aspect, as the different agents take decisions solely based on their observations. In this case, we propose algorithms that not only coordinate the agents to avoid negative interference with each other, but also leverage the interferences to transfer information between the agents, thus reaching performances similar to centralized algorithms. With competing agents, we propose algorithms with both satisfying performance and strategic (e.g., epsilon-Nash equilibria) guarantees.This thesis mainly focuses on the problem of multiplayer bandits, which combines different interactions between learning agents in a formalized online learning framework. Both for the cooperative and competing case, algorithms with performances comparable to the centralized case are proposed.Other sequential learning instances are also considered in this thesis. We also propose a strategy reaching centralized performances for decentralized queuing systems. For online auctions, we suggest to balance short and long term rewards with a utility/privacy trade-off. It is formalized as an optimization problem, that is equivalent to Sinkhorn divergence and benefits from the recent advances on Optimal Transport. We also study social learning with reviews, when the quality of the product varies over time.