Flash Talks

List of Speakers, Titles and Abstracts

 A group of n agents with numerical preferences for each other and additive utility over their neighbors are to be assigned to the n seats of a dining table. An arrangement is stable if no two agents strictly prefer each other’s seats. We study two natural topologies (circular and panel tables, i.e., cycle and paths) and show that deciding whether a stable arrangement exists is NP-hard, but only if the number of agent classes is unbounded and sufficiently many numerical values are allowed in the preference matrix. We moreover propose a complete characterization of the existence of stable arrangements based on the number of distinct values and the number of agent classes.

We investigate notions of group fairness in bipartite matching markets involving agents and jobs, where agents are grouped based on sensitive attributes. Employing a geometric approach, we characterize how many agents can be matched in each group, showing that the set of feasible matchings forms a (discrete) polymatroid. We focus on strong fairness notions and we explore the Price of Fairness (PoF), i.e., the loss in optimality when imposing such fairness constraints. We show that the PoF is bounded independently of the number of agents and jobs, but may be linear in the number of groups. Finally, we provide improved bounds with additional structural properties, or with stochastic graphs.

Social Influence and Communication have impact on politics, policy implementation, and, in general, consumption. We propose a three-stage game between a platform and its users to explore the dynamics of information between them. In the first stage, the platform privately persuades users based on their types. Stage two involves users communicating the platform's messages among themselves. In the last stage, users use the messages they hold after the communication stage and their types to determine their actions. We found that under truthful revelation of messages, the equilibrium actions depend on the direct and indirect effect of messages and the user's type. We characterise the communication networks for which truthful communication between users occurs. Finally, we characterise platform-optimal message structure and examine comparative statics with respect to the platform utility function.

We study a simple model of algorithmic collusion in which Q-learning algorithms repeatedly play a prisoner's dilemma and allow players to choose different exploration policies. We characterize behavior of such algorithms with asymmetric policies for extreme values and prove that any Nash equilibrium features some cooperative behavior. We further investigate the dynamics for general profiles of exploration policy by running extensive numerical simulations which indicate symmetry of equilibria, and give insight for their distribution.

We consider a given site equipped with a Charging Station (CS), where the Electric Vehicle (EV) users parking duration follow an exponential distribution, and are clustered into a finite set of classes depending on their expected parking duration. The number of EVs of each class at the CS therefore follows a Markov chain, that the CS operator can control through parking pricing, with knowledge only on the total number of EVs parked. The problem is formulated as an infinite horizon and discounted Partially Observable Markov Decision Process, with reward the CS operator’s profit.

The analysis and design of crowdsourcing platforms received great attention in the information systems domain. Ideally, one would like to study the effort that contestants exercise in equilibrium and analyze different designs. Contest theory is the main theoretical approach to analyze such competitions. The analysis of equilibria is notoriously hard. Almost all results assume complete information, and even then, analytical results are challenging. Recent trends in equilibrium learning allow computing equilibria of Bayesian games with high precision. We explore these techniques in contests and provide equilibrium predictions for games with and without an analytical solution to address managerial research questions.

Dynamic pricing is used extensively in the airlines and railways sectors, but very little in car hire, despite the close proximity of these sectors. Our objective here is to define, analyze and test an algorithmic decision approach for dynamic pricing for a car rental company, in an evolving and competitive context. To do that, machine learning methods will be used, in particular reinforcement learning, to develop a decision approach that dynamically learns to select the best pricing policy according to the market's prices and demand evolution over time.

The Colonel Blotto game is a resource allocation game where players decide where to focus their forces between different battlefields. We extend the standard Blotto game to a dynamic stochastic setting, in a time-continuous, two-player, zero-sum game. Using the dynamic programming principle, we explicitly characterize some Nash equilibrium strategies as well as the value of the game through a Hamilton-Jacobi-Bellman equation admitting a smooth solution. We formulate the game generally enough to allow for various rewards, as well as various drivers of randomness. We also present an application to cyber security using graph theory.

We focus on the design and price of supplemental training database. The designer versions the database and determines the associated tariff to screen data buyers with private database different in predictive power across multiple states. The predictive powers of the private database in different states determine the horizontal preference of the buyer while the overall predictive power determines the vertical preference. We investigate the joint design of the multi-dimensional screening and information, to implement revenue-maximizing selling policy. The optimal selling mechanism is shaped by interaction between horizontal and vertical preference and interaction between external and internal horizontal preference.

We study game dynamics in the population protocol model: n agents each maintain a current local strategy and interact in pairs uniformly at random. Upon each interaction, the agents play a two-person game and can subsequently update their strategies according to a fixed local algorithm. We ask how the distribution over agent strategies evolves over a sequence of interactions, and we introduce a new distributional equilibrium concept to quantify the quality of such distributions. We design a family of simple local algorithms for a class of repeated prisoner’s dilemma games, and we prove their convergence to approximate distributional equilibria. 

In this work we seek to decompose a game into simpler components where the day-to-day behavior of FTRL learning schemes is well understood. Focusing initially for concreteness on Exponential Weights schemes in continuous time on finite games, we introduce via a geometric construction the class of incompressible games, i. complementary (in a precise sense) to potential games and ii. exhibiting Poincaré recurrence under EW. Significantly broadening the spectrum of our investigation, we then discuss the convergence/recurrence properties of continuous-time FTRL and discrete-time mirror descent, both in its vanilla and optimistic variants, on the class of generalised harmonic games (of which incompressible games are a special case).

This paper reviews Unobservable Markov Decision Processes (UMDPs) with the long-run average objective, emphasizing their relationship with decision problems and decidability. We introduce a novel theoretical class of UMDPs that leverages the property of weak ergodicity in products of stochastic matrices. We then demonstrate that approximating the value within this class is decidable. Finally, we establish that determining the exact value remains undecidable.

Mean field control (MFC) problems have been introduced to study social optima in very large populations of strategic agents. The main idea is to consider an infinite population and to simplify the analysis by using a mean field approximation. These problems can also be viewed as optimal control problems for McKean-Vlasov dynamics. They have found applications in a wide range of fields, from economics and finance to social sciences and engineering. Usually, the goal for the agents is to minimize a total cost which consists in the integral of a running cost plus a terminal cost. In this work, we consider MFC problems in which there is no terminal cost but, instead, the terminal distribution is prescribed as in optimal transport problem. By analogy with MFC, we call such problems mean field optimal transport problems (or MFOT for short) since they can be viewed as a generalization of classical optimal transport problems when mean field interactions occur in the dynamics or the running cost function. We propose three numerical methods based on neural networks. The first one is based on directly learning an optimal control. The second one amounts to solve a forward-backward PDE system characterizing the solution. The third one relies on a primal-dual approach. We illustrate these methods with numerical experiments conducted on two families of examples. 

We study how pricing affects the division of surplus among buyers in auctions for multiple units. Our equity objective may be important, e.g., for competition concerns in downstream markets, complementing the long-standing debate on revenue and efficiency. We study a canonical model of auctions for multiple indivisible units with unit demand buyers and valuations with a private and a common component and consider all pricing rules that are a mixture (i.e., a convex combination) of pay-as-bid and uniform pricing. We propose the winners' empirical variance (WEV), the expected empirical variance of surplus among the winners, as a metric for surplus equity. We show that, for a range of private-common value proportions, a strictly interior mix of pay-as-bid and uniform pricing minimizes WEV. From an equity perspective, auctions with a higher private value component benefit from more price discrimination, whereas only auctions with a sufficiently high common value justify a more uniform pricing rule. We provide a criterion under which strictly mixed pricing dominates uniform pricing, a partial ranking of different mixed pricing formats, and bounds on the WEV-minimizing pricing under the assumption of log-concave signal distributions. In numerical experiments, we further illustrate the WEV-minimal pricing as a function of the private-common-value mix.

Given a compact convex strategy set $S$,  the mean-field stochastic evolutionary process $X_N$ on $S^N$ for an $N$-player population game $\mathfrak{g}_N$ is introduced. The existence and uniqueness of the invariant measure $\pi_N$ for $X_N$ is established for continuum strategy potential games. The empirical version $\xi_N$ induced by $X_N$ upon linear interpolation is shown to be deterministically approximated over every finite time horizon $T>0$ under the weak topology by a probability-measure valued ODE which belongs to the class of McKean-Vlasov diffusions. Finally, using the above results, the McKean-Vlasov diffusion is shown to be close in the 'medium run' to some probability measure $\mu^\circ$ as $N$ approaches infinity.

A key contributor to the success of society is our ability to meaningfully cooperate. Game-theoretic reasoning shows, however, that we are often caught in a dilemma between prioritising our own welfare over other’s. In this work, we study such dilemmas through the 'selfishness level', a standard game-theoretic metric which quantifies the extent to which a game's payoffs incentivise selfish behaviours. Using this framework, we derive the conditions under which dilemmas can be resolved and, additionally, produce a first-step towards introducing this metric to Markov games. Finally, we present experimental results indicating the positive effects of selfishness level guided mechanisms.

Consider Markov Decision Processes (MDPs) with reachability and stochastic shortest path objectives. Value Iteration (VI) iteratively applies local updates called Bellman updates. Many VI-based variants require exponentially many Bellman updates even for Markov Chains (MCs). For MCs, we present a preprocessing, i.e., a polynomial-time, linear-space, and discrete algorithm, after which only sub-exponentially many Bellman updates are required to compute the value. Our new approach leads to a practical algorithm for MDPs. Experimental results show that our approach provides a considerable improvement over existing VI methods on several benchmark examples from the literature.

We introduce the novel Coarse Payoff-Assessment Learning (CPAL) model that describes reinforcement learning in agents who tend to simplify the choice problems they face by focusing on the aggregate consequences of choosing over different clusters of alternatives (pre-arranged into exogenously defined similarity classes) instead of assessing each alternative individually. This technique is especially relevant in choice problems with uncertainty where the overall set of alternatives is too large to be considered extensively. We consider a smooth version of such a learning model and apply it to families of decision problems that differ in the set of available similarity classes. We first note that the steady-states of the learning dynamics correspond to a smooth (quantal) refinement of the Valuation Equilibrium. We present examples of choice problems that reveal the possibility of multiple equilibria and verify the local asymptotic stability of pure equilibria. In contrast, we also identify conditions under which a unique, fully mixed equilibrium emerges - characterized by identical valuations across all similarity classes involved in the mixing, although the agent almost surely selects the alternative with the highest valuation. In particular, when the trivial decision problems involving a single similarity class lead to sufficiently high payoffs, we show that the latter arises, and we prove that the unique steady-state (that involves uniform randomization) is globally asymptotically stable in the learning dynamics.

In a continuous-time moral hazard problem, an agent chooses to start shirking either if the multistage project is completed or if the project is unlikely to be completed before the final date. A principal wants to convince the agent to incur effort for as long as possible and can design the flow of information about the progress of the project. If the project is sufficiently promising ex ante, then the principal commits to providing only the good news that the project is accomplished. Otherwise, the principal commits to providing not only good news but also bad news that a project milestone has not been reached by an interim date.

How well do learning algorithms perform in strategic environments? We consider a Q-learning algorithm that plays against a sequence of myopic players. We identify the limit outcomes, and distinguish according to the information available to the short--run players. If the algorithm is transparent, a Folk-Theorem-like result emerges. However, if the algorithm is opaque and the short-run players observe its actions with little noise, the limit outcome is unique and corresponds to the Stackelberg payoff, state-by-state.

In a market where each period introduces a new buyer with independently and identically distributed WTP and firms use Q-learning for pricing, we identify two novel mechanisms that lead to collusive outcomes. Under asymmetric information structure, a firm with an informational advantage employs a Bait-and-Restrained-Exploit strategy, surrendering profits on some signals by setting higher prices, while exploiting limited profits on the remaining signals by setting much lower prices. Under a symmetric information structure, competition on some signals facilitates convergence to supra-competitive prices on the remaining signals. Extensive data usage by AIs in competitive markets can diminish collusion and industry profits.

I study the revenue guarantee across all Bayes coarse correlated equilibria, starting with the analysis in a first-price auction and common value setting. I first show that it suffices to only examine identical-play equilibria, and based on it, I give characteristic functions for the revenue guarantee assuming a continuous value distribution. The asymptotic revenue guarantee of a continuous distribution is equal to that under Bayes correlated equilibria in the literature. Moreover, I characterize the revenue guarantee for a binary value distribution, which gives the minimum revenue guarantee level among all value distributions. Finally, I show that all the results will still apply under a standard auction and symmetric prior environment.