ROADEF 2026>
Reinforcement learning for online stochastic matching
Antoine Lunven  1@  , Ana Bušić  1@  , Jean-Michel Fourneau  1, 2@  
1 : Apprentissage, graphes et optimisation distribuée
Département d'informatique - ENS-PSL, Centre Inria de Paris
2 : Données et algorithmes pour une ville intelligente et durable - DAVID
Université de Versailles Saint-Quentin-en-Yvelines

We consider online stochastic matching model, with items arriving over time according to a stochastic process. A matching model is defined as a set of classes and a set a compatibilities among these classes of items, which are represented by a connected compatibility graph. Items enter the system one by one and their class is drawn according to a probability distribution. We consider First Come First Matched Policy, i.e. uppon arrival an item is matched with the oldes compatible item in the system, and stored in a queue if there is no compatible items. We propose to use Reinforcement Learning (RL) algorithms to investigate the impact of graph topology on the system performance (e.g. what edges to add to the graph in order to minimize the queue size) for the FCFM policy, for which there is a known closed form steady-state distribution. 


Chargement... Chargement...