ROADEF 2026>
Reinforcement Learning-Enhanced Operator Selection in Adaptive Large Neighborhood Search: Application to CVRP and VRPTW
Asma Cherrered  1@  , Ammar Oulamara  2@  , Wahiba Ramdane Cherif-Khettaf  3@  
1 : Laboratoire Lorrain de Recherche en Informatique et ses Applications
Lorraine University -- LORIA
2 : Laboratoire Lorrain de Recherche en Informatique et ses Applications
Lorraine University -- LORIA
3 : Laboratoire Lorrain de Recherche en Informatique et ses Applications
Lorraine University -- LORIA

 

In the Vehicle Routing Problem with Time Windows (VRPTW) a fleet of capacitated vehicles must serve customers within individual time intervals while minimizing routing cost. It extends the Capacitated Vehicle Routing Problem (CVRP) by adding time-window constraints. Both problems are NP-hard, and exact methods quickly become impractical for medium and large
instances.

Large Neighborhood Search (LNS) and Adaptive Large Neighborhood Search (ALNS) are widely used for VRP variants. LNS alternates between a destroy step, which removes part of the current solution, and a repair step, which reinserts removed customers. ALNS enriches this framework with a set of destroy and repair operators whose selection probabilities are adapted online according to their observed performance [5, 3]. In most implementations, operator weights are updated via a manually designed scoring and roulette wheel selection based on aggregate historical performance.

A recent meta-study [6] shows that the classical adaptive layer often brings only limited average improvements (0.14%). We address this issue by introducing a data-driven operator selection mechanism for ALNS. Several ALNS-RL and LNS-RL variants have been proposed for routing problems, where Reinforcement Learning is used to guide operator choice or parameter settings.

In this work, we propose a unified data-driven ALNS framework applied to CVRP and VRPTW. The RL agent observes structured information about the solution and search process and learns to control (i) the choice of destroy operator, (ii) the destruction degree, and (iii) a temperature parameter impacting acceptance decisions. The approach is first implemented and evaluated on standard CVRP benchmarks, and is currently being extended to VRPTW.

 


Chargement... Chargement...