ROADEF 2026 - Sciencesconf.org

sciencesconf.org:roadef2026:687710

Outpatient surgery centers face significant operational challenges due to fluctuations in patient arrivals, heterogeneous processing durations, and variable resource availability. Traditional optimization approaches struggle to generalize under uncertainty, while machine learning methods often lack robustness when deployment conditions differ from training scenarios.

This work addresses bi-objective outpatient surgery scheduling by proposing a perturbation-based Q-learning variant (QLP) that explicitly incorporates operational uncertainties during the training phase. We model the problem as a Markov Decision Process following a three-stage Hybrid Flow Shop (preoperative, intraoperative, and postoperative stages) and jointly minimize makespan and cumulative waiting time.

Our key contributions are: (i) formulation of the scheduling problem within a reinforcement learning framework, (ii) development of a robust learning mechanism through controlled stochastic perturbations affecting arrival times, processing durations, and resource availability, and (iii) empirical demonstration of superior performance compared to standard Q-learning.

Computational experiments demonstrate that QLP maintains minimal performance variance across all perturbation levels, while standard Q-learning exhibits significant degradation under medium-high perturbations. The results confirm that perturbation-based training enables the agent to develop generalizable and robust scheduling policies capable of handling real-world operational uncertainties in healthcare environments.

Type :	:	Résumé
Thématiques	:	[INVITE] Intelligence artificielle et santé
Mots-Clés	:	Q_learning ; Scheduling ; Reinforcement Learning ; Robustness

Vie privée | Accessibilité