ROADEF 2026>
Evaluating the Potential of LLMs in Solving Combinatorial Optimization Problems
Rym Taleb  1@  , Fouad Maliki  1@  , Nadjib Brahimi  2@  , Sadeque Hamdan  3, 4@  
1 : Ecole Supérieure en Sciences Appliquées de Tlemcen (ESSA-Tlemcen)
2 : Rennes School of Business
Ministère de l'Enseignement supérieur et Recherche
3 : Bangor Business School
4 : Bangor University

Beyond recent technical progress, a key reason to study LLMs in optimization is their promise to make better decisions easier for non-experts. Many small businesses do not have in-house OR expertise or the budget to hire specialists, so they often rely on simple rules or intuition. With LLMs available through easy chat interfaces, a manager can describe a problem in plain language and ask for help with tasks such as scheduling, routing, or choosing facility locations. This raises a practical question: can an LLM act as a low-cost “optimization assistant” for organizations that would otherwise not use OR tools?

At the same time, LLM outputs can be unreliable. Unlike standard solvers and well-tested heuristics, LLMs do not guarantee good solutions and may produce confident but wrong answers. This makes it important to test what free, publicly available LLMs can realistically deliver on known optimization problems. In this paper, we evaluate several LLMs on the p-median facility location problem under different prompt designs, including an iterative interaction strategy. Our goal is to understand when LLMs can support non-expert decision-makers, and when they still fall short compared with established optimization methods.

Recent advances in large language models have opened new perspectives for combinatorial optimization, complementing traditional exact and metaheuristic methods. Several recent studies have highlighted the potential of LLMs to generate or adapt heuristics, including a structured review of LLM usage in metaheuristics [1] and an analysis of the integration of Generative AI in Operations Research [2]. More targeted contributions also demonstrate the ability of LLMs to discover new operators for real-world problems, such as VRPAGENT for vehicle routing problems [3].

In this context, our motivation is to provide a simple and transparent evaluation of what publicly accessible LLMs (free versions) can realistically produce when faced with optimization problems. Our contribution focuses on an initial case within facility location problems, namely the p-median problem [4], where we compare several LLMs under varying levels of prompt detail and through an iterative interaction strategy designed to progressively improve solution quality. The p-median problem is a well known NP-hard discrete optimization problem that consists in selecting p facilities from a set of candidates so as to minimize the total distance between each demand point and its assigned facility.

Although our experimentation concentrates on the p-median problem, the proposed methodology is easily extendable to other combinatorial optimization problems. The data used come from well known benchmark instances, allowing direct comparison of solution quality in terms of objective value, feasibility, and robustness. Our approach relies on three modes of LLM usage: autonomous mode, guided assistant mode, and optimization expert mode. In Phase 1 (autonomous mode), the LLM receives only the textual problem description and benchmark data, with no imposed algorithm, solution structure, or code.For this study, we evaluate ChatGPT, DeepSeek, Gemini, Grok, and Claude, all in their publicly accessible free versions. The evaluation criteria for Phase 1 include solution quality, constraint feasibility, reasoning coherence evaluated based on the clarity, consistency, and justification of the solution method provided by the LLM, stability across repeated submissions, and the model's ability to correct or improve its solution during successive iterations. The process incorporates an iterative loop: when a solution is incorrect or infeasible, the model is prompted to verify or recalculate it, and its behavior throughout these iterations is analyzed.


Chargement... Chargement...