Large-scale network design (facility opening, capacity allocation, flow routing) is naturally modeled as mixed-integer optimization but quickly becomes intractable under heterogeneous costs, asymmetric distances, tight capacities, and realistic side constraints. Classical metaheuristics (GA, SA, VNS) improve locally yet often stagnate or overfit instance structure, while purely learning-based controllers risk overfitting and lack interpretability. We propose CK-HH, a two-layer Compound Knowledge Hyper-Heuristic that fuses encoded analytical/empirical/contextual knowledge with an adaptive bandit/RL controller selecting among low-level heuristics (e.g., swaps, greedy insertion, SA/VNS, fix-and-optimize). A hyper-heuristic is a reinforcement-learning–driven, high-level strategy that selects or composes low-level heuristics during search. The state combines solution and runtime features; rewards balance improvement, time, and diversity. This hybrid yields interpretable guidance early and data-driven intensification near convergence, improving scalability and stability on large instances.

