This work introduces MADRL-GA, a hybrid approach that integrates Multi-Agent Deep Reinforcement Learning with a Genetic Algorithm to improve traffic coordination at unsignalized intersections. The method addresses key limitations of MADRL, including reward misalignment, hyperparameter sensitivity, and instability during multi-agent learning. The GA optimizes the reward-function coefficients offline by evaluating candidate vectors across multiple performance metrics such as throughput, waiting time, and collision avoidance. The optimized reward parameters are then refined during training through small perturbations to maintain local adaptability. The resulting adaptive reward formulation stabilizes the learning process and enhances policy convergence. Preliminary experiments in a two-intersection simulation environment show that MADRL-GA achieves smoother and faster reward progression, outperforming training based solely on perturbation-based reward shaping. These initial findings highlight the potential of combining evolutionary optimization with reinforcement learning for complex traffic management problems.

