Reinforcement-Learning-Based Hybrid Truck–Drone Delivery Optimization
Youyao Gao, Tongchang Liu, Huan JinThis paper studies large-scale last-mile delivery using a heterogeneous fleet of trucks, onboard drones in a hybrid truck–drone mode, and independent drones. Orders are first screened by a feasibility check; feasible orders are then assigned to one of the three modes by a delivery mode selection policy and routed using mode-specific planning algorithms. The delivery mode selection policy is trained with Proximal Policy Optimization (PPO), warm-started by behaviour cloning from heuristic decisions. For route planning, we use a five-step procedure for the hybrid mode and simple depot round trips for independent drones. Experiments on Solomon VRPTW benchmarks and extended instances (100/200/400 customers; R/C/RC distributions) show lower total cost than representative heuristic baselines and metaheuristics, with practical runtime. Sensitivity analysis over fleet sizes further indicates competitive performance across a range of truck and drone configurations, especially for medium and large fleets.