MMK: A Hybrid Scheduling Framework for Fine-Grained GPU Sharing for Deep Learning Applications
Zhuolong Jiang, Zinuo Cai, Yawen Li, Zizong Wang, Ruhui Ma, Haibing Guan, Buyya RajkumarWith the rapid growth of deep learning applications and their increasing demand for computational resources, GPU acceleration has become critical for supporting large-scale, compute-intensive deep learning tasks. To improve GPU utilization, emerging GPU sharing technologies such as Multi-Instance GPU (MIG) and Multi-Process Service (MPS) have been widely employed. However, MIG suffers from inefficient resource allocation while MPS introduces performance interference. Although integrating MIG with MPS further enhances GPU sharing capability, this approach still requires complex configuration and coarse-grained scheduling for dynamic workloads, making it ineffective in handling variations in resource demand for online jobs and offline jobs. To address these challenges, we propose MMK , a multi‑level GPU sharing system that integrates MIG, MPS, and kernel‑level scheduling for online and offline jobs. First, we design a hybrid scheduler to efficiently configure MIG and MPS resources for dynamic job demands. Second, we develop a kernel scheduler to implement fine‑grained scheduling strategies that dynamically optimize kernel execution, thereby improving system throughput and reducing resource contention. Extensive experiments demonstrate that compared with the state-of-the-art framework, we reduce the average job completion time and makespan by 28% and 32%, respectively, and increase system throughput by 35%.