Location-Aware Scheduling for Multitasking MCM-GPUs

doi:10.1145/3827618

DOI: 10.1145/3827618 ISSN: 1084-4309

Location-Aware Scheduling for Multitasking MCM-GPUs

Tiejian Zhang, Guangda Zhang, Yandong He, Chen Zhang, Jiping Gao, Hengzhu Liu, Xia Zhao

To address the growing computational demands of high-performance computing (HPC) and machine learning (ML) applications, Graphics Processing Units (GPUs) have transitioned from single-chip designs to multi-chip module (MCM) architectures, driven by manufacturing constraints. Simultaneously, GPUs are increasingly deployed in cloud environments to accelerate a wide range of applications for multiple users. Spatial multitasking, which allows multiple applications to run concurrently on a single GPU by executing them on different sets of streaming multiprocessors (SMs), offers an efficient method for sharing GPU resources. However, effectively supporting multitasking in the emerging MCM-GPU architecture presents a significant challenge that remains an open problem. This paper presents the key observation that the placement of co-executing applications in MCM-GPUs plays a critical role in system performance. Specifically, for certain multiprogram workloads, co-executing applications achieve better performance when located on the same GPU chip to maximize memory bandwidth utilization. Conversely, other workloads benefit from being distributed across different GPU chips to mitigate memory contention.

To address this, we propose a Location-Aware Scheduler (LA-Scheduler) that identifies application characteristics and makes optimized scheduling decisions. The LA-Scheduler performs lightweight workload classification using a k-means-based clustering method and applies multitasking scheduling rules during runtime to determine the optimal placement of applications within the MCM-GPU architecture. Evaluation results show that the proposed LA-Scheduler improves system throughput (STP) by an average of 38.67% compared to single-task operation. Compared to the traditional methods intra-chip scheduling and inter-chip scheduling, the system throughput is improved by an average of 13.51% and 7.14% respectively.

Outline

Location-Aware Scheduling for Multitasking MCM-GPUs

More from our Archive