Unleashing HPC Application Performance through Software Deployment: A Joint Model of Software Parallelism and Co-location
Yuxin Ren, Li Zhou, Chumin Sun, Rui Fan, Jie Sun, Ning Jia, Xinwei HuSoftware deployment is a critical software engineering practice, particularly for high performance computing (HPC) software.The deployment determines the software execution performance because the deployment maps a number of software components to multiple CPUs in a server. Inappropriate mapping decreases software parallelism and increases resource contention due to software co-location on the same CPUs. However, calculating the mapping to maximize the software performance is challenging, primarily due to the lack of a joint performance model that accounts for both software parallelism and co-location. Consequently, existing industry practice has to rely on experienced engineers to manually tune the mapping during deployment, resulting in substantial human resource waste of man-months and suboptimal software performance.
This paper proposes a holistic approach to mapping multiple CPUs among multiple software components to achieve better applicability and performance. We develop a performance model for predicting performance impact of different CPU mapping configurations, along with a search algorithm to identify the best mapping scheme. Our performance model jointly considers software parallelism and co-location, breaks the performance estimation into regularized execution and interference coefficient to improve accuracy, and integrates expert knowledge to reduce the model complexity. Our search algorithm employs nested iterative packing algorithm to explore all possible mapping schemes, thereby uncovering the optimal solution. Evaluation on a multi module HPC application shows 17% better performance than its default CPU mapping Our solution has been deployed in a commercial HPC cluster with more than 50K CPU cores, delivering 26.5% performance improvement and saving many man-months effort spent on performance tuning.