MQACO
: A Multi‐Step Reinforcement Learning Scheduling Algorithm for Heterogeneous Computing Systems
Jian He, Bo Gao, Zongfu Xie, Fei Gao, Yizhe Zhang ABSTRACT
To address the limitations of existing scheduling algorithms in heterogeneous computing systems, which fail to fully exploit structural scheduling information such as task dependencies and consequently suffer from low computational efficiency and severe resource wastage, this paper proposes an improved ant colony optimization algorithm based on multi‐step Q‐learning (MQACO). In the proposed algorithm, reward signals are constructed from the computation and communication costs of the directed acyclic graph (DAG). An eligibility trace mechanism is employed to integrate multi‐step reward information, enabling iterative optimization and dynamic updating of the Q‐matrix. The updated Q‐matrix is then used as the initial pheromone distribution for the subsequent ant colony optimization (ACO), which enhances exploration of the solution space while accelerating population convergence. Extensive experiments are conducted on randomly generated DAGs and large‐scale real‐world FFT application graphs with input sizes ranging from 16 to 128 points. The experimental results show that MQACO achieves better performance in terms of makespan, speedup, efficiency, and scheduling length ratio (SLR). Compared with the widely used HEFT heuristic, MQACO reduces the average makespan by 8.01% and the average SLR by 8.01%, while compared with existing meta‐heuristic algorithms, MQACO reduces SLR by an average of 5.32% and improves efficiency by 6.09%.