DOI: 10.1145/3828552 ISSN: 1544-3566
Reinforcement Learning on Data-Dependence Graphs for Custom Instruction Identification
Eslam Hussein, Bernd Waschneck, Christian MayrCustom instruction (CI) identification is a key technique for tailoring application-specific instruction-set processors (ASIPs) to demanding embedded workloads, yet classical two-stage flows that enumerate candidate instructions and then select among them do not scale as data-dependence graph (DDG) sizes and I/O interface bounds grow. This paper proposes
RL-GNN-CI
, a reinforcement-learning framework that operates directly on compiler-generated DDGs to identify profitable, convex CIs under joint I/O and area constraints. A graph neural network encoder, shared across all candidate instructions, conditions node and graph embeddings on the evolving CI state, while a policy head incrementally grows and commits CI candidates on each graph. We instantiate this framework with proximal policy optimization (PPO) and deep Q-learning (DQN), training on synthetic DDGs that match real-workload statistics. A controlled ablation study over action masking, reward shaping, and gain normalization identifies strict masking with either a simplified or absolute-gain reward as the most effective configurations. On DDGs extracted from MiBench, MediaBench, and JM H.264 kernels under 240 benchmark–constraint scenarios and a common latency-and-area model, the best
RL-GNN-CI
variants exceed an integer-linear-programming baseline by 5–7% and a tabular SARSA baseline by 18–20% in speedup, while using about 18% less CI area on average.