DOI: 10.1145/3821523 ISSN: 1084-4309

X-OCTANE: eXplainable On-Chip Telemetry-based Anomaly Notification Engine for Silicon Lifecycle Management

Eduardo Ortega, Hsiao-Ping Ni, Arjun Hati, Jonti Talukdar, Woohyun Paik, Fei Su, Rita Chattopadhyay, Krishnendu Chakrabarty

Silicon Lifecycle Management (SLM) is essential for ensuring the reliability, security, and performance of modern computing platforms. Conventional SLM approaches rely on off-chip or software-only analytics, which lack on-die interpretability. We present X-OCTANE, an explainable on-chip telemetry-based anomaly notification engine. X–OCTANE is a hybrid explainability framework that integrates Causal-Preserving Mutual Information (CP–MI) with SHAP-based feature attribution for interpretable feature ranking and selection. The proposed feature rank-and-selection procedure retains competitive safety-oriented anomaly detection and bolsters security-specific in-field monitoring of compute platforms. For the 7 nm ASAP technology, X-OCTANE achieves competitive anomaly detection performance with less than 0.50% CPU die area and 1.0% non-idle power overhead. In addition, X–OCTANE improves anomaly-driven diagnosis accuracy by over \(7\% \) on average. Experimental validation using the PAMPAR benchmark suite on two separate experimental platforms demonstrates enhanced anomaly detection and diagnosis through the proposed explainable feature-rank-and-selection method. X–OCTANE establishes a scalable and transparent foundation for on-chip SLM analytics, unifying detection fidelity with causal interpretability.

More from our Archive