Adaptive orthogonal role‐aware transformer for robust instruction alignment

doi:10.4218/etrij.2025-0449

DOI: 10.4218/etrij.2025-0449 ISSN: 1225-6463

Adaptive orthogonal role‐aware transformer for robust instruction alignment

Yeo‐Chan Yoon, Sookyun Kim, Gyeongmin Kim

Summary

Large language models are vulnerable to prompt injection because transformer attention treats tokens too uniformly across system and user roles. We propose the adaptive orthogonal role‐aware transformer (AORT), a lightweight retrofit that improves role‐sensitive decoding. AORT preserves the base token embeddings and adds at each transformer layer a learned orthonormal re‐encoding and a role‐gated additive attention bias to maintain separation between instruction and user representations during context mixing. We evaluate AORT on direct and indirect prompt‐injection benchmarks, including the BIPIA‐text benchmark. AORT achieves the lowest attack success rate (ASR) in most settings and successfully resists all eight evaluated attacks on LLaMA 3.1–8B. AORT reduces the ASR from 13.6% to 2.1% on LLaMA 3.1–8B evaluated with the BIPIA‐text benchmark. These findings indicate that explicit role‐priority structure in the forward pass can substantially improve prompt‐injection robustness with minimal architectural overhead.

Outline

Adaptive orthogonal role‐aware transformer for robust instruction alignment

Summary

More from our Archive