Embedding-Dependent Performance of Variational Quantum Reinforcement Learning for Intrusion Detection Under Dimensionality Constraints
Raid Anis Kerkatou, Hacene Belhadef, Aicha Eutamene, Svetlana Petrova StefanovaNetwork intrusion detection systems (IDS) operate in high-dimensional feature spaces under evolving attack patterns and asymmetric misclassification costs, where false negatives represent a critical security risk. Reinforcement learning (RL) offers a natural mechanism for encoding domain-specific misclassification costs directly into the learning signal through reward shaping, enabling cost-sensitive policy optimization in adaptive streaming environments. However, the integration of variational quantum models into RL-based IDS remains insufficiently explored. This work investigates a variational quantum reinforcement learning (VQRL) framework for intrusion detection, in which parameterized quantum circuits are employed to model the policy function. We adopt an RL formulation primarily as a principled cost-sensitive optimization approach rather than to exploit sequential state dependencies, and we employ Instantaneous Quantum Polynomial (IQP) embedding as a quantum feature encoding strategy. The study analyzes how embedding expressivity interacts with varying levels of dimensionality reduction via principal component analysis (PCA) on the CICIDS2017 dataset. Experiments demonstrate that VQRL-IQP achieves high recall and reduces false negative rates in moderately high-dimensional feature spaces compared to a classical RL baseline. This improvement is accompanied by an increase in false positive rates, reflecting a trade-off shaped jointly by the reward structure and the structural properties of IQP encoding. Statistical validation across five independent runs confirms the consistency of these trends. Importantly, no general quantum advantage in accuracy or computational efficiency is claimed; rather, the results indicate that VQRL-IQP offers a distinct error trade-off that is operationally valuable in security-critical scenarios where minimizing missed attacks is the primary objective.