MM-NIDS: A Novel Multimodal Ensemble Fusion Network Intrusion Detection System Using Numeric, Text, Graph, and Quantum Representations
Samar AboulEla, Rasha KashefThe proliferation of digital infrastructures and the Internet of Things (IoT) has led to a rapid increase in interconnected devices, exposing modern systems to increasingly sophisticated cyber threats. Intrusion detection in such environments remains a major challenge due to limited device resources, evolving attack vectors, and diverse traffic patterns. Traditional systems often fall short in scalability and adaptability when facing these modern threats. Thispaper introduces MM-NIDS, a novel multimodal fusion framework for NetFlow-based intrusion detection. The framework combines four complementary NetFlow-derived data representations (numerical, textual, graph-based, and quantum-inspired), each modeled using transformer-based architectures, including FT-Transformer and ELECTRA-Small. Feature embeddings are constructed using robust engineering techniques, while predictions from the four base models are integrated through five post hoc fusion strategies: averaging-based fusion, weighted averaging, confidence-based fusion, and two meta-fusion methods based on a Multi-Layer Perceptron (MLP) and Extreme Gradient Boosting (XGBoost). Extensive cross-dataset evaluations on four public NetFlow-based benchmarks confirm the system’s robustness, with the text-based model (M2) consistently achieving the highest individual performance. Fusion approaches provided modest and dataset-dependent improvements in detection balance, especially for underrepresented attacks. A detectability hypothesis was proposed and validated, showing that NetFlow features are particularly effective for volumetric and scan-based attacks but less so for stealthy, payload-driven threats. These findings highlight the potential of MM-NIDS for deployment in critical infrastructure, industrial IoT, and smart environments, suggesting that future work should incorporate deeper semantic or payload-level features to enhance the detection of evasive threats further.