Toward secure AI: detection and mitigation of backdoor attacks

doi:10.1108/ijicc-01-2026-0022

DOI: 10.1108/ijicc-01-2026-0022 ISSN: 1756-378X

Toward secure AI: detection and mitigation of backdoor attacks

Adil Ahmad, Anwar Shah, Muhammad Irfan Saeed, Muhammad Khalid, Qamar Uz Zaman

Purpose

This survey aims to provide a comprehensive and structured understanding of backdoor attacks across different domains. It seeks to establish a unified taxonomy of backdoor attacks, analyze domain-specific vulnerabilities and compare evaluation metrics used for detection and mitigation. Additionally, the survey critically reviews existing and emerging defense strategies, highlighting their strengths and limitations. By identifying key challenges and open research issues, this work intends to guide future research directions and support the development of more robust and resilient learning models against backdoor threats.

Design/methodology/approach

This survey adopts a systematic literature review methodology to analyze backdoor attacks and defense mechanisms across multiple domains. Relevant studies were collected from major academic databases, including IEEE Xplore, ACM Digital Library and Google Scholar. Based on the selected literature, a multi-dimensional taxonomy was developed to categorize backdoor attacks by attack vectors, trigger mechanisms and targeted components. Domain-specific vulnerabilities and mitigation techniques were examined, followed by a cross-domain comparison of evaluation metrics and defense strategies. Finally, key challenges and open research directions were synthesized from the analyzed studies.

Findings

The survey reveals that backdoor attacks pose significant and domain-specific threats across natural language processing (NLP), computer vision (CV), graph neural networks (GNNs) and generative AI. A unified taxonomy highlights common attack patterns while exposing domain-dependent triggers and vulnerabilities. The analysis shows that subtle input or structural manipulations can effectively activate backdoors with minimal detectability. While shared evaluation metrics exist, domain-specific measures remain fragmented, limiting cross-domain comparison. Existing defense strategies demonstrate partial effectiveness but suffer from high computational cost and performance trade-offs. Overall, the findings underscore the need for unified evaluation metrics and more adaptive, cost-efficient and cross-domain defense mechanisms.

Research limitations/implications

This survey is limited by its reliance on existing literature, which may not fully capture emerging or unpublished backdoor attack techniques. The analysis does not include empirical validation or benchmarking of defense methods, and the effectiveness of reported strategies depends on domain-specific assumptions and datasets. Despite these limitations, the findings provide valuable implications by highlighting critical vulnerabilities, evaluation gaps and research challenges across domains. The proposed taxonomy and comparative insights can inform the design of more robust detection and mitigation strategies and guide future empirical, cross-domain and standardized research efforts.

Practical implications

The findings of this survey offer practical guidance for researchers, practitioners and system designers developing secure AI systems. The proposed taxonomy supports structured threat modeling and risk assessment of backdoor attacks across application domains. Insights into domain-specific vulnerabilities and evaluation metrics can assist practitioners in selecting appropriate detection and mitigation strategies. The comparative review of defense mechanisms highlights trade-offs between robustness, performance and computational cost, enabling informed deployment decisions. Overall, this work supports the development of more resilient AI models and encourages the adoption of standardized evaluation practices in real-world security-critical applications.

Backdoor attacks on artificial intelligence systems pose serious risks to public trust, safety and fairness, particularly in socially sensitive applications such as healthcare, autonomous systems, surveillance and decision support. This survey highlights how undetected backdoors can lead to manipulated outcomes, privacy breaches and biased or harmful decisions. By improving awareness of backdoor threats and defense limitations across domains, this work contributes to the development of safer and more transparent AI systems. Strengthening model robustness against backdoor attacks supports responsible AI deployment and helps protect users and society from malicious exploitation.

Originality/value

This survey offers original value by providing a unified, cross-domain perspective on backdoor attacks and defenses spanning NLP, CV, GNNs and generative AI. Unlike existing reviews that focus on single domains or specific attack types, this work introduces a multi-dimensional taxonomy and systematically compares domain-specific vulnerabilities, evaluation metrics and defense strategies. The study also identifies critical gaps in current evaluation practices and defense scalability. These contributions provide a comprehensive reference for researchers and practitioners and establish a foundation for developing more adaptive and standardized backdoor mitigation frameworks.

Outline

Toward secure AI: detection and mitigation of backdoor attacks

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Originality/value

More from our Archive

Toward secure AI: detection and mitigation of backdoor attacks

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Social implications

Originality/value

More from our Archive