Toward secure AI: detection and mitigation of backdoor attacks
Adil Ahmad, Anwar Shah, Muhammad Irfan Saeed, Muhammad Khalid, Qamar Uz ZamanPurpose
This survey aims to provide a comprehensive and structured understanding of backdoor attacks across different domains. It seeks to establish a unified taxonomy of backdoor attacks, analyze domain-specific vulnerabilities and compare evaluation metrics used for detection and mitigation. Additionally, the survey critically reviews existing and emerging defense strategies, highlighting their strengths and limitations. By identifying key challenges and open research issues, this work intends to guide future research directions and support the development of more robust and resilient learning models against backdoor threats.
Design/methodology/approach
This survey adopts a systematic literature review methodology to analyze backdoor attacks and defense mechanisms across multiple domains. Relevant studies were collected from major academic databases, including IEEE Xplore, ACM Digital Library and Google Scholar. Based on the selected literature, a multi-dimensional taxonomy was developed to categorize backdoor attacks by attack vectors, trigger mechanisms and targeted components. Domain-specific vulnerabilities and mitigation techniques were examined, followed by a cross-domain comparison of evaluation metrics and defense strategies. Finally, key challenges and open research directions were synthesized from the analyzed studies.
Findings
The survey reveals that backdoor attacks pose significant and domain-specific threats across natural language processing (NLP), computer vision (CV), graph neural networks (GNNs) and generative AI. A unified taxonomy highlights common attack patterns while exposing domain-dependent triggers and vulnerabilities. The analysis shows that subtle input or structural manipulations can effectively activate backdoors with minimal detectability. While shared evaluation metrics exist, domain-specific measures remain fragmented, limiting cross-domain comparison. Existing defense strategies demonstrate partial effectiveness but suffer from high computational cost and performance trade-offs. Overall, the findings underscore the need for unified evaluation metrics and more adaptive, cost-efficient and cross-domain defense mechanisms.
Research limitations/implications
This survey is limited by its reliance on existing literature, which may not fully capture emerging or unpublished backdoor attack techniques. The analysis does not include empirical validation or benchmarking of defense methods, and the effectiveness of reported strategies depends on domain-specific assumptions and datasets. Despite these limitations, the findings provide valuable implications by highlighting critical vulnerabilities, evaluation gaps and research challenges across domains. The proposed taxonomy and comparative insights can inform the design of more robust detection and mitigation strategies and guide future empirical, cross-domain and standardized research efforts.
Practical implications
The findings of this survey offer practical guidance for researchers, practitioners and system designers developing secure AI systems. The proposed taxonomy supports structured threat modeling and risk assessment of backdoor attacks across application domains. Insights into domain-specific vulnerabilities and evaluation metrics can assist practitioners in selecting appropriate detection and mitigation strategies. The comparative review of defense mechanisms highlights trade-offs between robustness, performance and computational cost, enabling informed deployment decisions. Overall, this work supports the development of more resilient AI models and encourages the adoption of standardized evaluation practices in real-world security-critical applications.
Social implications
Backdoor attacks on artificial intelligence systems pose serious risks to public trust, safety and fairness, particularly in socially sensitive applications such as healthcare, autonomous systems, surveillance and decision support. This survey highlights how undetected backdoors can lead to manipulated outcomes, privacy breaches and biased or harmful decisions. By improving awareness of backdoor threats and defense limitations across domains, this work contributes to the development of safer and more transparent AI systems. Strengthening model robustness against backdoor attacks supports responsible AI deployment and helps protect users and society from malicious exploitation.
Originality/value
This survey offers original value by providing a unified, cross-domain perspective on backdoor attacks and defenses spanning NLP, CV, GNNs and generative AI. Unlike existing reviews that focus on single domains or specific attack types, this work introduces a multi-dimensional taxonomy and systematically compares domain-specific vulnerabilities, evaluation metrics and defense strategies. The study also identifies critical gaps in current evaluation practices and defense scalability. These contributions provide a comprehensive reference for researchers and practitioners and establish a foundation for developing more adaptive and standardized backdoor mitigation frameworks.