DOI: 10.3390/technologies14070391 ISSN: 2227-7080

Explainable Artificial Intelligence for Skin Lesion Classification: A Comprehensive Review of Methods and Challenges

Jennifer Whewell, Rebecca Peters, Janusz Kulon

The rapid advancement of machine learning and artificial intelligence (AI) has created new opportunities to enhance diagnostic accuracy in dermatology, particularly within primary care settings. Computer-aided diagnosis (CAD) systems have demonstrated potential to support General Practitioners (GPs) by enabling earlier and more consistent identification of skin diseases. This review critically examines the literature on explainable artificial intelligence (XAI) for skin disease classification, with a specific focus on the evolution of explainability frameworks and the methodological implications of dataset selection. A comprehensive review of studies published between 2020 and 2025 was conducted across multiple academic databases, encompassing research on skin lesion detection, classification, and monitoring. The analysis reveals that deep learning architectures, particularly those leveraging transfer learning with models such as EfficientNet, ResNet, and Xception, frequently report high classification accuracies—often exceeding 90% when evaluated on single benchmark datasets. However, studies employing multiple datasets consistently demonstrate more stable and generalisable performance, albeit with modest reductions in reported accuracy, highlighting a critical trade-off between performance optimisation and real-world robustness. The review further identifies a clear temporal progression in the adoption of XAI techniques. Early studies relied on a broader range of post hoc explainability while later work increasingly consolidated around Grad-CAM, SHAP, and related attribution techniques, followed by gradual diversification into more specialised frameworks such as TCAVs (Testing with Concept Activation Vectors) and Prototype-based Networks. Despite these advances, the lack of clinically grounded explanations, limited integration of ethical considerations, and reliance on non-clinical imagery continue to constrain clinical applicability which we have explored using a GRADE-style narrative. Notably, evidence suggests that CAD systems can improve GP diagnostic accuracy for conditions such as melanoma and seborrhoeic keratosis; however, sustained clinical adoption remains contingent on transparent, reliable, and context-aware explainability mechanisms.

More from our Archive