DOI: 10.1093/bjd/ljag086.257 ISSN: 0007-0963

AI09 Diagnostic accuracy of artificial intelligence in dermatology across skin colour, ethnic background and Fitzpatrick skin types: a systematic review

Noor Elabd, Yousef Binamer

Abstract

The use of artificial intelligence (AI) systems has increased over recent years. They are included in clinical practice in many specialties, especially dermatology. However, concerns remain regarding their diagnostic accuracy across people of different ethnic backgrounds, with potential implications for equity and patients’ safety. A review was carried out to systematically evaluate the diagnostic accuracy of AI-based dermatology tools compared with clinician diagnosis across diverse skin tones. A systematic review was conducted in accordance with PRISMA guidelines. PubMed, MEDLINE and multidisciplinary bibliographical databases were searched for English-language studies published between 2020 and 2025. Publications evaluating AI-based dermatological diagnosis against clinician diagnosis in different ethnic backgrounds and skin tones were included. Reviews, protocols, ethical analyses and nondiagnostic studies were excluded. Diagnostic accuracy across different ethnicities and Fitzpatrick skin types was assessed and compared. Fifteen studies met the inclusions criteria. Across these studies, AI diagnostic performance varied by skin tone, with several studies showing reduced accuracy in different ethnicities and in people with darker skin tones. These differences were commonly attributed to less training and exposure of the database on how dermatological conditions manifest across different skin tones. These limitations made it less likely for the AI to recognize the image and more likely to miss a diagnosis. Multiple studies reported improved diagnostic accuracy when a more diverse set of training images was included. Despite these improvements, AI tools did not achieve perfect diagnostic accuracy in any population, and missed diagnoses occurred across all skin types. Studies reporting high overall accuracy frequently lack stratification by skin tone, limiting the generalizability of their findings. AI-based dermatology diagnostic tools show less accuracy for darker skin tones due to under-represented training data. While including diverse datasets improves outcomes, current AI systems cannot replace clinician judgement. Clinician oversight remains essential to reduce diagnostic errors and prevent missed diagnoses.

More from our Archive