DOI: 10.1093/ced/llad255 ISSN: 0307-6938

ChatGPT-3.5 and ChatGPT-4 dermatological knowledge level based on the Specialty Certificate Examination in Dermatology

Miłosz Lewandowski, Paweł Łukowicz, Dariusz Świetlik, Wioletta Barańska-Rybak
  • Dermatology

Abstract

Background

The global use of artificial intelligence (AI) has the potential to revolutionize the healthcare industry. Despite the fact that AI is becoming more popular, there is still a lack of evidence on its use in dermatology.

Objectives

To determine the capacity of ChatGPT-3.5 and ChatGPT-4 to support dermatology knowledge and clinical decision-making in medical practice.

Methods

Three Specialty Certificate Examination in Dermatology tests, in English and Polish, consisting of 120 single-best-answer, multiple-choice questions each, were used to assess the performance of ChatGPT-3.5 and ChatGPT-4.

Results

ChatGPT-4 exceeded the 60% pass rate in every performed test, with a minimum of 80% and 70% correct answers for the English and Polish versions, respectively. ChatGPT-4 performed significantly better on each exam (P < 0.01), regardless of language, compared with ChatGPT-3.5. Furthermore, ChatGPT-4 answered clinical picture-type questions with an average accuracy of 93.0% and 84.2% for questions in English and Polish, respectively. The difference between the tests in Polish and English were not significant; however, ChatGPT-3.5 and ChatGPT-4 performed better overall in English than in Polish by an average of 8 percentage points for each test. Incorrect ChatGPT answers were highly correlated with a lower difficulty index, denoting questions of higher difficulty in most of the tests (P < 0.05).

Conclusions

The dermatology knowledge level of ChatGPT was high, and ChatGPT-4 performed significantly better than ChatGPT-3.5. Although the use of ChatGPT will not replace a doctor’s final decision, physicians should support the development of AI in dermatology to raise the standards of medical care.

More from our Archive