DOI: 10.1093/jsxmed/qdae001.115 ISSN: 1743-6095

(121) Utilizing ChatGPT for Urology Trainee Simulation of Peyronie’s Disease Counseling

S Reddy, S Smani, S Honig, B Harnisch, K Rotker
  • Urology
  • Reproductive Medicine
  • Endocrinology
  • Endocrinology, Diabetes and Metabolism
  • Psychiatry and Mental health



Peyronie’s disease is a medical issue with significant psychological overtones. However, trainee exposure to initial counseling and management can be limited and therefore understanding both the disease state and its psychological effects can be difficult to master. Institutions have incorporated simulation-based education to augment resident skills and knowledge in topics such as transgender health and SICU patient management. The gold standard of assessing trainees in simulated clinical interaction has been objective structured clinical examinations (OSCE). However, novel AI-based technologies like ChatGPT offer a promising alternative for resident education and assessment. ChatGPT, developed by OpenAI, utilizes natural language processing technology to provide human-like responses to open-ended questions posed by users. Preliminary reports have already described its use in training emergency medicine residents in delivering bad news.


To simulate a realistic clinical scenario utilizing ChatGPT to improve resident and student confidence in Peyronie’s disease evaluation and treatment, and to assess ChatGPT’s ability to deliver actionable feedback to trainees on both knowledge base and patient interaction compared to evaluation by a sexual medicine specialist.


ChatGPT Plus was utilized to roleplay the patient in an initial consultation for Peyronie’s disease. A detailed input was composed to establish the rules of the simulation (Figure 1). ChatGPT was prompted to conduct a final assessment of trainee performance with standardized scales in accuracy, comprehensiveness, and empathy (A, C, E). ChatGPT was prompted to use the 2015 AUA Peyronie’s Disease Diagnosis and Treatment Guidelines when formulating its assessment and feedback. Three trainees with different experience levels performed the simulation. At completion, a fellowship-trained sexual medicine specialist separately graded each trainee’s performance using the same scale. Cohen’s kappa was calculated to assess inter-rater reliability.


Three urology trainees performed the simulation: a 3rd year medical student, a PGY2 resident, and a PGY4 resident. Trainee performance scores are shown in Table 1. Mean ChatGPT ratings were 8.7, 8.0, and 9.0 (A, C, E), while andrologists ratings were 7.0, 4.7, and 6.3 (A, C, E). Progressive improvement was demonstrated as trainee experience increased. ChatGPT generally rated trainees at higher competency levels than andrologists. Cohen’s kappa was calculated to be -0.125, -0.125, and 0.0 (A, C, E), indicating an overall low level of agreement between ChatGPT and andrologists assessment of trainees.


Our findings demonstrate that ChatGPT has potential to be a successful tool for urology resident and medical student education on Peyronie’s disease counseling. However, there was disagreement in trainee ratings between ChatGPT and the sexual medicine specialist. ChatGPT provided trainees constructive, actionable feedback referencing current guidelines (Figure 2). When given defined parameters for a clinical scenario, ChatGPT can be effective in designing a realistic standardized patient simulation. However, ChatGPT has not appeared to master the competency and accuracy of that expected from a sexual medicine specialist. Tweaking the initial input can allow for testing competency in various alternative clinical vignettes. Further collection of real-world data is needed to better assess ChatGPT’s rating of the learner’s performance, ideally in conjunction with national/international guidelines.


Any of the authors act as a consultant, employee or shareholder of an industry for: Fellow, Posterity Health, Coloplast, Hims, Genentech.

More from our Archive