(165) Assessing Artificial Intelligence Quality in the Evaluation and Treatment of Erectile DysfunctionM Pathuri, O Marciano, P Barrtero Guimaraes, E Kocjancic, O Raheem
- Reproductive Medicine
- Endocrinology, Diabetes and Metabolism
- Psychiatry and Mental health
Since November 2022, Artificial Intelligence (AI) chatbots have grown in popularity including in urologic conditions. However, their accuracy and quality has not been evaluated systematically. In this study, we sought to assess the accuracy and quality of AI chatbots in the management of Erectile Dysfunction (ED).
We aim to evaluate the accuracy and quality of open-source language models in fielding common clinical questions pertaining to ED compared to board certified urologists.
Two AI open-source language models, ChatGPT and Google Bard, were fielded 15 standard questions related to ED some of which included causes, risk factors and treatment options of ED. Two board certified urologists were given the same questions on a standard survey. A third blinded board-certified urologist served as a grader using the AUA guidelines for ED on a Likert scale to assess the accuracy, robustness, and bias of each response. Urologist and AI responses were graded and aggregated using Likert scales.
Overall AI responses were significantly more accurate (p<0.01), robust (p<0.01), and unbiased (p<0.01). Additionally, Google Bard had the highest scores all around followed by ChatGPT. The urologists’ responses were approximately 38% lower compared to the AI responses.
This study suggests that AI responses were superior compared to urologists in the areas of accuracy, robustness, and bias pertaining to the management of ED. Albeit, the chatbots have promising role in urologic conditions including ED, its widespread clinical use and adoption warrants further evaluation in the context of clinical decision making and enhancing patient care.