DOI: 10.1093/jsxmed/qdae001.284 ISSN: 1743-6095

(298) AUA vs AI: An Inquiry into Testosterone Guidelines

Y Katlowitz, J Khurgin, N Leelani
  • Urology
  • Reproductive Medicine
  • Endocrinology
  • Endocrinology, Diabetes and Metabolism
  • Psychiatry and Mental health



ChatGPT is an artificial intelligence with an accessible user interface that allows for conversational input. It uses deep learning, a subset of machine learning based on artificial neural networks, to “speak” to the user. Patients and clinicians alike have been querying ChatGPT with medical questions, and recent publications have shown that the validity of responses vary. Limited information is available pertaining to ChatGPT’s capacity to accurately discuss testosterone management and deficiency. It is imperative for the medical community to be aware of potentially accurate or inaccurate information sources. This project is designed to assess ChatGPT’s clinical competency compared to the American Urologic Association (“AUA”) guidelines regarding the evaluation and management of testosterone deficiency.


To assess the competency of ChatGPT to answer general and clinical questions related to testosterone deficiency and replacement compared to the AUA guidelines.


The AUA guidelines for evaluation and management of testosterone deficiency include 31 main points. Each point was prompted to ChatGPT in the form of a question, accompanied by requests for validation of information, providing further information, or clinical next-step recommendations. The answers were then sorted into 3 categories all relative to the AUA guidelines, being accurate and complete, (AC), accurate but incomplete (AI), and incorrect or misleading (IM).


Of the 31 guideline-based questions queried, 22/31 (71%) were AC, 4/31 (13%) were AI, and 5/31 (16%) were IM. The highest number of AI answers pertained to treatment related queries. The highest number of IM answers pertained to counseling, particularly on commonly contested topics, such as the relationship between testosterone and prostate cancer or cardiovascular health.


When queried about the evaluation and management of testosterone deficiency, ChatGPT offered complete and accurate answers in 71% of cases. It had increased accuracy of even up to 100% when answering regarding firmly established or more binary topics, such as diagnosis requirements or recommended adjunct testing. As such, patients and providers may consider using ChatGPT as an ancillary resource for uncomplicated or firmly established testosterone related queries. However, it did much more poorly when discussing more contested topics such as testosterone’s relationship to prostate cancer, or when approaching elements of medicine that require a more personalized human touch, such as counseling. While ChatGPT may be helpful in the base or binary matters, interpersonal and patient specific matters are best handled by experts in the field.



More from our Archive