(023) Can Artificial Intelligence Generate a Superior Introduction to a Systematic Review?W Hellstrom, D Singh, G Brinkley, L Trost, M Ziegelmann, G Brock, M Khera, J Mulhall
- Reproductive Medicine
- Endocrinology, Diabetes and Metabolism
- Psychiatry and Mental health
An iteration of artificial intelligence (AI), large language models (LLMs), such as ChatGPT, have recently gained popularity as a potential educational resource for patients, students, and researchers. LLMs utilize deep learning algorithms to predict the next word in a sequence based on a data library of 300 billion words. Recent studies have shown that while LLMs provide useful information, they are limited due to producing false and nonsensical information. A focus of studies has been on the utility of ChatGPT as an educational resource to patients and learners; however, no study has investigated the utilization of ChatGPT to write a scientific review. AI- generated systematic reviews would be valuable to the dynamic field of urology, especially in terms of research and innovation.
Considering the limitations of ChatGPT in performing data analysis, the aim of this project was to assess the utility of AI in constructing introductions of a systematic review on urologic issues and to compare these AI-generated introductions to published introductions written by humans.
ChatGPT (4.0), available at www.chat.openai.com, was used to construct three introductions to three different systematic reviews on the following topics: testosterone therapy and cardiovascular events, safety and efficacy of noninvasive therapies for Peyronie’s disease, and erectile dysfunction after radical prostatectomy. Introductions from published systematic reviews in each of the above topics were also obtained. Five different experts in these topics, who were blinded to which introductions were AI generated, reviewed then assessed the introductions for the following parameters: completeness, engagement, accuracy, logic, and clarity of objective. Introductions were assessed for baseline literary differences as well.
There were no significant differences in the words, characters, number of sentences, and length of sentences between AI-generated introductions compared to those written by humans. There was a statistically significant difference in word length, with AI having a lower average of word length. AI scored significantly higher on average in the categories of completeness and engagement. We did not observe a significant difference between the two groups in the other categories. These observed findings demonstrate that AI can synthesize urologic literature (sexual medicine/men’s health) and generate introductions of a systematic review that are minimally noninferior and possibly superior regarding engagement and completeness. The limitations from this study include a small sample size (5) and the exclusive focus on AI-generated introductions rather than other AI-generated sections of a systematic review.
Our study reveals that ChatGPT, as an LLM, can provide a noninferior, possibly superior, introduction to a systematic review on topics of urologic sexual medicine and men’s health. As more facets of AI are developed, such as data integration and analysis, AI could be used to generate entire systematic reviews noninferior to those written by humans. Ethical principles of using AI in scientific literature should be examined and discussed prior to the development of more enhanced LLMs.