DOI: 10.4103/ijhas.ijhas_64_25 ISSN: 2278-4292

Assessing the response consistency in Chat Generative Pretrained Transformer: The case of “What is Ayurveda?”

Janmejaya Samal

INTRODUCTION:

Chat Generative Pretrained Transformer (ChatGPT) is a language-based chatbot launched in November 2022. This study was carried out to understand the consistency of the responses of ChatGPT to a single question when asked 10 different times, consecutively, within a particular time frame.

MATERIALS AND METHODS:

The study followed two steps: in the first step, only one question, “What is Ayurveda ?” was asked, followed by a conversation for bibliographic citations. The responses obtained were analyzed statistically, and the key concepts from the responses were free listed. Statistical analysis was conducted using Jamovi (version 2.3.28).

RESULTS:

The median word count was 334 (standard deviation [SD] = 32.87, interquartile range [IQR] =313–354), sentences were 18 (SD = 2.36, IQR = 17.0–20.3), headings were 1.50 (SD = 0.82, IQR = 1–2), and the subheadings were 8 (SD = 2.08, IQR = 7–9). The linear regression between the number of words and other particulars showed a statistically significant association with the number of sentences ( P = 0.003, 95% confidence interval). The free listing of the key concepts was done under three headings: core principles, treatment practices, and modern relevance. Four concepts were found to have the highest frequencies in the ChatGPT responses: Doshas ( n = 10, 100%), Panchakarma ( n = 7, 70%), Pancha Mahabhutas ( n = 7, 70%), and herbal medicine ( n = 6, 60%). In the conversation, ChatGPT provided 21 different bibliographic citations with three rounds of questions; however, none of the citations were found to exist.

CONCLUSIONS:

ChatGPT can be useful in many ways in education and research. However, it should be used with a pinch of salt as ChatGPT itself says, “ChatGPT can make mistakes. Check important info.”

More from our Archive