DOI: 10.1093/europace/euad369 ISSN: 1099-5129

Accuracy and comprehensibility of chat-based artificial intelligence for patient information on atrial fibrillation and cardiac implantable electronic devices

Henrike A K Hillmann, Eleonora Angelini, Nizar Karfoul, Sebastian Feickert, Johanna Mueller-Leisse, David Duncker
  • Physiology (medical)
  • Cardiology and Cardiovascular Medicine


Background and aims

Natural language processing chatbots (NLPC) can be used to gather information for medical content. However, these tools contain a potential risk of misinformation. This study aims to evaluate different aspects of responses given by different NLPCs on questions about atrial fibrillation (AF) and clinical implantable electronic devices (CIED).


Questions were entered into three different NLPC interfaces. Responses were evaluated with regard to appropriateness, comprehensibility, appearance of confabulation, absence of relevant content and recommendations given for clinically relevant decisions. Moreover, readability was assessed calculating word count and Flesch Reading Ease score.


52%, 60% and 84% of responses on AF and 16%, 72% and 88% on CIEDs were evaluated to be appropriate for all responses given by Google Bard, (GB) Bing Chat (BC) and ChatGPT Plus (CGP), respectively. Assessment of comprehensibility showed that 96%, 88% and 92% of responses on AF and 92% and 88% and 100% on CIEDs were comprehensible for all responses created by GB, BC and CGP, respectively. Readability varied between different NLPCs. Relevant aspects were missing in 52% (GB), 60% (BC) and 24% (CGP) for AF and in 92% (GB), 88% (BC) and 52% (CGP) for CIEDs.


Responses generated by an NLPC are mostly easy to understand with varying readability between the different NLPCs. Appropriateness of responses is limited and varies between different NLPC. Important aspects are often missed to be mentioned. Thus, chatbots should be used with caution to gather medical information about cardiac arrhythmias and devices.

More from our Archive