Artificial Intelligence as a Language Barrier Application in a Simulated Health Care Setting
Nicholas Hampers, Rita Thieme, Louis HampersObjective:
We evaluated the accuracy of an artificial intelligence program (ChatGPT 4.0) as a medical translation modality in a simulated pediatric urgent care setting.
Methods:
Two entirely separate instances of ChatGPT 4.0 were used. The first served as a simulated patient (SP). The SP generated complaints and symptoms while processing and generating text only in Spanish. A human provider (blinded to diagnosis) conducted a clinical “visit” with the SP. The provider typed questions and instructions in English only. A second instance of ChatGPT 4.0 was the artificial medical interpreter (AMI). The AMI translated the provider’s questions/instructions from English to Spanish and the SP’s responses/concerns from Spanish to English in real time. Post-visit transcripts were then reviewed for errors by a human-certified medical interpreter.
Results:
We conducted 10 simulated visits with 3597 words translated by the AMI (1331 English and 2266 Spanish). There were 23 errors (raw accuracy rate of 99.4%). Errors were categorized as: 9 omissions, 2 additions, 11 substitutions, and 1 editorialization. Three errors were judged to have potential clinical consequences, although these were minor ambiguities, readily resolved by the provider during the visit. Also, the AMI made repeated errors of gender (masculine/feminine) and second person formality (“usted”/“tu”). None of these were judged to have potential clinical consequences.
Conclusions:
The AMI accurately and safely translated the written content of simulated urgent care visits. It may serve as the basis for an expedient, cost-effective medical interpreter modality. Further work should seek to couple this translation accuracy with speech recognition and generative technology in trials with actual patients.