Multimodal AI for automated procedural documentation in interventional cardiology: an early-stage innovation report
Rugved Parmar, Nischal Sharma, Daoud Eldawud, Selin Unal, Hamza Hamza, Adam BudzikowskiBackground
Documentation during cardiac catheterisation traditionally requires nurse scribes who manually record timestamps, devices, medications and operator comments in real-time, demanding specialised personnel and introducing potential for human error during high-acuity moments.
Objective
To develop and test an Artificial intelligence (AI)-powered system that processes catheterisation video and audio to generate structured procedural documentation mirroring standard cath lab nursing flowsheets.
Methods
A web application was built using Google Gemini V.2.0 Flash, a multimodal large language model, with persona-based prompting to replicate nurse scribe documentation patterns. Feasibility testing used two publicly available cardiac catheterisation teaching videos (7 and 8 min; ~15 min total) processed retrospectively in batch mode. Reference standards were independently created by two authors, with discrepancies resolved by consensus discussion, and documentation was evaluated for completeness, accuracy, structural fidelity and temporal precision.
Results
The system produced structured time-stamped logs capturing procedural steps, device identification and medication administration. Time to first token was 15–30 s. Across 25 discrete documented events (procedural steps, device uses and medication administrations) identified in ~15 min of source video, one constituted hallucinated content—a fabricated finding of ‘severe calcification in proximal Left anterior descending (LAD)’—yielding a hallucination rate of 4% (1/25). Vital signs were not consistently captured, as haemodynamic monitors were not visible.
Conclusions
Multimodal AI can generate structured procedural documentation from catheterisation recordings with reasonable accuracy. The observed hallucination underscores the need for human verification. Future work should target real-time processing and prospective validation with actual procedural recordings.