DOI: 10.1136/bmjinnov-2026-001586 ISSN: 2055-8074

Multimodal AI for automated procedural documentation in interventional cardiology: an early-stage innovation report

Rugved Parmar, Nischal Sharma, Daoud Eldawud, Selin Unal, Hamza Hamza, Adam Budzikowski

Background

Documentation during cardiac catheterisation traditionally requires nurse scribes who manually record timestamps, devices, medications and operator comments in real-time, demanding specialised personnel and introducing potential for human error during high-acuity moments.

Objective

To develop and test an Artificial intelligence (AI)-powered system that processes catheterisation video and audio to generate structured procedural documentation mirroring standard cath lab nursing flowsheets.

Methods

A web application was built using Google Gemini V.2.0 Flash, a multimodal large language model, with persona-based prompting to replicate nurse scribe documentation patterns. Feasibility testing used two publicly available cardiac catheterisation teaching videos (7 and 8 min; ~15 min total) processed retrospectively in batch mode. Reference standards were independently created by two authors, with discrepancies resolved by consensus discussion, and documentation was evaluated for completeness, accuracy, structural fidelity and temporal precision.

Results

The system produced structured time-stamped logs capturing procedural steps, device identification and medication administration. Time to first token was 15–30 s. Across 25 discrete documented events (procedural steps, device uses and medication administrations) identified in ~15 min of source video, one constituted hallucinated content—a fabricated finding of ‘severe calcification in proximal Left anterior descending (LAD)’—yielding a hallucination rate of 4% (1/25). Vital signs were not consistently captured, as haemodynamic monitors were not visible.

Conclusions

Multimodal AI can generate structured procedural documentation from catheterisation recordings with reasonable accuracy. The observed hallucination underscores the need for human verification. Future work should target real-time processing and prospective validation with actual procedural recordings.

More from our Archive