A Large Language Model Pipeline for Stroke Staging Using Electronical Medical Records
Xiaochao Luo, Yanmei Liu, Yu Ma, Chuwei Li, Minghong Yao, Hunong Xiang, Xuan Qin, Jiali Liu, Xiaoping Zhan, Chen Zhang, Songtao Zang, Ke Deng, Ling Li, Xin SunABSTRACT
Objective
Identifying stroke patients at different disease stages is a prerequisite for clinical research using electronic medical records (EMRs), whereas an artificial intelligence‐based model that can be directly applied remains lacking. We therefore develop a large language model (LLM) pipeline for stroke staging model (StrokeSM) in retrospective clinical research.
Methods
StrokeSM was developed using a Chinese national stroke database comprising EMRs from 33,637 patients. A total of 2000 patients were randomly selected from the Tianjin regional stroke database for external validation. StrokeSM comprised three phases: stroke hospitalization identification based on BERT and a bidirectional cross‐attention network to fuse present illness history and discharge diagnosis, symptom–time extraction based on chief complaint through a UIE‐base LLM, and stroke staging classification according to the predefined rules.
Results
On the test set, StrokeSM achieved accuracy, F 1 score, precision, and recall of 0.90, 0.91, 0.91, and 0.90, respectively. The F 1 score, precision, and recall of StrokeSM for acute phase was 0.91, 0.89, and 0.93, respectively. On the external validation set, StrokeSM had an accuracy, F 1 score, precision, and recall of 0.92, 0.93, 0.93, and 0.92, respectively. Moreover, StrokeSM performed remarkably well in acute phase, with F 1 score, precision, and recall of 0.97, 0.98, and 0.96, respectively.
Conclusions
StrokeSM had achieved state‐of‐the‐art performance, providing an accurate method of classifying stroke populations with different disease stages in EMRs, especially in the acute phase. StrokeSM heralds automatic and accurate identification of disease stage phenotypes based on LLM in EMRs, laying the foundation for drawing reliable conclusions in clinical research.