Evaluation Frameworks for Clinical Foundation Models in Specific Tasks of Unstructured Medical Text Analysis: A Scoping Review
Laura Johana González Zazueta, Betsaida Lariza López Covarrubias, Christian Xavier Navarro Cota, Mabel Vázquez Briseño, Juan Iván Nieto Hipólito, Gerardo Salvador Romo Cárdenas, Gener J. Avilés RodríguezBackground/Objectives: Clinical foundation models (CFMs) are increasingly being applied to analyze unstructured clinical text, supporting tasks such as clinical reasoning, clinical documentation generation, and information extraction. However, their evaluation remains limited and lacks standardization, which represents a significant challenge for their safe integration into healthcare systems. This scoping review aimed to examine how performance, clinical utility, safety, and generalization are assessed in current evaluation frameworks and methodologies for CFMs applied to electronic health record text and to identify existing gaps and challenges in their evaluation. Methods: This scoping review was reported in accordance with PRISMA-ScR guidance, and the study selection process was summarized using a PRISMA flow diagram. A total of 448 records were identified in the initial search, of which 16 met the eligibility criteria and were included in the final synthesis. Results: The findings were organized into three categories: methodological frameworks, performance and clinical utility, and ethical and safety considerations. The results show progress in structured methods, evaluation schemes, and simulated environments. Nevertheless, inconsistencies persist in operational robustness, particularly regarding clinical reasoning and hallucinations. The literature also highlights important ethical concerns, including potential biases and regulatory gaps. Conclusions: Although CFMs show strong potential to transform clinical text analysis, no comprehensive framework currently guides their evaluation and safe use in real clinical settings. These findings highlight the need for a structured conceptual framework that integrates methodological, technical, clinical, and ethical criteria to support responsible implementation in clinical environments.