DOI: 10.1182/blood-2023-181644 ISSN: 0006-4971

An Honeur Framework for Generating Computer Understandable Cohort Definitions from Clinical Trial Protocols in Multiple Myeloma through Generative AI for Comparing with Real-World Data

Flavio Camarrone, Michel van Speybroeck, Linda Little, Hugh Scott, Raymond Powles
  • Cell Biology
  • Hematology
  • Immunology
  • Biochemistry


Study protocols define the population(s) of interest through a set of inclusion and exclusion criteria -often in the form of a textual description. In haematology, these criteria can become very complex and are difficult to translate into consistent and robust computer interpretable cohort definitions. Addressing this interoperability challenge has the potential to vastly accelerate the evaluation of populations of interest from a trial to a real world setting or to replicate studies.

Generative AI (GenAI) and its potential to extract valuable information from textual data such as medical records or research papers has recently received a lot of attention. Building on the clinical data harmonisation that already took place across the Haematology Outcomes Network in Europe (HONEUR), the application of the latest evolutions in GenAI technology for the generation of computer interpretable cohort definitions from descriptions in study protocols has been assessed and potential limitations have been identified.


HONEUR is a federated data network of 21 registries and hospitals in 9 different countries, comprising 45,000 multiple myeloma patients, where the data stays local and all participating data partners have full governance ( The main aim of the network is to accelerate evidence generation to expand the knowledge and understanding of patients with hematological malignancies. The pivotal step in enabling the federated analysis is the clinical data harmonization. This is based on structural and semantic harmonization to the OMOP data model ( For studies involving subpopulations of multiple myeloma patients, a simplified tabular structure is generated from the OMOP data. This structure contains an extensive list of baseline and line of treatment specific variables.

For the conversion of the inclusion/exclusion criteria into a standard format that could be executed by the HONEUR tools, ChatGPT ( version 4.0 was used. The following steps were hereby followed as a set of ‘prompts’: Define the role and system prompts: explaining in plain language what the intent is and what steps will be followed.Provide information on the data structure that contains the harmonized data: the data dictionary from the tabular structure mentioned above with the variable names, the description of these and the allowed values of the variables.Providing the target structure of the output fileProvide the inclusion / exclusion criteria as “training” promptsInstruct ChatGPT on how to behave with unknown scenarios.Repeat Step 5 until the outcome file correctly represents the cohort description as provided in Step 4

The result of the above steps is a correctly formatted file that is evaluated by Nuffield Cancer Centre London through upload in the local HONEUR real-world database and evaluated to assess the validity of the cohort definition(s)


In total 6 patients met all criteria in Nuffield Cancer Centre London (see Table 1). The system was further evaluated on more complex inclusion of cytogenetic and subsequent treatment inclusion criteria.

[Table 1]


In our setting, we've found that one-shot or few-shot learning is adequate to correctly translate complex scenarios of inclusion / exclusion rules into executable cohort definitions through the use of GenAI. The upfront semantic clinical data harmonization is however a prerequisite as the tools are not yet adequately trained to detect subtle but sometimes critical nuances in the clinical criteria.

The application of GenAI is promising for conversion of textual data to structured and computer interpretable information. This offers the possibility to accelerate evidence generation in a robust and reproducible way. We showed this system can work in identifying 6 patients in one center and is therefore extrapolatable to all 21 participating centers.

More from our Archive