DOI: 10.1182/blood-2023-172955 ISSN: 0006-4971

A Day 14 Endpoint for Acute Gvhd Clinical Trials

Nikolaos Spyrou, Yu Akahoshi, Steven Kowalyk, George Morales, Rahnuma Beheshti, Paibel Aguayo-Hiraldo, MHD Monzr Al Malki, Francis Ayuketang Ayuk, Peter Bader, Janna Baez, Alexandra Capellini, Hannah Choe, Zachariah Defilipp, Matthias Eder, Gilbert Eng, Aaron Etra, Sigrun Gleich, Stephan Grupp, Elizabeth Hexner, Matthias Hoepting, William J. Hogan, Stelios Kasikis, Nikolaos Katsivelos, Carrie L. Kitko, Sabrina Kraus, Deukwoo Kwon, Pietro Merli, Joseph Portelli, Muna Qayed, Ran Reshef, Tal Schechter-Finkelstein, Ingrid Vasova, Matthias Woelfl, Kitsada Wudhikarn, Rachel Young, Ernst Holler, Yi-Bin Chen, Ryotaro Nakamura, John Levine, James L.M. Ferrara
  • Cell Biology
  • Hematology
  • Immunology
  • Biochemistry

The overall response rate (ORR) 28 days (D28) after treatment has been adopted as the primary endpoint for clinical trials of acute graft versus host disease (GVHD). ORR combines complete response (CR) and partial response (PR) because CR and PR have very similar non relapse mortality (NRM). D28 ORR has some disadvantages: first, physicians usually decide to modify immunosuppression much earlier than D28, and second, NRM does not always correlate well with ORR at D28, particularly for patients who present with intermediate risk or low risk disease.

MAGIC serum biomarkers at day 14 (D14) of GVHD treatment have been shown to correlate better with NRM than clinical symptom response (Srinagesh et al, Blood Advances, 2019). We therefore hypothesized that a combination of clinical symptom severity and biomarkers at D14 of treatment would predict NRM at least as well and possibly better than clinical response at D28. We studied 1144 patients from 23 MAGIC sites who were systemically treated with at least 0.25 mg/kg/day of systemic steroids (prednisone equivalents) and had clinical data and serum samples available in the MAGIC database and biorepository. We divided the patients into a training set (n=764) and a validation set (n=380). We used 12-month NRM from the time of acute GVHD onset as the primary outcome of interest.

We first trained a recursive partitioning algorithm in the training set to create a response model (MAGIC D14 clinical response) according to overall GVHD grade at both onset and D14 that predicted 12-month NRM and then applied it to the validation set. Patients that either progressed to grade III/IV GVHD, continued having grade III/IV GVHD, or only improved to grade II at D14 from grade III/IV at D0 had high NRM and were classified as non-responders. Patients with grade I/II GVHD at day 0 who had grade 0-II GVHD at day 14 and patients who improved to grade 0/I at day 14 from grade III/IV at day 0 had low NRM and were classified as responders. Although counterintuitive, responders included patients whose GVHD remained at grade II because the 12 month NRM of this group was only 20%. The new clinical model predicted NRM as well as the D28 standard response model in the validation set. When we integrated the D14 MAP biomarker score (low: ≤ 0.290 vs high: >0.290) with the new D14 clinical response (D14 MAGIC integrated response), we observed three response groups of integrated response with strikingly different 12 month NRM (8%, 35%, 76%, p<0.001), in contrast to the D28 standard response (Figure 1A, 1B). D14 complete response required MAGIC clinical response with low MAP; D14 partial response required either MAGIC clinical response with high MAP or MAGIC clinical non-response with low MAP; and D14 non-response required MAGIC clinical non-response with high MAP. The D14 response groups also displayed significant differences in 12 month OS (82%, 58%, 14% respectively, p<0.001)

We then compared the D14 integrated response with the D28 standard response for 12 month NRM prediction. This D14 integrated response model was superior for time-dependent AUC (0.78 vs 0.71, P=0.03), sensitivity (0.71 vs 0.54), positive predictive value (0.52 vs 0.48) and negative predictive value (0.92 vs 0.89), with minimal loss in specificity (0.85 vs 0.87) in the prediction of 12-month NRM compared to the D28 standard response model. Using decision curve analysis, the D14 integrated response displayed higher net benefit (correct identification of patients that will experience NRM by 12 months) over the whole range of threshold probabilities/preferences for changing immunosuppression when 12-month NRM was used as the outcome. We conclude first, that the definition of response that uses onset and D14 grades is as accurate as D28 ORR model; and second, that a model using clinical grades and biomarkers on D14 more accurately predicts long term NRM than the standard D28 definition and may be useful as a clinical trial endpoint.

More from our Archive