DOI: 10.1182/blood-2023-181628 ISSN: 0006-4971

A Novel Machine-Learning Model to Predict Early Relapse in Mantle Cell Lymphoma (MCL)

Alba Cabirta Touzón, Adrian Mosquera Orgueira, Victor Navarro Garces, Pau Abrisqueta, Rodrigo A Salcedo Pereda, Carlos Aliste Santos, Marta Canelo Vilaseca, Marina Gomez Rosa, Alberto Lopez Garcia, Tomas Garcia, Fatima De la Cruz Vicente, Juan-Manuel Sancho, Eduardo Rios Herranz, Raul Cordoba, Angel Serna, Gloria Iacoboni, Moraima Jiménez, Cecilia Carpio, Cristina Garcia, Laura Gallur, Josep Castellvi, Francesc Bosch, Ana Marin Niebla
  • Cell Biology
  • Hematology
  • Immunology
  • Biochemistry


MCL is currently an incurable disease. Although survival has greatly improved due to the advent of more effective treatments, early relapse or refractoriness after the first-line treatment (1L) is independently associated with a dismal outcome (Visco et al, BJH 2019). An early identification of these higher-risk patients would allow to optimize their management strategies and, possibly, to improve their outcome. The purpose of this study was to develop a novel prognostic model index that could predict, from the time of diagnosis, the risk of early relapse.


Patients diagnosed with MCL from 6 Spanish centers between January 2000 and December 2021 and treated with ≥1 lines were retrospectively analyzed (training population). An external cohort from an additional Spanish hospital was used to further validate the score (test cohort). Patients were classified as early-POD (E-POD: refractory to 1L or relapsing ≤24 m from diagnosis) or late-POD (L-POD: relapse after 1L beyond 24 m from diagnosis or absence of relapse at data cut-off). All patients had a follow-up (FU) ≥24 m from diagnosis to be evaluable for POD24, except those dying from MCL <24m, who were included as E-POD.

The LASSO method with the minimum lambda was used to identify the most relevant variables affecting POD24. Missing variables in the training cohort were imputed so as not to lose patients. To enhance the predictive performance, a multimodal approach was adopted. First, a random forest survival (RFS) model was constructed using the variables selected by LASSO to predict Overall Survival (OS). Subsequently, a gradient-boosting (GB) model was developed to predict POD24, incorporating both the LASSO-selected variables and the risk predictions derived from the RFS model.


The training cohort included 231 patients, 99 (43%) E-POD and 132 (57%) L-POD, and the test cohort 38 patients, 12 (32%) E-POD and 26 (68%) L-POD. Table 1 shows the baseline characteristics of both cohorts. Median FU was 97.12 m (86.93 -113.45) for the training cohort and 120.44 m (103.69 -144.82) for the test cohort.

Among the variables associated with the risk of E-POD in the univariate analysis (age>65 years, no previous transplant, advanced stage, high-risk MIPI, Ki67≥30% and blastoid morphology), the LASSO method selected a total of 4: age at diagnosis, stage (I-II vs. III-IV), MIPI group (low, intermediate or high risk) and morphologic variant (classic vs. blastoid/pleomorphic). These variables, adjusted for POD24 as the endpoint, were first included in the RFS model to estimate the OS of each patient, obtaining a c-index of 0.78 in both the training and the validation cohorts. Then, we performed a GB model to predict the risk of E-POD, by including the same LASSO-selected variables in addition to the risk prediction of survival obtained with the RFS model, with relative weights of 9.86% (age), 0.38% (stage), 14.34% (MIPI), 16.18% (morphology) and 59.25% (estimated OS). The resulting model showed an AUC (area under the curve) of 0.82 and 0.75 in the training and test cohorts, respectively, and the calibration curves for the prediction model in both cohorts are shown in Figure 1.

Our model identifies 3 tertile groups with differentiated risk of E-POD: high, intermediate and low risk with an accuracy of 81% and 74%, sensitivity 81% and 70%, specificity 81% and 76%, positive predictive value 78% and 64% and negative predictive value 85% and 81% for the training and validation cohorts, respectively.

Since Gradient Boosting models are an ensemble of multiple decision trees and their predictions are made by combining the outputs of these trees using weighted averaging, rather than using a fixed mathematical formula with coefficients as in traditional linear models, a formula cannot be provided. Therefore, an online application is currently under construction to make the calculation of this E-POD index score widely applicable.


This new index score can predict the risk of E-POD at diagnosis, allowing for an earlier identification of this high-risk patient subset in need of more effective treatment strategies. External validation of this E-POD index score in a larger cohort is warranted to allow a wider application of this prognostic tool.

More from our Archive