DOI: 10.1182/blood-2023-177884 ISSN: 0006-4971

An Unsupervised Machine Learning Method Stratifies Chronic Lymphocytic Leukemia Patients into Novel Categories with Different Risk of Early Treatment

Federico Pozzo, Francesca Cuturello, Edith Villegas Garcia, Francesca Rossi, Massimo Degan, Paola Nanni, Ilaria Cattarossi, Eva Zaina, Paola Varaschin, Alessandra Braida, Michele Berton, Laura Zannier, Filippo Vit, Erika Tissino, Tamara Bittolo, Roberta Laureana, Giovanni Francesco D'Arena, Luca Laurenti, Agostino Tafuri, Jacopo Olivieri, Francesco Zaja, Annalisa Chiarenza, Francesco Di Raimondo, Maria Ilaria Del Principe, Riccardo Bomben, Antonella Zucchetto, Alessio Ansuini, Chris Fegan, Chris Pepper, Andrea Pepper, Kari G. Rabe, Sameer A. Parikh, Neil E. Kay, Alberto Cazzaniga, Valter Gattei
  • Cell Biology
  • Hematology
  • Immunology
  • Biochemistry

Scoring systems designed to improve the accuracy of prognostication in chronic lymphocytic leukemia (CLL) usually rely on discretized/dichotomic values of the clinical and biological variables. Here we analyzed the immunophenotypic and (immuno)genetic profiles of 2,243 CLL patients with Rai stage 0-I-II, by applying unsupervised machine learning methods, elaborating prognostic factors as continuous variables, to identify novel relationships and interactions likely missed in conventional hierarchical models. The study included an internal training cohort (n=863) and two external validation cohorts provided from Cardiff University/Brighton Medical School (n=455) and the Mayo Clinic (Rochester, MN; n=925). Primary endpoint was Time to First Treatment (TTFT).

Laboratory-based markers evaluated in the training cohort were: surface antigen expression by flow cytometry (as % of positive cells) of CD20, FMC7, CD49d, CD49c, CD38, CD23, CD43, CD22, ZAP-70; chromosomal aberrations (% of nuclei with abnormal signal) del13q, tris12, del11q and del17p,; mutational status of TP53 by NGS as % VAF; IGHV mutational status as % mutation. 420/863 cases received treatment (median 46 months).

Univariate Cox regression identified FISH del11q, del17p and tris12, TP53 and IGHV mutations, CD38 and CD49d expression, as features associated with TTFT (p-value <0.001). An unsupervised k-means algorithm partitioned cases into 6 clusters (C1-C6), as selected by the elbow method, and centroid analysis evaluated which feature most contributed to each cluster (Panel A):

C1 (n=275) was heavily IGHV-mutated (mutation range 4.5-22.0%) with low representation of all other features;

C2 (n=208) was mostly IGHV-unmutated/intermediate (up to 4.5%) in the absence of other features;

C3 (n=169) were CD49d-expressing cases (>50% expression), with low representation of other features, except CD38;

C4 (n=127) were cases with trisomy 12 (>28%), also expressing CD49d and CD38;

C5 (n=48) were cases with highly clonal del11q (44-98%) and IGHV-unmutated, mutually exclusive with TP53 disruption;

C6 (n=34) contained TP53-disrupted cases with high mutation burden (VAF 36-97%) and/or del17p (range 40-96%), with low importance of all the other features. Of note, alternative use of either TP53 mutation or del17p as defining features of cluster 6 did not result in changes of the classification accuracy.

Clinically, clusters 3-4-5-6 showed median TTFT of 26, 22, 9, 5 months respectively whereas median TTFT was not reached for clusters 1-2. Hierarchical agglomerative clustering aggregated the 6 clusters in 3 major risk tiers: high (C5-6), intermediate (C2-3-4) and low (C1), with median TTFT of 7, 45 and not reached, respectively (Panel B). Multivariate Cox analysis (MVA) with Rai staging demonstrated that the 3-tier risk score contributed significantly and independently to TTFT estimate (p<0.0001; Harrel's c-index=0.71).

To classify novel patients, we designed a hierarchical algorithm, based on each clusters' estimated risk and according to calculated cut-off, i.e. C6, del17p/ TP53>36%; C5, del11q>43%; C4, tris12>28%; C3, CD49d>50%; C2, IGHV≤4.5%; C1, the remaining patients with IGHV>4.5%. This approach was highly robust by an internal training-validation bootstrap (cluster classification accuracy = 0.96±0.02).

External samples from two independent cohorts were classified into the six clusters using the cut-offs defined by the hierarchical classifier, and then aggregated into the 3-risk tiers. The classifier performed robustly in both cohorts, again adding independent prognostic information to Rai stage in MVA, with p<0.0001 and c-index of 0.76 for Cardiff/Brighton cohort and p<0.0001 and c-index of 0.74 for Mayo cohort.

In conclusion, our machine-learning-driven, laboratory-based classification identifies clusters of patients who at higher risk of requiring early treatment, independent of clinical staging, and may help to better identify patients who may benefit from early treatment or more frequent disease surveillance.

For some features, TP53 or IGHV mutational status, our unsupervised approach selected cut-offs not recapitulating the canonical ones, envisioning different biological activities in the TTFT setting. For example, TP53 mutations were clinically relevant only if present in the vast majority of the clone , suggesting the involvement of specific activities of the TP53 mutant in this context.

More from our Archive