DOI: 10.3390/ijms27135819 ISSN: 1422-0067

Integrating Multi-View Features via Deep Generalized Canonical Correlation Analysis for Single-Cell Clustering

Wenhao Liu, Wei Zhang, Xiaoying Zheng, Yuanyuan Li

Single-cell RNA sequencing data are characterized by high dimensionality, sparsity, and strong nonlinearity, hindering conventional single-view clustering methods from capturing linear and nonlinear feature subspaces simultaneously. Features from distinct dimensionality reduction approaches are inherently complementary: PCA (Principal Component Analysis) preserves global linear structures, UMAP (Uniform Manifold Approximation and Projection) maintains topology and local neighborhoods, and PHATE (Potential of Heat-diffusion for Affinity-based Trajectory Embedding) depicts gradual transitions in cell differentiation. To fuse these complementary sources, we adopt an inter-view correlation maximization paradigm. Canonical Correlation Analysis (CCA) integrates two views by maximizing projection correlation but is limited to pairwise scenarios. We extend it to Generalized Canonical Correlation Analysis (GCCA) for multi-view alignment and introduce a deep autoencoder to construct the DeepGCCA (Deep Generalized Canonical Correlation Analysis) framework. This method generates three views via PCA, UMAP, and PHATE, extracts nonlinear latent features with the autoencoder, projects multi-view representations into a unified subspace under weighted GCCA constraints, and performs K-means clustering. Experiments on the two simulated and three real single-cell datasets evaluated in this study show that DeepGCCA demonstrates competitive performance against all single-view baselines and performs favorably compared to several widely adopted methods. Moreover, downstream marker gene analysis supports the biological interpretability of the resulting clusters within these datasets. Within the scope of this benchmark, DeepGCCA provides a valuable reference for high-precision clustering of single-cell transcriptomic data, offering practical insights into multi-view integration and biological interpretability.

More from our Archive