Statistical Methods for Detecting Nonlinear Relationships in Gene Expression and Omics Data: A Review
Łukasz HuminieckiHigh-throughput technologies such as RNA-seq and single-cell transcriptomics generate increasingly large and high-dimensional gene expression datasets in which nonlinear dependence structures are common. Because classical methods primarily capture linear associations, they may fail to characterize many biologically relevant patterns of dependence. To address this limitation, diverse nonlinear dependence measures—including information-theoretic, rank-based, kernel-based, distance-based, copula-based, and clustering-based approaches—have been developed. However, the field remains fragmented, and comparative evaluations are often inconsistent. This review organizes nonlinear methods into major methodological families and critically compares their statistical behavior, strengths, limitations, and characteristic modes of failure. We emphasize that method selection depends on matching inferential objectives to estimator assumptions, analytical constraints, and characteristic failure modes. By identifying recurring trade-offs among flexibility, robustness, interpretability, and computational scalability, we provide scenario-based guidance for method selection in transcriptomics, network inference, and functional genomics. In doing so, we aim to align inferential objectives with analytical requirements, supporting principled and application-specific use of nonlinear dependence methods in modern omics research.