DOI: 10.1002/qub2.70044 ISSN: 2095-4689

Genome–phenome association prediction using weighted deep matrix factorization with a multisource graph attention network

Ran Duan, Wei Cao, Maozu Guo, Le Tian, Lingling Zhao

Abstract

Genome–phenome association (GPA) prediction can broaden the understanding of biological mechanisms underlying complex phenotypic traits (e.g., diseases and agronomic traits). Traditional deep matrix factorization (DMF)‐based GPA methods can integrate multiple data types and uncover nonlinear associations but often rely on low‐dimensional matrix representations, making it challenging to capture intricate node interactions under sparse data conditions. Existing approaches also inadequately integrate multi‐omics data and frequently neglect local neighborhood (LN) and graph structural information, particularly when handling incomplete or low‐quality attribute data. Furthermore, most methods fail to effectively address the scarcity of plant phenotype data. We propose GDMF, an end‐to‐end GPA prediction framework that integrates a multisource graph attention network (GAT) with weighted DMF. By constructing a heterogeneous molecular network, GDMF unifies multi‐omics data and utilizes GAT’s attention mechanism to process multisource associations (both cross‐type and intra‐type node interactions), thereby enhancing the capture of LN structures while preserving critical node information. The weighted DMF component dynamically assigns importance weights to global relationship matrices, enabling robust modeling of complex gene–phenotype interactions even with sparse and noisy attribute data. To mitigate the scarcity of plant phenotype data, GDMF incorporates textual semantic similarity of trait ontology terms to enrich phenotypic attribute representations. Extensive experiments on maize and human datasets demonstrate that GDMF significantly outperforms competitive methods, validating its efficacy and superiority in GPA prediction.

More from our Archive