Protein-Nucleic Acid Binding Site Prediction Using Interpretable Kolmogorov–Arnold Networks with Hypergraph Representation Learning
Yangfeng Zhu, Guicong Sun, Weimin Zhu, Yongxian Fan, Zeheng Wu, Xianchen Zheng, Xiaoyong PanAbstract
Motivation
In recent years, protein language models (pLMs) and graph neural networks (GNNs) have demonstrated powerful expressive and reasoning capabilities in modeling protein-RNA/DNA interactions. However, existing methods, which use simple graphs to describe the relationships between residues, struggle to effectively capture the high-order, multi-body residue interactions present in protein-nucleic acid complex structures. In fact, spatially continuous but sequence-wise discontinuous residues often cooperatively determine nucleic acid binding capacity.
Results
In this study, we present IKANbind, a computational approach that combines hypergraph representation learning and interpretable Kolmogorov–Arnold Networks (KANs), for identifying nucleic acid binding residues (NBRs) in proteins. By combining the advantages of pLM, hypergraph neural networks and symbolic KAN, IKANbind outperforms existing methods on multiple NBR benchmark datasets. We also demonstrated that the pLM used in IKANbind can implicitly learn the physicochemical properties of binding residues, such as charge and hydrophobicity. In addition, the symbolic KAN, which uses a unique weighted mechanism of decomposable basis functions, can accurately identify the features with the greatest contribution to NBR recognition. We found that polarity and charge make greater contributions to NBR prediction than other physicochemical properties or evolutionary information. Finally, IKANbind achieves promising performance when extended to other ligand-binding residue prediction tasks.
Availability
IKANbind is freely available at https://github.com/yangfengzhuguet/IKANBind.
Supplementary information
Supplementary data are available at Bioinformatics online.