DOI: 10.1093/bib/bbaf313 ISSN: 1467-5463

An artificial intelligence-based approach for identifying the proteins regulating liquid–liquid phase separation

Zahoor Ahmed, Kiran Shahzadi, Rui Li, Yu-Qing Jiang, Yan-Ting Jin, Muhammad Arif, Juan Feng

Abstract

Liquid–liquid phase separation (LLPS) is a biomolecular process that underpins the formation of membrane-less organelles within living cells. This phenomenon, along with the resulting condensate bodies, is increasingly recognized for its critical roles in various biological processes, such as ribonucleic acid (RNA) metabolism, chromatin rearrangement, and signal transduction. Notably, regulator proteins play a central role in the process of LLPS. They are essential for the formation, stabilization, and maintenance of the dynamic properties of LLPS, ensuring an appropriate phase separation response to cellular signals. Targeting these regulator proteins is the key to manipulating LLPS for applications in biotechnology, materials science, and medicine, including biomaterials, drug delivery, diagnostics, and synthetic biology. Given their importance, this study focused on an artificial intelligence-based approach to identify regulator proteins in LLPS. We constructed a dataset of 913 positive and 6584 negative protein sequences, and divided it into eight balanced training datasets and a test dataset. Semantic information from protein sequences was extracted using the ESM2_t36 pretrained protein language model, followed by training a multilayer perceptron classifier. The model achieved 0.78 accuracy on the test dataset, outperforming traditional sequence-based methods, one-hot encoding, and other pretrained embedding methods. SHapley Additive exPlanations (SHAP)-based interpretation revealed key biophysical patterns enriched in regulator proteins, including higher levels of charged and disordered residues. Our results show that deep contextual protein representations combined with neural network-based classifiers can accurately identify LLPS regulator proteins. This tool offers new opportunities for understanding condensate biology and designing synthetic phase-separating systems. All data and code are available at: https://github.com/bioplusAI/LLPS_regulators_pred.

More from our Archive