DOI: 10.1093/bioinformatics/btaf189 ISSN: 1367-4803

From high-throughput evaluation to wet-lab studies: advancing mutation effect prediction with a retrieval-enhanced model

Yang Tan, Ruilin Wang, Banghao Wu, Liang Hong, Bingxin Zhou

Abstract

Motivation

Enzyme engineering is a critical approach for producing enzymes that meet industrial and research demands by modifying wild-type proteins to enhance properties such as catalytic activity and thermostability. Beyond traditional directed evolution and rational design, recent advancements in deep learning offer cost-effective and high-performance alternatives. By encoding implicit coevolutionary patterns, these pretrained models have become powerful tools, with the central challenge being to uncover the intricate relationships among protein sequence, structure, and function.

Results

We present VenusREM, a retrieval-enhanced protein language model designed to capture local amino acid interactions in both spatial and temporal scales. VenusREM achieves state-of-the-art performance on 217 assays from the ProteinGym benchmark. Beyond high-throughput open benchmark validations, we conducted a low-throughput post hoc analysis on more than 30 mutants to verify the model’s ability to improve the stability and binding affinity of a VHH antibody. We also validated the effectiveness of VenusREM by designing 10 novel mutants of a DNA polymerase and performing wet-lab experiments to evaluate their enhanced activity at elevated temperatures. Both in silico and experimental evaluations not only confirm the reliability of VenusREM as a computational tool for enzyme engineering but also demonstrate a comprehensive evaluation framework for future computational studies in mutation effect prediction.

Availability and implementation

The implementation is available at https://github.com/tyang816/VenusREM.