SOLeNNoID: A Deep Learning Pipeline For Solenoid Residue Detection in Protein Structures
Georgi I Nikov, Daniella Pretorius, James W MurrayAbstract
Motivation
Solenoid proteins, a subset of tandem repeat proteins, have structurally distinct, modular, and elongated architectures that differentiate them from globular proteins. These proteins play essential roles in diverse biological processes, including protein binding, enzymatic catalysis, ice binding, and nucleic acid interactions. Despite their biological significance and increasing commercial applications–such as in therapeutic engineered variants like DARPins and designed PPR proteins–accurate identification and annotation of solenoid structures remain challenging. Given that solenoid structures are more conserved than their sequences, recent advances in protein structure prediction suggest that structure-based solenoid detection methods are preferable to sequence-based ones.
Results
We introduce SOLeNNoID, a deep-learning-based pipeline for predicting solenoid residues in protein structures. Our method employs a convolutional neural network (CNN) architecture to analyze protein distance matrices, enabling accurate identification of solenoid-containing regions. SOLeNNoID covers all three solenoid subclasses: α-, α/β-, and β-solenoids. Comparative evaluation against existing structure-based methods demonstrates the superior performance of our approach. Applying SOLeNNoID to the entire Protein Data Bank (PDB) led to a 71% increase in detected solenoid-containing entries compared to the gold-standard RepeatsDB database, significantly expanding the known solenoid protein repertoire.
Availability and Implementation
SOLeNNoID is implemented in Python and available on github at https://github.com/gnik2018/SOLeNNoID. The source code and pre-trained models are accessible under a free-software license. Training data are available on Zenodo at https://zenodo.org/records/14927497. Contact: James W Murray j.w.murray@imperial.ac.uk
Supplementary information
Available online