An interpretable deep learning framework uncovers features governing CRISPR-Cas9 genome-editing efficiency
Nasim Bakhtiyari, Yosef Masoudi-Sobhanzadeh, Safar Farajnia, Sushant KumarAbstract
Motivation
CRISPR-Cas9 genome-editing efficiency is strongly influenced by the sequence composition and positional context of single-guide RNAs (sgRNAs). Although numerous deep learning–based models have been developed to predict Cas9 efficiency from sgRNA sequences, most operate as black boxes, offering limited insight into the sequence determinants underlying Cas9 activity. In addition, previous studies often overlook how the positional context of sequence motifs within sgRNAs influences their effects on Cas9 binding or cleavage.
Results
We introduce DeepCC9, an interpretable machine learning framework that combines explicit sequence feature extraction with a residual block–based deep architecture to improve interpretability and identify composition- and position-based motifs governing Cas9 genome-editing efficiency. We applied this method to multiple Cas9 variant datasets, achieving superior predictive performance compared with existing methods while enabling direct interpretation of sequence motifs and their positional effects. Our analysis uncovered 74 sequence motifs enriched or depleted at specific positions within sgRNAs and strongly associated with Cas9 efficiency, providing mechanistic insight into sequence features that influence guide performance. Together, these results establish DeepCC9 as a generalizable and interpretable framework for modeling sequence–function relationships and advancing the understanding of the sequence determinants underlying CRISPR-Cas9 genome editing.
Availability and implementation
The authors have implemented their algorithm in the Python programming language (version 3.X), which is accessible using (https://github.com/MasoudiYosef/DeepCC9, https://zenodo.org/records/20073890).
Supplementary information
Supplementary data are available at Bioinformatics online.