DOI: 10.3390/ijms27125593 ISSN: 1422-0067

RNA-Binding Protein Occupancy Composition Predicts Long Noncoding RNA Subcellular Localization

Hidenori Tani

The subcellular localization of long noncoding RNAs (lncRNAs) is a central determinant of their function, yet its molecular determinants remain incompletely defined, and most existing predictors rely on the primary sequence. Because RNA-binding proteins (RBPs) are the proximal effectors of RNA compartmentalization, this study tested whether the composition of RBPs bound to a lncRNA is predictive of its nuclear or cytoplasmic localization. Enhanced crosslinking and immunoprecipitation (eCLIP) occupancy for 139 RBPs in K562 cells was integrated with the cytoplasmic–nuclear relative concentration indices (CN-RCIs) derived from matched subcellular fractionation, and localization was modeled under chromosome-grouped cross-validation with nested regularization. RBP-occupancy composition predicted localization beyond the transcript size and total binding amount (incremental cross-validated coefficient of determination, delta-R-squared = 0.17; receiver-operating-characteristic area under the curve, AUC = 0.73, a moderate-strength association; Freedman–Lane permutation, p = 0.005). This increment persisted (delta-R-squared = 0.12; p = 0.005) against an expanded baseline that additionally absorbed the transcript abundance, intron content and exon number, indicating predictive information that is not reducible to these transcript features, and the classifier was well calibrated (Brier score = 0.10; expected calibration error = 0.02). The signed coefficient profile separated RBP function systematically: factors acting in nuclear processes (splicing, 3′-end processing, and nuclear-matrix association) carried negative, nuclear-direction weights, whereas factors acting in cytoplasmic processes (translation and messenger RNA stability) carried positive, cytoplasmic-direction weights (Mann–Whitney p = 0.013). The profile generalized across cell lines: a K562-trained model predicted HepG2 localization (transfer AUC = 0.71 using 76 shared RBPs), and HepG2 reproduced the association independently (AUC = 0.77). The association is correlational and of moderate strength; it is presented as an interpretable, RBP-occupancy-based complement to sequence-based predictors of lncRNA localization.

More from our Archive