DOI: 10.1111/1755-0998.70176 ISSN: 1755-098X

EasyCen : A Lightweight Framework for Centromere Localisation and Repeat‐Organisation Profiling in Telomere‐to‐Telomere Genomes

Yunyun Lv, Yanping Li, Jia Li, Xidong Mu

ABSTRACT

Accurate identification of centromeres in telomere‐to‐telomere (T2T) genomes remains difficult due to the rapid evolution of centromeric repeats and their lack of conserved sequence features. In this study, we present EasyCen, a lightweight sequence‐based framework for centromere identification and repeat‐architecture profiling across various eukaryotes. Rather than relying on repeat annotation or homology, EasyCen recognises centromeres based on recurrent positional features of repetitive DNA. Besides centromere localisation, EasyCen incorporates a repeat‐pair profiling module for exploratory characterisation of internal repeat organisation. Benchmarking on Arabidopsis thaliana and Mus musculus showed high accuracy (typically > 85% coordinate overlap with published annotations) and substantially reduced runtime. Cross‐species analysis revealed a ‘beads‐on‐a‐string’‐like repeat pattern in both mouse and human centromeres, associated with GC‐ and CpG‐rich subdomains. EasyCen performs effectively without pre‐existing repeat libraries, making it particularly useful for large, repeat‐rich, or non‐model genomes. Our analyses further suggest that certain organisational features of centromeric repeats may recur across diverse eukaryotic lineages despite rapid sequence turnover.

More from our Archive