DOI: 10.1002/tpg2.70266 ISSN: 1940-3372

Haplotype‑resolved comparison of transcription factor superfamilies between wild and cultivated autotetraploid green jujube and prioritization of candidate transcription factors via machine learning

Xudong Zhu, Pengyan Chang, Yanzhen Liao, Huini Wu, Fan Jiang

Abstract

A framework beyond single‐reference genomes is needed to understand transcription factor evolution. This study employed an integrated haplotype‑resolved genomes–transcriptome atlas–machine learning to characterize the transcription factors of autotetraploid green jujube ( Ziziphus mauritiana ). The first haplotype‑resolved comparison of transcription factor superfamilies from eight haplotype genomes (HapGenome) representing a specific wild and a specific cultivated green jujube accession, encompassing 42 superfamilies and 12,123 gene copies. Evolutionary analyses revealed high structural conservation with minimal copy number variation, gene presence/absence variations, and strong purifying selection (Ka/Ks < 1). Dispersed duplication (47.24%), not whole‐genome duplication (36.10%), was the most frequently observed duplication event in the expansion of transcription factor superfamily. A haplotype‑resolved transcriptome atlas demonstrated that tissue‐specific expression divergence occurred at the superfamily level and between the core/dispensable genes. Integrating transcriptomic and metabolomic data, support vector machine classification with leave‑one‑out cross‑validated distinguished three wild fruits from six cultivated fruits with the accuracy of 89% using orthologous gene groups (OGGs) expression profiles. The eXtreme gradient boosting was employed as an exploratory tool to prioritize OGGs related to metabolite changes. Finally, OGG‐95, a Lesion Simulating Disease Zn finger transcription factor, was screened out, which was significantly upregulated in cultivated fruits, and its expression was significantly correlated with differential accumulation of nucleotides and organic acids that need further functional validation. This integrative study provided novel insights into the genomic architecture and regulatory evolution of transcription factors in a polyploid fruit crop, highlighting the power of multi‐dimensional analyses for gene discovery.

More from our Archive