DOI: 10.1126/sciadv.aeb3363 ISSN: 2375-2548

Annotating the pangenome reveals the diversity in the genetic basis for metabolic enzymes

Omid Ardalani, Patrick V. Phaneuf, Kalpathy J. Krishnan, David Pride, Lars K. Nielsen, Bernhard O. Palsson

Affordable sequencing has flooded public databases with bacterial genomes; yet, species-scale maps that connect gene content variation to metabolic functions essential to biotechnology/system biology remain scarce. We address this gap by building a pangenome-wide gene-protein-reaction association and applying it to 2377 Escherichia coli genomes to reconstruct a pangenome-scale metabolic model (panGEM). We validate panGEM against Biolog carbon source utilization assays, achieving ≈0.99 precision in growth/no-growth predictions. Using panGEM, we identify >11,000 rare metabolic genes, yet only 35 metabolic reactions are rare. To explain the mismatch, we examined rare genes and found that most are pseudogenes or diverged orthologs acquired by horizontal gene transfer (HGT). Results indicate a recurrent loss-reacquisition cycle in which a core allele is lost/pseudogenized and its function is restored by HGT, preserving function without expanding the reactome, generating genetic heterogeneity in a small subset (~3.6%) of reactions, marking selection pressure hotspots of metabolism. Thus, pangenome annotation reveals the evolutionary dynamics that shape the genetic basis of metabolism.

More from our Archive