DOI: 10.1126/sciadv.aeb3363 ISSN: 2375-2548
Annotating the pangenome reveals the diversity in the genetic basis for metabolic enzymes
Omid Ardalani, Patrick V. Phaneuf, Kalpathy J. Krishnan, David Pride, Lars K. Nielsen, Bernhard O. Palsson
Affordable sequencing has flooded public databases with bacterial genomes; yet, species-scale maps that connect gene content variation to metabolic functions essential to biotechnology/system biology remain scarce. We address this gap by building a pangenome-wide gene-protein-reaction association and applying it to 2377
Escherichia coli
genomes to reconstruct a pangenome-scale metabolic model (panGEM). We validate panGEM against Biolog carbon source utilization assays, achieving ≈0.99 precision in growth/no-growth predictions. Using panGEM, we identify >11,000 rare metabolic genes, yet only 35 metabolic reactions are rare. To explain the mismatch, we examined rare genes and found that most are pseudogenes or diverged orthologs acquired by horizontal gene transfer (HGT). Results indicate a recurrent loss-reacquisition cycle in which a core allele is lost/pseudogenized and its function is restored by HGT, preserving function without expanding the reactome, generating genetic heterogeneity in a small subset (~3.6%) of reactions, marking selection pressure hotspots of metabolism. Thus, pangenome annotation reveals the evolutionary dynamics that shape the genetic basis of metabolism.