A more complete picture: capturing single nucleotide variant diversity in extended-spectrum beta-lactamase producing Escherichia coli using post-enrichment metagenomics
Sarah Gallichan, Tommi Mäklin, Esther Picton-Barlow, Claudia McKeown, Sally Forrest, Jukka Corander, Maria Moore, Nicholas A. Feasey, Eva Heinz, Fabrice E. Graf, Joseph M. LewisInferring transmission relies on accurately distinguishing between isolates from the same source and those from different sources, and high-quality genomic data are frequently used to model transmission scenarios. The post-enrichment metagenome sequencing (pe-MGS) method uses a sequencing approach to analyse the diversity of a target pathogen enriched by pre-culturing and has been effectively used to analyse the transmission of nosocomial infections. However, a direct comparison of single nucleotide variant (SNV) call accuracy, cost and feasibility between single-colony whole-genome sequence (sc-WGS) data and pe-MGS for an antimicrobial resistant bacteria of clinical importance, extended-spectrum beta-lactamase producing Escherichia coli (ESBL-EC), is required for implementation in large-scale clinical studies. A spiked stool sample and rectal swabs from six study participants were pre-enriched in buffered peptone water and cultured on MacConkey agar with 1 mg l −1 cefotaxime. Seven single colonies were picked, and the remaining biomass of all colonies was collected from each plate, sequenced and analysed using the mSWEEP/mGEMS pipeline. We created a custom SNV calling workflow that allows heterozygous SNVs in a bacterial population and found that the choice of reference changed the number of measurable SNV distances between the sc-WGS and pe-MGS. Using our custom workflow with a core-gene reference captured 99% of all the SNV calls from multiple sc-WGS data in the pe-MGS data of the same culture. The plate sweep method offers a feasible, cost-effective alternative to multiple single colony picks for describing within-host ESBL-EC diversity. The workflow we developed allows for effective SNV calling from pe-MGS data that were comparable to SNV calls from multiple sc-WGS data from the same sample.