Integrated population clustering and genomic epidemiology with PopPIPE

Martin P McHugh, Samuel T Horsfield, Johanna von Wachsmann, Jacqueline Toussaint, Kerry A Pettigrew, Elzbieta Czarniak, Thomas J Evans, Alistair Leanord, Luke Tysall, Stephen H Gillespie, Kate E Templeton, Matthew T G Holden, Nicholas J Croucher, John A Lees*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Genetic distances between bacterial DNA sequences can be used to cluster populations into closely related subpopulations and as an additional source of information when detecting possible transmission events. Due to their variable gene content and order, reference-free methods offer more sensitive detection of genetic differences, especially among closely related samples found in outbreaks. However, across longer genetic distances, frequent recombination can make calculation and interpretation of these differences more challenging, requiring significant bioinformatic expertise and manual intervention during the analysis process. Here, we present a Population analysis PIPEline (PopPIPE) which combines rapid reference-free genome analysis methods to analyse bacterial genomes across these two scales, splitting whole populations into subclusters and detecting plausible transmission events within closely related clusters. We use k-mer sketching to split populations into strains, followed by split k-mer analysis and recombination removal to create alignments and subclusters within these strains. We first show that this approach creates high-quality subclusters on a population-wide dataset of Streptococcus pneumoniae. When applied to nosocomial vancomycin-resistant Enterococcus faecium samples, PopPIPE finds transmission clusters that are more epidemiologically plausible than core genome or multilocus sequence typing (MLST) approaches. Our pipeline is rapid and reproducible, creates interactive visualizations and can easily be reconfigured and re-run on new datasets. Therefore, PopPIPE provides a user-friendly pipeline for analyses spanning species-wide clustering to outbreak investigations.
Original languageEnglish
Article number001404
Pages (from-to)1-9
Number of pages9
JournalMicrobial Genomics
Volume11
Issue number4
DOIs
Publication statusPublished - 28 Apr 2025

Keywords

  • Humans
  • Streptococcus pneumoniae/genetics
  • Genome, Bacterial
  • Genomics/methods
  • Multilocus sequence typing
  • Cluster analysis
  • Molecular epidemiology/methods
  • Enterococcus faecium/genetics
  • Computational biology/methods
  • Phylogeny
  • Cross infection/microbiology

Fingerprint

Dive into the research topics of 'Integrated population clustering and genomic epidemiology with PopPIPE'. Together they form a unique fingerprint.

Cite this