Abstract
Genetic distances between bacterial DNA sequences can be used to cluster populations into closely related subpopulations and as an additional source of information when detecting possible transmission events. Due to their variable gene content and order, reference-free methods offer more sensitive detection of genetic differences, especially among closely related samples found in outbreaks. However, across longer genetic distances, frequent recombination can make calculation and interpretation of these differences more challenging, requiring significant bioinformatic expertise and manual intervention during the analysis process. Here, we present a Population analysis PIPEline (PopPIPE) which combines rapid reference-free genome analysis methods to analyse bacterial genomes across these two scales, splitting whole populations into subclusters and detecting plausible transmission events within closely related clusters. We use k-mer sketching to split populations into strains, followed by split k-mer analysis and recombination removal to create alignments and subclusters within these strains. We first show that this approach creates high-quality subclusters on a population-wide dataset of Streptococcus pneumoniae. When applied to nosocomial vancomycin-resistant Enterococcus faecium samples, PopPIPE finds transmission clusters that are more epidemiologically plausible than core genome or multilocus sequence typing (MLST) approaches. Our pipeline is rapid and reproducible, creates interactive visualizations and can easily be reconfigured and re-run on new datasets. Therefore, PopPIPE provides a user-friendly pipeline for analyses spanning species-wide clustering to outbreak investigations.
Original language | English |
---|---|
Article number | 001404 |
Pages (from-to) | 1-9 |
Number of pages | 9 |
Journal | Microbial Genomics |
Volume | 11 |
Issue number | 4 |
DOIs | |
Publication status | Published - 28 Apr 2025 |
Keywords
- Humans
- Streptococcus pneumoniae/genetics
- Genome, Bacterial
- Genomics/methods
- Multilocus sequence typing
- Cluster analysis
- Molecular epidemiology/methods
- Enterococcus faecium/genetics
- Computational biology/methods
- Phylogeny
- Cross infection/microbiology
Fingerprint
Dive into the research topics of 'Integrated population clustering and genomic epidemiology with PopPIPE'. Together they form a unique fingerprint.Datasets
-
PopPIPE on vancomycin resistant Enterococcus faecium
Lees, J. (Creator) & McHugh, M. (Creator), Figshare, 2025
DOI: 10.6084/m9.figshare.28495571.v1
Dataset
-
VREfm Nosocomial Outbreak WGS Investigation
McHugh, M. P. (Creator), Gillespie, S. H. (Creator) & Holden, M. (Creator), NCBI GenBank, 2025
https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA997588
Dataset