Abstract
Background: Cancer remains one of the leading causes of morbidity and mortality worldwide. Comprehensive datasets that combine histopathological images with genetic and survival data across various tumour sites are essential for advancing computational pathology and personalised medicine.
Results: We present SurGen, a dataset comprising 1,020 H&E-stained whole-slide images (WSIs) from 843 colorectal cancer cases. The dataset includes detailed annotations for key genetic mutations (KRAS, NRAS, BRAF) and mismatch repair status, as well as survival data for 426 cases. We illustrate SurGen’s utility with a proof-of-concept model that predicts mismatch repair status directly from WSIs, achieving a test area under the receiver operating characteristic curve of 0.8273. These preliminary results underscore the dataset’s potential to facilitate research in biomarker discovery, prognostic modelling, and advanced machine learning applications in colorectal cancer and beyond.
Conclusions: SurGen offers a valuable resource for the scientific community, enabling studies that require high-quality WSIs linked with comprehensive clinical and genetic information on colorectal cancer. Our initial findings affirm the dataset’s capacity to advance diagnostic precision and foster the development of personalised treatment strategies in colorectal oncology.
Results: We present SurGen, a dataset comprising 1,020 H&E-stained whole-slide images (WSIs) from 843 colorectal cancer cases. The dataset includes detailed annotations for key genetic mutations (KRAS, NRAS, BRAF) and mismatch repair status, as well as survival data for 426 cases. We illustrate SurGen’s utility with a proof-of-concept model that predicts mismatch repair status directly from WSIs, achieving a test area under the receiver operating characteristic curve of 0.8273. These preliminary results underscore the dataset’s potential to facilitate research in biomarker discovery, prognostic modelling, and advanced machine learning applications in colorectal cancer and beyond.
Conclusions: SurGen offers a valuable resource for the scientific community, enabling studies that require high-quality WSIs linked with comprehensive clinical and genetic information on colorectal cancer. Our initial findings affirm the dataset’s capacity to advance diagnostic precision and foster the development of personalised treatment strategies in colorectal oncology.
| Original language | English |
|---|---|
| Article number | giaf086 |
| Pages (from-to) | 1-16 |
| Number of pages | 16 |
| Journal | GigaScience |
| Volume | 14 |
| Early online date | 8 Oct 2025 |
| DOIs | |
| Publication status | Published - 2025 |
Keywords
- Whole-slide image (WSI)
- Hematoxylin and eosin (H&E) stain
- Mismatch repair (MMR)
- Microsatellite instability (MSI)
- KRAS mutation
- NRAS mutation
- BRAF mutation
- Colorectal cancer
- Digital pathology
- Dataset
Fingerprint
Dive into the research topics of 'SurGen: 1020 H&E-stained whole slide images with survival and genetic markers'. Together they form a unique fingerprint.Projects
- 2 Finished
-
ICAIRD: I-CAIRD: Industrial Centre for AI Research in Digital Diagnostics
Harris-Birtill, D. (PI) & Arandelovic, O. (CoI)
1/02/19 → 31/01/23
Project: Standard
-
ICAIRD: I-CAIRD: Industrial Centre for AI Research in Digital Diagnostics
Harrison, D. (PI)
1/02/19 → 31/01/22
Project: Standard
Datasets
-
Supporting data for "SurGen: 1020 H&E-stained Whole Slide Images With Survival and Genetic Markers"
Myles, C. (Creator), Um, I. (Creator), Marshall, C. (Creator), Harris-Birtill, D. (Creator) & Harrison, D. (Creator), GigaDB, 2025
DOI: 10.5524/102725
Dataset
-
SurGen: 1020 H&E-stained Whole Slide Images With Survival and Genetic Markers
Myles, C. G. G. (Creator), Um, I. H. (Creator), Marshall, C. (Creator), Harris-Birtill, D. C. C. (Creator) & Harrison, D. J. (Creator), EMBL-EBI, 24 Jul 2024
DOI: 10.6019/S-BIAD1285, https://www.ebi.ac.uk/biostudies/bioimages/studies/S-BIAD1285
Dataset
Student theses
-
Machine learning whole slide image analysis for the prediction of colorectal cancer biomarkers
Myles, C. (Author), Harris-Birtill, D. (Supervisor) & Harrison, D. (Supervisor), 2 Jul 2026Student thesis: Doctoral Thesis (PhD)
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver