Abstract
Colorectal cancer (CRC) is the second leading cause of cancer death worldwide, and incidence is predicted to rise sharply in the next decade. Precision oncology requires the identification of clinically actionable biomarkers such as MMR/MSI status and KRAS, NRAS and BRAF mutations, however immunohistochemistry and sequencing are slow and costly. This thesis shows that machine learning algorithms can rapidly and inexpensively predict these biomarkers directly from routine haematoxylin and eosin whole slide images (WSI), potentially accelerating triage for confirmatory testing and shortening the route to targeted therapy.Baseline tissue segmentation and slide-level tumour detection are established using fully supervised convolutional networks. A data constrained experiment then evaluates three foundation models (CTransPath, Phikon and UNI) on 423 CRC slides, confirming that self-supervised representations support reliable biomarker prediction when data are scarce. Next, the thesis introduces SurGen, a public dataset of 1020 slides from 843 patients with genetic and survival annotations, released with fully reproducible code exemplars, lowering the barrier to entry for future researchers.
An extensive generalisation study trains transformer aggregators on embeddings from UNI, UNI2-h and Virchow2 across five cohorts: TransSCOT, SurGen, TCGA-CRC, CPTAC-COAD and ORION. Models achieve state-of-the-art AUROC of 0.81778 for KRAS, 0.72108 for NRAS 0.92828 for BRAF, and 0.9559 for MSI within individual cohorts, and retain high accuracy when applied to unseen datasets. A two stage hyper-parameter search reduces computation by a factor of twenty without sacrificing performance.
The thesis supplies the first openly accessible large scale CRC WSI resource with genomics and outcomes, rigorous baselines for both data-rich and data-scarce settings, and the most extensive evaluation to date of foundation model transfer in colorectal pathology. Combining open data with extensive experimentation and thorough benchmarking, this thesis advances WSI biomarker detection towards clinical triage workflow, potentially expediting access to personalised care for colorectal cancer patients.
| Date of Award | 2 Jul 2026 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | David Harris-Birtill (Supervisor) & David Harrison (Supervisor) |
Keywords
- Artificial intelligence
- Machine learning
- Digital pathology
- Computational pathology
- Computer vision
- Colorectal cancer
- Biomarker prediction
- Whole slide imaging
Access Status
- Full text embargoed until
- 12 Mar 2027
Cite this
- Standard