Machine learning and external validation of the IDENTIFY risk calculator for patients with haematuria referred to secondary care for suspected urinary tract cancer

Sinan Khadhouri*, Artsiom Hramyka, Kevin Gallagher, Alexander Light, Simona Ippoliti, Marie Edison, Cameron Alexander, Meghana Kulkarni, Eleanor Zimmermann, Arjun Nathan, Luca Orecchia, Ravi Banthia, Pietro Piazza, David Mak, Nikolaos Pyrgidis, Prabhat Narayan, Pablo Abad Lopez, Faisal Nawaz, Trung-Thanh Tran, Francesco ClapsDonnacha Hogan, Juan Gomez Rivas, Santiago Alonso, Ijeoma Chibuzo, Beatriz Gutierrez Hidalgo, Jessica Whitburn, Jeremy Teoh, Gautier Marcq, Alexandra Szostek, Jasper Bondad, Petros Sountoulides, Tom Kelsey, Veeru Kasivisvanathan

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review


The IDENTIFY study developed a model to predict urinary tract cancer using patient characteristics from a large multicentre, international cohort of patients referred with haematuria. In addition to calculating an individual’s cancer risk, it proposes thresholds to stratify them into very-low-risk (<1%), low-risk (1–<5%), intermediate-risk (5–<20%), and high-risk (≥20%) groups.

To externally validate the IDENTIFY haematuria risk calculator and compare traditional regression with machine learning algorithms.

Design, setting, and participants
Prospective data were collected on patients referred to secondary care with new haematuria. Data were collected for patient variables included in the IDENTIFY risk calculator, cancer outcome, and TNM staging. Machine learning methods were used to evaluate whether better models than those developed with traditional regression methods existed.

Outcome measurements and statistical analysis
The area under the receiver operating characteristic curve (AUC) for the detection of urinary tract cancer, calibration coefficient, calibration in the large (CITL), and Brier score were determined.

Results and limitations
There were 3582 patients in the validation cohort. The development and validation cohorts were well matched. The AUC of the IDENTIFY risk calculator on the validation cohort was 0.78. This improved to 0.80 on a subanalysis of urothelial cancer prevalent countries alone, with a calibration slope of 1.04, CITL of 0.24, and Brier score of 0.14. The best machine learning model was Random Forest, which achieved an AUC of 0.76 on the validation cohort. There were no cancers stratified to the very-low-risk group in the validation cohort. Most cancers were stratified to the intermediate- and high-risk groups, with more aggressive cancers in higher-risk groups.

The IDENTIFY risk calculator performed well at predicting cancer in patients referred with haematuria on external validation. This tool can be used by urologists to better counsel patients on their cancer risks, to prioritise diagnostic resources on appropriate patients, and to avoid unnecessary invasive procedures in those with a very low risk of cancer.

Patient summary
We previously developed a calculator that predicts patients’ risk of cancer when they have blood in their urine, based on their personal characteristics. We have validated this risk calculator, by testing it on a separate group of patients to ensure that it works as expected. Most patients found to have cancer tended to be in the higher-risk groups and had more aggressive types of cancer with a higher risk. This tool can be used by clinicians to fast-track high-risk patients based on the calculator and investigate them more thoroughly.
Original languageEnglish
Number of pages9
JournalEuropean Urology Focus
VolumeArticles in press
Early online date21 Jun 2024
Publication statusE-pub ahead of print - 21 Jun 2024


  • Haematuria
  • Predictive model
  • Prediction
  • Urinary tract cancer
  • Bladder cancer
  • Upper tract urothelial cancer
  • Renal cancer
  • Validation
  • Risk calculator
  • Cancer risk


Dive into the research topics of 'Machine learning and external validation of the IDENTIFY risk calculator for patients with haematuria referred to secondary care for suspected urinary tract cancer'. Together they form a unique fingerprint.

Cite this