Endometrial whole-slide images dataset for detection of malignancy in endometrial biopsies

Mahnaz Mohammadi*, Christina Fell, Sarah Bell, Gareth Bryson, Sheeba Syed, Prakash Konanahalli, David Harris-Birtill, Ognjen Arandjelovic, Clare Orange, Prishma Shahi, In Hwa Um*, James D Blackwood, David J Harrison

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Background: Whole-slide imaging enables the digitization of entire histological slides at a high resolution, allowing pathologists and
researchers to analyze tissue samples digitally rather than through traditional microscopy. This technology has become increasingly
valuable in pathology for research, education, and clinical diagnostics. Endometrial biopsy is very common, often being undertaken
to exclude noncancerous disease. This means that most cases do not contain cancer, and the challenge is to accurately and efficiently exclude serious pathology rather than simply make a diagnosis of malignancy. A well-curated, expert-annotated, endometrial whole-slide dataset covering a spread of cancer and noncancer diagnoses will support machine learning applications in automated diagnosis, facilitate research into the pathology of endometrial cancer, and serve as an educational resource for medical professionals.

Results: We introduce a newly constructed, large-scale dataset of endometrial biopsy specimens, comprising 2,909 whole-slide images
in iSyntax format, each accompanied by a corresponding annotation file in JSON format. Each whole-slide image is labeled with a primary class label representing its final diagnosis and a subcategory label providing further details within that diagnostic class.
These class labels are critical for machine learning applications, as they enable the development of artificial intelligence models capable of distinguishing between different types of endometrial abnormalities, improving automated classification, and guiding clinical decision-making.

Conclusions: Constructing and curating a high-quality endometrial whole-slide dataset requires significant effort to ensure accurate
annotations, data integrity, and patient privacy protection. However, the availability of a well-annotated dataset with detailed class labels is crucial for advancing digital pathology. Such a resource can enhance diagnostic accuracy, support personalized treatment strategies, and ultimately improve outcomes for patients with endometrial cancer and other endometrial conditions.
Original languageEnglish
Article numbergiaf147
Pages (from-to)1-8
Number of pages8
JournalGigaScience
Volume14
Early online date5 Dec 2025
DOIs
Publication statusPublished - 2025

Keywords

  • Endometrium
  • Whole-slide imaging
  • Endometrial cancer
  • Endometrial hyperplasia
  • Endometrial carcinoma
  • Digital slide repository
  • Image analysis
  • Image-segmentation
  • Histopathology
  • Deep learning
  • Machine learning

Fingerprint

Dive into the research topics of 'Endometrial whole-slide images dataset for detection of malignancy in endometrial biopsies'. Together they form a unique fingerprint.

Cite this