Abstract
Background: Whole-slide imaging enables the digitization of entire histological slides at a high resolution, allowing pathologists and
researchers to analyze tissue samples digitally rather than through traditional microscopy. This technology has become increasingly
valuable in pathology for research, education, and clinical diagnostics. Endometrial biopsy is very common, often being undertaken
to exclude noncancerous disease. This means that most cases do not contain cancer, and the challenge is to accurately and efficiently exclude serious pathology rather than simply make a diagnosis of malignancy. A well-curated, expert-annotated, endometrial whole-slide dataset covering a spread of cancer and noncancer diagnoses will support machine learning applications in automated diagnosis, facilitate research into the pathology of endometrial cancer, and serve as an educational resource for medical professionals.
Results: We introduce a newly constructed, large-scale dataset of endometrial biopsy specimens, comprising 2,909 whole-slide images
in iSyntax format, each accompanied by a corresponding annotation file in JSON format. Each whole-slide image is labeled with a primary class label representing its final diagnosis and a subcategory label providing further details within that diagnostic class.
These class labels are critical for machine learning applications, as they enable the development of artificial intelligence models capable of distinguishing between different types of endometrial abnormalities, improving automated classification, and guiding clinical decision-making.
Conclusions: Constructing and curating a high-quality endometrial whole-slide dataset requires significant effort to ensure accurate
annotations, data integrity, and patient privacy protection. However, the availability of a well-annotated dataset with detailed class labels is crucial for advancing digital pathology. Such a resource can enhance diagnostic accuracy, support personalized treatment strategies, and ultimately improve outcomes for patients with endometrial cancer and other endometrial conditions.
researchers to analyze tissue samples digitally rather than through traditional microscopy. This technology has become increasingly
valuable in pathology for research, education, and clinical diagnostics. Endometrial biopsy is very common, often being undertaken
to exclude noncancerous disease. This means that most cases do not contain cancer, and the challenge is to accurately and efficiently exclude serious pathology rather than simply make a diagnosis of malignancy. A well-curated, expert-annotated, endometrial whole-slide dataset covering a spread of cancer and noncancer diagnoses will support machine learning applications in automated diagnosis, facilitate research into the pathology of endometrial cancer, and serve as an educational resource for medical professionals.
Results: We introduce a newly constructed, large-scale dataset of endometrial biopsy specimens, comprising 2,909 whole-slide images
in iSyntax format, each accompanied by a corresponding annotation file in JSON format. Each whole-slide image is labeled with a primary class label representing its final diagnosis and a subcategory label providing further details within that diagnostic class.
These class labels are critical for machine learning applications, as they enable the development of artificial intelligence models capable of distinguishing between different types of endometrial abnormalities, improving automated classification, and guiding clinical decision-making.
Conclusions: Constructing and curating a high-quality endometrial whole-slide dataset requires significant effort to ensure accurate
annotations, data integrity, and patient privacy protection. However, the availability of a well-annotated dataset with detailed class labels is crucial for advancing digital pathology. Such a resource can enhance diagnostic accuracy, support personalized treatment strategies, and ultimately improve outcomes for patients with endometrial cancer and other endometrial conditions.
| Original language | English |
|---|---|
| Article number | giaf147 |
| Pages (from-to) | 1-8 |
| Number of pages | 8 |
| Journal | GigaScience |
| Volume | 14 |
| Early online date | 5 Dec 2025 |
| DOIs | |
| Publication status | Published - 2025 |
Keywords
- Endometrium
- Whole-slide imaging
- Endometrial cancer
- Endometrial hyperplasia
- Endometrial carcinoma
- Digital slide repository
- Image analysis
- Image-segmentation
- Histopathology
- Deep learning
- Machine learning
Fingerprint
Dive into the research topics of 'Endometrial whole-slide images dataset for detection of malignancy in endometrial biopsies'. Together they form a unique fingerprint.Projects
- 2 Finished
-
ICAIRD: I-CAIRD: Industrial Centre for AI Research in Digital Diagnostics
Harris-Birtill, D. (PI) & Arandelovic, O. (CoI)
1/02/19 → 31/01/23
Project: Standard
-
ICAIRD: I-CAIRD: Industrial Centre for AI Research in Digital Diagnostics
Harrison, D. (PI)
1/02/19 → 31/01/22
Project: Standard
Datasets
-
Endometrial Whole Slide Images Dataset
Um, I. (Creator), Mohammadi, M. (Creator), Fell, C. (Creator), Morrison, D. (Creator), Orange, C. E. L. (Creator), Harrison, D. (Creator), Harris-Birtill, D. (Creator), Arandelovic, O. (Creator) & Blackwood, J. (Creator), EMBL-EBI, 2024
DOI: 10.6019/S-BIAD1199
Dataset
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver