On defining rules for cancer data fabrication

Juliana Kuster Filipe Bowles*, Agastya Silvina, Eyal Bin, Michael Vinov

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Downloads (Pure)


Data is essential for machine learning projects, and data accuracy is crucial for being able to trust the results obtained from the associated machine learning models. Previously, we have developed machine learning models for predicting the treatment outcome for breast cancer patients that have undergone chemotherapy, and developed a monitoring system for their treatment timeline showing interactively the options and associated predictions. Available cancer datasets, such as the one used earlier, are often too small to obtain significant results, and make it difficult to explore ways to improve the predictive capability of the models further. In this paper, we explore an alternative to enhance our datasets through synthetic data generation. From our original dataset, we extract rules to generate fabricated data that capture the different characteristics inherent in the dataset. Additional rules can be used to capture general medical knowledge. We show how to formulate rules for our cancer treatment data, and use the IBM solver to obtain a corresponding synthetic dataset. We discuss challenges for future work.
Original languageEnglish
Title of host publicationRules and Reasoning
Subtitle of host publication4th International Joint Conference, RuleML+RR 2020, Oslo, Norway, June 29–July 1, 2020, Proceedings
EditorsVictor Gutiérrez Basulto, Tomáš Kliegr, Ahmet Soylu, Martin Giese, Dumitru Roman
Place of PublicationCham
Number of pages9
ISBN (Electronic)9783030579777
ISBN (Print)9783030579760
Publication statusPublished - 2020
Event4th International Joint Conference on Rules and Reasoning (RCUL+RR 2020) - Online, Oslo, Norway
Duration: 29 Jun 20201 Jul 2020
Conference number: 4

Publication series

NameLecture Notes in Computer Science (Programming and Software Engineering)
Volume12173 LNCS
ISSN (Print)0302-9743


Conference4th International Joint Conference on Rules and Reasoning (RCUL+RR 2020)
Abbreviated titleRCUL+RR 2020
Internet address


  • Cancer data
  • Synthetic data
  • Constraint solvers
  • Fabrication rules


Dive into the research topics of 'On defining rules for cancer data fabrication'. Together they form a unique fingerprint.

Cite this