Templated text synthesis for expert-guided multi-label extraction from radiology reports

Patrick Schrempf, Hannah Watson, Eunsoo Park, Maciej Pajak, Hamish MacKinnon, Keith W. Muir, David Harris-Birtill, Alison Q. O’Neil

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)
17 Downloads (Pure)


Training medical image analysis models traditionally requires large amounts of expertly annotated imaging data which is time-consuming and expensive to obtain. One solution is to automatically extract scan-level labels from radiology reports. Previously, we showed that, by extending BERT with a per-label attention mechanism, we can train a single model to perform automatic extraction of many labels in parallel. However, if we rely on pure data-driven learning, the model sometimes fails to learn critical features or learns the correct answer via simplistic heuristics (e.g., that “likely” indicates positivity), and thus fails to generalise to rarer cases which have not been learned or where the heuristics break down (e.g., “likely represents prominent VR space or lacunar infarct” which indicates uncertainty over two differential diagnoses). In this work, we propose template creation for data synthesis, which enables us to inject expert knowledge about unseen entities from medical ontologies, and to teach the model rules on how to label difficult cases, by producing relevant training examples. Using this technique alongside domain-specific pre-training for our underlying BERT architecture i.e., PubMedBERT, we improve F1 micro from 0.903 to 0.939 and F1 macro from 0.512 to 0.737 on an independent test set for 33 labels in head CT reports for stroke patients. Our methodology offers a practical way to combine domain knowledge with machine learning for text classification tasks.
Original languageEnglish
Pages (from-to)299-317
Number of pages19
JournalMachine Learning and Knowledge Extraction
Issue number2
Publication statusPublished - 24 Mar 2021


  • NLP
  • Radiology report labelling
  • BERT
  • Data synthesis
  • Templates


Dive into the research topics of 'Templated text synthesis for expert-guided multi-label extraction from radiology reports'. Together they form a unique fingerprint.

Cite this