Use of data-driven methods to search Electronic Health Records (EHRs) in aid of clinical trial recruitment

  • Wen Shi

Student thesis: Doctoral Thesis (PhD)


Recruiting participants into clinical studies is resource-intensive. However, nearly half of the
trials fail to recruit enough participants, which lead to early termination or poor power of a
study. Digitisation of health care records has ushered in second use of electronic health records
(EHRs). One application is for participant identification and recruitment. Development of
advanced analytics has also added more possibility to that end. In this thesis, an evaluation of
an EHRs-based recruitment support service is presented first. Following that is a study of the
contents of eligibility criteria and its availability in EHRs. A systematic review of advanced
analytics applied to EHRs for recruitment purposes is reported after that. Finally, a
retrospective study of identifying eligible participants for nine clinical studies from EHRs using
case-based reasoning method is presented. It was found that EHRs-based recruitment service
might have difficulty in identifying patients with certain symptoms and minor conditions due
to lack of access to the full set of health care data. Study on eligibility criteria also corroborated
that need to access primary care data and to involve advanced analytics in cohort identification
in order to address different types of eligibility criteria. The review included 11 relevant papers
and found that most were in-silico studies except for one interventional study. Performances
could not be synthesised due to huge differences in experiment set-ups, including trial domain,
number of trials used, analysis unit, outcome definition, evaluation method. A study using
NLP-incorporated case-based reasoning generated good performance indicated by a relatively
comprehensive set of measures. Adaptation of case-based reasoning method to EHRs for
patient recruitment in SHARE showed good differentiation performances in seven projects.
But it did not perform well when evaluated by information retrieval metrics. The results
reflected that structured data alone cannot realise the full potential of the computable method,
echoing the findings from the other studies.
Date of Award17 Jun 2022
Original languageEnglish
Awarding Institution
  • University of St Andrews
SupervisorFrank Sullivan (Supervisor) & Tom Kelsey (Supervisor)

Access Status

  • Full text embargoed until
  • 18 April 2027

Cite this