Assessing the validity of AI-authored exam questions

Research output: Contribution to journalAbstract

Abstract

This research aims to employ an artificial intelligence (AI) large language model (LLM) to generate valid single best answer (SBA) exam questions for undergraduate medical students. The objective is to design a prompt that generates SBA questions, which can be quality-assured using established methods to ensure they are valid; this will enable rapid replenishment of depleted assessment banks, which resulted from Covid-era open-book exams, and provide students with more formative assessments.

Methods A commercially available LLM (OpenAI GPT-41) was prompted to generate 200 SBA questions based on Medical Schools Council guidance and Scottish Graduate-Entry Medicine (ScotGEM) Learning Outcomes (LOs). The questions were screened to ensure they conformed with the guidelines and LO before a subset were included in an examination alongside an equal number of human-authored questions, which was undertaken by students. Facility and discrimination index was calculated for each item, and the performance of AI- and human- authored questions was compared.

Results Most AI-generated SBAs were exam-ready with little to no modifications. Adjustments were made to correct, for example, the inclusion of ‘all of the above’ answers, American spellings and non-alphabetised options.

Statistical analysis showed no significant difference between AI- and human-authored questions in terms of facility and discrimination index.2

Conclusion LLMs can produce questions adhering to best-practice guidelines and relevant LOs, though a quality-assurance process is needed to ensure proper formatting and alignment. Future work will refine AI prompts for more curriculum-specific question alignment.
Original languageEnglish
JournalThe Clinical Teacher
Volume21
Issue numberS2
DOIs
Publication statusPublished - 12 Nov 2024

Fingerprint

Dive into the research topics of 'Assessing the validity of AI-authored exam questions'. Together they form a unique fingerprint.

Cite this