Substantially inflated type I error rates if propensity score method is not fixed in advance

Markus Neuhäuser, Julia M. Kraechter, Matthias Thielmann, Graeme D. Ruxton

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)
8 Downloads (Pure)


Propensity scores are often used to adjust for between-group variation in covariates, when individuals cannot be randomized to groups. There is great flexibility in how these scores can be appropriately used. This flexibility might encourage p-value hacking – where several alternative uses of propensity scores are explored and the one yielding the lowest p-value is selectively reported. Such unreported multiple testing must inevitably inflate type I error rates – our focus is on exploring how strong this inflation effect might be. Across three different scenarios, we compared the performance of four different methods. Each taken individually gave type I error rates near the nominal (5%) value, but taking the minimum value of four tests led to actual error rates between 150% and 200% of the nominal value. Hence, we strongly recommend pre-selection of the details of the statistical treatment of propensity scores to avoid risk of very serious over-inflation of type I error rates.
Original languageEnglish
Number of pages7
JournalCommunications in Statistics: Case Studies, Data Analysis and Applications
Early online date19 May 2020
Publication statusE-pub ahead of print - 19 May 2020


  • Propensity scores
  • Multiple testing
  • Observational study


Dive into the research topics of 'Substantially inflated type I error rates if propensity score method is not fixed in advance'. Together they form a unique fingerprint.

Cite this