Abstract
Propensity scores are often used to adjust for between-group variation
in covariates, when individuals cannot be randomized to groups. There is
great flexibility in how these scores can be appropriately used. This
flexibility might encourage p-value hacking – where several alternative
uses of propensity scores are explored and the one yielding the lowest
p-value is selectively reported. Such unreported multiple testing must
inevitably inflate type I error rates – our focus is on exploring how
strong this inflation effect might be. Across three different scenarios,
we compared the performance of four different methods. Each taken
individually gave type I error rates near the nominal (5%) value, but
taking the minimum value of four tests led to actual error rates between
150% and 200% of the nominal value. Hence, we strongly recommend
pre-selection of the details of the statistical treatment of propensity
scores to avoid risk of very serious over-inflation of type I error
rates.
Original language | English |
---|---|
Number of pages | 7 |
Journal | Communications in Statistics: Case Studies, Data Analysis and Applications |
Early online date | 19 May 2020 |
DOIs | |
Publication status | E-pub ahead of print - 19 May 2020 |
Keywords
- Propensity scores
- Multiple testing
- Observational study