Abstract
Propensity scores are often used to adjust for between-group variation
in covariates, when individuals cannot be randomized to groups. There is
great flexibility in how these scores can be appropriately used. This
flexibility might encourage p-value hacking – where several alternative
uses of propensity scores are explored and the one yielding the lowest
p-value is selectively reported. Such unreported multiple testing must
inevitably inflate type I error rates – our focus is on exploring how
strong this inflation effect might be. Across three different scenarios,
we compared the performance of four different methods. Each taken
individually gave type I error rates near the nominal (5%) value, but
taking the minimum value of four tests led to actual error rates between
150% and 200% of the nominal value. Hence, we strongly recommend
pre-selection of the details of the statistical treatment of propensity
scores to avoid risk of very serious over-inflation of type I error
rates.
| Original language | English |
|---|---|
| Number of pages | 7 |
| Journal | Communications in Statistics: Case Studies, Data Analysis and Applications |
| Early online date | 19 May 2020 |
| DOIs | |
| Publication status | E-pub ahead of print - 19 May 2020 |
Keywords
- Propensity scores
- Multiple testing
- Observational study
Fingerprint
Dive into the research topics of 'Substantially inflated type I error rates if propensity score method is not fixed in advance'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver