Classical tests, linear models, and their extensions for the analysis of 2x2 contingency tables

Rebecca Nagel*, Graeme Douglas Ruxton, Michael Blair Morrissey

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

6 Downloads (Pure)

Abstract

1. Ecologists and evolutionary biologists are regularly tasked with the comparison of binary data across groups. There is, however, some discussion in the biostatistics literature about the best methodology for the analysis of data comprising binary explanatory and response variables forming a 2 × 2 contingency table.
2. We assess several methodologies for the analysis of 2 × 2 contingency tables using a simulation scheme of different sample sizes with outcomes evenly or unevenly distributed between groups. Specifically, we assess the commonly recommended logistic (generalised linear model [GLM]) regression analysis, the classical Pearson chi-squared test and four conventional alternatives (Yates' correction, Fisher's exact, exact unconditional and mid-p), as well as the widely discouraged linear model (LM) regression.
3. We found that both LM and GLM analyses provided unbiased estimates of the difference in proportions between groups. LM and GLM analyses also provided accurate standard errors and confidence intervals when the experimental design was balanced. When the experimental design was unbalanced, sample size was small, and one of the two groups had a probability close to 1 or 0, LM analysis could substantially over- or under-represent statistical uncertainty. For null hypothesis significance testing, the performance of the chi-squared test and LM analysis were almost identical. Across all scenarios, both had high power to detect non-null effects and reject false positives. By contrast, the GLM analysis was underpowered when using z-based p-values, in particular when one of the two groups had a probability near 1 or 0. The GLM using the LRT had better power to detect non-null results.
4. Our simulation results suggest that, wherever a chi-squared test would be recommended, a linear regression is a suitable alternative for the analysis of 2 × 2 contingency table data. When researchers opt for more sophisticated procedures, we provide R functions to calculate the standard error of a difference between two probabilities from a Bernoulli GLM output using the delta method. We also explore approaches to compliment GLM analysis of 2 × 2 contingency tables with credible intervals on the probability scale. These additional operations should support researchers to make valid assessments of both statistical and practical significances.
Original languageEnglish
Pages (from-to)843-855
Number of pages13
JournalMethods in Ecology and Evolution
Volume15
Issue number5
Early online date1 Apr 2024
DOIs
Publication statusPublished - May 2024

Keywords

  • 2 x 2 contingency table
  • Chi-squared test
  • Linear models
  • Logistic GLMs
  • Uncertainty estimates

Fingerprint

Dive into the research topics of 'Classical tests, linear models, and their extensions for the analysis of 2x2 contingency tables'. Together they form a unique fingerprint.

Cite this