Projects per year
Abstract
1. Ecologists and evolutionary biologists are regularly tasked with the comparison of binary data across groups. There is, however, some discussion in the biostatistics literature about the best methodology for the analysis of data comprising binary explanatory and response variables forming a 2 × 2 contingency table.
2. We assess several methodologies for the analysis of 2 × 2 contingency tables using a simulation scheme of different sample sizes with outcomes evenly or unevenly distributed between groups. Specifically, we assess the commonly recommended logistic (generalised linear model [GLM]) regression analysis, the classical Pearson chisquared test and four conventional alternatives (Yates' correction, Fisher's exact, exact unconditional and midp), as well as the widely discouraged linear model (LM) regression.
3. We found that both LM and GLM analyses provided unbiased estimates of the difference in proportions between groups. LM and GLM analyses also provided accurate standard errors and confidence intervals when the experimental design was balanced. When the experimental design was unbalanced, sample size was small, and one of the two groups had a probability close to 1 or 0, LM analysis could substantially over or underrepresent statistical uncertainty. For null hypothesis significance testing, the performance of the chisquared test and LM analysis were almost identical. Across all scenarios, both had high power to detect nonnull effects and reject false positives. By contrast, the GLM analysis was underpowered when using zbased pvalues, in particular when one of the two groups had a probability near 1 or 0. The GLM using the LRT had better power to detect nonnull results.
4. Our simulation results suggest that, wherever a chisquared test would be recommended, a linear regression is a suitable alternative for the analysis of 2 × 2 contingency table data. When researchers opt for more sophisticated procedures, we provide R functions to calculate the standard error of a difference between two probabilities from a Bernoulli GLM output using the delta method. We also explore approaches to compliment GLM analysis of 2 × 2 contingency tables with credible intervals on the probability scale. These additional operations should support researchers to make valid assessments of both statistical and practical significances.
2. We assess several methodologies for the analysis of 2 × 2 contingency tables using a simulation scheme of different sample sizes with outcomes evenly or unevenly distributed between groups. Specifically, we assess the commonly recommended logistic (generalised linear model [GLM]) regression analysis, the classical Pearson chisquared test and four conventional alternatives (Yates' correction, Fisher's exact, exact unconditional and midp), as well as the widely discouraged linear model (LM) regression.
3. We found that both LM and GLM analyses provided unbiased estimates of the difference in proportions between groups. LM and GLM analyses also provided accurate standard errors and confidence intervals when the experimental design was balanced. When the experimental design was unbalanced, sample size was small, and one of the two groups had a probability close to 1 or 0, LM analysis could substantially over or underrepresent statistical uncertainty. For null hypothesis significance testing, the performance of the chisquared test and LM analysis were almost identical. Across all scenarios, both had high power to detect nonnull effects and reject false positives. By contrast, the GLM analysis was underpowered when using zbased pvalues, in particular when one of the two groups had a probability near 1 or 0. The GLM using the LRT had better power to detect nonnull results.
4. Our simulation results suggest that, wherever a chisquared test would be recommended, a linear regression is a suitable alternative for the analysis of 2 × 2 contingency table data. When researchers opt for more sophisticated procedures, we provide R functions to calculate the standard error of a difference between two probabilities from a Bernoulli GLM output using the delta method. We also explore approaches to compliment GLM analysis of 2 × 2 contingency tables with credible intervals on the probability scale. These additional operations should support researchers to make valid assessments of both statistical and practical significances.
Original language  English 

Pages (fromto)  843855 
Number of pages  13 
Journal  Methods in Ecology and Evolution 
Volume  15 
Issue number  5 
Early online date  1 Apr 2024 
DOIs  
Publication status  Published  May 2024 
Keywords
 2 x 2 contingency table
 Chisquared test
 Linear models
 Logistic GLMs
 Uncertainty estimates
Fingerprint
Dive into the research topics of 'Classical tests, linear models, and their extensions for the analysis of 2x2 contingency tables'. Together they form a unique fingerprint.Projects
 1 Finished

Royal Society Research Fellowship: A development evolutionary quantitative genetic theory.
1/10/14 → 30/09/19
Project: Fellowship