A Double machine learning trend model for citizen science data

Daniel Fink*, Alison Johnston, Matt Strimas-Mackey, Tom Auer, Wesley M. Hochachka, Shawn Ligocki, Lauren Oldham Jaromczyk, Orin Robinson, Chris Wood, Steve Kelling, Amanda D. Rodewald

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)
2 Downloads (Pure)


1. Citizen and community science datasets are typically collected using flexible protocols. These protocols enable large volumes of data to be collected globally every year; however, the consequence is that these protocols typically lack the structure necessary to maintain consistent sampling across years. This can result in complex and pronounced interannual changes in the observation process, which can complicate the estimation of population trends because population changes over time are confounded with changes in the observation process.

2. Here we describe a novel modelling approach designed to estimate spatially explicit species population trends while controlling for the interannual confounding common in citizen science data. The approach is based on Double machine learning, a statistical framework that uses machine learning (ML) methods to estimate population change and the propensity scores used to adjust for confounding discovered in the data. ML makes it possible to use large sets of features to control for confounding and to model spatial heterogeneity in trends. Additionally, we present a simulation method to identify and adjust for residual confounding missed by the propensity scores.

3. To illustrate the approach, we estimated species trends using data from the citizen science project eBird. We used a simulation study to assess the ability of the method to estimate spatially varying trends when faced with realistic confounding and temporal correlation. Results demonstrated the ability to distinguish between spatially constant and spatially varying trends. There were low error rates on the estimated direction of population change (increasing/decreasing) at each location and high correlations on the estimated magnitude of population change.

4. The ability to estimate spatially explicit trends while accounting for confounding inherent in citizen science data has the potential to fill important information gaps, helping to estimate population trends for species and/or regions lacking rigorous monitoring data.
Original languageEnglish
Pages (from-to)2435-2448
Number of pages14
JournalMethods in Ecology and Evolution
Issue number9
Early online date21 Jul 2023
Publication statusPublished - 1 Sept 2023


  • Causal Forests
  • Causal inference
  • Citizen science
  • Confounding
  • Double machine learning
  • Machine learning
  • Propensity score
  • Trends


Dive into the research topics of 'A Double machine learning trend model for citizen science data'. Together they form a unique fingerprint.

Cite this