A Double Machine Learning Trend Model for Citizen Science Data

  • Daniel Fink (Creator)
  • Alison Johnston (Creator)
  • Matthew Strimas-Mackey (Creator)
  • Tom Auer (Creator)
  • Wesley Hochachka (Creator)
  • Shawn Ligocki (Creator)
  • Lauren Oldham Jaromczyk (Creator)
  • Orin Robinson (Creator)
  • Chris Wood (Creator)
  • Steve Kelling (Creator)
  • Amanda D. Rodewald (Creator)



Citizen and community science datasets are typically collected using flexible protocols. These protocols enable large volumes of data to be collected globally every year; however, the consequence is that these protocols typically lack the structure necessary to maintain consistent sampling across years. This can result in complex and pronounced interannual changes in the observation process, which can complicate the estimation of population trends because population changes over time are confounded with changes in the observation process. Here we describe a novel modeling approach designed to estimate spatially explicit species population trends while controlling for the interannual confounding common in citizen science data. The approach is based on Double Machine Learning, a statistical framework that uses machine learning methods to estimate population change and the propensity scores used to adjust for confounding discovered in the data. Machine learning makes it possible to use large sets of features to control for confounding and to model spatial heterogeneity in trends. Additionally, we present a simulation method to identify and adjust for residual confounding missed by the propensity scores. To illustrate the approach, we estimated species trends using data from the citizen science project eBird. We used a simulation study to assess the ability of the method to estimate spatially varying trends when faced with realistic confounding and temporal correlation. Results demonstrated the ability to distinguish between spatially constant and spatially varying trends. There were low error rates on the estimated direction of population change (increasing/decreasing) at each location and high correlations on the estimated magnitude of population change. The ability to estimate spatially explicit trends while accounting for confounding inherent in citizen science data has the potential to fill important information gaps, helping to estimate population trends for species and/or regions lacking rigorous monitoring data.
Date made available28 Jun 2023


  • Software

Cite this