TY - JOUR
T1 - Accounting for preferential sampling in species distribution models
AU - Pennino, Maria Grazia
AU - Paradinas, Iosu
AU - Illian, Janine B.
AU - Muñoz, Facundo
AU - Bellido, José María
AU - López-Quílez, Antonio
AU - Conesa, David
N1 - D. C., A. L. Q. and F. M. would like to thank the Ministerio de Educación y Ciencia (Spain) for financial support (jointly financed by the European Regional Development Fund) via Research Grants MTM2013‐42323‐P and MTM2016‐77501‐P, and ACOMP/2015/202 from Generalitat Valenciana (Spain).
PY - 2019/1/1
Y1 - 2019/1/1
N2 - Species distribution models (SDMs) are now being widely used in ecology
for management and conservation purposes across terrestrial, freshwater,
and marine realms. The increasing interest in SDMs has drawn the
attention of ecologists to spatial models and, in particular, to
geostatistical models, which are used to associate observations of
species occurrence or abundance with environmental covariates in a
finite number of locations in order to predict where (and how much of) a
species is likely to be present in unsampled locations. Standard
geostatistical methodology assumes that the choice of sampling locations
is independent of the values of the variable of interest. However, in
natural environments, due to practical limitations related to time and
financial constraints, this theoretical assumption is often violated. In
fact, data commonly derive from opportunistic sampling (e.g., whale or
bird watching), in which observers tend to look for a specific species
in areas where they expect to find it. These are examples of what is
referred to as preferential sampling, which can lead to biased
predictions of the distribution of the species. The aim of this study is
to discuss a SDM that addresses this problem and that it is more
computationally efficient than existing MCMC methods. From a statistical
point of view, we interpret the data as a marked point pattern, where
the sampling locations form a point pattern and the measurements taken
in those locations (i.e., species abundance or occurrence) are the
associated marks. Inference and prediction of species distribution is
performed using a Bayesian approach, and integrated nested Laplace
approximation (INLA) methodology and software are used for model fitting
to minimize the computational burden. We show that abundance is highly
overestimated at low abundance locations when preferential sampling
effects not accounted for, in both a simulated example and a practical
application using fishery data. This highlights that ecologists should
be aware of the potential bias resulting from preferential sampling and
account for it in a model when a survey is based on non‐randomized
and/or non‐systematic sampling.
AB - Species distribution models (SDMs) are now being widely used in ecology
for management and conservation purposes across terrestrial, freshwater,
and marine realms. The increasing interest in SDMs has drawn the
attention of ecologists to spatial models and, in particular, to
geostatistical models, which are used to associate observations of
species occurrence or abundance with environmental covariates in a
finite number of locations in order to predict where (and how much of) a
species is likely to be present in unsampled locations. Standard
geostatistical methodology assumes that the choice of sampling locations
is independent of the values of the variable of interest. However, in
natural environments, due to practical limitations related to time and
financial constraints, this theoretical assumption is often violated. In
fact, data commonly derive from opportunistic sampling (e.g., whale or
bird watching), in which observers tend to look for a specific species
in areas where they expect to find it. These are examples of what is
referred to as preferential sampling, which can lead to biased
predictions of the distribution of the species. The aim of this study is
to discuss a SDM that addresses this problem and that it is more
computationally efficient than existing MCMC methods. From a statistical
point of view, we interpret the data as a marked point pattern, where
the sampling locations form a point pattern and the measurements taken
in those locations (i.e., species abundance or occurrence) are the
associated marks. Inference and prediction of species distribution is
performed using a Bayesian approach, and integrated nested Laplace
approximation (INLA) methodology and software are used for model fitting
to minimize the computational burden. We show that abundance is highly
overestimated at low abundance locations when preferential sampling
effects not accounted for, in both a simulated example and a practical
application using fishery data. This highlights that ecologists should
be aware of the potential bias resulting from preferential sampling and
account for it in a model when a survey is based on non‐randomized
and/or non‐systematic sampling.
KW - Bayesian modelling
KW - Integrated nested Laplace approximation
KW - Point processes
KW - Species Distribution Models (SDMs)
KW - Stochastic partial differential equation
UR - http://www.scopus.com/inward/record.url?scp=85060489279&partnerID=8YFLogxK
U2 - 10.1002/ece3.4789
DO - 10.1002/ece3.4789
M3 - Article
SN - 2045-7758
VL - 9
SP - 653
EP - 663
JO - Ecology and Evolution
JF - Ecology and Evolution
IS - 1
ER -