The data and modelling objects created in this notebook can be downloaded directly to save computational time.
Users who wish to complete the exercises can download a small
template R
script. Assuming you have already downloaded the
data objects above, this script will load all data objects so that the
steps used to create them are not necessary to tackle the exercises.
This tutorial relates to content covered in Lecture 1 and Lecture 2, and relies on the following packages for manipulating data, shaping time series, fitting dynamic regression models and plotting:
library(dplyr)
library(mvgam)
library(gratia)
library(ggplot2); theme_set(theme_classic())
library(marginaleffects)
library(viridis)
We will work with time series of rodent captures from the Portal Project, a long-term monitoring study based near the town of Portal, Arizona. Researchers have been operating a standardized set of baited traps within 24 experimental plots at this site since the 1970’s. Sampling follows the lunar monthly cycle, with observations occurring on average about 28 days apart. However, missing observations do occur due to difficulties accessing the site (weather events, COVID disruptions etc…). You can read about the sampling protocol in this preprint by Ernest et al on the Biorxiv.
All data from the Portal Project are made openly available in
near real-time so that they can provide maximum benefit to scientific
research and outreach (a set of open-source software tools make data readily
accessible). These data are extremely rich, containing monthly
counts of rodent captures for >20 species. But rather than accessing
the raw data, we will use some data that I have already processed and
put into a simple, usable form
data("portal_data")
As the data come pre-loaded with mvgam
, you can read a
little about it in the help page using ?portal_data
. Before
working with data, it is important to inspect how the data are
structured. There are various ways to inspect data in R
; I
typically find the the glimpse()
function in
dplyr
to be useful for understanding how variables are
structured
dplyr::glimpse(portal_data)
## Rows: 320
## Columns: 5
## $ time <int> 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, …
## $ series <fct> DM, DO, PB, PP, DM, DO, PB, PP, DM, DO, PB, PP, DM, DO, PB, …
## $ captures <int> 20, 2, 0, 0, NA, NA, NA, NA, 36, 5, 0, 0, 40, 3, 0, 1, 29, 3…
## $ ndvi_ma12 <dbl> -0.172144125, -0.172144125, -0.172144125, -0.172144125, -0.2…
## $ mintemp <dbl> -0.79633807, -0.79633807, -0.79633807, -0.79633807, -1.33471…
We will analyse time series of captures for one specific rodent species, the Desert Pocket Mouse Chaetodipus penicillatus. This species is interesting in that it goes into a kind of “hibernation” during the colder months, leading to very low captures during the winter period
Manipulating the data into a ‘tidy’ format is necessary for modelling
in mvgam
, mgcv
and brms
(see the
formatting data for use in mvgam
vignette
for more details). By ‘long’ format, we mean that each
series x time
observation needs to have its own entry in
the data.frame
or list
object that we wish to
use as data for modelling. A simple example can be viewed by simulating
data using sim_mvgam()
.
data <- sim_mvgam(n_series = 4, T = 24)
head(data$data_train, 12)
## y season year series time
## 1 0 1 1 series_1 1
## 2 0 1 1 series_2 1
## 3 2 1 1 series_3 1
## 4 0 1 1 series_4 1
## 5 0 2 1 series_1 2
## 6 0 2 1 series_2 2
## 7 0 2 1 series_3 2
## 8 0 2 1 series_4 2
## 9 0 3 1 series_1 3
## 10 0 3 1 series_2 3
## 11 1 3 1 series_3 3
## 12 0 3 1 series_4 3
Notice how we have four different time series in these simulated
data, but we do not spread these out into different columns. Rather,
there is only a single column for the outcome variable, labelled
y
in these simulated data. We also must supply a variable
labelled time
to ensure the modelling software knows how to
arrange the time series when building models. This setup still allows us
to formulate multivariate time series models, as we will see in later
tutorials. Below are the steps needed to shape our
portal_data
object into the correct form. First, we create
a time
variable, select the column representing counts of
our target species (PP
), and select some predictors
portal_data %>%
# Filter the data to only contain captures of the 'PP'
dplyr::filter(series == 'PP') %>%
droplevels() %>%
# Create a 'count' variable for the outcome
dplyr::mutate(count = captures) %>%
# Add a 'year' variable
dplyr::mutate(year = sort(rep(1:8, 12))[time]) %>%
# Select the variables of interest to keep in the model_data
dplyr::select(series, year, time, count, mintemp, ndvi_ma12) -> model_data
The data now contain six variables:
series
, a factor indexing which time series each
observation belongs to
year
, the year of sampling
time
, the indicator of which time step each observation
belongs to
count
, the response variable representing the number of
captures of the species PP
in each sampling
observation
mintemp
, the monthly average minimum temperature at each
time step
ndvi_ma12
, a 12-month moving average of the monthly
Normalized Difference Vegetation Index at each time step
Check the data structure again
dplyr::glimpse(model_data)
## Rows: 80
## Columns: 6
## $ series <fct> PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, …
## $ year <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, …
## $ time <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
## $ count <int> 0, NA, 0, 1, 7, 7, 8, 8, 4, NA, 0, 0, 0, 0, 0, 0, NA, 2, 4, …
## $ mintemp <dbl> -0.79633807, -1.33471597, -1.24166462, -1.08048145, -0.42447…
## $ ndvi_ma12 <dbl> -0.172144125, -0.237363477, -0.212120638, -0.160438125, -0.0…
summary(model_data)
## series year time count mintemp
## PP:80 Min. :1.00 Min. : 1.00 Min. : 0.000 Min. :-2.0978
## 1st Qu.:2.00 1st Qu.:20.75 1st Qu.: 1.000 1st Qu.:-1.0808
## Median :4.00 Median :40.50 Median : 5.000 Median :-0.4091
## Mean :3.85 Mean :40.50 Mean : 5.222 Mean :-0.2151
## 3rd Qu.:5.25 3rd Qu.:60.25 3rd Qu.: 8.000 3rd Qu.: 0.6133
## Max. :7.00 Max. :80.00 Max. :21.000 Max. : 1.4530
## NA's :17
## ndvi_ma12
## Min. :-0.66884
## 1st Qu.:-0.20869
## Median :-0.16517
## Mean :-0.09501
## 3rd Qu.:-0.03440
## Max. : 0.74831
##
We have some NA
s in our response variable
count
. Visualize the data as a heatmap to get a sense of
where these are distributed (NA
s are shown as red bars)
image(
is.na(t(model_data %>%
dplyr::arrange(dplyr::desc(time)))), axes = F,
col = c('grey80', 'darkred')
)
axis(
3,
at = seq(0,1,
len = NCOL(model_data)),
labels = colnames(model_data)
)
These observations will be thrown out by most modelling packages in
R
. But as you will see when we work through the tutorials,
mvgam
keeps these in the data so that predictions can be
automatically returned for the full dataset.
count
,
mintemp
and ndvi_ma12
variables as histograms
to get a better sense of the types of values they can take. See
?hist
for guidancecount
are
zeros. See ?which
and ?==
for guidancecount
# the which function is useful for finding indices in a variable that fit a certain pattern
which(model_data$count == 0)
# to see how many zeros there are, calculate the length of this set of indices
length(which(model_data$count == 0))
# to calculate proportion of observations that are zero, divide this by the total number of observations
length(which(model_data$count == 0)) / length(model_data$count)
“The first thing to do in any data analysis task is to plot the data. Graphs enable many features of the data to be visualised, including patterns, unusual observations, changes over time, and relationships between variables” (Hyndman and Athanasopoulos; Forecasting: Principles and Practice (3rd ed))
A time series is a set of observations taken sequentially in time. When working with time series, the first step (after inspecting the data structure) is usually to plot observations over time, as either a line plot or scatterplot
ggplot(
data = model_data,
aes(x = time, y = count)
) +
geom_line(col = 'darkred') +
geom_point(col = 'darkred') +
labs(x = 'Time',
y = 'Counts')
## Warning: Removed 17 rows containing missing values or values outside the scale range
## (`geom_point()`).
This plot reveals several important features including: missing data, evidence of repeated seasonal cycles, lower bounding at zero and long-term undulations. Most of these features make it challenging to model series like this using conventional time series / forecasting software. But before we discuss modelling approaches, let’s build some more visuals to look for other key features
Lag plots are often used to inspect temporal autocorrelation at a
variety of lags for a univariate time series. While correlation measures
the extent of a linear relationship between two variables,
autocorrelations measure linear relationships between lagged values of a
time series. Ordinarily we’d want to plot our outcome of interest
(count
), but the default lag.plot
can’t handle
missing values (more on that below). But we can inspect correlations
among temperature values (using the mintemp
variable) and
lagged versions of mintemp
to see evidence of strong
positive autocorrelation at several lags
lag.plot(model_data$mintemp,
lags = 4,
pch = 16,
labels = FALSE,
do.lines = FALSE)
This function takes differences of the variable mintemp
and plotting the lagged value (i.e. the value of the series at time
\(t-lag\)) vs the observed value at
time \(t\). We can replicate this
process for a single lag so it is more obvious what is happening under
the hood
mintemp_lag1 <- diff(model_data$mintemp, lag = 1)
ggplot(
data.frame(lag1 = mintemp_lag1,
now = model_data$mintemp[2:NROW(model_data)]),
aes(x = lag1,
y = now)
) +
geom_point(col = 'darkred') +
labs(x = expression("mintemp"["t-1"]),
y = expression("mintemp"["t"]))
A Pearson’s correlation test confirms that there is a strong positive
relationship between current ndvi
and what
ndvi
was one month earlier
cor.test(mintemp_lag1,
model_data$mintemp[2:NROW(model_data)])
##
## Pearson's product-moment correlation
##
## data: mintemp_lag1 and model_data$mintemp[2:NROW(model_data)]
## t = 2.6121, df = 77, p-value = 0.01081
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.06851564 0.47636349
## sample estimates:
## cor
## 0.2853038
But unfortunately the lag plot will fail when the variable of
interest has missing values, which our outcome of interest
(count
) has plenty of. Rather than using a lag plot, we can
instead plot the autocorrelation function at a variety of lags. These
plots are often used for deciding which moving average order
to choose when building autoregressive models
acf(model_data$count,
na.action = na.pass,
bty = 'l',
lwd = 2,
ci.col = 'darkred',
main = 'Counts of PP trappings')
Several features are evident from this plot: First, we observe positive autocorrelation at small lags. When time series show a trend, autocorrelations for small lags will often to be large and positive because observations nearby in time have similar values. So the ACF suggests there is evidence of a trend as autocorrelations tend to slowly decrease as the lags increase. Second, we see evidence of seasonality in captures. When data are seasonal, the autocorrelations will often be larger for the seasonal lags (at multiples of the seasonal period) than for other lags. We know this species shows evidence of torpor in the winter months, so it is not surprising to see a strong annual pattern in these autocorrelations.
The partial autocorrelation function shows something similar, but has controlled for the short-term lagged correlations. These plots are often used for selecting orders of autoregressive models. For example, if we choose an AR2 model then we hope our AR2 parameter will capture remaining autocorrelations at lag 2 when controlling for any autocorrelations at lag 1.
pacf(model_data$count,
na.action = na.pass,
bty = 'l',
lwd = 2,
ci.col = 'darkred',
main = 'Counts of PP trappings')
Both of these plots suggest there is temporal autocorrelation
present, as well as seasonality. We can view the ACF and some other
useful features of the data with one of mvgam
’s primary
functions, plot_mvgam_series
(see
?plot_mvgam_series()
for details):
plot_mvgam_series(data = model_data,
y = 'count')
Another standard vizualisation is Seasonal Trend Decomposition. It is
often helpful to split a time series into several components, each
representing an underlying pattern, to inspect features of the data that
may not be immediately apparent. STL decomposes a time series into
seasonal, trend and irregular components using loess
smoothing (see ?stl
for details). But just to get to a
point where we can use this with most ecological series, we have to deal
with missing observations. The usual practice is to impute these in some
way. We can use the na.approx
function from the
zoo
package for this, which will use linear interpolation
to impute missing values. This requires that we first convert the
count
observations to a time series object of class
ts
. We set the frequency to 12
as this is
close enough to what we have with our lunar monthly observation
intervals. We will not be using the imputed data for any modelling, so
please don’t worry too much if you aren’t up to speed on how linear
interpolation works. This is purely for visualization purposes
# First convert the counts to a time series object of class ts
count_ts <- ts(model_data$count,
frequency = 12)
# Impute NAs using linear interpolation
interp_series <- zoo::na.approx(count_ts)
Now we can perform the decomposition. This requires us to specify the seasonal window, which controls the span in lags over which the seasonal loess will operate. Smaller values allow for more rapid changes. A key advantage of this method is that the seasonal component is allowed to change over time. But a drawback is that this parameter must be chosen by the analyst, and changing its value can give different results. We will use a 5-year seasonal window in this first example
plot(
stl(interp_series,
s.window = 5)
)
The strong seasonality is obvious in this plot, as is the evidence of
an undulating trend. There is also a lot of variation in the
remainder
component, which demonstrates that the
seasonality and trend alone cannot capture all of the variation in the
observations over time. The sizes of the bars on the right-hand side of
these plots show scale to help us understand the relative magnitudes of
each component. But what happens if we change the seasonal window and
force it to be stable through time?
plot(
stl(interp_series,
s.window = 'periodic')
)
Here the remainder
component is even more important, and
the trend has become less smooth. Try varying the s.window
argument to see how the conclusions about component contributions will
change. For more information on decomposition, see this chapter in Applied Time Series Analysis for Fisheries and
Environmental Sciences, by Holmes, Scheuerell and Ward
count
and mintemp
at lags of 1 - 15 months.
This procedure is often used to help identify lags of the x-variable
that might be useful predictors of the outcome. Note, you will need to
use na.action = na.pass
so that the missing values in
count
don’t cause problems. See ?ccf
for
details about calculating cross-correlations using
type = 'correlation'
. Which lags of mintemp
would you consider to be the most likely candidates for predicting our
outcome count
?STL
decomposition for the mintemp
variableccf
# fill in the "?" with the correct variable / value
ccf(x = ?,
y = ?,
na.action = ?,
lag.max = ?,
# compute correlations at each lag
type = 'correlation',
# not necessary but makes for a prettier plot
bty = 'l',
lwd = 2,
ci.col = 'darkred',
main = expression(paste(italic(Cor),"(", mintemp[lag], ",", count, ")")))
Now onto some modelling! Our first task will be to fit a Generalized
Linear Model (GLM) that can capture the features of our
count
observations (integer data, lower bound at zero,
missing values) while also attempting to model temporal variation. We
will use a GLM with Poisson observations, a log link function and random
(hierarchical) intercepts for year
. This will allow us to
capture our prior belief that, although each year is unique, having been
sampled from the same population of effects, all years are connected and
thus might contain valuable information about one another. This will
capitalize on the partial pooling properties of hierarchical models:
Hierarchical (random) effects offer many advantages when
modelling data with grouping structures (i.e. multiple species,
locations, years etc…). But before we fit the model, we will need to
convert year
to a factor so that we can use a random effect
basis in mvgam
. See ?smooth.terms
and
?smooth.construct.re.smooth.spec
for details about the
re
basis construction that is used by both
mvgam
and mgcv
model_data %>%
# Convert 'year' to a factor variable
dplyr::mutate(year = factor(year)) -> model_data
levels(model_data$year)
We are now ready for our first mvgam
model. The syntax
will be familiar to users who have built models with mgcv
.
But for a refresher, see ?mvgam_formulae
and the examples
in ?gam.models
. Random effects can be specified using the
s()
wrapper with the re
basis. Note that we
can also suppress the primary intercept using the usual R
formula syntax - 1
. mvgam
has a number of
possible observation families that can be used, see
?mvgam_families
for more information. We will use
Stan
as the fitting engine, which deploys Hamiltonian Monte
Carlo (HMC) for full Bayesian inference. By default, 4 HMC chains will
be run using a warmup of 500 iterations and collecting 225 posterior
samples from each chain. Ordinarily we would take more posterior samples
per chain but 225 give us an object that will be easier to download for
completing the exercises. Interested users should consult the Stan
user’s guide for more information
about the software and the enormous variety of models that can be
tackled with HMC.
model1 <- mvgam(
count ~ s(year, bs = 're') - 1,
family = poisson(),
data = model_data,
samples = 225
)
With any model fitted in mvgam
, the Stan
code can be viewed using code()
. This is helpful if you
aren’t sure what priors are used by default:
stancode(model1)
Stan
code
## // Stan model code generated by package mvgam
## data {
## int<lower=0> total_obs; // total number of observations
## int<lower=0> n; // number of timepoints per series
## int<lower=0> n_series; // number of series
## int<lower=0> num_basis; // total number of basis coefficients
## matrix[total_obs, num_basis] X; // mgcv GAM design matrix
## array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
## int<lower=0> n_nonmissing; // number of nonmissing observations
## array[n_nonmissing] int<lower=0> flat_ys; // flattened nonmissing observations
## matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
## array[n_nonmissing] int<lower=0> obs_ind; // indices of nonmissing observations
## }
## parameters {
## // raw basis coefficients
## vector[num_basis] b_raw;
##
## // random effect variances
## vector<lower=0>[1] sigma_raw;
##
## // random effect means
## vector[1] mu_raw;
## }
## transformed parameters {
## // basis coefficients
## vector[num_basis] b;
## b[1 : 7] = mu_raw[1] + b_raw[1 : 7] * sigma_raw[1];
## }
## model {
## // prior for random effect population variances
## sigma_raw ~ student_t(3, 0, 2.5);
##
## // prior for random effect population means
## mu_raw ~ std_normal();
##
## // prior (non-centred) for s(year)...
## b_raw[1 : 7] ~ std_normal();
## {
## // likelihood functions
## flat_ys ~ poisson_log_glm(flat_xs, 0.0, b);
## }
## }
## generated quantities {
## vector[total_obs] eta;
## matrix[n, n_series] mus;
## array[n, n_series] int ypred;
##
## // posterior predictions
## eta = X * b;
## for (s in 1 : n_series) {
## mus[1 : n, s] = eta[ytimes[1 : n, s]];
## ypred[1 : n, s] = poisson_log_rng(mus[1 : n, s]);
## }
## }
Once the model has finished, the first step is to inspect the
summary()
to ensure no major diagnostic warnings have been
produced and to summarise posterior effects
summary(model1)
## GAM formula:
## count ~ s(year, bs = "re") - 1
##
## Family:
## poisson
##
## Link function:
## log
##
## Trend model:
## None
##
## N series:
## 1
##
## N timepoints:
## 80
##
## Status:
## Fitted using Stan
## 4 chains, each with iter = 725; warmup = 500; thin = 1
## Total post-warmup draws = 900
##
##
## GAM coefficient (beta) estimates:
## 2.5% 50% 97.5% Rhat n_eff
## s(year).1 0.900 1.30 1.6 1 1279
## s(year).2 0.860 1.20 1.5 1 1280
## s(year).3 -0.045 0.56 1.0 1 926
## s(year).4 2.100 2.30 2.5 1 1247
## s(year).5 1.100 1.50 1.8 1 978
## s(year).6 1.600 1.80 2.1 1 1301
## s(year).7 1.900 2.10 2.3 1 1596
##
## GAM group-level estimates:
## 2.5% 50% 97.5% Rhat n_eff
## mean(s(year)) 0.67 1.50 1.9 1.01 155
## sd(s(year)) 0.37 0.69 1.6 1.03 123
##
## Approximate significance of GAM smooths:
## edf Ref.df Chi.sq p-value
## s(year) 5.66 7 252 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Stan MCMC diagnostics:
## n_eff / iter looks reasonable for all parameters
## Rhat looks reasonable for all parameters
## 0 of 900 iterations ended with a divergence (0%)
## 0 of 900 iterations saturated the maximum tree depth of 10 (0%)
## E-FMI indicated no pathological behavior
##
## Samples were drawn using NUTS(diag_e) at Thu Mar 06 8:29:03 AM 2025.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split MCMC chains
## (at convergence, Rhat = 1)
##
## Use how_to_cite(model1) to get started describing this model
Another useful function to call after fitting a model with
mvgam()
is to use how_to_cite()
, which will
generate a full referenced text description that you can modify for your
materials and methods sections in scientific publications
description <- how_to_cite(model1)
description
## Methods text skeleton
## We used the R package mvgam (version 1.1.5001; Clark & Wells, 2023) to
## construct, fit and interrogate the model. mvgam fits Bayesian
## State-Space models that can include flexible predictor effects in both
## the process and observation components by incorporating functionalities
## from the brms (Burkner 2017), mgcv (Wood 2017) and splines2 (Wang & Yan,
## 2023) packages. The mvgam-constructed model and observed data were
## passed to the probabilistic programming environment Stan (version
## 2.36.0; Carpenter et al. 2017, Stan Development Team 2025), specifically
## through the cmdstanr interface (Gabry & Cesnovar, 2021). We ran 4
## Hamiltonian Monte Carlo chains for 500 warmup iterations and 225
## sampling iterations for joint posterior estimation. Rank normalized
## split Rhat (Vehtari et al. 2021) and effective sample sizes were used to
## monitor convergence.
##
## Primary references
## Clark, NJ and Wells K (2023). Dynamic Generalized Additive Models
## (DGAMs) for forecasting discrete ecological time series. Methods in
## Ecology and Evolution, 14, 771-784. doi.org/10.1111/2041-210X.13974
## Burkner, PC (2017). brms: An R Package for Bayesian Multilevel Models
## Using Stan. Journal of Statistical Software, 80(1), 1-28.
## doi:10.18637/jss.v080.i01
## Wood, SN (2017). Generalized Additive Models: An Introduction with R
## (2nd edition). Chapman and Hall/CRC.
## Wang W and Yan J (2021). Shape-Restricted Regression Splines with R
## Package splines2. Journal of Data Science, 19(3), 498-517.
## doi:10.6339/21-JDS1020 https://doi.org/10.6339/21-JDS1020.
## Carpenter, B, Gelman, A, Hoffman, MD, Lee, D, Goodrich, B, Betancourt,
## M, Brubaker, M, Guo, J, Li, P and Riddell, A (2017). Stan: A
## probabilistic programming language. Journal of Statistical Software 76.
## Gabry J, Cesnovar R, Johnson A, and Bronder S (2025). cmdstanr: R
## Interface to 'CmdStan'. https://mc-stan.org/cmdstanr/,
## https://discourse.mc-stan.org.
## Vehtari A, Gelman A, Simpson D, Carpenter B, and Burkner P (2021).
## Rank-normalization, folding, and localization: An improved Rhat for
## assessing convergence of MCMC (with discussion). Bayesian Analysis 16(2)
## 667-718. https://doi.org/10.1214/20-BA1221.
##
## Other useful references
## Arel-Bundock, V, Greifer, N, and Heiss, A (2024). How to interpret
## statistical models using marginaleffects for R and Python. Journal of
## Statistical Software, 111(9), 1-32.
## https://doi.org/10.18637/jss.v111.i09
## Gabry J, Simpson D, Vehtari A, Betancourt M, and Gelman A (2019).
## Visualization in Bayesian workflow. Journal of the Royal Statatistical
## Society A, 182, 389-402. doi:10.1111/rssa.12378.
## Vehtari A, Gelman A, and Gabry J (2017). Practical Bayesian model
## evaluation using leave-one-out cross-validation and WAIC. Statistics and
## Computing, 27, 1413-1432. doi:10.1007/s11222-016-9696-4.
## Burkner, PC, Gabry, J, and Vehtari, A. (2020). Approximate
## leave-future-out cross-validation for Bayesian time series models.
## Journal of Statistical Computation and Simulation, 90(14), 2499-2523.
## https://doi.org/10.1080/00949655.2020.1783262
We can get some sense of the variation in yearly intercepts from the
summary, but it is easier to understand them by plotting. Plot posterior
distributions of the temporal random effects using
plot.mvgam()
with type = 're'
. See
?plot.mvgam
for more details
plot(model1,
type = 're')
A better way, however, to quickly plot all key effects in an
mvgam
model is to use conditional_effects()
,
which can give quick overviews of estimated effects on either the link
scale or the response scale (see ?conditional_effects.mvgam
for details)
conditional_effects(model1)
How do these yearly intercepts translate into time-varying
predictions? To understand this, we can plot posterior hindcasts from
this model. The hindcast()
function can be used to return
an object of class mvgam_forecast
. In the
hindcasts
slot, a matrix of predictions will be returned
for each series in the data (only one series in our example):
hcs <- hindcast(model1)
str(hcs)
## List of 15
## $ call :Class 'formula' language count ~ s(year, bs = "re") - 1
## .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## $ trend_call : NULL
## $ family : chr "poisson"
## $ trend_model : chr "None"
## $ drift : logi FALSE
## $ use_lv : logi FALSE
## $ fit_engine : chr "stan"
## $ type : chr "response"
## $ series_names : chr "PP"
## $ train_observations:List of 1
## ..$ PP: int [1:80] 0 NA 0 1 7 7 8 8 4 NA ...
## $ train_times : int [1:80] 1 2 3 4 5 6 7 8 9 10 ...
## $ test_observations : NULL
## $ test_times : NULL
## $ hindcasts :List of 1
## ..$ PP: num [1:900, 1:80] 7 3 4 3 3 3 5 1 2 5 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : NULL
## .. .. ..$ : chr [1:80] "ypred[1,1]" "ypred[2,1]" "ypred[3,1]" "ypred[4,1]" ...
## $ forecasts : NULL
## - attr(*, "class")= chr "mvgam_forecast"
plot(hcs)
## No non-missing values in test_observations; cannot calculate forecast score
## Warning: Removed 17 rows containing missing values or values outside the scale range
## (`geom_point()`).
In any regression, a key question is whether the residuals show any
patterns that can be indicate un-modelled variation. For GLMs, we can
use a modified residual called the Dunn-Smyth,
or randomized quantile, residual. Inspect Dunn-Smyth residuals from
the model using plot()
with
type = 'residuals'
plot(model1,
type = 'residuals')
n_draws x n_timepoints
. Calculate posterior median
residuals per time point and plot these (hint, use
?pp_check
for guidance). Do these residuals look worrisome
to you?# fill in the "?" with the correct variable / value
# use pp_check() to plot residuals against a covariate from the model
pp_check(model1,
type = 'resid_ribbon',
x = ?,
ndraws = 200)
These random effects do not have a sense of “time”. Each yearly
intercept is not restricted to be similar to the previous yearly
intercept. This drawback is evident when we predict for a new year. To
do this, we repeat the exercise above but this time split the data into
training and testing sets before re-running the model. We can then
supply the test set as newdata
. For splitting, we will make
use of the filter()
function from dplyr
model_data %>%
dplyr::filter(time <= 70) -> data_train
model_data %>%
dplyr::filter(time > 70) -> data_test
model1b <- mvgam(
count ~ s(year, bs = 're') - 1,
family = poisson(),
data = data_train,
newdata = data_test
)
Using the forecast()
function and its associated
plot()
method gives insight into how the model’s
hierarchical formulation provides all structure needed to sample values
for un-modelled years
fcs <- forecast(model1b)
str(fcs)
## List of 16
## $ call :Class 'formula' language count ~ s(year, bs = "re") - 1
## .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## $ trend_call : NULL
## $ family : chr "poisson"
## $ family_pars : NULL
## $ trend_model : chr "None"
## $ drift : logi FALSE
## $ use_lv : logi FALSE
## $ fit_engine : chr "stan"
## $ type : chr "response"
## $ series_names : Factor w/ 1 level "PP": 1
## $ train_observations:List of 1
## ..$ PP: int [1:70] 0 NA 0 1 7 7 8 8 4 NA ...
## $ train_times : int [1:70] 1 2 3 4 5 6 7 8 9 10 ...
## $ test_observations :List of 1
## ..$ PP: int [1:10] NA 4 11 8 5 2 5 8 14 14
## $ test_times : int [1:10] 71 72 73 74 75 76 77 78 79 80
## $ hindcasts :List of 1
## ..$ PP: num [1:2000, 1:70] 4 3 3 3 1 3 2 3 2 3 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : NULL
## .. .. ..$ : chr [1:70] "ypred[1,1]" "ypred[2,1]" "ypred[3,1]" "ypred[4,1]" ...
## $ forecasts :List of 1
## ..$ PP: num [1:2000, 1:10] 5 7 2 7 7 5 4 7 4 3 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : NULL
## .. .. ..$ : chr [1:10] "ypred[71,1]" "ypred[72,1]" "ypred[73,1]" "ypred[74,1]" ...
## - attr(*, "class")= chr "mvgam_forecast"
plot(fcs)
## Out of sample DRPS:
## 29.7823355
## Warning: Removed 17 rows containing missing values or values outside the scale range
## (`geom_point()`).
Here you can see that the predictions do not capture the temporal variation in the test set.
We nearly always wish to include predictor variables that may explain
some variation in our observations. Here, we will update the model from
above by including a parametric (fixed) effect of mintemp
.
This code also shows how you can start modifying priors, in this case by
using a standard normal prior for the mintemp
effect:
model2 <- mvgam(
count ~ s(year, bs = 're') +
mintemp - 1,
family = poisson(),
# Standard normal prior on the NDVI coefficient
priors = prior(normal(0, 1), class = b),
data = data_train,
newdata = data_test
)
mintemp
value at each timepoint \(t\).
Inspect the summary
summary(model2)
## GAM formula:
## count ~ mintemp + s(year, bs = "re") - 1
##
## Family:
## poisson
##
## Link function:
## log
##
## Trend model:
## None
##
## N series:
## 1
##
## N timepoints:
## 80
##
## Status:
## Fitted using Stan
## 4 chains, each with iter = 1000; warmup = 500; thin = 1
## Total post-warmup draws = 2000
##
##
## GAM coefficient (beta) estimates:
## 2.5% 50% 97.5% Rhat n_eff
## mintemp 0.320 0.46 0.6 1.00 1331
## s(year).1 0.770 1.10 1.4 1.00 2541
## s(year).2 0.910 1.20 1.5 1.00 2286
## s(year).3 -0.046 0.53 1.0 1.00 1761
## s(year).4 1.900 2.10 2.3 1.00 2073
## s(year).5 1.100 1.40 1.7 1.00 2208
## s(year).6 1.800 2.10 2.3 1.00 2503
## s(year).7 -0.760 1.30 3.0 1.01 492
##
## GAM group-level estimates:
## 2.5% 50% 97.5% Rhat n_eff
## mean(s(year)) 0.53 1.30 1.9 1.00 493
## sd(s(year)) 0.36 0.72 1.9 1.02 159
##
## Approximate significance of GAM smooths:
## edf Ref.df Chi.sq p-value
## s(year) 5.01 7 186 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Stan MCMC diagnostics:
## n_eff / iter looks reasonable for all parameters
## Rhat looks reasonable for all parameters
## 2 of 2000 iterations ended with a divergence (0.1%)
## *Try running with larger adapt_delta to remove the divergences
## 0 of 2000 iterations saturated the maximum tree depth of 10 (0%)
## E-FMI indicated no pathological behavior
##
## Samples were drawn using NUTS(diag_e) at Thu Mar 06 8:50:35 AM 2025.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split MCMC chains
## (at convergence, Rhat = 1)
##
## Use how_to_cite(model2) to get started describing this model
Look again at the estimated effects, but this time on the link scale
conditional_effects(model2,
type = 'link')
This plot indicates a positive effect of mintemp
on
log(counts)
. But given that our model used a nonlinear link
function, it can still be difficult to fully understand what
relationship our model is estimating between a predictor and the
response. We can use
conditional_effects(model2, type = 'response')
to to again
see these on the response scale, or we can use
plot_predictions()
for more customisable plots:
plot_predictions(model2,
condition = "mintemp",
# include the observed count values
# as points, and show rugs for the observed
# ndvi and count values on the axes
points = 0.5, rug = TRUE) +
labs(y = 'Predicted count', x = 'Mintemp')
Now it is easier to get a sense of the positive effect of
mintemp
. But has this predictor made an impact on
predictions?
plot(
forecast(model2,
newdata = data_test)
)
## Out of sample DRPS:
## 36.38919975
## Warning: Removed 17 rows containing missing values or values outside the scale range
## (`geom_point()`).
These predictions certainly look different. Are the residual diagnostics improved?
plot(model2,
type = 'residuals')
ndvi_12
and our outcome
count
ndvi_ma12
as a second predictor. Be sure to first
check whether there is a strong pairwise correlation between
mintemp
and ndvi_12
(hint, use
?cor.test
for guidance)ndvi_12
on
log(counts)
? Or does inclusion of ndvi_ma12
lead to different inference on the effect of mintemp
compared to when ndvi_ma12
was not in the model?We will now incorporate a spline of time
in place of the
temporal yearly random effects. Nonlinear splines are commonly viewed as
variations of random effects in which coefficients that control the
shape of the spline are drawn from a joint, penalized distribution. This
strategy is often used in time series analysis to capture smooth
temporal variation. The lectures (particularly lecture 2) have gone into detail about smoothing
splines and basis functions, but it is worth quickly re-iterating some
of this detail here. When we construct smoothing splines, the workhorse
package mgcv
will calculate a set of basis functions that
will collectively control the shape and complexity of the resulting
spline. It is often helpful to visualize these basis functions to get a
better sense of how splines work. We’ll create a set of 6 basis
functions to represent possible variation in the effect of
time
on our outcome. This can easily be done using
functions in the gratia
package:
basis_funs <- basis(
s(time, bs = 'tp', k = 6),
data = model_data
)
Once the functions are created, the next step is to plot them
draw(basis_funs) +
labs(y = 'Basis function value', x = 'Time')
Upping the potential complexity of the smooth function, by increasing
k
(to 10 in this example), leads to a larger diversity of
basis functions:
draw(
basis(s(time, bs = 'tp', k = 10),
data = model_data)
) +
labs(y = 'Basis function value', x = 'Time')
R
routines
# Set up the smooth object that is needed to predict values of
# basis functions
temp_gam <- gam(count ~ s(time, bs = 'tp', k = 6),
data = model_data,
family = poisson(), fit = FALSE)
# We can understand how this penalty matrix is used to
# translate values of `time` into basis functions by
# generating a sequence of `time` values for sampling
# prior function realisations
cov_seq <- seq(min(model_data$time),
max(model_data$time),
length.out = 500)
summary(cov_seq)
# Generate the linear predictor matrix for this
# sequence of `time` values
Xp <- mgcv::PredictMat(temp_gam$smooth[[1]],
data.frame(time = cov_seq))
str(Xp)
# This linear predictor contains 5 columns, one for each
# function; the sequences of values in the columns represent
# the actual basis functions over the range of simulated
# `time` values
par(mar = c(4,4,2,1.5))
plot(1, type = 'n',
ylim = range(Xp),
xlim = range(cov_seq),
bty = 'l',
ylab = 'Basis function value',
xlab = 'Time')
cols <- viridis::viridis(n = NCOL(Xp), alpha = 0.8)
for(i in 1:NCOL(Xp)){
lines(x = cov_seq, y = Xp[,i], lwd = 3, col = 'white')
lines(x = cov_seq, y = Xp[,i], lwd = 2.5, col = cols[i])
}
box(bty = 'l', lwd = 2)
mvgam
In addition to constructing basis functions, mgcv
also
creates a penalty matrix \(S\) that
works to constrain the spline’s wiggliness. When fitting a GAM, we must
estimate smoothing parameters (\(\lambda\)) to penalize these matrices,
resulting in constrained basis coefficients and smoother functions that
are less likely to overfit the data. This is the key to fitting GAMs in
a Bayesian framework, as we can jointly estimate the \(\lambda\)’s using informative priors to
prevent overfitting and expand the complexity of models we can tackle.
To see this in practice, we can now fit a model that replaces the yearly
random effects with a smooth function of time
. We will need
a reasonably complex function (large k
) to try and
accommodate the temporal variation in our observations. Following some
useful advice by Gavin Simpson, we will use a
b-spline basis for the temporal smooth. Because we no longer have
intercepts for each year, we also retain the primary intercept term in
this model (there is no -1
in the formula now):
model3 <- mvgam(
count ~ s(time, bs = 'bs', k = 15) +
ndvi_ma12 +
mintemp,
family = poisson(),
# Standard normal priors on all fixed effect coefficients
priors = prior(normal(0, 1), class = b),
data = data_train,
newdata = data_test
)
mgcv
. The weights \((\beta_{smooth})\) are drawn from a
penalized multivariate normal distribution where the precision matrix
\((\Omega\)) is multiplied by a
smoothing penalty \((\lambda)\). If
\(\lambda\) becomes large, this acts to
squeeze the covariances among the weights \((\beta_{smooth})\), leading to a less
wiggly spline. Note that sometimes there are multiple smoothing
penalties that contribute to the covariance matrix, but I am only
showing one here for simplicity.
View the summary
summary(model3)
## GAM formula:
## count ~ s(time, bs = "bs", k = 15) + ndvi_ma12 + mintemp
##
## Family:
## poisson
##
## Link function:
## log
##
## Trend model:
## None
##
## N series:
## 1
##
## N timepoints:
## 80
##
## Status:
## Fitted using Stan
## 4 chains, each with iter = 1000; warmup = 500; thin = 1
## Total post-warmup draws = 2000
##
##
## GAM coefficient (beta) estimates:
## 2.5% 50% 97.5% Rhat n_eff
## (Intercept) 0.87 1.100 1.400 1.00 894
## ndvi_ma12 -0.27 0.950 2.200 1.00 1103
## mintemp 0.20 0.480 0.730 1.00 904
## s(time).1 -6.70 -2.900 0.030 1.00 680
## s(time).2 0.51 2.500 5.000 1.01 449
## s(time).3 -8.70 -4.900 -2.200 1.00 722
## s(time).4 -0.61 1.400 4.000 1.01 528
## s(time).5 -2.40 -0.550 1.500 1.01 566
## s(time).6 -5.20 -2.800 -0.140 1.00 640
## s(time).7 -1.50 0.300 2.500 1.01 520
## s(time).8 -0.39 1.600 4.000 1.01 504
## s(time).9 -1.80 0.220 2.700 1.01 542
## s(time).10 -3.50 -0.940 1.400 1.00 663
## s(time).11 -2.60 -0.076 2.900 1.00 550
## s(time).12 -4.50 -2.200 -0.078 1.00 713
## s(time).13 1.70 4.100 7.100 1.01 461
## s(time).14 -11.00 -3.700 2.100 1.01 429
##
## Approximate significance of GAM smooths:
## edf Ref.df Chi.sq p-value
## s(time) 11.4 14 68.5 0.00027 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Stan MCMC diagnostics:
## n_eff / iter looks reasonable for all parameters
## Rhat looks reasonable for all parameters
## 0 of 2000 iterations ended with a divergence (0%)
## 0 of 2000 iterations saturated the maximum tree depth of 10 (0%)
## E-FMI indicated no pathological behavior
##
## Samples were drawn using NUTS(diag_e) at Thu Mar 06 8:41:28 AM 2025.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split MCMC chains
## (at convergence, Rhat = 1)
##
## Use how_to_cite(model3) to get started describing this model
The summary above now contains posterior estimates for the smoothing
parameters as well as basis coefficients for effect of
time
. One quick way to get a sense of how “nonlinear” a
function might be is to compare the reference degrees of
freedom (Ref.df
) to the estimated effective
degrees of freedom edf
in the summary output. Larger
values of edf
indicate more wiggly terms, while the number
of basis functions supplied in the k
argument controls the
Ref.df
, i.e. the maximum possible edf
. When
the edf
is well below the Ref.df
, this
suggests the function is being shrunk down towards a smoother shape. But
if edf
is large (approaching Ref.df
), that
suggests the model needs all of those basis functions to capture
nonlinearities in the data. These values, alongside the estimate
smoothing parameters, are also useful for generating an approximate
significance test to ask if the function is statistically ‘different’
from a linear or flat function. Low p-values for this test also
suggest the function is capturing a lot of variation in the data. You
can look at the Details section in
?summary.gam
for more information about these tests.
Clearly we can see that our function of time
is highly
nonlinear here, as it has a relatively high edf
and a low
p-value.
Plot the effects as before, using
conditional_effects()
conditional_effects(model3)
We can also view some realizations of the underlying
time
function (here, each line is a different potential
curve drawn from the posterior of all possible curves):
plot(model3,
type = 'smooths',
realisations = TRUE,
n_realisations = 30)
A useful question when modelling using GAMs is to identify where the
function is changing most rapidly. To address this, we can plot
estimated 1st derivative (i.e. the slope) of the spline over a sequence
of time
values. As usual, we can use
marginaleffects
routines for this:
plot_slopes(model3,
variables = 'time',
condition = 'time',
type = 'link') +
geom_hline(yintercept = 0, linetype = 'dashed')
Here, values above >0
indicate the function was
increasing at that time point, while values <0
indicate
the function was declining.
loo
Working with brms
or mvgam
objects can be
daunting because we don’t have anova()
or
AIC()
methods that ecologists are used to for model
comparison. But we have approximate leave-one-out cross-validation,
which works in a similar way. Using loo_compare()
on
multiple models will ask which model would be expected to do a better
job of predicting left-out observations within the training
period. A larger value is better
loo_compare(model2,
model3)
## elpd_diff se_diff
## model3 0.0 0.0
## model2 -45.7 14.0
Clearly the nonlinear smooth of time
(in
model3
) gives a better in-sample fit than the random
effects of year
(model2
).
ndvi_ma12
effect to a smooth function of
ndvi_ma12
using the s()
argument (hint, see
?s
for guidance on how to define smooth terms in GAM
formulae using mgcv
syntax). Inspect the resulting function
estimates for both ndvi_ma12
and time
, and
consider using some of the strategies outlined in this blog post on interpreting nonlinear effects in GAMs with mgcv and
marginaleffects to understand the smooths in more depthk
to a larger number, something like 30. Do you get any
sense that there are problems with fitting this model? Do any of your
conclusions change?sessionInfo()
## R version 4.4.2 (2024-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 10 x64 (build 19045)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=English_Australia.utf8 LC_CTYPE=English_Australia.utf8
## [3] LC_MONETARY=English_Australia.utf8 LC_NUMERIC=C
## [5] LC_TIME=English_Australia.utf8
##
## time zone: Australia/Brisbane
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] rstan_2.32.6 StanHeaders_2.32.10 viridis_0.6.5
## [4] viridisLite_0.4.2 zoo_1.8-13 marginaleffects_0.25.0
## [7] ggplot2_3.5.1 gratia_0.10.0 mvgam_1.1.5001
## [10] dplyr_1.1.4 downloadthis_0.4.1
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.1 farver_2.1.2 loo_2.8.0.9000
## [4] fastmap_1.2.0 tensorA_0.36.2.1 digest_0.6.37
## [7] timechange_0.3.0 mime_0.12 lifecycle_1.0.4
## [10] processx_3.8.6 magrittr_2.0.3 posterior_1.6.0.9000
## [13] compiler_4.4.2 rlang_1.1.5 sass_0.4.9
## [16] tools_4.4.2 yaml_2.3.10 collapse_2.0.19
## [19] data.table_1.17.0 knitr_1.49 labeling_0.4.3
## [22] bridgesampling_1.1-2 curl_6.2.1 pkgbuild_1.4.6
## [25] cmdstanr_0.8.0 abind_1.4-8 klippy_0.0.0.9500
## [28] withr_3.0.2 purrr_1.0.4 grid_4.4.2
## [31] stats4_4.4.2 colorspace_2.1-1 inline_0.3.21
## [34] scales_1.3.0 insight_1.0.2 cli_3.6.4
## [37] mvtnorm_1.3-3 rmarkdown_2.29 generics_0.1.3
## [40] RcppParallel_5.1.10 rstudioapi_0.17.1 cachem_1.1.0
## [43] stringr_1.5.1 splines_4.4.2 bayesplot_1.11.1.9000
## [46] assertthat_0.2.1 parallel_4.4.2 matrixStats_1.5.0
## [49] brms_2.22.9 vctrs_0.6.5 V8_6.0.1
## [52] Matrix_1.7-1 jsonlite_1.9.0 patchwork_1.3.0
## [55] b64_0.1.3 jquerylib_0.1.4 tidyr_1.3.1
## [58] glue_1.8.0 ps_1.9.0 codetools_0.2-20
## [61] ggokabeito_0.1.0 mvnfast_0.2.8 distributional_0.5.0
## [64] lubridate_1.9.4 stringi_1.8.4 gtable_0.3.6
## [67] QuickJSR_1.5.2 munsell_0.5.1 tibble_3.2.1
## [70] pillar_1.10.1 htmltools_0.5.8.1 Brobdingnag_1.2-9
## [73] R6_2.6.1 evaluate_1.0.3 lattice_0.22-6
## [76] backports_1.5.0 bslib_0.9.0 rstantools_2.4.0.9000
## [79] bsplus_0.1.4 Rcpp_1.0.14 zip_2.3.2
## [82] coda_0.19-4.1 gridExtra_2.3 nlme_3.1-166
## [85] checkmate_2.3.2 mgcv_1.9-1 xfun_0.51
## [88] fs_1.6.5 pkgconfig_2.0.3