The data and modelling objects created in this notebook can be downloaded directly to save computational time.
Users who wish to complete the exercises can download a small
template R
script. Assuming you have already downloaded the
data objects above, this script will load all data objects so that the
steps used to create them are not necessary to tackle the exercises.
This tutorial relates to content covered in Lecture 2 and Lecture 3, and relies on the following packages for manipulating data, shaping time series, fitting dynamic regression models and plotting:
library(dplyr)
library(mvgam)
library(brms)
library(marginaleffects)
library(tidybayes)
library(ggplot2); theme_set(theme_classic())
We will continue with time series of rodent captures from the Portal
Project, a
long-term monitoring study based near the town of Portal, Arizona,
with particular focus on the time series of captures for the Desert
Pocket Mouse Chaetodipus penicillatus. Data manipulations will
proceed in a similar way as in Tutorial 1, except that we now use a lagged version
of mintemp
(at a lag of four lunar months; see
?dplyr::lag
for details) to give you a sense of how you can
create and use lagged predictors in time series models
data("portal_data")
portal_data %>%
# Filter the data to only contain captures of the 'PP'
dplyr::filter(series == 'PP') %>%
droplevels() %>%
# Create a 'count' variable for the outcome
dplyr::mutate(count = captures) %>%
# Create a lagged version of mintemp to use as a predictor in place of the
# contemporaneous mintemp we have been using previously. Note, the data must be
# in the correct order for this to work properly
dplyr::arrange(time) %>%
dplyr::mutate(mintemp_lag4 = dplyr::lag(mintemp, 4)) %>%
# Select the variables of interest to keep in the model_data
dplyr::select(series, time, count,
ndvi_ma12, mintemp_lag4) -> model_data
The data now contain five variables:
series
, a factor indexing which time series each
observation belongs to
time
, the indicator of which time step each observation
belongs to
count
, the response variable representing the number of
captures of the species PP
in each sampling
observation
ndvi_ma12
, a 12-month moving average of the monthly
Normalized Difference Vegetation Index at each time step
mintemp_lag4
, the lagged monthly average minimum
temperature at each time step
Using lagged versions of predictors is a standard practice in time
series analysis / forecasting. But unfortunately the software we are
using cannot easily handle missing values in the predictors. Because
we’ve calculated lags of mintemp
, the first 4
rows of our data now have missing values for this predictor:
head(model_data, 6)
## series time count ndvi_ma12 mintemp_lag4
## 1 PP 1 0 -0.17214413 NA
## 2 PP 2 NA -0.23736348 NA
## 3 PP 3 0 -0.21212064 NA
## 4 PP 4 1 -0.16043812 NA
## 5 PP 5 7 -0.08267729 -0.7963381
## 6 PP 6 7 -0.03692877 -1.3347160
Visualize the data as a heatmap to get a sense of where these are
distributed (NA
s are shown as red bars in the below
plot)
image(
is.na(
t(model_data %>%
dplyr::arrange(dplyr::desc(time)))
),
axes = F,
col = c('grey80', 'darkred'))
axis(3,
at = seq(0,
1,
len = NCOL(model_data)),
labels = colnames(model_data))
For now we will simply omit these missing values and restart the time
index at 1
:
model_data %>%
dplyr::filter(!is.na(mintemp_lag4)) %>%
dplyr::mutate(time = time - min(time) + 1) -> model_data_trimmed
head(model_data_trimmed)
As in the previous tutorial, we also split the data into training and testing subsets for inspecting forecast behaviours. We will do the same here, but use a shorter validation period in this example
model_data_trimmed %>%
dplyr::filter(time <= 68) -> data_train
model_data_trimmed %>%
dplyr::filter(time > 68) -> data_test
Our last tutorial ended with a model that included a smooth function
of time
. The lectures have illustrated why this might be a
bad idea if one of our goals is to forecast. We will produce forecasts
from this model to show why this can be problematic. Start by fitting a
similar model to the one we ended with in Tutorial 1, including the data_test
object as newdata
to ensure the model automatically
produces forecasts.
model0 <- mvgam(
count ~ s(time, bs = 'bs', k = 15) +
mintemp_lag4 +
ndvi_ma12,
family = poisson(),
data = data_train,
newdata = data_test
)
Inspect forecasts from this model, which are not great
plot(forecast(model0))
What happens if we change k
slightly by using 8 basis
functions rather than 15?
model0b <- mvgam(
count ~ s(time, bs = 'bs', k = 8) +
mintemp_lag4 +
ndvi_ma12,
family = poisson(),
data = data_train,
newdata = data_test
)
Forecasts from this model are completely different!
plot(forecast(model0b))
## Out of sample DRPS:
## 75.518577
Why is this happening? The forecasts are driven almost entirely by variation in the temporal spline, which is extrapolating linearly forever beyond the training data. Any slight wiggles near the boundary will result in wildly different forecasts. To visualize this, we can plot the extrapolated temporal functions for the two models. Here for the first model, with 15 basis functions:
plot_predictions(
model0,
by = 'time',
newdata = datagrid(time = 1:max(data_test$time)),
type = 'link'
) +
geom_vline(xintercept = max(data_train$time),
linetype = 'dashed')
And for the second model, with fewer basis functions (10). The difference is clear, with this model reacting to a slightly upward shift in the smooth toward the boundary that results in predictions heading to infinity:
plot_predictions(
model0b,
by = 'time',
newdata = datagrid(time = 1:max(data_test$time)),
type = 'link'
) +
geom_vline(xintercept = max(data_train$time),
linetype = 'dashed')
Try changing the value of k
and seeing how forecasts
change. This should give you more confidence that extrapolating from
penalized smooths can be unpredictable and difficult to control. There
are some strategies to help reign in this behaviour (once again this advice comes via a blogpost by Gavin
Simpson), but results will still be unpredictable and sensitive to
choices such as k
or the placement of knots. But if we
shouldn’t use smooth functions of time for forecasting, what other
options do we have?
?b.spline
for guidance)bs = 'cr'
)# Replace the ? with the correct value(s)
# Using ?b.spline you will see examples in the Description and Details sections about how these splines can handle multiple penalties
model0 <- mvgam(
count ~ s(time, bs = 'bs', k = 15,
m = c(?)) +
mintemp_lag4 +
ndvi_ma12,
family = poisson(),
data = data_train,
newdata = data_test
)
One useful strategy to deal with this is to fit models that incorporate autocorrelated residual processes within the
GAM. If our model’s residuals are autocorrelated, our estimates of
uncertainty around other regression parameters (such as the \(\beta\) coefficients) can be biased and
misleading. Failing to address this may lead to false inferences and
perhaps to poor forecasts. Fortunately there are ways we can tackle this
by modelling the autocorrelation in our model’s errors. This can be done
using mgcv
functionality, as the blog post above explains.
But we will use brms
here as it is more flexible and
provides an interface that is more consistent with
mvgam
.
brms
First, we will fit a similar model to those used in Tutorial 1 so to see how brms
works.
The interface is very familiar, relying on the usual R
formula syntax (see ?brmsformula
for details and examples).
But there are some key differences between brms
and
mvgam
. For example, brms
can handle a much
larger diversity of response distributions (see ?brmsfamily
for details). But for now, we will stick a similar Poisson model as
previously (a nonlinear function of mintemp_lag4
and a
parametric, linear function of mintemp
):
model1 <- brm(
count ~
s(mintemp_lag4, k = 9) +
ndvi_ma12,
family = poisson(),
# set a mildly informative prior on beta coefficients
prior = set_prior("normal(0, 2)", class = "b"),
data = data_train,
backend = 'cmdstanr',
iter = 1000,
cores = 4
)
Use summary()
to ensure no major diagnostic warnings
have been produced and inspect posterior distributions for key
parameters. You may notice some warnings of divergent transitions for
this model, which should give you some immediate concern about the
validity of the model’s inferences
summary(model1)
## Warning: There were 179 divergent transitions after warmup. Increasing
## adapt_delta above 0.8 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## Family: poisson
## Links: mu = log
## Formula: count ~ s(mintemp_lag4, k = 9) + ndvi_ma12
## Data: data_train (Number of observations: 52)
## Draws: 4 chains, each with iter = 1000; warmup = 500; thin = 1;
## total post-warmup draws = 2000
##
## Smoothing Spline Hyperparameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## sds(smintemp_lag4_1) 2.46 1.79 0.08 6.09 1.04 120
## Tail_ESS
## sds(smintemp_lag4_1) 205
##
## Regression Coefficients:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 1.52 0.07 1.38 1.65 1.01 383 517
## ndvi_ma12 0.12 0.25 -0.35 0.59 1.01 278 209
## smintemp_lag4_1 -2.37 1.50 -5.63 0.51 1.02 307 243
##
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
As witht he mvgam
models, for any brms
model we can inspect the underlying Stan
code using the
stancode
function:
stancode(model1)
Stan
code
## // generated with brms 2.22.9
## functions {
## }
## data {
## int<lower=1> N; // total number of observations
## array[N] int Y; // response variable
## int<lower=1> K; // number of population-level effects
## matrix[N, K] X; // population-level design matrix
## int<lower=1> Kc; // number of population-level effects after centering
## // data for splines
## int Ks; // number of linear effects
## matrix[N, Ks] Xs; // design matrix for the linear effects
## // data for spline 1
## int nb_1; // number of bases
## array[nb_1] int knots_1; // number of knots
## // basis function matrices
## matrix[N, knots_1[1]] Zs_1_1;
## int prior_only; // should the likelihood be ignored?
## }
## transformed data {
## matrix[N, Kc] Xc; // centered version of X without an intercept
## vector[Kc] means_X; // column means of X before centering
## for (i in 2:K) {
## means_X[i - 1] = mean(X[, i]);
## Xc[, i - 1] = X[, i] - means_X[i - 1];
## }
## }
## parameters {
## vector[Kc] b; // regression coefficients
## real Intercept; // temporary intercept for centered predictors
## vector[Ks] bs; // unpenalized spline coefficients
## // parameters for spline 1
## // standardized penalized spline coefficients
## vector[knots_1[1]] zs_1_1;
## vector<lower=0>[nb_1] sds_1; // SDs of penalized spline coefficients
## }
## transformed parameters {
## // penalized spline coefficients
## vector[knots_1[1]] s_1_1;
## real lprior = 0; // prior contributions to the log posterior
## // compute penalized spline coefficients
## s_1_1 = sds_1[1] * zs_1_1;
## lprior += normal_lpdf(b | 0, 2);
## lprior += student_t_lpdf(Intercept | 3, 1.4, 2.5);
## lprior += normal_lpdf(bs | 0, 2);
## lprior += student_t_lpdf(sds_1 | 3, 0, 2.5)
## - 1 * student_t_lccdf(0 | 3, 0, 2.5);
## }
## model {
## // likelihood including constants
## if (!prior_only) {
## // initialize linear predictor term
## vector[N] mu = rep_vector(0.0, N);
## mu += Intercept + Xs * bs + Zs_1_1 * s_1_1;
## target += poisson_log_glm_lpmf(Y | Xc, mu, b);
## }
## // priors including constants
## target += lprior;
## target += std_normal_lpdf(zs_1_1);
## }
## generated quantities {
## // actual population-level intercept
## real b_Intercept = Intercept - dot_product(means_X, b);
## }
Of course we still cannot interpret the smooth effects using the
posterior summary. We will need to make targeted plots to do that.
Fortunately brms
has a similar
conditional_effects()
function to mvgam
so
this is simple to do (note that by default calling
conditional_effects()
on brms
models will
return predictions on the expectation scale):
plot(
conditional_effects(model1),
theme = theme_classic(),
mean = FALSE,
rug = TRUE,
ask = FALSE
)
tidybayes
A notable difference between brms
and mvgam
is that the former does not automatically return predictions. These need
to be computed after model fitting, which means that residuals also have
to be computed. To inspect residual diagnostics we can enlist the help
of the tidybayes
package to produce randomized quantile
residuals. More information on this procedure can be found in this tidybayes vignette. Do not worry if it is difficult
to understand, it is not of major relevance to the learning objectives
of the course (and if you use mvgam
, this will be done for
you automatically anyway)
data_train %>%
# calculate posterior predictions
add_predicted_draws(model1) %>%
# use predictions to compute randomized quantile residuals
summarise(
p_lower = mean(.prediction < count),
p_upper = mean(.prediction <= count),
p_residual = runif(1,
max(p_lower, 0.00001),
min(p_upper, 0.99999)),
z_residual = qnorm(p_residual),
.groups = "drop_last"
) -> residual_data
Perform the usual residual diagnostics. For example, below are residual ACF, pACF and Q-Q plots, all of which show that the model’s predictions do not adequately match the features of the observed data
acf(residual_data$z_residual,
na.action = na.pass,
bty = 'l',
lwd = 2,
ci.col = 'darkred',
main = 'Dunn-Smyth residuals')
pacf(residual_data$z_residual,
na.action = na.pass,
bty = 'l',
lwd = 2,
ci.col = 'darkred',
main = 'Dunn-Smyth residuals')
ggplot(residual_data,
aes(sample = z_residual)) +
geom_qq() +
geom_abline(col = 'darkred') +
theme_classic()
We can compare inferences from the two models to see that they are broadly similar:
plot(
conditional_effects(model2),
ask = FALSE
)
conditional_effects(model3,
type = 'expected')
Now compare estimates for the residual AR1 parameters. First, extract
posterior draws for the AR1 parameter from the two models, which can be
done using the as.data.frame
:
ar1_ests <- rbind(
data.frame(
ar1 = as.data.frame(model3, variable = 'ar1[1]')$`ar1[1]`,
model = 'mvgam'),
data.frame(
ar1 = as.data.frame(model2, variable = 'ar[1]')$`ar[1]`,
model = 'brms')
)
We can use ggplot2
to compare these estimates, which
shows that they are similar, but not exactly the same:
ggplot(data = ar1_ests,
aes(x = ar1, fill = model)) +
geom_histogram(aes(y = after_stat(density)),
bins = 40,
col = 'white') +
facet_grid(rows = 'model') +
theme_bw() +
theme(legend.position = 'none') +
scale_fill_manual(values = c('darkblue',
'darkred')) +
labs(x = expression(ar1[error]),
y = 'Frequency') +
xlim(c(0,1))
There is a major drawback of the brms
model here, which
happens because any rows in data_train
that have
NA
in the outcome variable (count
) are being
thrown out before model fitting. In fact, there are quite a few of these
in our data:
length(which(is.na(data_train$count)))
## [1] 16
This means that the brms
model thinks that
certain observations are adjacent to one another in time
when actually they are not. For example, the brms
model
believes that timepoints 5
and 7
are right
next to each other, and also that timepoints 12
and
14
are right next to each other:
data_train %>%
dplyr::select(time, count) %>%
head(16)
## time count
## 1 1 7
## 2 2 7
## 3 3 8
## 4 4 8
## 5 5 4
## 6 6 NA
## 7 7 0
## 8 8 0
## 9 9 0
## 10 10 0
## 11 11 0
## 12 12 0
## 13 13 NA
## 14 14 2
## 15 15 4
## 16 16 7
This can often cause problems because we would not expect the level
of autocorrelation to be the same when jumping 2
timesteps
vs only 1
timestep. In fact, most regression packages in
R
will do this (i.e. they will fit autocorrelation models
to residuals after throwing out observations with missing
values). But, mvgam
does not do this! It
keeps the observations with NA
s in the data so that the
temporal spacing can be adequately preserved. Other drawbacks of the
brms
approach to correlated residuals include:
Of course there may be situations where we really do have unequal
temporal spacing among observations and we don’t want to force a model
into equal spacing by filling missing timepoints with NA
s.
In this case, mvgam
can be used with a Continuous Time
Autoregressive process:
\[\begin{align*} z_{i,t} & \sim \text{Normal}(ar1_i^{distance} * z_{i,t-1}, \sigma_i) \end{align*}\]
Where \(distance\) is a vector of
non-negative measurements of the time differences between successive
observations. See the Examples section in
?CAR
for an illustration of how to set these models
up.
As you may remember from the lectures, the larger this AR1 parameter becomes (in absolute value), the more a time series tends toward nonstationarity. We can look at some example functions to investigate this behaviour in more detail here
R
code used to
produce these simulations
# Define a function to simulate AR1 processes with a fixed error variance
simulate_ar1 = function(ar1 = 0.5, N = 50){
# simulate the initial value of the series
init <- rnorm(1, mean = 0, sd = 0.25)
# create an empty vector to store the time series values
states <- vector(length = N)
# set the first value of the states as the initial value
states[1] <- init
# loop over remaining time points and fill in the AR1 process
for(t in 2:N){
states[t] <- rnorm(1, mean = ar1 * states[t - 1],
sd = 0.25)
}
return(states)
}
# plot a single AR1 series with an AR1 parameter = 0.5
plot(x = 1:50,
y = simulate_ar1(ar1 = 0.5),
type = 'l',
lwd = 2, col = 'darkred',
ylab = expression(f(Time)),
xlab = 'Time', bty = 'l',
ylim = c(-2,2),
main = expression(AR[1]~'='~0.5))
# overlay an additional 20 possible series using different colours
for(i in 1:20){
lines(simulate_ar1(ar1 = 0.5),
col = sample(c("#DCBCBC",
"#C79999",
"#B97C7C",
"#A25050",
"#7C0000"), 1),
lwd = 2)
}
box(bty = 'l', lwd = 2)
This plot shows how an AR1 process with a smallish AR1 parameter
(0.5) approaches stationarity fairly quickly. What happens if we
increase this parameter toward the types of values that our models have
estimated? The code to produce these simulations is the same as above,
except the AR1 parameter was changed to 0.95
The variance of this particular series grows into the future for a
much longer time than did the series we simulated above. This is useful
for getting a sense about the relative stability of the underlying
dynamic process. We can enforce a Random Walk so that the dynamic
process will not be stationary (note, a Random Walk trend can only be
fit in mvgam
; there is currently no way of using such a
process in brms
, that I’m aware of):
model3b <- mvgam(
count ~
s(mintemp_lag4, k = 9) +
ndvi_ma12,
family = poisson(),
data = data_train,
newdata = data_test,
priors = prior(normal(0, 2), class = b),
trend_model = RW()
)
The model can be described mathematically as follows: \[\begin{align*} \boldsymbol{count}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = f(\boldsymbol{mintemp})_t + \beta_{ndvi} * \boldsymbol{ndvi}_t + z_t \\ z_t & \sim \text{Normal}(z_{t-1}, \sigma_{error}) \\ \sigma_{error} & \sim \text{Exponential}(2) \\ f(\boldsymbol{mintemp}) & = \sum_{k=1}^{K}b * \beta_{smooth} \\ \beta_{smooth} & \sim \text{MVNormal}(0, (\Omega * \lambda)^{-1}) \\ \beta_{ndvi} & \sim \text{Normal}(0, 1) \end{align*}\]
Note the similarity to model3 above. The only difference is that we no longer estimate the AR1 parameter, so the mean of the latent process at time \(t\) is simply the value of the latent process at time \(t-1\).Plotting the trend estimates for the AR1 and Random Walk models gives you further insight into the different assumptions they make about the temporal evolution of the system:
patchwork::wrap_plots(
plot(
forecast(model3,
type = 'trend')
) +
labs(title = 'AR1 trend'),
plot(
forecast(model3b,
type = 'trend')
) +
labs(title = 'RW trend'),
ncol = 1
)