The data and modelling objects created in this notebook can be downloaded directly to save computational time.
Users who wish to complete the exercises can download a small
template R
script. Assuming you have already downloaded the
data objects above, this script will load all data objects so that the
steps used to create them are not necessary to tackle the exercises.
This tutorial relates to content covered in Lecture 4, and relies on the following packages for manipulating data, shaping time series, fitting dynamic regression models and plotting:
library(dplyr)
library(mvgam)
library(tidybayes)
library(ggplot2); theme_set(theme_classic())
library(marginaleffects)
The data we will use for this tutorial is spawner-recruit data
for sockeye salmon (Oncorhynchus nerka) from the Bristol Bay
region in Southwest Alaska. Commercial fishermen in Alaska target this
important species using seines and gillnets for fresh or frozen fillet
sales and canning. Records for the rivers in Bristol Bay span the years
1952-2005, and are maintained within the RAM Legacy Stock
Assessment Database. Spawner estimates originated from Bristol Bay
Area Annual Management Reports from the Alaska Department of Fish and
Game, and are based on in-river spawner counts (including both tower
counts and aerial surveys). Age-specific catch rates were calculated
from records of catch and escapement according to age class. These data
were used to determine the number of recruits in each year.
Below we process the data, keeping observations for a single river and taking \(log(spawners_{t})\) to use as the offset in our models.
# access the data from the ATSA Github site
load(url('https://github.com/atsa-es/atsalibrary/raw/master/data/sockeye.RData'))
sockeye %>%
# Remove any previous grouping structures
dplyr::ungroup() %>%
# select the target river
dplyr::filter(region == "KVICHAK") %>%
# filter the data to only include years when number of spawners
# was estimated, as this will be used as the offset
dplyr::filter(!is.na(spawners)) %>%
# log the spawners variable to use as an appropriate offset
dplyr::mutate(log_spawners = log(spawners)) %>%
# add a 'time' variable
dplyr::mutate(time = dplyr::row_number()) %>%
# add a series variable (as a factor)
dplyr::mutate(series = as.factor('sockeye')) %>%
# keep variables of interest
dplyr::select(time, series, log_spawners, recruits) -> model_data
The data now contain four variables: time
, the indicator
of which time step each observation belongs to series
, a
factor indexing which time series each observation belongs to
log_spawners
, the logged number of recorded sockeye
spawners (in thousands), measured as the number of returning spawners
minus catch at each timepoint, which will be used as an offset in models
recruits
, the response variable representing the number of
estimated sockeye recruits at each timepoint (in thousands)
Check the data structure
head(model_data)
## # A tibble: 6 × 4
## time series log_spawners recruits
## <int> <fct> <dbl> <dbl>
## 1 1 sockeye 9.15 39000
## 2 2 sockeye 7.95 4090
## 3 3 sockeye 6.28 289
## 4 4 sockeye 6.52 547
## 5 5 sockeye 9.59 55300
## 6 6 sockeye 8.22 3500
dplyr::glimpse(model_data)
## Rows: 50
## Columns: 4
## $ time <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17…
## $ series <fct> sockeye, sockeye, sockeye, sockeye, sockeye, sockeye, soc…
## $ log_spawners <dbl> 9.152711, 7.951559, 6.282267, 6.522093, 9.588777, 8.21878…
## $ recruits <dbl> 39000, 4090, 289, 547, 55300, 3500, 5360, 1120, 5750, 450…
Visualize the data as a heatmap to get a sense of where any
NA
s are distributed (NA
s are shown as red bars
in the below plot)
image(is.na(t(model_data %>%
dplyr::arrange(dplyr::desc(time)))), axes = F,
col = c('grey80', 'darkred'))
axis(3, at = seq(0,1, len = NCOL(model_data)), labels = colnames(model_data))
Plot properties of the time series, which indicate some overdispersion in the recruit count values that we’ll have to address
plot_mvgam_series(data = model_data, y = 'recruits')
mvgam
We will fit a number of models to these data to showcase the various methods that are available to us for evaluating and comparing dynamic GLMs / GAMs. We begin with a model that does not include a dynamic component, as this is always a useful starting point. It also allows us to highlight another useful feature of GLMs / GAMs when modelling count data, as we can include an offset variable that allows us to model rates of interest.
Sometimes it is more relevant to model the rate an event occurs per
some underlying unit. For example, in epidemiological modelling the
interest is often in understanding the number of disease cases per
population. But we never get to observe this rate directly. An
offset allows us to model the underlying rate by algebraically accounting for variation in the number of
possible counts in each observation unit. This works naturally in
models that use a log link function, such as the Poisson and Negative
Binomial models. Including an offset is usually straightforward in most
R
modelling software, and indeed this is the case in
mvgam
as well. Here we fit a model with no trend and
Negative Binomial observations, including an offset of the
log_spawners
variable that we calculated above so we can
estimate the rate of recruitment per spawner. Our first model
addresses overdispersion in the observed data by using a Negative
Binomial observation model, and we only include the implicit intercept
along with an offset of log_spawners
in the linear
predictor. As in previous tutorials, only 250 samples per chain are used
to keep the size of the resulting object small (though we’d always want
to use more samples in a real analysis)
model1 <- mvgam(recruits ~ offset(log_spawners),
family = nb(),
trend_model = 'None',
data = model_data,
samples = 250)
The model can be described mathematically as follows: \[\begin{align*} \boldsymbol{recruits}_t & \sim \text{NegBinomial}(\lambda_t, \phi) \\ log(\lambda_t/\boldsymbol{spawners_{t}}) & = \alpha \\ log(\lambda_t) & = \alpha + 1*log(\boldsymbol{spawners_{t}}) \\ \alpha & \sim \text{Normal}(0, 1) \\ \phi^1 & \sim \text{StudentT}(3, 0, 0.1) \end{align*}\]
Note that this is not a classic density-dependent stock-recruitment model such as the Ricker or Beverton Holt models. Rather, this model very simplistically assumes recruits can increase without bound as a function of the number of spawners.The inclusion of the offset is important if we wish to make
predictions on the scale of the observations. This is because there will
be no regression coefficient assigned to this variable (it’s coefficient
is fixed at 1). If you wish to manually make predictions, you’ll need to
take care to include the offset. This is returned as an attribute of the
linear predictor matrix when we call predict.gam
on the
underlying mgcv
model:
lpmatrix <- predict(model1$mgcv_model, type = 'lpmatrix')
attr(lpmatrix, 'model.offset')
## [1] 9.152711 7.951559 6.282267 6.522093 9.588777 8.218787 7.855545
## [8] 5.826000 6.863803 10.098232 8.237479 8.077137 7.847763 9.034796
## [15] 9.539644 7.779049 6.917706 5.424950 8.396155 9.480368 7.585789
## [22] 7.200425 8.330864 9.323669 9.769956 7.467371 7.038784 8.180321
## [29] 9.259131 8.883224 7.073270 8.711114 8.311398 9.026418 8.849371
## [36] 8.347590 8.461680 8.301522 9.028818 9.210340 7.279319 7.313220
## [43] 7.740664 8.732305 7.512071 7.003065 6.556778 7.432484 8.612503
## [50] 7.749322
To make predictions on the outcome scale, we might do something like the following:
# extract posterior estimates of the observation linear predictor betas
betas_post <- as.data.frame(model1, variable = 'betas')
# generate the linear predictor matrix for the fitted data
# (note, you an supply newdata here to predict for new scenarios)
lpmatrix <- predict(model1$mgcv_model, type = 'lpmatrix')
# set up a matrix to store linear predictor values
linpreds <- matrix(NA, nrow = NROW(betas_post), ncol = NROW(lpmatrix))
# loop across all posterior estimates
for(i in 1:NROW(betas_post)){
# multiply covariate values by the beta estimates (including intercepts)
linpreds[i, ] <- lpmatrix %*% betas_post[i, ] +
# add the offset, which will be equal to 0 if no offset was
# supplied to the model
as.vector(attr(lpmatrix, 'model.offset'))
}
# plot a histogram to see the distribution of these values
hist(linpreds,
col = 'darkred',
border = 'white',
xlab = expression(Linear~predictor~(log[mu])),
ylab = '',
yaxt = 'n',
main = '',
lwd = 2)
This same type of process underlies many of mvgam
’s
prediction / plotting functions. For example, to plot partial effects of
smooth terms, the plot_mvgam_smooth
function will set up a
grid of evenly-spaced hypothetical values as newdata
to
predict the contribution of the smooth term to the linear predictor. It
does this by essentially ignoring the contributions of all other
predictors (setting their values to 0 in the linear predictor matrix).
The same process also underlies the functions used to plot conditional
effects and posterior contrasts (you can learn a great deal about how
these newdata
grids are set up by looking over the
documentation in the marginaleffects
book and by calling
?marginaleffects::datagrid
)
Once these predictions are made for the linear predictor, you could
then compute posterior expectations (in this case by
exponentiating the values with exp(linpreds)
to “undo” the
log link). Or you could compute posterior predictions by
incorporating the uncertainty in the observation model:
# extract posterior estimates of the overdispersion parameter phi
phi_post <- as.data.frame(model1, variable = 'obs_params')
# set up a matrix to store posterior predictions
ypreds <- matrix(NA, nrow = NROW(betas_post), ncol = NROW(lpmatrix))
# loop across all posterior estimates
for(i in 1:NROW(betas_post)){
# take Negative Binomial draws, being sure to exponentiate the
# linear predictors and to use the appropriate overdispersion
# estimate for the 'size' argument
ypreds[i, ] <- rnbinom(n = NROW(lpmatrix),
mu = exp(linpreds[i, ]),
size = phi_post[i,])
}
# plot a histogram to see the distribution of these values
hist(ypreds,
xlim = c(0, quantile(as.vector(ypreds), probs = 0.95)),
breaks = 150,
col = 'darkred',
border = 'white',
xlab = 'Posterior predictions',
ylab = '',
yaxt = 'n',
main = '',
lwd = 2)
mvgam
makes this process straightforward using the
predict.mvgam
, hindcast.mvgam
and
forecast.mvgam
functions. For example, we could replicate
the linear prediction calculations using the following:
linpreds2 <- predict(model1,
type = 'link',
process_error = FALSE)
all.equal(linpreds, linpreds2)
## [1] "Attributes: < Length mismatch: comparison on first 1 components >"
## [2] "Attributes: < Component \"dim\": Mean relative difference: 0.9485714 >"
## [3] "Numeric: lengths (50000, 200) differ"
Notice how we explicitly set process_error to FALSE
so
that the calculations of the linear predictor will ignore any
contributions of dynamic trends. This isn’t applicable in this example,
but it becomes important to consider when making predictions from
dynamic models. We could also compute expectations using
predict.mvgam
:
linpreds3 <- predict(model1, type = 'expected', process_error = FALSE)
all.equal(exp(linpreds), linpreds3)
## [1] "Attributes: < Length mismatch: comparison on first 1 components >"
## [2] "Attributes: < Component \"dim\": Mean relative difference: 0.9485714 >"
## [3] "Numeric: lengths (50000, 200) differ"
And finally we can calculate posterior predictions, which consider uncertainty in the Negative Binomial sampling distribution:
ypreds2 <- predict(model1, type = 'response', process_error = FALSE)
all.equal(ypreds, ypreds2)
## [1] "Attributes: < Length mismatch: comparison on first 1 components >"
## [2] "Attributes: < Component \"dim\": Mean relative difference: 0.9485714 >"
## [3] "Numeric: lengths (50000, 200) differ"
These values will not be identical due to the random nature
of the sampling distribution. If you are computing predictions /
expectations for new data, the predict.mvgam
function
should be your primary choice. But if you merely want to return
predictions / expectations for the training data and/or any testing data
supplied to the model, you will often be better off using the
hindcast.mvgam
and forecast.mvgam
functions.
For example, here are the hindcasts of the expectations, which may
differ very slightly from those we computed above due to variation in
the floating point arithmetic’s used by Stan
and
R
:
hc_expectations <- hindcast(model1, type = 'expected')
all.equal(exp(linpreds), hc_expectations$hindcasts$sockeye)
## [1] "Mean relative difference: 1.275639e-05"
As we’ve seen before, these can be plotted in the same general way as
other mvgam
plotting functions:
plot(hc_expectations)
The model summary is shown below, which indicates there are no major sampling issues to worry about
summary(model1)
## GAM formula:
## recruits ~ offset(log_spawners)
##
## Family:
## negative binomial
##
## Link function:
## log
##
## Trend model:
## None
##
## N series:
## 1
##
## N timepoints:
## 50
##
## Status:
## Fitted using Stan
## 4 chains, each with iter = 750; warmup = 500; thin = 1
## Total post-warmup draws = 1000
##
##
## Observation dispersion parameter estimates:
## 2.5% 50% 97.5% Rhat n_eff
## phi[1] 1.2 1.7 2.3 1 615
##
## GAM coefficient (beta) estimates:
## 2.5% 50% 97.5% Rhat n_eff
## (Intercept) 0.61 0.82 1.1 1.01 573
##
## Stan MCMC diagnostics:
## n_eff / iter looks reasonable for all parameters
## Rhat looks reasonable for all parameters
## 0 of 1000 iterations ended with a divergence (0%)
## 0 of 1000 iterations saturated the maximum tree depth of 12 (0%)
## E-FMI indicated no pathological behavior
##
## Samples were drawn using NUTS(diag_e) at Tue May 07 11:49:43 AM 2024.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split MCMC chains
## (at convergence, Rhat = 1)
Note though that the hindcast.mvgam
and
forecast.mvgam
functions simply extract any predictions
that were made in the original model, so will always include uncertainty
in any underlying dynamic components.
Posterior predictive checks are useful for understanding how well
your model can capture particular features of your observed data. In
mvgam
, the pp_check()
function is particularly
helpful as it relies on the very broad range of checking functions that
are available in bayesplot
. For example, we can plot a
density kernel of our observed counts and overlay simulated kernels from
our model. These simulated kernels are generated using
predict.mvgam()
, so they are informative for all
mvgam
models (regardless of whether or not the model
included dynamic components).
pp_check(model1, type = 'dens_overlay', ndraws = 25)
## Warning in pp_check.mvgam(model1, type = "dens_overlay", ndraws = 25): NA
## responses are not shown in 'pp_check'.
The model generally tends to capture the shape of the observed data. We can confirm this behaviour by inspecting a PIT histogram to again compare observed vs predicted values. For a well calibrated model fit, uncertainty envelopes (in light red) around the observed CDF (in dark red) will be standard uniform.
pp_check(model1, type = 'pit_ecdf')
## Using all posterior draws for ppc type 'pit_ecdf' by default.
## Warning in pp_check.mvgam(model1, type = "pit_ecdf"): NA responses are not
## shown in 'pp_check'.
The pattern suggests the model is fitting the data well, which we can further confirm by comparing observed vs simulated CDFs:
pp_check(model1, type = 'ecdf_overlay')
## Using 10 posterior draws for ppc type 'ecdf_overlay' by default.
## Warning in pp_check.mvgam(model1, type = "ecdf_overlay"): NA responses are not
## shown in 'pp_check'.
pp_check()
even accepts custom functions. For example,
if we want to know whether our model simulates counts that are
< 1000
as frequently as they tend to observe in the true
data, we can use:
prop_under1000 <- function(x){
length(which(x < 1000)) / length(x)
}
pp_check(model1, type = "stat", stat = "prop_under1000")
## Using all posterior draws for ppc type 'stat' by default.
## Warning in pp_check.mvgam(model1, type = "stat", stat = "prop_under1000"): NA
## responses are not shown in 'pp_check'.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
There are many other types of posterior predictive checks that can be
implemented, and it is highly recommended that you design some to suit
your own specific needs (i.e., what aspect(s) of the data are you most
concerned with your model being able to adequately capture?). See
?mvgam::ppc_check
for details
Plot Dunn-Smyth residuals and posterior hindcasts to further inspect model fit
plot(model1, type = 'residuals')
plot(model1, type = 'forecast')
Here it is worth quickly revisiting the uncertainty in these
predicted values, and how that differs from uncertainty in our expected
values (calculated above as hc_expectations
). Remember that
our goal is to produce a forecast distribution because we do
not know what the future holds and we can therefore consider it to be a
random variable drawn from some probability distribution. Often
we visualise forecasts as lines and ribbons to show particular summaries
of this forecast distribution, but it can be easier to understand where
these summaries come from if we instead view realizations from
this distribution (i.e. random draws). For example, here are 5
realizations from our expected hindcast values…
plot(hc_expectations, realisations = TRUE, n_realisations = 5)
…and here are 20 realizations:
plot(hc_expectations, realisations = TRUE, n_realisations = 20)
Note how there isn’t much uncertainty in these values because all uncertainty is coming from variation in the intercept term \(\alpha\). In sharp contrast, there is larger uncertainty in our hindcast predictions because these predictions also consider variation in the Negative Binomial sampling distribution (including uncertainty in the overdispersion parameter \(\phi\)). As above, here are 5 realizations for the hindcast predictions…
hc_predictions <- hindcast(model1, type = 'response')
plot(hc_predictions, realisations = TRUE, n_realisations = 5)
…and here are 20 realizations:
plot(hc_predictions, realisations = TRUE, n_realisations = 20)
predict
vs hindcast
Back to the modelling stragy. Although the model is trying to capture the dispersion in the data (the summary above shows that the \(\phi\) parameter is estimated to be very small, indicating a large amount of overdispersion after accounting for the offset), there are still some patterns in the residuals, including substantial autocorrelation at a lag of 1. We need a dynamic model to capture this autocorrelation, so next we add to the model by including an AR1 latent residuals model
model2 <- mvgam(recruits ~ offset(log_spawners),
family = nb(),
trend_model = AR(),
data = model_data)
Because this model has a dynamic trend component, it is worth quickly
revisiting the differences between the predict.mvgam
function and the hindcast.mvgam
/
forecast.mvgam
functions. First, we investigate how
ignoring the uncertainty in the trend component changes the predictions.
In this step, we calculate predictions for the observed time period in
two ways: first we average over the uncertainty from the latent trend,
and second we ignore this uncertainty. Plotting both of these together
shows how our prediction uncertainty changes when we include the
uncertainty in the dynamic trend component. A crucial detail to
understand here is that the predict.mvgam
function does not
account for the actual ‘times’ of the data, rather it assumes the
dynamic process has reached stationarity and simply samples from the
following distribution for the trend: \(z_t
\sim Normal(0, \sigma_{error})\)
# calculate predictions for the training data by including the dynamic process
# uncertainty
ypreds <- predict(model2, type = 'response',
process_error = TRUE, summary = FALSE)
# calculate 95% empirical quantiles of predictions
cred <- sapply(1:NCOL(ypreds),
function(n) quantile(ypreds[,n],
probs = c(0.025, 0.975), na.rm = T))
pred_vals <- 1:NCOL(ypreds)
# graphical plotting parameters
layout(matrix(1:2, ncol = 2))
par(mar = c(2.5, 2.3, 2, 2),
oma = c(1, 1, 0, 0),
mgp = c(1.5, 0.5, 0))
# plot credible intervals
plot(1, type = "n", bty = 'L',
xlab = 'Time',
ylab = 'predict.mvgam predictions',
xlim = c(min(pred_vals), max(pred_vals)),
ylim = c(min(cred, na.rm = T),
max(cred, na.rm = T)))
polygon(c(pred_vals, rev(pred_vals)), c(cred[1,], rev(cred[2,])),
col = 'darkred', border = NA)
# now repeat, but this time ignore uncertainty in the dynamic component
ypreds2 <- predict(model2, type = 'response',
process_error = FALSE, summary = FALSE)
cred2 <- sapply(1:NCOL(ypreds),
function(n) quantile(ypreds2[,n],
probs = c(0.025, 0.975), na.rm = T))
# overlay these predictions as a lighter colour
polygon(c(pred_vals, rev(pred_vals)), c(cred2[1,], rev(cred2[2,])),
col = "#DCBCBC", border = NA)
# overlay the actual observations as well
points(model_data$recruits, pch = 16, cex = 0.8, col = 'white')
points(model_data$recruits, pch = 16, cex = 0.65)
box(bty = 'l', lwd = 2)
# now plot the actual hindcasts, which did not assume the process was
# stationary but instead modelled its dynamics over the training time.
# use the same y-axis limits to see how these plots differ
plot_mvgam_fc(model2, ylim = c(min(cred, na.rm = T),
max(cred, na.rm = T)),
ylab = 'Hindcast predictions')
layout(1)
As you can see, the predictions differ. The
hindcast.mvgam
and forecast.mvgam
functions
use the time component of the data, so it is crucial that any
newdata
fed to the forecast.mvgam
function
follows on sequentially from the data that was used to fit the model
(this is not internally checked by the package because it might be a
headache to do so when data are not supplied in a specific time-order).
But the predictions from these functions might not be as useful for
scenario-planning because any scenarios would then be required to have
an explicit time dimension. Sometimes we’d rather get a sense of how
predictions might change on average if some external conditions
changed, for example if temperature were to increase by x degrees or if
we were to sample from a new group for a random effect. For exploring
these types of scenarios, the predict.mvgam
function is
more useful. This is the function that is used in any calls to
plot_predictions
, for example.
As an illustration of the some of the many uses for the
predict.mvgam
function, here we can compute how many
more recruits this model would expect to see if there were
5,000 spawners recorded vs if there were only 3,000 spawners recorded.
We can take full advantage of the avg_comparisons
function
from the marginaleffects
package to do this (see
?marginaleffects::avg_comparisons
for details)
# take draws from the model that compute the average comparison between
# spawners recorded at 5,000 vs 3,000 (be sure to log first!)
post_contrasts <- avg_comparisons(model2,
variables = list(log_spawners = log(c(3000, 5000)))) %>%
posteriordraws()
# use the resulting posterior draw object to plot a density of the
# posterior contrasts
post_contrasts %>%
ggplot(aes(x = draw)) +
# use the stat_halfeye function from tidybayes for a nice visual
stat_halfeye(fill = "#C79999") +
labs(x = "(spawners = 5,000) − (spawners = 3,000)", y = "Density",
title = "Average posterior contrast") +
theme_classic()
model1
’s posterior intercept
coefficients \(\alpha\). Use
?mvgam::as.data.frame.mvgam
for guidancerecruits
vs
log_spawners
using the supplies model_data
.
Take a few notes on whether you think our primary modelling assumption,
that the number of log(recruits
) scales linearly with
number of log_spawners
, is justified.# Replace the ? with the correct value(s)
# check names of variables that can be extracted
vars_available <- variables(model1)
# we are interested in the observation model coefficients
vars_available$observation_betas
# extract the intercepts using the as.data.frame.mvgam function
# you will have two options for extracting this variable. first, you
# can use the alias
alphas <- as.data.frame(model1, variable = ?)
# alternatively, use the original name
alphas <- as.data.frame(model1, variable = ?)
# plot a histogram
hist(alphas[,1], xlab = expression(alpha),
main = '', col = 'darkred', border = 'white')
Now on to the topic of model comparisons. Plot Dunn-Smyth residuals for this dynamic model to see that there is no more autocorrelation
plot(model2, type = 'residuals')
However we must be cautious interpreting these residuals because the AR1 trend model is extremely flexible. The in-sample fits can sometimes be nearly perfect, meaning the residuals will very often look excellent! We can confirm this behaviour by inspecting hindcasts
plot(model2, type = 'forecast')
But PPCs can still give valid insights into model inadequacy:
pp_check(model2, type = 'dens_overlay')
## Using 10 posterior draws for ppc type 'dens_overlay' by default.
## Warning in pp_check.mvgam(model2, type = "dens_overlay"): NA responses are not
## shown in 'pp_check'.
pp_check(model2, type = 'pit_ecdf')
## Using all posterior draws for ppc type 'pit_ecdf' by default.
## Warning in pp_check.mvgam(model2, type = "pit_ecdf"): NA responses are not
## shown in 'pp_check'.
These plots show that the model still tends to predict well. But there are some more worrying and pressing matters we can address with this model. Inspect the summary to see what I mean:
summary(model2)
## GAM formula:
## recruits ~ offset(log_spawners)
##
## Family:
## negative binomial
##
## Link function:
## log
##
## Trend model:
## AR1
##
## N series:
## 1
##
## N timepoints:
## 50
##
## Status:
## Fitted using Stan
## 4 chains, each with iter = 1000; warmup = 500; thin = 1
## Total post-warmup draws = 2000
##
##
## Observation dispersion parameter estimates:
## 2.5% 50% 97.5% Rhat n_eff
## phi[1] 4.3 16 310 1.07 64
##
## GAM coefficient (beta) estimates:
## 2.5% 50% 97.5% Rhat n_eff
## (Intercept) -0.0077 0.6 2.1 1.09 32
##
## Latent trend parameter AR estimates:
## 2.5% 50% 97.5% Rhat n_eff
## ar1[1] 0.28 0.63 0.95 1.03 106
## sigma[1] 0.53 0.72 0.94 1.01 271
##
## Stan MCMC diagnostics:
## n_eff / iter looks reasonable for all parameters
## Rhats above 1.05 found for 43 parameters
## *Diagnose further to investigate why the chains have not mixed
## 56 of 2000 iterations ended with a divergence (2.8%)
## *Try running with larger adapt_delta to remove the divergences
## 0 of 2000 iterations saturated the maximum tree depth of 12 (0%)
## Chain 4: E-FMI = 0.0674
## *E-FMI below 0.2 indicates you may need to reparameterize your model
##
## Samples were drawn using NUTS(diag_e) at Tue May 07 11:50:47 AM 2024.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split MCMC chains
## (at convergence, Rhat = 1)
The HMC sampler is telling us there are clear issues when trying to implement both a flexible latent trend AND an overdispersion parameter. Both of these components are competing to capture the extra dispersion in the observations, leading to low effective sample sizes and poor convergence for the overdispersion parameter \(\phi\). It turns out to be very challenging to estimate them both without making some strong adjustments to the model. We now have several options:
Here we will illustrate the latter two options. First, we try the Negative Binomial observation model with a smooth Gaussian Process for the trend
model3 <- mvgam(recruits ~ offset(log_spawners),
family = nb(),
trend_model = 'GP',
data = model_data)
Next we use a Poisson observation model with a AR1 trend
model4 <- mvgam(recruits ~ offset(log_spawners),
family = poisson(),
trend_model = AR(),
data = model_data)
Neither of these models has any major issues with sampling, which is a good sign!
summary(model3)
## GAM formula:
## recruits ~ offset(log_spawners)
##
## Family:
## negative binomial
##
## Link function:
## log
##
## Trend model:
## GP
##
## N series:
## 1
##
## N timepoints:
## 50
##
## Status:
## Fitted using Stan
## 4 chains, each with iter = 1000; warmup = 500; thin = 1
## Total post-warmup draws = 2000
##
##
## Observation dispersion parameter estimates:
## 2.5% 50% 97.5% Rhat n_eff
## phi[1] 1.4 2.2 3.2 1 1047
##
## GAM coefficient (beta) estimates:
## 2.5% 50% 97.5% Rhat n_eff
## (Intercept) 0.19 0.74 1.3 1 1067
##
## Latent trend marginal deviation (alpha) and length scale (rho) estimates:
## 2.5% 50% 97.5% Rhat n_eff
## alpha_gp[1] 0.11 0.55 1.1 1 705
## rho_gp[1] 1.10 2.80 18.0 1 607
##
## Stan MCMC diagnostics:
## n_eff / iter looks reasonable for all parameters
## Rhat looks reasonable for all parameters
## 13 of 2000 iterations ended with a divergence (0.65%)
## *Try running with larger adapt_delta to remove the divergences
## 0 of 2000 iterations saturated the maximum tree depth of 12 (0%)
## E-FMI indicated no pathological behavior
##
## Samples were drawn using NUTS(diag_e) at Tue May 07 11:51:57 AM 2024.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split MCMC chains
## (at convergence, Rhat = 1)
summary(model4)
## GAM formula:
## recruits ~ offset(log_spawners)
##
## Family:
## poisson
##
## Link function:
## log
##
## Trend model:
## AR1
##
## N series:
## 1
##
## N timepoints:
## 50
##
## Status:
## Fitted using Stan
## 4 chains, each with iter = 1000; warmup = 500; thin = 1
## Total post-warmup draws = 2000
##
##
## GAM coefficient (beta) estimates:
## 2.5% 50% 97.5% Rhat n_eff
## (Intercept) 0.074 0.49 1.1 1.02 246
##
## Latent trend parameter AR estimates:
## 2.5% 50% 97.5% Rhat n_eff
## ar1[1] 0.30 0.57 0.86 1 824
## sigma[1] 0.63 0.77 0.98 1 1306
##
## Stan MCMC diagnostics:
## n_eff / iter looks reasonable for all parameters
## Rhat looks reasonable for all parameters
## 0 of 2000 iterations ended with a divergence (0%)
## 32 of 2000 iterations saturated the maximum tree depth of 12 (1.6%)
## *Run with max_treedepth set to a larger value to avoid saturation
## E-FMI indicated no pathological behavior
##
## Samples were drawn using NUTS(diag_e) at Tue May 07 11:53:11 AM 2024.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split MCMC chains
## (at convergence, Rhat = 1)
Hindcasts and trend estimates are very different for the two models:
plot(model3, type = 'trend')
plot(model4, type = 'trend')
plot(model3, type = 'forecast')
plot(model4, type = 'forecast')
Here the extreme flexibility of the AR1 trend is apparent in the
hindcasts for model4
. Hopefully this gives you more of an
idea about why we can’t put too much faith in residual or fitted value
diagnostics for these flexible dynamic models. Ok so we cannot use
in-sample fits or residual diagnostics to guide reasonable comparisons
of these two models. It would be more appropriate to use 1-step ahead
residuals for interrogation. But computing these is extremely
computationally demanding. How then can we compare models? Using
loo_compare()
, we can ask which model would likely give
better predictions for new data within the training set. This
method of approximating leave-one-out prediction error (using a method
called Pareto Smoothed Importance Sampling (PSIS)) is widely-used for
comparing Bayesian models, and is generally interpreted in the same way
that AIC comparisons can be interpreted. Here I use a modified
loo()
function that will be implemented in the next CRAN
release of mvgam
to compute in-sample LOO values (don’t
worry about these technicalities):
loo_hcs = function(x, ...){
x$series_names <- levels(x$obs_data$series)
logliks <- logLik(x,
linpreds = predict(x,
newdata = x$obs_data,
type = 'link',
summary = FALSE,
process_error = TRUE),
newdata = x$obs_data,
family_pars = mvgam:::extract_family_pars(x))
logliks <- logliks[,!apply(logliks, 2, function(x) all(!is.finite(x)))]
releffs <- loo::relative_eff(exp(logliks),
chain_id = sort(rep(1:x$model_output@sim$chains,
(NROW(logliks) /
x$model_output@sim$chains))))
loo::loo(logliks, r_eff = releffs, ...)
}
loo::loo_compare(list(model3 = loo_hcs(model3),
model4 = loo_hcs(model4)))
## elpd_diff se_diff
## model3 0.0 0.0
## model4 -7095298.8 1057034.9
The Gaussian Process / Negative Binomial model is strongly favoured here, yielding much higher values of the Expected Log Predictive Density (ELPD; also known as the log score). This is one useful metric for choosing among models, but we’d also like to compare them on their abilities to predict future out of sample data.
mvgam
includes functions that facilitate
leave-future-out comparisons. Time series models are often evaluated
using an expanding window training technique, where the model is
initially trained on some subset of data from \(t = 1\) to \(t =
n_{train}\), and then is used to produce forecasts for the next
\(h\) time steps \((t = n_{train} + h)\). In the next
iteration, the size of training data is expanded by a single time point
and the process repeated. This is obviously computationally challenging
for Bayesian time series models, as the number of refits can be very
large. mvgam
uses an approximation based on importance
sampling. Briefly, we refit the model using the first \(min_t\) observations to perform a single
exact \(h~ahead\) forecast step. This
forecast is evaluated against the \(min_t +
h\) out of sample observations using the ELPD as with the LOO
comparisons above. Next, we approximate each successive round of
expanding window forecasts by moving forward one step at a time \(for~i~in~1:N_{evaluations}\) and
re-weighting draws from the model’s posterior predictive distribution
using PSIS. In each iteration \(i\),
PSIS weights are obtained for all observations that would have been
included in the model if we had re-fit. If these importance ratios are
stable, we consider the approximation adequate and use the re-weighted
posterior’s forecast for evaluating the next holdout set of testing
observations \(((min_t + i + 1):(min_t + i +
h))\). This is similar to the process of particle filtering to
update forecasts in light of new data by re-weighting the posterior
draws using importance weights. But at some point the importance ratio
variability will become too large and importance sampling will be
unreliable. This is indicated by the estimated shape parameter k of the
generalized Pareto distribution crossing a certain threshold
pareto_k_threshold
. Only then do we refit the model using
all of the observations up to the time of the failure. We then restart
the process and iterate forward until the next refit is triggered. The
process, which is implemented using the lfo_cv
function, is
computationally much more efficient, as only a fraction of the
evaluations typically requires refits (the algorithm is described in
detail by Bürkner et al. 2020).
Run the approximator for the first model, setting
min_t = 20
and using a forecast horizon of
fc_horizon = 3
lfo_gp <- lfo_cv(model3, min_t = 20,
pareto_k_threshold = 0.5,
fc_horizon = 3, n_cores = 4)
The S3
plotting function for these lfo_cv
objects will show the Pareto-k values and ELPD values over the
evaluation time points. For the Pareto k plot, a dashed red line
indicates the specified threshold chosen for triggering model refits.
For the ELPD plot, a dashed red line indicates the bottom 10% quantile
of ELPD values. Points below this threshold may represent outliers that
were more difficult to forecast
plot(lfo_gp)
The top plot (the Pareto k plot) indicates that we only needed one additional refit to give good approximate predictions for the Gaussian Process / Negative Binomial model
lfo_ar <- lfo_cv(model4, min_t = 20,
pareto_k_threshold = 0.5,
fc_horizon = 3, n_cores = 4)
plot(lfo_ar)
The Poisson / AR model, in contrast, needed to be refit many times to ensure the stability of importance-weighted predictions. We would generally expect more refits for this model as the underlying trend has less “memory” than the GP trend, so its forecasts could become irrelevant more quickly as a result.
Compare ELPDs for the two models using all evaluation timepoints when observations weren’t missing by plotting the differences in their scores. This plot will give us an indication if one model is consistently giving better h-step ahead forecasts when trained on successively more data
# determine which timepoints were not missing in the evaluation set
nonmiss_evalpoints <- which(!is.na(model_data$recruits[26:NROW(model_data)]))
# plot the GP ELPD values minus the AR ELPD values
# values will be > 0 if the GP model gave better predictions
# (ELPD is better when larger)
gp_minus_ar <- lfo_gp$elpds[nonmiss_evalpoints] - lfo_ar$elpds[nonmiss_evalpoints]
plot(gp_minus_ar,
ylab = 'GP - AR ELPD',
xlab = 'Time point',
pch = 16,
bty = 'l',
cex = 1,
col = 'white',
xaxt = 'n')
axis(side = 1, at = nonmiss_evalpoints,
labels = lfo_gp$eval_timepoints[nonmiss_evalpoints])
abline(h = 0, lty = 'dashed', lwd = 2)
# show all ELPD comparisons in black
points(gp_minus_ar, pch = 16, cex = 0.85)
# highlight the comparisons for when the GP predictions were better
points(ifelse(gp_minus_ar > 0, gp_minus_ar, NA),
pch = 16, cex = 0.85, col = 'darkred')
The Gaussian Process / Negative Binomial model tends to give better probabilistic predictions throughout the leave-future-out exercise. This result strongly agrees with the results from the leave-one-out approximator above, which makes model selection easy in this case. In-sample residuals for the Gaussian Process / Negative Binomial model look fairly good as well, suggesting we should definitely stick with this model going forward:
plot(model3, type = 'residuals')
If we have enough computational power and our model is simple / small
enough, it can often be worthwhile to compute exact leave-future-out
forecasts. The update.mvgam
function makes this relatively
simple to do in mvgam
, as all we need to do is supply the
new testing and training datasets and the model will be updated. Below I
show how to do this for our chosen model (model3
) by
splitting on timepoint 36 to create training and testing data.
data_train = model_data %>%
dplyr::filter(time <= 36)
data_test = model_data %>%
dplyr::filter(time > 36)
Now refit the model using the update.mvgam
function.
Note that you can supply an argument to state lfo = TRUE
if
you wish to ignore the computation of residuals and lighten the final
model. This can be helpful if all you plan to do is compare forecasts,
as it will make repeated calls to the update
function
faster. See ?mvgam::update.mvgam
for more details
model3_exact <- update(model3,
data = data_train,
newdata = data_test,
lfo = TRUE)
Once this model is finished, we are ready to compute exact forecast
evaluations. The first step in this process is to create an object of
class mvgam_forecast
using the forecast.mvgam
function:
fc <- forecast(model3_exact)
We’ve seen this class before when calling the
hindcast.mvgam
or forecast.mvgam
functions.
But we haven’t looked in-depth at the other ways we can use objects of
this class. A primary purpose is to readily allow forecast evalutions
for each series in the data, using a variety of possible scoring
functions. See ?mvgam::score.mvgam_forecast
to view the
types of scores that are available. For these data, which have very
large counts, a useful scoring metric is the Continuous Rank Probability Score (CRPS). A CRPS
value is similar to what we might get if we calculated a weighted
absolute error using the full forecast distribution.
scores <- score(fc, score = 'crps')
scores$sockeye
## score in_interval interval_width eval_horizon score_type
## 1 6022.7278 0 0.9 1 crps
## 2 3560.9737 1 0.9 2 crps
## 3 6604.2062 1 0.9 3 crps
## 4 6794.8664 1 0.9 4 crps
## 5 881.5366 1 0.9 5 crps
## 6 1757.0302 1 0.9 6 crps
## 7 2622.2656 1 0.9 7 crps
## 8 NA NA 0.9 8 crps
## 9 NA NA 0.9 9 crps
## 10 NA NA 0.9 10 crps
## 11 NA NA 0.9 11 crps
## 12 NA NA 0.9 12 crps
## 13 NA NA 0.9 13 crps
## 14 NA NA 0.9 14 crps
The returned data.frame
now shows the CRPS score for
each evaluation in the testing data, along with some other useful
information about the fit of the forecast distribution. In particular,
we are given a logical value (1s and 0s) telling us whether the true
value was within a pre-specified credible interval (i.e. the coverage of
the forecast distribution). The default interval width is 0.9, so we
would hope that the values in the in_interval
column take a
1 approximately 90% of the time. This value can be changed if you wish
to compute different coverages, say using a 60% interval:
scores <- score(fc, score = 'crps', interval_width = 0.6)
scores$sockeye
## score in_interval interval_width eval_horizon score_type
## 1 6022.7278 0 0.6 1 crps
## 2 3560.9737 0 0.6 2 crps
## 3 6604.2062 0 0.6 3 crps
## 4 6794.8664 1 0.6 4 crps
## 5 881.5366 1 0.6 5 crps
## 6 1757.0302 0 0.6 6 crps
## 7 2622.2656 0 0.6 7 crps
## 8 NA NA 0.6 8 crps
## 9 NA NA 0.6 9 crps
## 10 NA NA 0.6 10 crps
## 11 NA NA 0.6 11 crps
## 12 NA NA 0.6 12 crps
## 13 NA NA 0.6 13 crps
## 14 NA NA 0.6 14 crps
Although we would prefer to use distributional forecast evaluations,
point evaluations are also possible using outputs from the
forecast.mvgam
function. For example, we can calculate the
Mean Absolute Percentage Error (MAPE), which is a commonly used metric
to evaluate point forecasts when the scales of the time series vary
quite a lot:
# exctract the true values from the test set
truth <- fc$test_observations$sockeye
# calculate mean prediction values for each column in the forecast matrix
mean_fc <- apply(fc$forecasts$sockeye, 2, mean)
# calculate the absolute percentage errors, which are scaled by the
# value of the truths. Note that we only use the first 7 observations
# in the test set because the rest are NA
p <- abs(100*((truth[1:7] - mean_fc[1:7]) / mean_fc[1:7]))
# take the mean of the absolute percentage errors
mean(p)
## [1] 73.3368
Finally, we can look at one more type of score. The
lfo_cv
function we used above for approximate
leave-future-out CV currently can only use the ELPD to score forecasts.
This is fine as the ELPD is a strictly proper scoring rule that can be
applied to any distributional forecast, but currently there is no
straightforward way to calculate it for an object of clases
mvgam_forecast
like our fc
object. But we can
calculate it for all data that was fed to the model (in this case the
training and testing data) using the logLik.mvgam
function.
For example, we can calculate the predictive density for our refit like
this:
# calculate predictive density values for each posterior draw for the
# training and testing observations
log_scores <- logLik(model3_exact, n_cores = 2)
This function returns the log(probability) of the observation given
the distributional assumptions (i.e. that the truth is distributed
Negative Binomial with a mean centred around the predicted expectation
and overdispersion given by the estimated \(\phi\) parameter). We can illustrate how
this works with a simple example. Assume our model gave a posterior
expectation of 21 for a particular timepoint. The model has also
estimated an overdispersion parameter of 5, and the true observed value
at this timepoint is 39. The log score is then calculated using the
correct density function (dnbinom
in this case; see
?dnbinom
for details):
dnbinom(x = 39, mu = 21, size = 5, log = TRUE)
## [1] -4.849416
We would get a higher score (i.e. a better score) if our model’s expectation was closer to the truth:
dnbinom(x = 39, mu = 41, size = 5, log = TRUE)
## [1] -3.860454
In contrast, we would get a lower score (i.e. a worse score) if the expectation was further from the truth:
dnbinom(x = 39, mu = 12, size = 5, log = TRUE)
## [1] -7.979571
The logLik.mvgam
function does this for us, computing
the log(probability) scores using the correct distribution for the
model, and returning a matrix of scores (one for each posterior draw *
observation combination). The ELPD for each out of sample observation
can is calculated by taking the mean of these scores. This option is
available in mvgam
as well, but note that you must use
type = 'link'
when producing forecasts so that expectations
can be calculated. Because of this reason, interval coverages are not
calculated when using type = 'elpd'
:
fc_linpreds <- forecast(model3_exact, type = 'link')
scores <- score(fc_linpreds, score = 'elpd')
scores$sockeye
## score eval_horizon score_type
## 1 -10.330861 1 elpd
## 2 -9.608175 2 elpd
## 3 -10.263423 3 elpd
## 4 -10.411050 4 elpd
## 5 -8.482237 5 elpd
## 6 -8.875313 6 elpd
## 7 -9.331130 7 elpd
## 8 NA 8 elpd
## 9 NA 9 elpd
## 10 NA 10 elpd
## 11 NA 11 elpd
## 12 NA 12 elpd
## 13 NA 13 elpd
## 14 NA 14 elpd
The pattern in these scores reflects the pattern from the
crps
scores above (remember that for ELPD, the goal is to
achieve higher scores).
fc
object using the formula provided
by Hyndman and Athanasopoulosmodel3
(i.e. split the data on timepoint 37 rather than
36). Compare CRPS values from this model to the previous refit to see
whether the scores for timepoints 38-43 have changed when conditioning
on the extra information in timepoint 37sessionInfo()
## R version 4.3.1 (2023-06-16 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 11 x64 (build 22621)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=English_Australia.utf8 LC_CTYPE=English_Australia.utf8
## [3] LC_MONETARY=English_Australia.utf8 LC_NUMERIC=C
## [5] LC_TIME=English_Australia.utf8
##
## time zone: Australia/Brisbane
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggplot2_3.5.0 tidybayes_3.0.6
## [3] mvgam_1.1.0 insight_0.19.7
## [5] marginaleffects_0.17.0.9002 brms_2.21.0
## [7] Rcpp_1.0.12 mgcv_1.8-42
## [9] nlme_3.1-162 dplyr_1.1.4
## [11] downloadthis_0.3.3
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.1 svUnit_1.0.6 farver_2.1.1
## [4] scoringRules_1.1.1 loo_2.6.0 fastmap_1.1.1
## [7] tensorA_0.36.2.1 digest_0.6.34 timechange_0.2.0
## [10] mime_0.12 lifecycle_1.0.4 StanHeaders_2.26.28
## [13] magrittr_2.0.3 posterior_1.5.0 compiler_4.3.1
## [16] rlang_1.1.3 sass_0.4.8 tools_4.3.1
## [19] utf8_1.2.4 yaml_2.3.8 collapse_2.0.7
## [22] data.table_1.14.10 knitr_1.45 labeling_0.4.3
## [25] bridgesampling_1.1-2 pkgbuild_1.4.3 plyr_1.8.9
## [28] abind_1.4-5 klippy_0.0.0.9500 withr_3.0.0
## [31] purrr_1.0.2 grid_4.3.1 stats4_4.3.1
## [34] fansi_1.0.6 colorspace_2.1-0 inline_0.3.19
## [37] scales_1.3.0 MASS_7.3-60 cli_3.6.2
## [40] mvtnorm_1.2-4 rmarkdown_2.25 generics_0.1.3
## [43] RcppParallel_5.1.7 rstudioapi_0.15.0 reshape2_1.4.4
## [46] cachem_1.0.8 rstan_2.32.3 stringr_1.5.1
## [49] splines_4.3.1 bayesplot_1.10.0 assertthat_0.2.1
## [52] parallel_4.3.1 matrixStats_1.2.0 base64enc_0.1-3
## [55] vctrs_0.6.5 Matrix_1.6-4 jsonlite_1.8.8
## [58] arrayhelpers_1.1-0 ggdist_3.3.1 jquerylib_0.1.4
## [61] tidyr_1.3.1 glue_1.7.0 codetools_0.2-19
## [64] distributional_0.3.2 lubridate_1.9.3 stringi_1.8.3
## [67] gtable_0.3.4 QuickJSR_1.0.9 munsell_0.5.0
## [70] tibble_3.2.1 pillar_1.9.0 htmltools_0.5.7
## [73] Brobdingnag_1.2-9 R6_2.5.1 evaluate_0.23
## [76] lattice_0.21-8 highr_0.10 backports_1.4.1
## [79] bslib_0.6.1 rstantools_2.3.1.1 bsplus_0.1.4
## [82] zip_2.3.0 coda_0.19-4.1 gridExtra_2.3
## [85] checkmate_2.3.1 xfun_0.42 fs_1.6.3
## [88] pkgconfig_2.0.3