Ecological forecasting in R

Exercises and associated data

The data and modelling objects created in this notebook can be downloaded directly to save computational time.

Users who wish to complete the exercises can download a small template R script. Assuming you have already downloaded the data objects above, this script will load all data objects so that the steps used to create them are not necessary to tackle the exercises.

Load libraries and time series data

This tutorial relates to content covered in Lecture 4, and relies on the following packages for manipulating data, shaping time series, fitting dynamic regression models and plotting:

library(dplyr)
library(mvgam) 
library(marginaleffects)
library(tidybayes)
library(ggplot2); theme_set(theme_classic())

A sockeye salmon (Oncorhynchus nerka) in spawning colours; photo Wikimedia Commons

The data we will use for this tutorial is spawner-recruit data for sockeye salmon (Oncorhynchus nerka) from the Bristol Bay region in Southwest Alaska. Commercial fishermen in Alaska target this important species using seines and gillnets for fresh or frozen fillet sales and canning. Records for the rivers in Bristol Bay span the years 1952-2005, and are maintained within the RAM Legacy Stock Assessment Database. Spawner estimates originated from Bristol Bay Area Annual Management Reports from the Alaska Department of Fish and Game, and are based on in-river spawner counts (including both tower counts and aerial surveys). Age-specific catch rates were calculated from records of catch and escapement according to age class. These data were used to determine the number of recruits in each year.

Map of the rivers in Bristol Bay, Alaska sampled for sockeye spawners; photo courtesy of Holmes, Ward and Scheuerell, https://atsa-es.github.io/

Manipulating data for modelling

Below we process the data, keeping observations for a single river and taking \(log(spawners_{t})\) to use as the offset in our models. We also retain two possibly important covariates, pdo_winter_t2 and pdo_summer_t2. These variables, representing the Pacific Decadal Oscillation (PDO) during the salmon’s first year in the ocean (i.e. 2 years prior to sampling), are widely believed to reflect “bottlenecks” to salmon survival in their first year.

# access the data from the ATSA Github site
load(url('https://github.com/atsa-es/atsalibrary/raw/master/data/sockeye.RData'))
sockeye %>%
  
  # Remove any previous grouping structures
  dplyr::ungroup() %>%
  
  # select the target river
  dplyr::filter(region == "KVICHAK") %>%
  
  # filter the data to only include years when number of spawners
  # was estimated, as this will be used as the offset
  dplyr::filter(!is.na(spawners)) %>%
  
  # log the spawners variable to use as an appropriate offset
  dplyr::mutate(log_spawners = log(spawners)) %>%
  
  # add a 'time' variable
  dplyr::mutate(time = dplyr::row_number()) %>%
  
  # add a series variable (as a factor)
  dplyr::mutate(series = as.factor('sockeye')) %>%
  
  # keep variables of interest
  dplyr::select(time, series, log_spawners, 
                recruits, pdo_summer_t2, pdo_winter_t2) -> model_data

The data now contain four variables: time, the indicator of which time step each observation belongs to series, a factor indexing which time series each observation belongs to
log_spawners, the logged number of recorded sockeye spawners (in thousands), measured as the number of returning spawners minus catch at each timepoint, which will be used as an offset in models recruits, the response variable representing the number of estimated sockeye recruits at each timepoint (in thousands)

Check the data structure

head(model_data)

## # A tibble: 6 × 6
##    time series  log_spawners recruits pdo_summer_t2 pdo_winter_t2
##   <int> <fct>          <dbl>    <dbl>         <dbl>         <dbl>
## 1     1 sockeye         9.15    39000         -0.5          -0.31
## 2     2 sockeye         7.95     4090         -2.36         -1.78
## 3     3 sockeye         6.28      289         -1.48         -2.14
## 4     4 sockeye         6.52      547          0.58          0   
## 5     5 sockeye         9.59    55300          1.05          0.68
## 6     6 sockeye         8.22     3500         -0.22          0.77

dplyr::glimpse(model_data)

## Rows: 50
## Columns: 6
## $ time          <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
## $ series        <fct> sockeye, sockeye, sockeye, sockeye, sockeye, sockeye, so…
## $ log_spawners  <dbl> 9.152711, 7.951559, 6.282267, 6.522093, 9.588777, 8.2187…
## $ recruits      <dbl> 39000, 4090, 289, 547, 55300, 3500, 5360, 1120, 5750, 45…
## $ pdo_summer_t2 <dbl> -0.50, -2.36, -1.48, 0.58, 1.05, -0.22, 0.01, -0.73, -1.…
## $ pdo_winter_t2 <dbl> -0.31, -1.78, -2.14, 0.00, 0.68, 0.77, 0.07, -0.42, -0.8…

Visualize the data as a heatmap to get a sense of where any NAs are distributed (NAs are shown as red bars in the below plot)

image(
  is.na(t(model_data %>%
            dplyr::arrange(dplyr::desc(time)))), axes = F,
  col = c('grey80', 'darkred')
)
axis(3, 
     at = seq(0,1, len = NCOL(model_data)), 
     labels = colnames(model_data))

Plot properties of the time series, which indicate some overdispersion in the recruit count values that we’ll have to address

Posterior simulation in `mvgam`

We will fit a number of models to these data to showcase the various methods that are available to us for evaluating and comparing dynamic GLMs / GAMs. We begin with a model that does not include a dynamic component, as this is always a useful starting point. It also allows us to highlight another useful feature of GLMs / GAMs when modelling count data, as we can include an offset variable that allows us to model rates of interest.

Sometimes it is more relevant to model the rate an event occurs per some underlying unit. For example, in epidemiological modelling the interest is often in understanding the number of disease cases per population. But we never get to observe this rate directly. An offset allows us to model the underlying rate by algebraically accounting for variation in the number of possible counts in each observation unit. This works naturally in models that use a log link function, such as the Poisson and Negative Binomial models. Including an offset is usually straightforward in most R modelling software, and indeed this is the case in mvgam as well. Here we fit a model with no trend and Poisson observations, including an offset of the log_spawners variable that we calculated above so we can estimate the rate of recruitment per spawner. Our first model includes smooth functions of pdo_winter_t2 and pdo_summer_t2 (including an interaction among the two), along with an offset of log_spawners in the linear predictor. As in previous tutorials, only 250 samples per chain are used to keep the size of the resulting object small (though we’d always want to use more samples in a real analysis). Note that this is not a classic density-dependent stock-recruitment model such as the Ricker or Beverton Holt models. Rather, this model very simplistically assumes recruits can increase without bound as a function of the number of spawners.

model1 <- mvgam(
  recruits ~ offset(log_spawners) +
    s(pdo_winter_t2, 
      k = 3, 
      bs = 'cr') +
    s(pdo_summer_t2, 
      k = 3, 
      bs = 'cr') +
    ti(pdo_winter_t2,
       pdo_summer_t2,
       k = 3,
       bs = 'cr'),
  family = poisson(),
  trend_model = 'None',
  data = model_data,
  samples = 250
)

The model summary is shown below, which indicates there are no major sampling issues to worry about. It also suggests that the interaction among the two covariates is highly wiggly and important in the model (as are the main effects)

summary(model1)

## GAM formula:
## recruits ~ offset(log_spawners) + s(pdo_winter_t2, k = 3, bs = "cr") + 
##     s(pdo_summer_t2, k = 3, bs = "cr") + ti(pdo_winter_t2, pdo_summer_t2, 
##     k = 3, bs = "cr")
## 
## Family:
## poisson
## 
## Link function:
## log
## 
## Trend model:
## None
## 
## N series:
## 1 
## 
## N timepoints:
## 50 
## 
## Status:
## Fitted using Stan 
## 4 chains, each with iter = 1000; warmup = 750; thin = 1 
## Total post-warmup draws = 1000
## 
## 
## GAM coefficient (beta) estimates:
##                                   2.5%  50% 97.5% Rhat n_eff
## (Intercept)                       -1.6 -1.5  -1.5    1   298
## s(pdo_winter_t2).1                 6.1  6.2   6.3    1   374
## s(pdo_winter_t2).2                -1.6 -1.6  -1.5    1   549
## s(pdo_summer_t2).1                 5.0  5.1   5.2    1   345
## s(pdo_summer_t2).2                -5.8 -5.7  -5.7    1   397
## ti(pdo_winter_t2,pdo_summer_t2).1  5.4  5.5   5.6    1   428
## ti(pdo_winter_t2,pdo_summer_t2).2 27.0 27.0  28.0    1   380
## ti(pdo_winter_t2,pdo_summer_t2).3 12.0 12.0  13.0    1   474
## ti(pdo_winter_t2,pdo_summer_t2).4 16.0 16.0  16.0    1   308
## 
## Approximate significance of GAM smooths:
##                                   edf Ref.df Chi.sq p-value    
## s(pdo_winter_t2)                1.686      2   18.3  <2e-16 ***
## s(pdo_summer_t2)                0.984      2   26.6  <2e-16 ***
## ti(pdo_winter_t2,pdo_summer_t2) 3.516      4   28.0  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Stan MCMC diagnostics:
## ✔ No issues with effective samples per iteration
## ✔ Rhat looks good for all parameters
## ✔ No issues with divergences
## ✔ No issues with maximum tree depth
## 
## Samples were drawn using sampling(hmc). For each parameter, n_eff is a
##   crude measure of effective sample size, and Rhat is the potential scale
##   reduction factor on split MCMC chains (at convergence, Rhat = 1)
## 
## Use how_to_cite() to get started describing this model

The inclusion of the offset is important if we wish to make predictions on the scale of the observations. This is because there will be no regression coefficient assigned to this variable (it’s coefficient is fixed at 1). If you wish to manually make predictions, you’ll need to take care to include the offset. This is returned as an attribute of the linear predictor matrix when we call predict.gam on the underlying mgcv model:

lpmatrix <- predict(model1$mgcv_model, type = 'lpmatrix')
attr(lpmatrix, 'model.offset')

##  [1]  9.152711  7.951559  6.282267  6.522093  9.588777  8.218787  7.855545
##  [8]  5.826000  6.863803 10.098232  8.237479  8.077137  7.847763  9.034796
## [15]  9.539644  7.779049  6.917706  5.424950  8.396155  9.480368  7.585789
## [22]  7.200425  8.330864  9.323669  9.769956  7.467371  7.038784  8.180321
## [29]  9.259131  8.883224  7.073270  8.711114  8.311398  9.026418  8.849371
## [36]  8.347590  8.461680  8.301522  9.028818  9.210340  7.279319  7.313220
## [43]  7.740664  8.732305  7.512071  7.003065  6.556778  7.432484  8.612503
## [50]  7.749322

mvgam makes this process straightforward using the predict.mvgam(), which is then used by many other functions in the package including plot_predictions(), draw(), hindcast() and forecast(). It is also used by conditional_effects(), which is one of the first functions you will likely use after fitting a model in mvgam. This function acts as a wrapper to the more flexible plot_predictions() and can give a quick overview of the “main” effects in a model’s regression formulae. It displays all combinations up to two-way interactions. See ?conditional_effects.mvgam for guidance.

conditional_effects(
  model1, 
  type = 'link'
)

I usually recommend you use type = 'link' or type = 'expected' when plotting these, as the default type = 'response' can sometimes make it hard to see patterns for certain response families. But because we used a Poisson family, the patterns won’t change too much between the default type = 'response' and type = 'expected' (apart from wider uncertainty bands):

conditional_effects(
  model1, 
  type = 'expected'
)

conditional_effects(
  model1, 
  type = 'response'
)

Although conditional_effects() is great for getting a quick overview, you will often need to move to plot_predictions(), plot_slopes(), and various comparisons() functions so that you can get the key combinations of predictor values that you are after. For example, we can plot the conditional effects of the two covariates using plot_predictions(). This function will internally call the datagrid() function from marginaleffects to generate a data grid of user-specified values to be supplied as newdata to mvgam’s predict() function. By defauls, any other numeric covariates will be set to 0, and any other factor variables will be set to their first levels. To get a sense of what this looks like, let’s generate a few grids. First, we will make a grid using the minimum and maximum values of pdo_winter_t2. Notice how all other values in the grid are identical across the rows of the grid:

datagrid(
  pdo_winter_t2 = range,
  model = model1
)

##   log_spawners pdo_summer_t2 time  series pdo_winter_t2 rowid
## 1     8.067752       -0.3032   26 sockeye         -2.14     1
## 2     8.067752       -0.3032   26 sockeye          1.43     2

We can make this grid as simple or as fancy as we want. Here we will provide a set sequence of values for pdo_winter_t2:

datagrid(
  pdo_winter_t2 = c(-2, -1, 0, 1, 2),
  model = model1
)

##   log_spawners pdo_summer_t2 time  series pdo_winter_t2 rowid
## 1     8.067752       -0.3032   26 sockeye            -2     1
## 2     8.067752       -0.3032   26 sockeye            -1     2
## 3     8.067752       -0.3032   26 sockeye             0     3
## 4     8.067752       -0.3032   26 sockeye             1     4
## 5     8.067752       -0.3032   26 sockeye             2     5

This same logic can easily be extended to multiple predictors:

datagrid(
  pdo_winter_t2 = c(-2, -1, 0, 1, 2),
  pdo_summer_t2 = c(-3, 3),
  model = model1
)

##    log_spawners time  series pdo_winter_t2 pdo_summer_t2 rowid
## 1      8.067752   26 sockeye            -2            -3     1
## 2      8.067752   26 sockeye            -2             3     2
## 3      8.067752   26 sockeye            -1            -3     3
## 4      8.067752   26 sockeye            -1             3     4
## 5      8.067752   26 sockeye             0            -3     5
## 6      8.067752   26 sockeye             0             3     6
## 7      8.067752   26 sockeye             1            -3     7
## 8      8.067752   26 sockeye             1             3     8
## 9      8.067752   26 sockeye             2            -3     9
## 10     8.067752   26 sockeye             2             3    10

In the above, you can see how we get a grid with every possible combination of predictors. See ?marginaleffects::datagrid for details of some of the other things you can do when making these fake data grids. When we call plot_predictions() and pass variables to the condition or by arguments, a datagrid is being created internally to automatically allow us to inspect the kinds of effects we’d like to look at. For example, supplying one of the two covariates to the condition argument will set up a grid using a fine sequence of fake values for the covariate of interest, while setting the other covariate to zero. This allows us to produce a smooth plot of predictions across the range of observed values for the specified covariate:

plot_predictions(
  model1,
  condition = 'pdo_winter_t2',
  type = 'expected'
)

plot_predictions(
  model1,
  condition = 'pdo_summer_t2',
  type = 'expected'
)

Supplying more than one covariate to the condition argument will set up more in-depth visualisations:

plot_predictions(
  model1,
  condition = c('pdo_summer_t2',
                'pdo_winter_t2'),
  type = 'expected'
)

Alternatively, we can supply covariates to the by argument. This will only used those values that were actually observed in the original data, rather than making a sequence of fake values. This can be very handy for looking at hindcast distributions from some of our models by passing time to the by argument. Adding a non-zero value for the points argument will show the observed data points

plot_predictions(
  model1,
  by = 'time',
  points = 0.5
)

The plot_slopes() function is also highly useful, in this case for understanding how a nonlinear effect changes over the space of a covariate (i.e. what is the first derivative of the function over a grid of covariate values).

plot_slopes(
  model1,
  variables = 'pdo_summer_t2',
  condition = 'pdo_summer_t2',
  type = 'link'
) +
  geom_hline(yintercept = 0,
             linetype = 'dashed')

There are many more extended uses of predict.mvgam(). For example, we can compute how many more recruits this model would expect to see if there were 5,000 spawners recorded vs if there were only 3,000 spawners recorded. We will full advantage of the avg_comparisons() function from marginaleffects to do this. As with plot_predictions(), this function will internally set up a datagrid that will be passed to predict.mvgam() to capture exactly what we are after. Specifically, this datagrid will fix the covariates to zero so that the two scenarios (3,000 spawners vs 5,000 spawners) are directly comparable. It will then make a comparison between these predictions, and it will do this many times using draws from the model’s Bayesian posterior distribution. The default comparison is to take the difference between the two scenarios. See ?marginaleffects::avg_comparisons for details.

# take draws from the model that compute the average comparison between
# spawners recorded at 5,000 vs 3,000 (be sure to log first!)
post_contrasts <- avg_comparisons(
  model1, 
  variables = list(log_spawners = log(c(3000, 5000))),
  type = 'expected'
) %>%
  posterior_draws() 
head(post_contrasts)

##   drawid     draw         term                            contrast estimate
## 1      1 4438.660 log_spawners 8.51719319141624 - 8.00636756765025 4430.429
## 2      2 4433.804 log_spawners 8.51719319141624 - 8.00636756765025 4430.429
## 3      3 4435.292 log_spawners 8.51719319141624 - 8.00636756765025 4430.429
## 4      4 4444.133 log_spawners 8.51719319141624 - 8.00636756765025 4430.429
## 5      5 4427.345 log_spawners 8.51719319141624 - 8.00636756765025 4430.429
## 6      6 4431.674 log_spawners 8.51719319141624 - 8.00636756765025 4430.429
##   conf.low conf.high predicted_lo predicted_hi predicted tmp_idx
## 1 4415.562  4445.988     7505.689     12509.48   23617.9       1
## 2 4415.562  4445.988     7505.689     12509.48   23617.9       1
## 3 4415.562  4445.988     7505.689     12509.48   23617.9       1
## 4 4415.562  4445.988     7505.689     12509.48   23617.9       1
## 5 4415.562  4445.988     7505.689     12509.48   23617.9       1
## 6 4415.562  4445.988     7505.689     12509.48   23617.9       1

Each row of the returned object gives us one posterior draw of this comparison. We can plot the distribution of these comparisons using the tidybayes package:

# use the resulting posterior draw object to plot a density of the 
# posterior contrasts
post_contrasts %>%
  ggplot(aes(x = draw)) +
  
  # use the stat_halfeye function from tidybayes for a nice visual
  stat_halfeye(fill = "#C79999") +
  labs(
    x = "(spawners = 5,000) − (spawners = 3,000)", 
    y = "Density", 
    title = "Average posterior contrast"
  ) +
  theme_classic()

Posterior predictive checks

Posterior predictive checks are useful for understanding how well your model can capture particular features of your observed data. In mvgam, the pp_check() function is particularly helpful as it relies on the very broad range of checking functions that are available in bayesplot. For example, we can plot a density kernel of our observed counts and overlay simulated kernels from our model. These simulated kernels are generated using predict.mvgam(), so they are informative for all mvgam models (regardless of whether or not the model included dynamic components).

pp_check(
  model1, 
  type = 'dens_overlay', 
  ndraws = 100
)

This model is not really capturing the shape of the observed data. We can confirm this behaviour by comparing observed vs simulated CDFs. What else might we need to do to improve this model?

pp_check(
  model1, 
  type = 'ecdf_overlay', 
  ndraws = 100
)

There are many other types of posterior predictive checks that can be implemented, and it is highly recommended that you design some to suit your own specific needs (i.e., what aspect(s) of the data are you most concerned with your model being able to adequately capture?). See ?mvgam::ppc_check for details of other kinds of plots you can make.

Of course we should look more closely at the residuals for any unmodelled temporal autocorrelation

plot(
  model1,
  type = 'residuals'
)

Model expansion

Back to the modelling strategy. There are still some patterns in the residuals, including substantial autocorrelation and an apparent mis-specification of the dispersion in the data. We need a dynamic model to capture this autocorrelation, so we will add an AR1 model for the residuals. But we also need to think about the dispersion in the data. A Negative Binomial may be better here, as it can handle count values that distribute beyond what a Poisson can handle.

model2 <- mvgam(
  recruits ~ 
    offset(log_spawners) +
    s(pdo_winter_t2, 
      k = 3, 
      bs = 'cr') +
    s(pdo_summer_t2, 
      k = 3, 
      bs = 'cr') +
    ti(pdo_winter_t2,
       pdo_summer_t2,
       k = 3,
       bs = 'cr'),,
  family = nb(),
  trend_model = AR(),
  noncentred = TRUE,
  data = model_data
)

Because this model has a dynamic trend component, it is worth quickly revisiting the differences between the predict.mvgam() function and the hindcast.mvgam() / forecast.mvgam() functions. First, we investigate how ignoring the uncertainty in the trend component changes the predictions. In this step, we calculate predictions for the observed time period in two ways: first we average over the uncertainty from the latent trend (using process_error = TRUE). This will essentially give us a prediction where we include the random effect of the latent process model, but assume that we have a different possible sample from that random effect distribution at each time point. Second, we use process_error = FALSE to give predictions that essentially ignore the random effects from the latent process. Plotting both of these together shows how our prediction uncertainty changes when we include the uncertainty in the dynamic trend component. A crucial detail to understand here is that the predict.mvgam() only uses the covariates from the model formulae, so in this case it does not account for the actual ‘times’ of the data. Rather it assumes the dynamic process has reached stationarity and simply samples from the following distribution for the trend: \(z_t \sim Normal(0, \sigma_{error})\)

# calculate predictions for the training data by including 
# the dynamic process uncertainty without estimating the actual
# latent states
ypreds <- predict(
  model2, 
  type = 'response', 
  process_error = TRUE, 
  summary = FALSE
)

# calculate 95% empirical quantiles of predictions
cred <- sapply(
  1:NCOL(ypreds),
  function(n) 
    quantile(ypreds[,n],
             probs = c(0.025, 0.975), 
             na.rm = T)
)
pred_vals <- 1:NCOL(ypreds)

# graphical plotting parameters
layout(matrix(1:2, ncol = 2))
par(mar = c(2.5, 2.3, 2, 2),
    oma = c(1, 1, 0, 0),
    mgp = c(1.5, 0.5, 0))

# plot credible intervals    
plot(
  1, 
  type = "n", 
  bty = 'L',
  xlab = 'Time',
  ylab = 'predict() predictions',
  xlim = c(min(pred_vals), 
           max(pred_vals)),
  ylim = c(min(cred, na.rm = T),
           max(cred, na.rm = T))
)
polygon(
  c(pred_vals, rev(pred_vals)), 
  c(cred[1,], rev(cred[2,])),
  col = 'darkred', 
  border = NA
)

# now repeat, but this time ignore uncertainty in the dynamic component
ypreds2 <- predict(
  model2, 
  type = 'response', 
  process_error = FALSE, 
  summary = FALSE
)
cred2 <- sapply(
  1:NCOL(ypreds),
  function(n) 
    quantile(ypreds2[,n],
             probs = c(0.025, 0.975), 
             na.rm = T)
)

# overlay these predictions as a lighter colour
polygon(
  c(pred_vals, rev(pred_vals)), 
  c(cred2[1,], rev(cred2[2,])),
  col = "#DCBCBC", 
  border = NA
)

# overlay the actual observations as well
points(
  model_data$recruits, 
  pch = 16, 
  cex = 0.8, 
  col = 'white'
)
points(
  model_data$recruits, 
  pch = 16, cex = 0.65
)
box(bty = 'l', lwd = 2)

# now plot the actual hindcasts, which did not assume the process was 
# stationary but instead modelled the latent states over the training time.
# use the same y-axis limits to see how these plots differ
plot_mvgam_fc(
  model2, 
  ylim = c(min(cred, na.rm = T),
           max(cred, na.rm = T)),
  ylab = 'hindcast() predictions'
)

layout(1)

As you can see, the predictions differ. The plot_mvgam_fc() function, which uses the same prediction steps as hindcast() and forecast() functions, uses the actual estimates of the latent temporal states. These predictions will nearly always “fit the data better”, but they might not be as useful for scenario-planning. Sometimes we’d rather get a sense of how predictions might change on average if some external conditions changed, for example if temperature were to increase by x degrees or if we were to sample from a new group for a random effect. For exploring these types of scenarios, the prediction-based functions (i.e. predict(), plot_predictions(), conditional_effects(), pp_check() etc…) are more useful because they do not use the actual estimates of the latent states.

Exercises

Plot a histogram of model1’s posterior intercept coefficients \(\alpha\). Use ?mvgam::as.data.frame.mvgam for guidance
Plot a scatterplot of recruits vs log_spawners using the supplies model_data. Take a few notes on whether you think our primary modelling assumption, that the number of log(recruits) scales linearly with number of log_spawners, is justified.

Check here if you’re having trouble extracting the intercept posterior

# Replace the ? with the correct value(s)
# check names of variables that can be extracted
vars_available <- variables(model1)

# we are interested in the observation model coefficients
vars_available$observation_betas

# extract the intercepts using the as.data.frame.mvgam function
# you will have two options for extracting this variable. first, you
# can use the alias
alphas <- as.data.frame(model1, variable = ?)

# alternatively, use the original name
alphas <- as.data.frame(model1, variable = ?)

# plot a histogram
hist(alphas[,1], xlab = expression(alpha),
     main = '', col = 'darkred', border = 'white')

Comparing dynamic models

Now on to the topic of model comparisons. Plot Dunn-Smyth residuals for this dynamic model to see that there is no more autocorrelation

plot(
  model2, 
  type = 'residuals'
)

However we must be cautious interpreting these residuals because the AR1 trend model is extremely flexible. The in-sample fits can sometimes be nearly perfect, meaning the residuals will very often look excellent! We can confirm this behaviour by inspecting hindcasts

hcs <- hindcast(model2)
plot(hcs)

## No non-missing values in test_observations; cannot calculate forecast score

But PPCs can still give valid insights into model inadequacy:

pp_check(
  model2, 
  type = 'dens_overlay',
  ndraws = 100
)

pp_check(
  model2, 
  type = 'ecdf_overlay',
  ndraws = 100
)

These plots show that the model clearly gives a better fit to the data compared to the Poisson model above. But there are some more worrying and pressing matters we can address with this model. Inspect the summary to see what I mean:

summary(model2)

## GAM formula:
## recruits ~ offset(log_spawners) + s(pdo_winter_t2, k = 3, bs = "cr") + 
##     s(pdo_summer_t2, k = 3, bs = "cr") + ti(pdo_winter_t2, pdo_summer_t2, 
##     k = 3, bs = "cr")
## 
## Family:
## negative binomial
## 
## Link function:
## log
## 
## Trend model:
## AR()
## 
## 
## N series:
## 1 
## 
## N timepoints:
## 50 
## 
## Status:
## Fitted using Stan 
## 4 chains, each with iter = 1000; warmup = 500; thin = 1 
## Total post-warmup draws = 2000
## 
## 
## Observation dispersion parameter estimates:
##        2.5% 50% 97.5% Rhat n_eff
## phi[1]  3.9  12    47 1.03   137
## 
## GAM coefficient (beta) estimates:
##                                    2.5%    50% 97.5% Rhat n_eff
## (Intercept)                        0.21  0.560  0.89 1.02   391
## s(pdo_winter_t2).1                -0.51 -0.016  0.48 1.00  1162
## s(pdo_winter_t2).2                -0.44  0.027  0.48 1.00   866
## s(pdo_summer_t2).1                -0.64 -0.089  0.38 1.01   547
## s(pdo_summer_t2).2                -0.45  0.017  0.47 1.00  1011
## ti(pdo_winter_t2,pdo_summer_t2).1 -0.66 -0.038  0.49 1.00   507
## ti(pdo_winter_t2,pdo_summer_t2).2 -0.66 -0.015  0.52 1.00   561
## ti(pdo_winter_t2,pdo_summer_t2).3 -0.49  0.030  0.66 1.00   323
## ti(pdo_winter_t2,pdo_summer_t2).4 -0.41  0.093  0.87 1.01   417
## 
## Approximate significance of GAM smooths:
##                                      edf Ref.df Chi.sq p-value
## s(pdo_winter_t2)                1.02e-04      2   0.02    0.99
## s(pdo_summer_t2)                2.40e-05      2   0.30    0.93
## ti(pdo_winter_t2,pdo_summer_t2) 1.24e-05      4   0.20    1.00
## 
## Latent trend parameter AR estimates:
##          2.5%  50% 97.5% Rhat n_eff
## ar1[1]   0.25 0.63  0.97 1.01   450
## sigma[1] 0.47 0.68  0.94 1.03   153
## 
## Stan MCMC diagnostics:
## ✔ No issues with effective samples per iteration
## ✖ Rhats above 1.05 found for some parameters
##     Use pairs() and mcmc_plot() to investigate
## ✔ No issues with divergences
## ✔ No issues with maximum tree depth
## 
## Samples were drawn using sampling(hmc). For each parameter, n_eff is a
##   crude measure of effective sample size, and Rhat is the potential scale
##   reduction factor on split MCMC chains (at convergence, Rhat = 1)
## 
## Use how_to_cite() to get started describing this model

The HMC sampler is telling us there are some issues when trying to implement both a flexible latent dynamic process AND an overdispersion parameter. Both of these components are competing to capture the extra dispersion in the observations, leading to low effective sample sizes and poor convergence for the overdispersion parameter \(\phi\). As the summary() suggests, it is useful to look at estimates for these parameters using the diagnostic functions pairs() and mcmc_plot():

pairs(
  model2,
  variable = c('phi',
               'sigma'),
  regex = TRUE
)

The “funnel” shape in the top-right shows that \(\sigma\) can only be estimated freely when \(\phi\) gets a bit larger. As \(\phi\) becomes small, the Negative Binomial distribution has more flexibility to capture dispersion and so there is less need for a flexible latent process model. But when \(\phi\) becomes large, the Negative Binomial starts to behave like a Poisson and there is more need for a flexible latent process, hence the need for a larger \(\sigma\). This makes it challenging to estimate both simultaneously:

mcmc_plot(
  model2,
  variable = c('phi',
               'sigma'),
  regex = TRUE,
  type = 'combo'
)

We now have several options:

Use containment priors to restrict one or both of the processes (i.e. pull the overdispersion parameter towards very large values to make the observations behave like a Poisson a priori)
Keep the Negative Binomial observation model and use a smooth trend, such as a Gaussian Process or a Piecewise Logistic
Drop the Negative Binomial observation model (using a Poisson instead) and use a flexible Autoregressive trend

Here we will illustrate the latter two options. First, we try the Negative Binomial observation model with a smooth Piecewise Logistic function for the trend. This trend model can be useful for population dynamics as it follows one of ecology’s classic population growth models. However, we must supply a value for the carrying capacity of the trend process. Here we will assume the maximum value the logistic growth trend could take is the same as the maximum value that the latent AR1 process from model2 took. Note that we also must suppress the intercept in the formula because piecewise trends have their own intercept parameters.

model_data$cap <- exp(
  max(hindcast(model2, 
               type = 'trend')$hindcasts$sockeye)
)

model3 <- mvgam(
  recruits ~ 
    offset(log_spawners) +
    s(pdo_winter_t2, 
      k = 3, 
      bs = 'cr') +
    s(pdo_summer_t2, 
      k = 3, 
      bs = 'cr') +
    ti(pdo_winter_t2,
       pdo_summer_t2,
       k = 3,
       bs = 'cr') - 1,
  family = nb(),
  trend_model = PW(growth = 'logistic'),
  data = model_data
)

Next we use a Poisson observation model with a AR1 trend

model4 <- mvgam(
  recruits ~ 
    offset(log_spawners) +
    s(pdo_winter_t2, 
      k = 3, 
      bs = 'cr') +
    s(pdo_summer_t2, 
      k = 3, 
      bs = 'cr') +
    ti(pdo_winter_t2,
       pdo_summer_t2,
       k = 3,
       bs = 'cr'),
  family = poisson(),
  trend_model = AR(),
  noncentred = TRUE,
  data = model_data
)

Neither of these models has any major issues with sampling, which is a good sign!

summary(model3)

## GAM formula:
## recruits ~ s(pdo_winter_t2, k = 3, bs = "cr") + s(pdo_summer_t2, 
##     k = 3, bs = "cr") + ti(pdo_winter_t2, pdo_summer_t2, k = 3, 
##     bs = "cr") + offset(log_spawners) - 1
## 
## Family:
## negative binomial
## 
## Link function:
## log
## 
## Trend model:
## PW(growth = "logistic")
## 
## 
## N series:
## 1 
## 
## N timepoints:
## 50 
## 
## Status:
## Fitted using Stan 
## 4 chains, each with iter = 1000; warmup = 500; thin = 1 
## Total post-warmup draws = 2000
## 
## 
## Observation dispersion parameter estimates:
##        2.5% 50% 97.5% Rhat n_eff
## phi[1]  1.2 1.8   2.7    1  1237
## 
## GAM coefficient (beta) estimates:
##                                    2.5%     50% 97.5% Rhat n_eff
## s(pdo_winter_t2).1                -0.67 -0.0770  0.40 1.00  1220
## s(pdo_winter_t2).2                -0.37  0.0790  0.62 1.00   769
## s(pdo_summer_t2).1                -0.74 -0.0820  0.44 1.00   744
## s(pdo_summer_t2).2                -0.50 -0.0072  0.49 1.00   815
## ti(pdo_winter_t2,pdo_summer_t2).1 -1.50 -0.0830  0.45 1.03   103
## ti(pdo_winter_t2,pdo_summer_t2).2 -2.00 -0.0330  0.58 1.04    92
## ti(pdo_winter_t2,pdo_summer_t2).3 -0.50  0.0700  1.60 1.04   105
## ti(pdo_winter_t2,pdo_summer_t2).4 -0.44  0.1000  1.80 1.03    89
## 
## Approximate significance of GAM smooths:
##                                      edf Ref.df Chi.sq p-value
## s(pdo_winter_t2)                1.30e-06      2   0.61    0.85
## s(pdo_summer_t2)                2.47e-07      2   0.28    0.95
## ti(pdo_winter_t2,pdo_summer_t2) 3.98e-06      4   0.13    1.00
## 
## Latent trend growth rate estimates:
##             2.5%   50%  97.5% Rhat n_eff
## k_trend[1] -0.24 -0.12 -0.027    1   899
## 
## Latent trend offset estimates:
##            2.5%  50% 97.5% Rhat n_eff
## m_trend[1] -2.2 -0.3   1.6    1  2250
## 
## Stan MCMC diagnostics:
## ✔ No issues with effective samples per iteration
## ✔ Rhat looks good for all parameters
## ✔ No issues with divergences
## ✔ No issues with maximum tree depth
## 
## Samples were drawn using sampling(hmc). For each parameter, n_eff is a
##   crude measure of effective sample size, and Rhat is the potential scale
##   reduction factor on split MCMC chains (at convergence, Rhat = 1)
## 
## Use how_to_cite() to get started describing this model

summary(model4)

## GAM formula:
## recruits ~ offset(log_spawners) + s(pdo_winter_t2, k = 3, bs = "cr") + 
##     s(pdo_summer_t2, k = 3, bs = "cr") + ti(pdo_winter_t2, pdo_summer_t2, 
##     k = 3, bs = "cr")
## 
## Family:
## poisson
## 
## Link function:
## log
## 
## Trend model:
## AR()
## 
## 
## N series:
## 1 
## 
## N timepoints:
## 50 
## 
## Status:
## Fitted using Stan 
## 4 chains, each with iter = 1000; warmup = 500; thin = 1 
## Total post-warmup draws = 2000
## 
## 
## GAM coefficient (beta) estimates:
##                                    2.5%     50% 97.5% Rhat n_eff
## (Intercept)                        0.18  0.4900  0.82 1.01   301
## s(pdo_winter_t2).1                -0.52 -0.0100  0.47 1.00  1041
## s(pdo_winter_t2).2                -0.47  0.0300  0.47 1.00  1069
## s(pdo_summer_t2).1                -0.70 -0.1100  0.31 1.01   811
## s(pdo_summer_t2).2                -0.49  0.0043  0.48 1.00   645
## ti(pdo_winter_t2,pdo_summer_t2).1 -0.73 -0.0340  0.45 1.02   302
## ti(pdo_winter_t2,pdo_summer_t2).2 -0.77 -0.0073  0.56 1.02   219
## ti(pdo_winter_t2,pdo_summer_t2).3 -0.42  0.0490  0.77 1.01   260
## ti(pdo_winter_t2,pdo_summer_t2).4 -0.40  0.0870  0.93 1.02   241
## 
## Approximate significance of GAM smooths:
##                                      edf Ref.df Chi.sq p-value
## s(pdo_winter_t2)                0.003532      2   0.03    0.98
## s(pdo_summer_t2)                0.000275      2   0.42    0.90
## ti(pdo_winter_t2,pdo_summer_t2) 0.000134      4   0.72    0.99
## 
## Latent trend parameter AR estimates:
##          2.5%  50% 97.5% Rhat n_eff
## ar1[1]   0.22 0.48  0.67 1.01   380
## sigma[1] 0.64 0.78  0.99 1.02   409
## 
## Stan MCMC diagnostics:
## ✔ No issues with effective samples per iteration
## ✔ Rhat looks good for all parameters
## ✔ No issues with divergences
## ✖ 764 of 2000 iterations saturated the maximum tree depth of 10 (38.2%)
##     Try a larger max_treedepth to avoid saturation
## 
## Samples were drawn using sampling(hmc). For each parameter, n_eff is a
##   crude measure of effective sample size, and Rhat is the potential scale
##   reduction factor on split MCMC chains (at convergence, Rhat = 1)
## 
## Use how_to_cite() to get started describing this model

But hindcasts and trend estimates are very different for the two models:

plot(
  model3, 
  type = 'trend'
)

plot(
  model4, 
  type = 'trend'
)

plot(
  hindcast(model3)
)

## No non-missing values in test_observations; cannot calculate forecast score

plot(
  hindcast(model4)
)

## No non-missing values in test_observations; cannot calculate forecast score

Approximate Leave-One-Out CV

Here the extreme flexibility of the AR1 trend is apparent in the hindcasts for model4. Hopefully this gives you more of an idea about why we can’t put too much faith in residual or fitted value diagnostics for these flexible dynamic models. Ok so we cannot use in-sample fits or residual diagnostics to guide reasonable comparisons of these two models. It would be more appropriate to use 1-step ahead residuals for interrogation. But computing these is extremely computationally demanding. How then can we compare models? Using loo_compare(), we can ask which model would likely give better predictions for new data within the training set. This method of approximating leave-one-out prediction error (using a method called Pareto Smoothed Importance Sampling (PSIS)) is widely-used for comparing Bayesian models, and is generally interpreted in the same way that AIC comparisons can be interpreted:

loo_compare(
  model3,
  model4
)

## Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
## Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.

##        elpd_diff se_diff  
## model3       0.0       0.0
## model4 -443317.4   68161.1

The Piecewise / Negative Binomial model is strongly favoured here, yielding much higher values of the Expected Log Predictive Density (ELPD; also known as the log score). This is one useful metric for choosing among models, but it has a few downfalls. First, it is challenging to use PSIS-LOO for time series models, particularly those that have flexible autoregressive latent processes (like model4 has above). These autoregressive processes can fit the training data very well, but when we ask how they would generalize to new data within the training period they can give poor predictions because they don’t have any past state values to learn from. This is one reason why loo() very often gives warnings for these models. My recommendation would be to only use loo() and loo_compare() when comparing models that have different types of covariate effects, but that do not have different types of dynamic trends. Once you’ve selected your covariate model, then you can add an appropriate trend if needed. Alternatively, you could instead compare them solely on their abilities to predict future out-of-sample data. This is where leave-future-out cross-validation may be handy.

Approximate Leave-Future-Out CV

mvgam includes functions that facilitate leave-future-out comparisons. Time series models are often evaluated using an expanding window training technique, where the model is initially trained on some subset of data from \(t = 1\) to \(t = n_{train}\), and then is used to produce forecasts for the next \(h\) time steps \((t = n_{train} + h)\). In the next iteration, the size of training data is expanded by a single time point and the process repeated. This is obviously computationally challenging for Bayesian time series models, as the number of refits can be very large. mvgam uses an approximation based on importance sampling. Briefly, we refit the model using the first \(min_t\) observations to perform a single exact \(h~ahead\) forecast step. This forecast is evaluated against the \(min_t + h\) out of sample observations using the ELPD as with the LOO comparisons above. Next, we approximate each successive round of expanding window forecasts by moving forward one step at a time \(for~i~in~1:N_{evaluations}\) and re-weighting draws from the model’s posterior predictive distribution using PSIS. In each iteration \(i\), PSIS weights are obtained for all observations that would have been included in the model if we had re-fit. If these importance ratios are stable, we consider the approximation adequate and use the re-weighted posterior’s forecast for evaluating the next holdout set of testing observations \(((min_t + i + 1):(min_t + i + h))\). This is similar to the process of particle filtering to update forecasts in light of new data by re-weighting the posterior draws using importance weights. But at some point the importance ratio variability will become too large and importance sampling will be unreliable. This is indicated by the estimated shape parameter k of the generalized Pareto distribution crossing a certain threshold pareto_k_threshold. Only then do we refit the model using all of the observations up to the time of the failure. We then restart the process and iterate forward until the next refit is triggered. The process, which is implemented using the lfo_cv() function, is computationally much more efficient, as only a fraction of the evaluations typically requires refits (the algorithm is described in detail by Bürkner et al. 2020).

Run the approximator for the first model, setting min_t = 20 and using a forecast horizon of fc_horizon = 2

lfo_pw <- lfo_cv(
  model3, 
  min_t = 20, 
  fc_horizon = 2, 
  n_cores = 4
)

The S3 plotting function for these lfo_cv objects will show the Pareto-k values and ELPD values over the evaluation time points. For the Pareto k plot, a dashed red line indicates the specified threshold chosen for triggering model refits. For the ELPD plot, a dashed red line indicates the bottom 10% quantile of ELPD values. Points below this threshold may represent outliers that were more difficult to forecast

plot(lfo_pw)

The top plot (the Pareto k plot) indicates that we only needed one additional refit to give good approximate predictions for the Piecewise / Negative Binomial model

lfo_ar <- lfo_cv(
  model4, 
  min_t = 20, 
  fc_horizon = 2, 
  n_cores = 4
)

plot(lfo_ar)

The Poisson / AR model, in contrast, needed to be refit many times to ensure the stability of importance-weighted predictions. We would generally expect more refits for this model as the underlying trend has less “memory” than the piecewise trend, so its forecasts could become irrelevant more quickly as a result.

Compare ELPDs for the two models using all nonmissing evaluation timepoints by plotting the differences in their scores. This plot will give us an indication if one model is consistently giving better h-step ahead forecasts when trained on successively more data

# determine which timepoints were not missing in the evaluation set
nonmiss_evalpoints <- which(
  !is.na(model_data$recruits[22:NROW(model_data)])
)

# plot the PW ELPD values minus the AR ELPD values
# values will be > 0 if the PW model gave better predictions
# (ELPD is better when larger)
pw_minus_ar <- lfo_pw$elpds[nonmiss_evalpoints] - 
  lfo_ar$elpds[nonmiss_evalpoints]
plot(
  pw_minus_ar,
  ylab = 'PW - AR ELPD',
  xlab = 'Time point',
  pch = 16, 
  bty = 'l',
  cex = 1,
  col = 'white',
  xaxt = 'n'
)
axis(
  side = 1, 
  at = nonmiss_evalpoints,
  labels = lfo_pw$eval_timepoints[nonmiss_evalpoints]
)
abline(
  h = 0, 
  lty = 'dashed', 
  lwd = 2
)

# show all ELPD comparisons in black
points(
  pw_minus_ar, 
  pch = 16, 
  cex = 0.95
)

# highlight the comparisons for when the PW predictions were better
points(
  ifelse(pw_minus_ar > 0, pw_minus_ar, NA), 
  pch = 16, 
  cex = 0.95, 
  col = 'darkred'
)

The Piecewise / Negative Binomial model tends to give better probabilistic predictions throughout the leave-future-out exercise. This result agrees with results from the in-sample comparison using loo() above. But overall, I would tend to favour the out-of-sample metrics more when performing model selection, especially given that we are working with a time series here.

Exact Leave-Future-Out CV

If we have enough computational power and our model is simple / small enough, it can often be worthwhile to compute exact leave-future-out forecasts. The update.mvgam() function makes this relatively simple to do in mvgam(), as all we need to do is supply the new testing and training datasets and the model will be updated. Below I show how to do this for our chosen model (model3) by splitting on timepoint 36 to create training and testing data.

data_train = model_data %>%
  dplyr::filter(time <= 36)
data_test = model_data %>%
  dplyr::filter(time > 36)

Now refit the model using the update.mvgam() function. Note that you can supply an argument to state lfo = TRUE if you wish to ignore the computation of residuals and lighten the final model. This can be helpful if all you plan to do is compare forecasts, as it will make repeated calls to the update() function faster. See ?mvgam::update.mvgam for more details

model3_exact <- update(
  model3, 
  data = data_train,
  newdata = data_test,
  lfo = TRUE
)

Once this model is finished, we are ready to compute exact forecast evaluations. The first step in this process is to create an object of class mvgam_forecast using the forecast.mvgam() function:

fc <- forecast(model3_exact)

We’ve seen this class before when calling the hindcast.mvgam() or forecast.mvgam() functions. But we haven’t looked in-depth at the other ways we can use objects of this class. A primary purpose is to readily allow forecast evalutions for each series in the data, using a variety of possible scoring functions. See ?mvgam::score.mvgam_forecast to view the types of scores that are available. For these data, which have very large counts, a useful scoring metric is the Continuous Rank Probability Score (CRPS). A CRPS value is similar to what we might get if we calculated a weighted absolute error using the full forecast distribution.

scores <- score(
  fc, 
  score = 'crps'
)
scores$sockeye

##        score in_interval interval_width eval_horizon score_type
## 1  6022.7278           0            0.9            1       crps
## 2  3560.9737           1            0.9            2       crps
## 3  6604.2062           1            0.9            3       crps
## 4  6794.8664           1            0.9            4       crps
## 5   881.5366           1            0.9            5       crps
## 6  1757.0302           1            0.9            6       crps
## 7  2622.2656           1            0.9            7       crps
## 8         NA          NA            0.9            8       crps
## 9         NA          NA            0.9            9       crps
## 10        NA          NA            0.9           10       crps
## 11        NA          NA            0.9           11       crps
## 12        NA          NA            0.9           12       crps
## 13        NA          NA            0.9           13       crps
## 14        NA          NA            0.9           14       crps

The returned data.frame now shows the CRPS score for each evaluation in the testing data, along with some other useful information about the fit of the forecast distribution. In particular, we are given a logical value (1s and 0s) telling us whether the true value was within a pre-specified credible interval (i.e. the coverage of the forecast distribution). The default interval width is 0.9, so we would hope that the values in the in_interval column take a 1 approximately 90% of the time. This value can be changed if you wish to compute different coverages, say using a 60% interval:

scores <- score(
  fc, 
  score = 'crps', 
  interval_width = 0.6
)
scores$sockeye

##        score in_interval interval_width eval_horizon score_type
## 1  6022.7278           0            0.6            1       crps
## 2  3560.9737           0            0.6            2       crps
## 3  6604.2062           0            0.6            3       crps
## 4  6794.8664           1            0.6            4       crps
## 5   881.5366           1            0.6            5       crps
## 6  1757.0302           0            0.6            6       crps
## 7  2622.2656           0            0.6            7       crps
## 8         NA          NA            0.6            8       crps
## 9         NA          NA            0.6            9       crps
## 10        NA          NA            0.6           10       crps
## 11        NA          NA            0.6           11       crps
## 12        NA          NA            0.6           12       crps
## 13        NA          NA            0.6           13       crps
## 14        NA          NA            0.6           14       crps

Although we would prefer to use distributional forecast evaluations, point evaluations are also possible using outputs from the forecast.mvgam() function. For example, we can calculate the Mean Absolute Percentage Error (MAPE), which is a commonly used metric to evaluate point forecasts when the scales of the time series vary quite a lot:

# exctract the true values from the test set
truth <- fc$test_observations$sockeye

# calculate mean prediction values for each column in the forecast matrix
mean_fc <- apply(
  fc$forecasts$sockeye, 
  2, 
  mean
)

# calculate the absolute percentage errors, which are scaled by the 
# value of the truths. Note that we only use the first 7 observations
# in the test set because the rest are NA
p <- abs(
  100 * (
    (truth[1:7] - mean_fc[1:7]) / 
      mean_fc[1:7]
  )
)

# take the mean of the absolute percentage errors
mean(p)

## [1] 73.3368

As I always recommend, it is so helpful to the developers that dedicate much of their time to bringing us this software if we acknowledge them through appropriate citations. The piecewise models in particular deserve to be properly cited because they build on contributions from a number of helpful sources. Use how_to_cite() to make this task easier:

how_to_cite(model3)

## Methods text skeleton
## We used the R package mvgam (version 1.1.51; Clark & Wells, 2023) to
##   construct, fit and interrogate the model. mvgam fits Bayesian
##   State-Space models that can include flexible predictor effects in both
##   the process and observation components by incorporating functionalities
##   from the brms (Burkner 2017), mgcv (Wood 2017) and splines2 (Wang & Yan,
##   2023) packages. Piecewise dynamic trends were parameterized and
##   estimated following methods described by Taylor and Letham (2018). The
##   mvgam-constructed model and observed data were passed to the
##   probabilistic programming environment Stan (version 2.36.0; Carpenter et
##   al. 2017, Stan Development Team 2025), specifically through the cmdstanr
##   interface (Gabry & Cesnovar, 2021). We ran 4 Hamiltonian Monte Carlo
##   chains for 500 warmup iterations and 500 sampling iterations for joint
##   posterior estimation. Rank normalized split Rhat (Vehtari et al. 2021)
##   and effective sample sizes were used to monitor convergence.
## 
## Primary references
## Clark, NJ and Wells K (2023). Dynamic Generalized Additive Models
##   (DGAMs) for forecasting discrete ecological time series. Methods in
##   Ecology and Evolution, 14, 771-784. doi.org/10.1111/2041-210X.13974
## Burkner, PC (2017). brms: An R Package for Bayesian Multilevel Models
##   Using Stan. Journal of Statistical Software, 80(1), 1-28.
##   doi:10.18637/jss.v080.i01
## Wood, SN (2017). Generalized Additive Models: An Introduction with R
##   (2nd edition). Chapman and Hall/CRC.
## Wang W and Yan J (2021). Shape-Restricted Regression Splines with R
##   Package splines2. Journal of Data Science, 19(3), 498-517.
##   doi:10.6339/21-JDS1020 https://doi.org/10.6339/21-JDS1020.
## Taylor S and Letham B (2018). Forecasting at scale. The American
##   Statistician 72(1) 37-45. https://doi.org/10.1080/00031305.2017.1380080
## Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M,
##   Brubaker M, Guo J, Li P and Riddell A (2017). Stan: A probabilistic
##   programming language. Journal of Statistical Software 76.
## Gabry J, Cesnovar R, Johnson A, and Bronder S (2025). cmdstanr: R
##   Interface to 'CmdStan'. https://mc-stan.org/cmdstanr/,
##   https://discourse.mc-stan.org.
## Vehtari A, Gelman A, Simpson D, Carpenter B, and Burkner P (2021).
##   Rank-normalization, folding, and localization: An improved Rhat for
##   assessing convergence of MCMC (with discussion). Bayesian Analysis 16(2)
##   667-718. https://doi.org/10.1214/20-BA1221.
## 
## Other useful references
## Arel-Bundock V, Greifer N, and Heiss A (2024). How to interpret
##   statistical models using marginaleffects for R and Python. Journal of
##   Statistical Software, 111(9), 1-32.
##   https://doi.org/10.18637/jss.v111.i09
## Gabry J, Simpson D, Vehtari A, Betancourt M, and Gelman A (2019).
##   Visualization in Bayesian workflow. Journal of the Royal Statatistical
##   Society A, 182, 389-402. doi:10.1111/rssa.12378.
## Vehtari A, Gelman A, and Gabry J (2017). Practical Bayesian model
##   evaluation using leave-one-out cross-validation and WAIC. Statistics and
##   Computing, 27, 1413-1432. doi:10.1007/s11222-016-9696-4.
## Burkner PC, Gabry J, and Vehtari A. (2020). Approximate leave-future-out
##   cross-validation for Bayesian time series models. Journal of Statistical
##   Computation and Simulation, 90(14), 2499-2523.
##   https://doi.org/10.1080/00949655.2020.1783262

Exercises

Calculate the Root Mean Squared Error (RMSE) for posterior mean predictions from the fc object using the formula provided by Hyndman and Athanasopoulos
Roll the training data one timepoint forward and refit model3 (i.e. split the data on timepoint 37 rather than 36). Compare CRPS values from this model to the previous refit to see whether the scores for timepoints 38-43 have changed when conditioning on the extra information in timepoint 37
Consider watching the below video by Rob Hyndman on evaluating distributional forecasts

Session Info

sessionInfo()

## R version 4.4.2 (2024-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 10 x64 (build 19045)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=English_Australia.utf8  LC_CTYPE=English_Australia.utf8   
## [3] LC_MONETARY=English_Australia.utf8 LC_NUMERIC=C                      
## [5] LC_TIME=English_Australia.utf8    
## 
## time zone: Australia/Brisbane
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_3.5.1          tidybayes_3.0.7        marginaleffects_0.25.0
## [4] rstan_2.32.6           StanHeaders_2.32.10    mvgam_1.1.51          
## [7] dplyr_1.1.4            downloadthis_0.4.1    
## 
## loaded via a namespace (and not attached):
##  [1] svUnit_1.0.6          tidyselect_1.2.1      farver_2.1.2         
##  [4] loo_2.8.0.9000        fastmap_1.2.0         tensorA_0.36.2.1     
##  [7] digest_0.6.37         timechange_0.3.0      mime_0.12            
## [10] lifecycle_1.0.4       processx_3.8.6        magrittr_2.0.3       
## [13] posterior_1.6.0.9000  compiler_4.4.2        rlang_1.1.5          
## [16] sass_0.4.9            tools_4.4.2           utf8_1.2.4           
## [19] yaml_2.3.10           collapse_2.0.19       data.table_1.17.0    
## [22] knitr_1.49            labeling_0.4.3        bridgesampling_1.1-2 
## [25] pkgbuild_1.4.6        curl_6.2.1            cmdstanr_0.8.0       
## [28] plyr_1.8.9            abind_1.4-8           klippy_0.0.0.9500    
## [31] withr_3.0.2           purrr_1.0.4           grid_4.4.2           
## [34] stats4_4.4.2          colorspace_2.1-1      inline_0.3.21        
## [37] scales_1.3.0          insight_1.0.2         cli_3.6.4            
## [40] mvtnorm_1.3-3         rmarkdown_2.29        generics_0.1.3       
## [43] RcppParallel_5.1.10   rstudioapi_0.17.1     reshape2_1.4.4       
## [46] cachem_1.1.0          stringr_1.5.1         splines_4.4.2        
## [49] bayesplot_1.11.1.9000 assertthat_0.2.1      parallel_4.4.2       
## [52] matrixStats_1.5.0     brms_2.22.9           vctrs_0.6.5          
## [55] V8_6.0.1              Matrix_1.7-1          jsonlite_1.9.0       
## [58] gratia_0.10.0         arrayhelpers_1.1-0    patchwork_1.3.0      
## [61] ggdist_3.3.2          b64_0.1.3             jquerylib_0.1.4      
## [64] tidyr_1.3.1           glue_1.8.0            ps_1.9.0             
## [67] ggokabeito_0.1.0      codetools_0.2-20      mvnfast_0.2.8        
## [70] distributional_0.5.0  lubridate_1.9.4       stringi_1.8.4        
## [73] gtable_0.3.6          QuickJSR_1.5.2        munsell_0.5.1        
## [76] tibble_3.2.1          pillar_1.10.1         htmltools_0.5.8.1    
## [79] Brobdingnag_1.2-9     R6_2.6.1              evaluate_1.0.3       
## [82] lattice_0.22-6        extraDistr_1.10.0     backports_1.5.0      
## [85] bslib_0.9.0           rstantools_2.4.0.9000 bsplus_0.1.4         
## [88] Rcpp_1.0.14           zip_2.3.2             coda_0.19-4.1        
## [91] gridExtra_2.3         nlme_3.1-166          checkmate_2.3.2      
## [94] mgcv_1.9-1            xfun_0.51             fs_1.6.5             
## [97] pkgconfig_2.0.3

n.clark@uq.edu.au, https://github.com/nicholasjclark ↩︎

Ecological forecasting in R

Tutorial 3: model and forecast evaluation

Nicholas Clark, School of Veterinary Science, University of Queensland¹

Exercises and associated data

Load libraries and time series data

Manipulating data for modelling

Posterior simulation in `mvgam`

Posterior predictive checks

Model expansion

Exercises

Comparing dynamic models

Approximate Leave-One-Out CV

Approximate Leave-Future-Out CV

Exact Leave-Future-Out CV

Exercises

Session Info

Ecological forecasting in R

Tutorial 3: model and forecast evaluation

Nicholas Clark, School of Veterinary Science, University of Queensland1

Exercises and associated data

Load libraries and time series data

Manipulating data for modelling

Posterior simulation in mvgam

Posterior predictive checks

Model expansion

Exercises

Comparing dynamic models

Approximate Leave-One-Out CV

Approximate Leave-Future-Out CV

Exact Leave-Future-Out CV

Exercises

Session Info

Nicholas Clark, School of Veterinary Science, University of Queensland¹

Posterior simulation in `mvgam`