physalia-forecasting-course

Ecological forecasting with `mvgam` and `brms`

Physalia-Courses

https://www.physalia-courses.org/

Nicholas J Clark

24–28th March, 2025

COURSE OVERVIEW

Time series analysis and forecasting are standard goals in applied ecology. But most time series courses focus only on traditional forecasting models such as ARIMA or Exponential Smoothing. These models cannot handle features that dominate ecological data, including overdispersion, clustering, missingness, discreteness and nonlinear effects. Using the flexible and powerful Bayesian modelling software Stan, we can now meet this complexity head on. Packages such as {mvgam} and {brms} can build Stan code to specify ecologically appropriate models that include nonlinear effects, random effects and dynamic processes, all with simple interfaces that are familiar to most R users. In this course you will learn how to wrangle, visualize and explore ecological time series. You will also learn to use the {mvgam} and {brms} packages to analyse a diversity of ecological time series to gain useful insights and produce accurate forecasts. All course materials (presentations, practical exercises, data files, and commented R scripts) will be provided electronically to participants.

TARGET AUDIENCE AND ASSUMED BACKGROUND

This course is aimed at higher degree research students and early career researchers working with time series data in the natural sciences (with particular emphasis on ecology) who want to extend their knowledge by learning how to add dynamic processes to model temporal autocorrelation. Participants should ideally have some knowledge of regression including linear models, generalized linear models and hierarchical (random) effects. But we’ll briefly recap these as we connect them to time series modelling.

Participants should be familiar with RStudio and have some fluency in programming R code. This includes an ability to import, manipulate (e.g. modify variables) and visualise data. There will be a mix of lectures and hands-on practical exercises throughout the course.

LEARNING OUTCOMES

Understand how dynamic GLMs and GAMs work to capture both nonlinear covariate effects and temporal dependence
Be able to fit dynamic GLMs and GAMs in R using the {mvgam} and {brms} packages
Understand how to critique, visualize and compare fitted dynamic models
Know how to produce forecasts from dynamic models and evaluate their accuracies using probabilistic scoring rules

COURSE PREPARATION

Please be sure to have at least version 4.2 — and preferably version 4.3 or above — of R installed. Note that R and RStudio are two different things: it is not sufficient to just update RStudio, you also need to update R by installing new versions as they are released.

To install RStudio, go to https://posit.co/download/rstudio-desktop/ and follow the instructions.

To download R go to the CRAN Download page and follow the links to download R for your operating system:

To check what version of R you have installed, you can run

version

in R and look at the version.string entry (or the major and minor entries).

We will make use of several R packages that you’ll need to have installed. Prior to the start of the course, please run the following code to update your installed packages and then install the required packages:

# update any installed R packages
update.packages(ask = FALSE, checkBuilt = TRUE)

# install the development version of brms, including its dependencies
install.packages("remotes")
remotes::install_github("paul-buerkner/brms", dependencies = TRUE)

# install mvgam and a few other packages we will use for plotting
install.packages(c("mvgam", "gratia", "tidybayes"))

INSTALLING AND CHECKING STAN

When working in R, there are two primary interfaces we can use to fit models with Stan ({rstan} and CmdStan). It is highly recommended that you use the Cmdstan backend, with the {cmdstanr} interface, rather than using {rstan}. But either interface will work. What is more important is that you have an up to date version of Stan. For all {mvgam} and {brms} functionalities to work properly, please ensure you have at least version 2.33 of Stan installed. The CRAN and GitHub development versions of {rstan} and CmdStan are currently several versions ahead of this, and all of these versions are stable.

Compiling a Stan program requires a modern C++ compiler and the GNU Make build utility (a.k.a. “gmake”). The correct versions of these tools to use will vary by operating system, but unfortunately most standard Windows and MacOS X machines do not come with them installed by default. The first step to installing Stan is to update your C++ toolchain so that you can compile models correctly. The {cmdstanr} package usually makes this easy to do. First install the R package {cmdstanr} by running the following command in a fresh R environment:

install.packages("cmdstanr", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))

If you don’t have CmdStan installed then {cmdstanr} can install it for you, assuming you have a suitable C++ toolchain. To double check that your toolchain is set up properly you can call the check_cmdstan_toolchain() function:

library(cmdstanr)
check_cmdstan_toolchain(fix = TRUE)

This may give you a message that you need to update rtools to match your current version of R, so please follow the instructions in the printed error message if that is the case. Once the toolchain is set up properly, {cmdstanr} will require a working installation of CmdStan, the shell interface to Stan. CmdStan can be installed by calling the install_cmdstan() function:

install_cmdstan(cores = 2)

The exact version you have installed can be checked using cmdstanr::cmdstan_version(). You should now be able to follow the remaining instructions on the Getting Started with {cmdstanr} page to ensure that Stan models can successfully compile on your machine. However a quicker way to check this would be to run this script:

library(mvgam)
simdat <- sim_mvgam()
mod <- mvgam(
  y ~ s(season, bs = 'cc', k = 5) +
    s(time, series, bs = 'fs', k = 8),
  data = simdat$data_train
)

But issues can sometimes occur when:

If you run into any of these issues, it is best to consult with your IT department for support. There are detailed instructions by the Stan team on how to ensure you have the correct C++ toolchain to compile models, so please refer to those when consulting your IT department.

PROGRAM

09:00 - 12:00 (Berlin time): live coding exercises and review of the lecture materials

3 additional hours: self-guided readings and viewing of lecture recordings

Monday (day 1)

Lecture 1 (html | pdf)
Lecture 2 (html | pdf)
Live code examples (Random effects)
Tutorial 1 (html)

Introduction to time series and time series visualization
Some traditional time series models and their assumptions
GLMs and GAMs for ecological modelling
Temporal random effects and temporal residual correlation structures

Tuesday (day 2)

Lecture 3 (html | pdf)
Live code examples (Interactions | Time-varying effects)
Tutorial 2 (html)

Dynamic GLMs and Dynamic GAMs
Autoregressive dynamic processes
Gaussian Processes
Dynamic coefficient models

Wednesday (day 3)

Lecture 4 (html | pdf)
Live code example (Distributed lags)
Tutorial 3 (html)

Bayesian posterior predictive checks
Forecasting from dynamic models
Point-based forecast evaluation
Probabilistic forecast evaluation

Thursday (day 4)

Lecture 5 (html | pdf)
Live code examples (Time-varying seasonality)
Tutorial 4 (html)

Multivariate ecological time series
Vector autoregressive processes
Dynamic factor models
Multivariate forecast evaluation

Friday (day 5)

Group-based practical examples / case studies
Review, feedback and open discussion