class: inverse, middle, left, my-title-slide, title-slide .title[ # Ecological Time Series Analysis and Forecasting ] .author[ ### Nicholas Clark ] .institute[ ### School of Veterinary Science, University of Queensland, Australia ] .date[ ### Friday 13th December, 2024 ] --- class: middle center # Welcome ??? --- ## Workflow Press the "o" key on your keyboard to navigate among html slides Access the [workshop materials here](https://github.com/nicholasjclark/ESA_2024_timeseries) - View the sample
scripts in [live_code_examples](https://github.com/nicholasjclark/ESA_2024_timeseries/tree/main/live_code_examples) - Use the [Google Doc](https://docs.google.com/document/d/1xd3icf1wxGxO3SVt2AmKO8CkeKv1QpsxgqK7rR15U08/edit?tab=t.0#heading=h.ds87nag4ykyb) to ask questions Relevant open-source materials include: - [Forecasting Principles and Practice](https://otexts.com/fpp3/) - [Applied Time Series Analysis](https://atsa-es.github.io/atsa-labs/) - [Ecological Forecasting & Dynamics Course](https://github.com/nicholasjclark/physalia-forecasting-course/tree/main) - [A blog on how to use and interpret GAMs](https://ecogambler.netlify.app/blog/) --- ## This workshop's topics Introductions Why are time series difficult? Common time series models Why they fail in ecology State-Space GAMs and the `mvgam` 📦 --- class: inverse middle center big-subsection # Tell us about yourself --- ## Some challenges of time series Temporal autocorrelation Lagged effects Non-Gaussian data and missing observations Measurement error Time-varying effects Nonlinearities Multi-series clustering --- ## A *positively* autocorrelated series .panelset[ .panel[.panel-name[Code] ```r # set seed for reproducibility set.seed(1111) # number of timepoints T <- 100 # use arima.sim to simulate from an AR(1) model series <- arima.sim(model = list(ar = 0.8), n = T, sd = 1) # plot the time series as a line plot(series, type = 'l', bty = 'l', lwd = 2, col = 'darkred', ylab = 'Y', xlab = 'Time') ``` ] .panel[.panel-name[Model] `$$\boldsymbol{Y}_{t}\sim \text{Normal}(\color{darkred}{0.8} * \boldsymbol{Y}_{t-1},\color{darkred}{1})$$` ] .panel[.panel-name[Plot] .center[![](ESA_slidedeck_files/figure-html/ar_sim-1.svg)] ] ] --- ## A *negatively* autocorrelated series .panelset[ .panel[.panel-name[Code] ```r # set seed for reproducibility set.seed(1111) # number of timepoints T <- 100 # use arima.sim to simulate from an AR(1) model series <- arima.sim(model = list(ar = -0.8), n = T, sd = 1) # plot the time series as a line plot(series, type = 'l', bty = 'l', lwd = 2, col = 'darkred', ylab = 'Y', xlab = 'Time') ``` ] .panel[.panel-name[Model] `$$\boldsymbol{Y}_{t}\sim \text{Normal}(\color{darkred}{-0.8} * \boldsymbol{Y}_{t-1},\color{darkred}{1})$$` ] .panel[.panel-name[Plot] .center[![](ESA_slidedeck_files/figure-html/ar_simneg-1.svg)] ] ] --- class: full-size ## Seasonality .pull-right-bigger[![Lynx](resources/canada-lynx-gary-pritts.jpg)] Many time series show .emphasize[*repeated periodic cycles*] - Breeding seasons - Migration - Green-ups / green-downs - Lunar cycles - Predator / prey dynamics Often change slowly over time --- ## Example seasonal series <img src="ESA_slidedeck_files/figure-html/unnamed-chunk-3-1.svg" style="display: block; margin: auto;" /> --- ## Decompose: trend + seasonality <img src="ESA_slidedeck_files/figure-html/unnamed-chunk-4-1.svg" style="display: block; margin: auto;" /> --- class: middle center ### Modelling these multiple components, either additively or multiplicatively, is a major goal of most time series analysis procedures --- ## Common time series models Random Walk ([RW](https://atsa-es.github.io/atsa-labs/sec-tslab-random-walks-rw.html)) Autoregressive ([AR](https://atsa-es.github.io/atsa-labs/sec-tslab-autoregressive-ar-models.html)) Autoregressive Integrated Moving Average ([ARIMA](https://otexts.com/fpp3/arima.html); require [stationarity](https://otexts.com/fpp3/stationarity.html)) Exponential Smoothing ([ETS](https://otexts.com/fpp3/expsmooth.html)) [Regression with ARIMA errors](https://otexts.com/fpp3/regarima.html) --- ## *Very* easy to apply in
<img src="resources/fc_logo.png" style="position:fixed; right:8%; top:4%; width:100px; height:117px; border:none;" /> Tools in the [`forecast` 📦](https://pkg.robjhyndman.com/forecast/) are hugely popular and accessible for time series analysis / forecasting [ETS](https://pkg.robjhyndman.com/forecast/reference/ets.html) handles many types of seasonality and nonlinear trends [Regression with ARIMA errors](https://pkg.robjhyndman.com/forecast/reference/auto.arima.html) includes additive fixed effects of predictors while capturing trends and seasonality *Some* of these algorithms can handle missing data *All* are extremely fast to fit and forecast --- ## Great! But what about these? .grey[Temporal autocorrelation Lagged effects] .emphasize[*Non-Gaussian data and missing observations* *Measurement error* *Time-varying effects* *Nonlinearities* *Multi-series clustering*] --- ## Ecological time series include Counts of multiple species over time Presence-absence of species Repeated captures in multiple plots Censored measures (OTUs / pollutants with limits of detection) Phenology records Tree rings etc... --- ## Example ecological time series </br> .pull-left[ <img src="ESA_slidedeck_files/figure-html/unnamed-chunk-5-1.svg" style="display: block; margin: auto;" /> ] .pull-right[ <img src="ESA_slidedeck_files/figure-html/unnamed-chunk-6-1.svg" style="display: block; margin: auto;" /> ] --- ## Another ecological time series </br> .pull-left[ <img src="ESA_slidedeck_files/figure-html/unnamed-chunk-7-1.svg" style="display: block; margin: auto;" /> ] .pull-right[ <img src="ESA_slidedeck_files/figure-html/unnamed-chunk-8-1.svg" style="display: block; margin: auto;" /> ] --- ## Collections of ecological series <img src="ESA_slidedeck_files/figure-html/unnamed-chunk-9-1.svg" style="display: block; margin: auto;" /> --- class: inverse white-subsection background-image: url('./resources/bwbirds.jpeg') background-size: cover ## All can have measurement error --- class: inverse middle center big-subsection # How can we do better? --- background-image: url('./resources/SS_model.svg') ## State-Space models --- ## State-Space models </br> <img align="center" width="1200" height="300" src="resources/auger.jpg"> .small[[Auger-Methe *et al* 2021](https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecm.1470)] --- ## State-Space linear models `\begin{align*} Y_t & \sim Normal(\alpha + \sum_{j=1}^{J}(\beta_j x_{jt}) + Zz_t, \sigma_y) \\ z_t & \sim Normal(f(z_{t-lag}) + \sum_{k=1}^{K}(\beta_k q_{kt}), \sigma_z) \end{align*}` <br/> Where: - `\(\beta_j\)` capture linear effects of the `\(J\)` observation model predictors - `\(z_t\)` is a .emphasize[*latent process*], weighted by a loading matrix `\(Z\)` - `\(\beta_k\)` capture linear effects of the `\(K\)` process model predictors --- background-image: url('./resources/df_with_series.gif') ## *Z* ⇨ induced correlations --- # Gaussian!?! Gaussian observation models won't give sensible predictions for bounded / discrete / non-Normal data We can do better by choosing observation distributions that obey the constraints on our outcome variables .emphasize[*Generalizes*] the linear regression by replacing parameters from other probability distributions with linear models `\(\alpha + \sum_{j=1}^{J}(\beta_j x_{jt}) + Zz_t\)` ⇨ `\(g^{-1}(\alpha + \sum_{j=1}^{J}(\beta_j x_{jt}) + Zz_t)\)` `\(g^{-1}\)` is the inverse of a nonlinear .emphasize[*link function*] --- ## Many relevant distributions [Many useful GLM probability distributions exist](https://cran.r-project.org/web/packages/brms/vignettes/brms_families.html). Some of these include: - .emphasize[*Negative Binomial*] — overdispersed integers in `\((0,1,2,...)\)` - .emphasize[*Bernoulli*] — presence-absence data in `\(\{0,1\}\)` - .emphasize[*Student's T*] — heavy-tailed (skewed) real values in `\((-\infty, \infty)\)` - .emphasize[*Lognormal*] — heavy-tailed (right skewed) real values in `\((0, \infty)\)` - .emphasize[*Gamma*] — lighter-tailed (less skewed) real values in `\((0, \infty)\)` - .emphasize[*Multinomial*] — integers representing `\(K\)` unordered categories in `\((0,1,..., K)\)` - .emphasize[*Ordinal*] — integers representing `\(K\)` ordered categories in `\((0,1,..., K)\)` --- class: middle center ### State-Space GLMs allow us to build models that respect the bounds and distributions of our observed data <br> ### They traditionally assume the appropriately transformed mean response depends *linearly* on the predictors, as well as on the latent states <br> ### But there are many other properties we'd like to model, including nonlinearities. Time to get .multicolor[W I G G L Y] --- background-image: url('./resources/smooth_only.gif') ## GAMs use splines ... --- background-image: url('./ESA_slidedeck_files/figure-html/basis-functions-1.svg') ## ... made of basis functions --- background-image: url('./resources/basis-functions-weights-1.svg') ## Weighting basis functions ... --- background-image: url('./resources/basis_weights.gif') ## ... gives a spline `\((f(x))\)` --- background-image: url('./resources/smooth_to_data.gif') ## Penalize `\(f"(x)\)` to learn weights --- background-image: url('./ESA_slidedeck_files/figure-html/complexity-1.svg') --- ## State-Space GAMs `\begin{align*} \mathbb{E}(\boldsymbol{Y_t}|\boldsymbol{X_t}, \boldsymbol{Q_t}) & = g^{-1}(\alpha + \sum_{j=1}^{J}f(x_{jt}) + Zz_t) \\ z_t & \sim Normal(f(z_{t-lag}) + \sum_{k=1}^{K}f(q_{kt}), \sigma_z) \end{align*}` <br/> Where: - `\(f(x)\)` are potentially nonlinear functions of the `\(J\)` predictors - `\(z_t\)` is a .emphasize[*latent process*], weighted by a loading matrix `\(Z\)` - `\(f(q)\)` are potentially nonlinear functions of the `\(K\)` predictors --- class: inverse middle center big-subsection # Questions? --- <img src="resources/mvgam_logo.png" style="position:fixed; right:8%; top:4%; width:100px; height:117px; border:none;" /> ## The [`mvgam` 📦](https://github.com/nicholasjclark/mvgam/tree/master) Bayesian framework to fit State-Space GAMs - Hierarchical intercepts, slopes, smooths and Gaussian Processes - Learning `\(Z\)` ⇨ JSDMs, N-mixture models, Dynamic Factors Built off [`mgcv`](https://cran.r-project.org/web/packages/mgcv/index.html), [`brms` ](https://paulbuerkner.com/brms/) and [`splines2`](https://cran.r-project.org/web/packages/splines2/index.html) 📦's for flexible effects Familiar
formula interface Uni- or multivariate series from a range of response distributions Uses [Stan](https://mc-stan.org/) for efficient Hamiltonian Monte Carlo sampling --- class: middle center big-subsection # [Package overview](https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html) --- ## Example of the interface ```r model <- mvgam( formula = y ~ s(series, bs = 're') + s(x0, series, bs = 're') + x1, trend_formula = ~ gp(x2, k = 20) + te(x3, x4, bs = c('cr', 'tp')), data = data, family = poisson(), trend_model = AR(p = 1, ma = TRUE, cor = TRUE), burnin = 500, samples = 500, chains = 4) ``` --- ## Produce all `Stan` code and objects ```r stancode(model) ``` .small[ ``` ## // Stan model code generated by package mvgam ## functions { ## /* Spectral density of a squared exponential Gaussian process ## * Args: ## * x: array of numeric values of dimension NB x D ## * sdgp: marginal SD parameter ## * lscale: vector of length-scale parameters ## * Returns: ## * numeric vector of length NB of the SPD evaluated at 'x' ## */ ## vector spd_gp_exp_quad(data array[] vector x, real sdgp, vector lscale) { ## int NB = dims(x)[1]; ## int D = dims(x)[2]; ## int Dls = rows(lscale); ## real constant = square(sdgp) * sqrt(2 * pi()) ^ D; ## vector[NB] out; ## if (Dls == 1) { ## // one dimensional or isotropic GP ## real neg_half_lscale2 = -0.5 * square(lscale[1]); ## constant = constant * lscale[1] ^ D; ## for (m in 1 : NB) { ## out[m] = constant * exp(neg_half_lscale2 * dot_self(x[m])); ## } ## } else { ## // multi-dimensional non-isotropic GP ## vector[Dls] neg_half_lscale2 = -0.5 * square(lscale); ## constant = constant * prod(lscale); ## for (m in 1 : NB) { ## out[m] = constant * exp(dot_product(neg_half_lscale2, square(x[m]))); ## } ## } ## return out; ## } ## } ## data { ## int<lower=1> k_gp_trend_x2_; // basis functions for approximate gp ## array[k_gp_trend_x2_] vector[1] l_gp_trend_x2_; // approximate gp eigenvalues ## array[20] int b_trend_idx_gp_x2_; // gp basis coefficient indices ## int<lower=0> total_obs; // total number of observations ## int<lower=0> n; // number of timepoints per series ## int<lower=0> n_sp_trend; // number of trend smoothing parameters ## int<lower=0> n_lv; // number of dynamic factors ## int<lower=0> n_series; // number of series ## matrix[n_series, n_lv] Z; // matrix mapping series to latent states ## int<lower=0> num_basis; // total number of basis coefficients ## int<lower=0> num_basis_trend; // number of trend basis coefficients ## vector[num_basis_trend] zero_trend; // prior locations for trend basis coefficients ## matrix[total_obs, num_basis] X; // mgcv GAM design matrix ## matrix[n * n_lv, num_basis_trend] X_trend; // trend model design matrix ## array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?) ## array[n, n_lv] int ytimes_trend; ## int<lower=0> n_nonmissing; // number of nonmissing observations ## matrix[24, 72] S_trend2; // mgcv smooth penalty matrix S_trend2 ## array[n_nonmissing] int<lower=0> flat_ys; // flattened nonmissing observations ## matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations ## array[n_nonmissing] int<lower=0> obs_ind; // indices of nonmissing observations ## } ## transformed data { ## vector[n_lv] trend_zeros = rep_vector(0.0, n_lv); ## } ## parameters { ## // gp term sd parameters ## real<lower=0> alpha_gp_trend_x2_; ## ## // gp term length scale parameters ## array[1] vector<lower=0>[1] rho_gp_trend_x2_; ## ## // gp term latent variables ## vector[k_gp_trend_x2_] z_gp_trend_x2_; ## ## // raw basis coefficients ## vector[num_basis] b_raw; ## vector[num_basis_trend] b_raw_trend; ## ## // latent state SD terms ## vector<lower=0>[n_lv] sigma; ## cholesky_factor_corr[n_lv] L_Omega; ## ## // random effect variances ## vector<lower=0>[2] sigma_raw; ## ## // random effect means ## vector[2] mu_raw; ## ## // latent state AR1 terms ## vector<lower=-1, upper=1>[n_lv] ar1; ## ## // ma coefficients ## matrix<lower=-1, upper=1>[n_lv, n_lv] theta; ## ## // dynamic error parameters ## array[n] vector[n_lv] error; ## ## // smoothing parameters ## vector<lower=0>[n_sp_trend] lambda_trend; ## } ## transformed parameters { ## // latent states and loading matrix ## vector[n * n_lv] trend_mus; ## matrix[n, n_series] trend; ## array[n] vector[n_lv] LV; ## array[n] vector[n_lv] epsilon; ## ## // LKJ form of covariance matrix ## matrix[n_lv, n_lv] L_Sigma; ## ## // computed error covariance matrix ## cov_matrix[n_lv] Sigma; ## matrix[n_series, n_lv] lv_coefs; ## ## // basis coefficients ## vector[num_basis] b; ## vector[num_basis_trend] b_trend; ## ## // observation model basis coefficients ## b[1 : 2] = b_raw[1 : 2]; ## b[3 : 6] = mu_raw[1] + b_raw[3 : 6] * sigma_raw[1]; ## b[7 : 10] = mu_raw[2] + b_raw[7 : 10] * sigma_raw[2]; ## ## // process model basis coefficients ## b_trend[1 : num_basis_trend] = b_raw_trend[1 : num_basis_trend]; ## b_trend[b_trend_idx_gp_x2_] = sqrt(spd_gp_exp_quad(l_gp_trend_x2_, ## alpha_gp_trend_x2_, ## rho_gp_trend_x2_[1])) ## .* z_gp_trend_x2_; ## ## // latent process linear predictors ## trend_mus = X_trend * b_trend; ## ## // derived latent states ## LV[1] = trend_mus[ytimes_trend[1, 1 : n_lv]] + error[1]; ## epsilon[1] = error[1]; ## for (i in 2 : n) { ## // lagged error ma process ## epsilon[i] = theta * error[i - 1]; ## ## // full ARMA process ## LV[i] = trend_mus[ytimes_trend[i, 1 : n_lv]] ## + ar1 .* (LV[i - 1] - trend_mus[ytimes_trend[i - 1, 1 : n_lv]]) ## + epsilon[i] + error[i]; ## } ## L_Sigma = diag_pre_multiply(sigma, L_Omega); ## Sigma = multiply_lower_tri_self_transpose(L_Sigma); ## lv_coefs = Z; ## for (i in 1 : n) { ## for (s in 1 : n_series) { ## trend[i, s] = dot_product(lv_coefs[s, : ], LV[i, : ]); ## } ## } ## } ## model { ## // prior for random effect population variances ## sigma_raw ~ student_t(3, 0, 2.5); ## ## // prior for random effect population means ## mu_raw ~ std_normal(); ## ## // prior for (Intercept)... ## b_raw[1] ~ student_t(3, 0, 2.5); ## ## // prior for x1B... ## b_raw[2] ~ student_t(3, 0, 2); ## ## // prior (non-centred) for s(series)... ## b_raw[3 : 6] ~ std_normal(); ## ## // prior (non-centred) for s(x0,series)... ## b_raw[7 : 10] ~ std_normal(); ## ## // priors for AR parameters ## ar1 ~ std_normal(); ## ## // priors for latent state SD parameters ## sigma ~ student_t(3, 0, 2.5); ## ## // dynamic process models ## ## // prior for (Intercept)_trend... ## b_raw_trend[1] ~ student_t(3, 0, 2); ## ## // prior for te(x3,x4)_trend... ## b_raw_trend[22 : 45] ~ multi_normal_prec(zero_trend[22 : 45], ## S_trend2[1 : 24, 1 : 24] ## * lambda_trend[3] ## + S_trend2[1 : 24, 25 : 48] ## * lambda_trend[4] ## + S_trend2[1 : 24, 49 : 72] ## * lambda_trend[5]); ## ## // prior for gp(x2)_trend... ## z_gp_trend_x2_ ~ std_normal(); ## alpha_gp_trend_x2_ ~ student_t(3, 0, 2.5); ## rho_gp_trend_x2_[1] ~ inv_gamma(1.494197, 0.056607); ## b_raw_trend[b_trend_idx_gp_x2_] ~ std_normal(); ## lambda_trend ~ normal(5, 30); ## ## // contemporaneous errors ## L_Omega ~ lkj_corr_cholesky(2); ## for (i in 1 : n) { ## error[i] ~ multi_normal_cholesky(trend_zeros, L_Sigma); ## } ## ## // ma coefficients ## for (i in 1 : n_lv) { ## for (j in 1 : n_lv) { ## if (i != j) { ## theta[i, j] ~ normal(0, 0.2); ## } ## } ## } ## { ## // likelihood functions ## vector[n_nonmissing] flat_trends; ## flat_trends = to_vector(trend)[obs_ind]; ## flat_ys ~ poisson_log_glm(append_col(flat_xs, flat_trends), 0.0, ## append_row(b, 1.0)); ## } ## } ## generated quantities { ## vector[total_obs] eta; ## matrix[n, n_series] mus; ## vector[n_sp_trend] rho_trend; ## vector[n_lv] penalty; ## array[n, n_series] int ypred; ## penalty = 1.0 / (sigma .* sigma); ## rho_trend = log(lambda_trend); ## ## // posterior predictions ## eta = X * b; ## for (s in 1 : n_series) { ## mus[1 : n, s] = eta[ytimes[1 : n, s]] + trend[1 : n, s]; ## ypred[1 : n, s] = poisson_log_rng(mus[1 : n, s]); ## } ## } ``` ] --- ## Workflow Fit models that can include splines, GPs, and multivariate dynamic processes to sets of time series; use informative priors for effective regularization Use posterior predictive checks and Randomized Quantile (Dunn-Smyth) residuals to assess model failures Use `marginaleffects` 📦 to generate interpretable (and reportable) model predictions Produce probabilistic forecasts Evaluate forecast distributions using proper scoring rules --- class: middle center big-subsection # [`?mvgam`](https://nicholasjclark.github.io/mvgam/reference/mvgam.html#details) --- background-image: url('./resources/mvgam_cheatsheet.png') background-size: contain --- class: inverse middle center big-subsection # Live code examples