Ecological forecasting in R

.title[
# Ecological forecasting in R
]
.subtitle[
## Lecture 4: multivariate ecological timeseries
]
.author[
### Nicholas Clark
]
.institute[
### School of Veterinary Science, University of Queensland
]
.date[
### 0900–1200 CET Thursday 27th May, 2025
]

---

## Workflow

Press the "o" key on your keyboard to navigate among slides

Access the [tutorial html here](https://nicholasjclark.github.io/physalia-forecasting-course/day4/tutorial_4_physalia)
- Download the data objects and exercise <svg aria-hidden="true" role="img" viewBox="0 0 581 512" style="height:1em;width:1.13em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:steelblue;overflow:visible;position:relative;"><path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"/></svg> script from the html file
- Complete exercises and use Slack to ask questions

Relevant open-source materials include:
- [GAMs for time series](https://youtu.be/Ukfvd8akfco?si=tlmSm2-51ZZ1wQf6)
- [Smoothed dynamic factor analysis for identifying trends in multivariate time series](https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.13788)
- [State-Space Vector Autoregressions in `mvgam`](https://ecogambler.netlify.app/blog/vector-autoregressions/)

---

## This lecture's topics

Multivariate ecological time series

Vector autoregressive processes

Dynamic factor models

Multivariate forecast evaluation

---

# Multivariate ecological time series

---

## We often measure *multiple* series
<img src="lecture_5_slidedeck_files/figure-html/unnamed-chunk-1-1.png" style="display: block; margin: auto;" />

---
## Applicable for many situations

Multivariate time series arise when we have:
- Multiple species in one site
- Same species in multiple sites
- Multiple subjects in an experiment
- Multiple plots within a site
- etc...

Often the structure of the data is grouped in some way (i.e. .emphasize[*hierarchical*])

Both `mvgam` and `brms` 📦's were designed to handle this kind of data

---

class: middle center
### Hierarchical models *learn from all groups at once* to inform group-level estimates
<br>
### This induces *regularization*, where noisy estimates are pulled towards the overall mean
<br>
### The regularization is known as [partial pooling](https://www.jstor.org/stable/25471160)

---
background-image: url('./resources/partial_pool.gif')
## Partial pooling in action
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
.small[[McElreath 2023](https://www.youtube.com/watch?v=SocRgsf202M)]

---

## Noisy estimates *pulled* to the mean
<br>
.center[<img align="center" width="768" height="384" src="resources/partial_pool_estimates.png">]

---

## What about *nonlinear* effects?
<img src="resources/mozzie_cycles.jpg" style="position:fixed; right:13%; top:20%; width:896px; height:465px; border:none;"/>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
.small[[Whittaker et al 2022](https://royalsocietypublishing.org/doi/10.1098/rspb.2022.0089)]

---

## What about *nonlinear* effects?
We very often expect to encounter nonlinear effects in ecology

But if we measure multiple species / plots / individuals etc.. through time, we can also encounter hierarchical nonlinear effects
- Same species may respond similarly to environmental change over different sites
- Different species may respond similarly in the same site

Our data may not be rich enough to estimate all effects individually; so what can we do?
---

## Simulate some hierarchical data

``` r
library(mvgam)

# set a seed for reproducibility
set.seed(650)

# simulate four time series with very little 
# trend and with hierarchical seasonality
simdat <- sim_mvgam(
  seasonality = 'hierarchical',
  trend_rel = 0.05,
  n_series = 4, 
  mu = c(1, 1, 2, 1.2)
)
```
]

.panel[.panel-name[Data]
<table class=" lightable-minimal" style='color: black; font-family: "Trebuchet MS", verdana, sans-serif; width: auto !important; margin-left: auto; margin-right: auto;'>
 <thead>
  <tr>
   <th style="text-align:right;"> y </th>
   <th style="text-align:right;"> season </th>
   <th style="text-align:right;"> year </th>
   <th style="text-align:left;"> series </th>
   <th style="text-align:right;"> time </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 4 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:left;"> series_1 </td>
   <td style="text-align:right;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:left;"> series_2 </td>
   <td style="text-align:right;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:left;"> series_3 </td>
   <td style="text-align:right;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 5 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:left;"> series_4 </td>
   <td style="text-align:right;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 5 </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:left;"> series_1 </td>
   <td style="text-align:right;"> 2 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 4 </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:left;"> series_2 </td>
   <td style="text-align:right;"> 2 </td>
  </tr>
</tbody>
</table>

]
]

---
## The series
<img src="lecture_5_slidedeck_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" />

---
## Similar seasonal patterns
<img src="lecture_5_slidedeck_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" />

---

class: middle center
### Can we somehow estimate the average population smooth *and* a smooth to determine how each series deviates from the population?
<br>
### Yes! We can use .multicolor[hierarchical GAMs]

---

## Decomposing seasonality
<img src="lecture_5_slidedeck_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" />

---

## How did we model this?

``` r
mod <- mvgam(
  y ~ 
    s(season, bs = 'cc', k = 12) +
    s(season, series, k = 6, bs = 'fs'),
  data = data, 
  family = poisson()
)
```

---

## How did we model this?

``` r
mod <- mvgam(
  y ~ 
*   s(season, bs = 'cc', k = 12) +
    s(season, series, k = 6, bs = 'fs'),
  data = data, 
  family = poisson()
)
```

A .emphasize[*shared*] smooth of seasonality

This is a group-level smooth, similar to what we might expect the average seasonal function to be in this set of series
---

## How did we model this?

``` r
mod <- mvgam(
  y ~ 
    s(season, bs = 'cc', k = 12) +
*   s(season, series, k = 6, bs = 'fs'),
  data = data, 
  family = poisson()
)
```

Series-level .emphasize[*deviation*] smooths of seasonality, which all share a common smoothing penalty

These are individual-level smooths that capture how each series' seasonal pattern differs from the shared smooth
- See [Pedersen et al 2019](https://peerj.com/articles/6876/) for useful guidance

---
## Conditional predictions
<img src="lecture_5_slidedeck_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" />

---

## Hierarchical GAMs (HGAMs)
By decomposing the linear predictor into a set of additive smooths, HGAMs offer a number of very useful ways to model multivariate data

Each of these strategies allows us to:
- Learn smooth functions for each series using data from .emphasize[*all series in the data*]
- Regularize functions when using noisy and/or sparse data
- Make predictions for .emphasize[*new series that haven't been measured yet*]

See [`?mgcv::factor.smooth.interaction`](https://stat.ethz.ch/R-manual/R-devel/library/mgcv/html/factor.smooth.html) for more details

---

class: middle center
### But what if we have smooths of two or more covariates? Can we still learn these hierarchically?

---
## Hierarchical smooth interactions

``` r
mvgam(
  y ~ 
*   te(temp, month, k = c(4, 4)) +
*   te(temp, month, k = c(4, 4), by = series),
  family = gaussian(),
  data = plankton_train,
  newdata = plankton_test,
  trend_model = 'None'
)
```

Here each series is given a [tensor product smooth](https://onlinelibrary.wiley.com/doi/10.1111/j.1541-0420.2006.00574.x) of two covariates

Series-specific smooth is a deviation from a "shared" smooth

More on this example in [Tutorial 4](https://nicholasjclark.github.io/physalia-forecasting-course/day4/tutorial_4_physalia)

---

## Hierarchical distributed lags

``` r
mvgam(
  y ~  s(ndvi_ma12, trend, bs = 're') +
*   te(mintemp, lag, k = c(3, 4), bs = c('tp', 'cr')) +
*   te(mintemp, lag, by = weights_dm, k = c(3, 4), bs = c('tp', 'cr')) +
*   te(mintemp, lag, by = weights_do, k = c(3, 4), bs = c('tp', 'cr')) +
*   te(mintemp, lag, by = weights_ot, k = c(3, 4), bs = c('tp', 'cr')) +
*   te(mintemp, lag, by = weights_pp, k = c(3, 4), bs = c('tp', 'cr')),
  ...
)
```

Here the effect of minimum temperature changes smoothly over lags for each series, using the `by` argument with a set of series-specific weights for deviations

More on this type of example [in this recent blogpost](https://ecogambler.netlify.app/blog/distributed-lags-mgcv/)

---

class: middle center
### HGAMs offer a solution to estimate the hierarchical, nonlinear effects that we think are common in ecology
<br>
### This is a huge advantage over traditional time series models
<br>
### But how can we handle multivariate *dynamic components*?

---

# Live code example

---

# Vector autoregressive processes

---

## VAR1
Similar to an AR1, but the response is now a .emphasize[*vector*]
<br/>
<br/>
`\begin{align*}
x_t & \sim \text{Normal}(A * x_{t-1}, \Sigma) \\
\end{align*}`

Where: 
- `$x_t$` is a vector of time series observations at time `$t$`
- `$\Sigma$` determines the spread (or flexibility) of the process and any correlations among time series errors
- `$A$` is a matrix of coefficients estimating lagged dependence and cross-dependence between the elements of `$x$`

---

---

## Properties of a VAR1
If off-diagonals in both `$A$` and `$\Sigma$` = 0, process is an AR1

If off-diagonals in `$A$`, but not in `$\Sigma$`, = 0, process is an AR1 with correlated errors

If `$A$` has no zero entries, the process can quickly become .emphasize[*nonstationary*], leading to explosive forecasts

---

## Advantages of VARs

They allow us to explore whether one response variable can be useful in predicting another, giving us a means to tackle .emphasize[*interactions*]

They allow for impulse response analyses, where the response of one variable to a sudden but temporary change in another variable can be predicted

They allow us to identify which other responses are most influential in driving our uncertainty in a focal response

---
## Some simulated data

---
## Latent VAR1s in `mvgam` 📦

``` r
varmod <- mvgam(
  y ~ 1,
* trend_model = VAR(),
  data = data_train,
  newdata = data_test,
  family = gaussian()
)
```

If multiple series are included in the data, we can use a VAR1 to estimate latent dynamics
- `trend_model = VAR()`: a VAR1 with uncorrelated process errors (off-diagonals in `$\Sigma$` set to zero)
- `trend_model = VAR(cor = TRUE)`: a VAR1 with possibly correlated process errors

---
## Enforcing stationarity
Forecasts from VAR models, especially those with many series, can very quickly become nonstationary

Not only will forecast variance increase indefinitely, but dynamics can become unstable and forecasts can .emphasize[*explode*]

`mvgam` 📦 enforces stationarity in all VAR1 models using careful prior choices
- Maps the process to unconstrained partial autocorrelations, which can be constrained to preserve stationarity
- Derived and explained in detail by Sarah Heaps [here](https://www.youtube.com/watch?v=W7rUEvQKyus) and [here](https://www.tandfonline.com/doi/full/10.1080/10618600.2022.2079648)

---
## Trend estimates
<img src="lecture_5_slidedeck_files/figure-html/unnamed-chunk-16-1.png" style="display: block; margin: auto;" />

---
## Hindasts / forecasts
<img src="lecture_5_slidedeck_files/figure-html/unnamed-chunk-17-1.png" style="display: block; margin: auto;" />

---
## Interaction matrix `$A$`
<img src="lecture_5_slidedeck_files/figure-html/unnamed-chunk-18-1.png" style="display: block; margin: auto;" />
---

class: middle center
### VAR models give us a tool to capture complex multi-series dynamics while providing sensible forecasts
<br>
### But often we have enough series that estimating VAR parameters becomes intractible
<br>
### How can we achieve dimension reduction when forecasting multiple series?

---

# Dynamic factor models

---

background-image: url('./resources/df_with_series.gif')
## Inducing multiseries correlations
---

## Dynamic factors
Propagate a set of latent factors and allow each observed series to .emphasize[*depend*] on these factors with a set of weights
<br/>
<br/>
`\begin{align*}
x_t & = \theta*z_t \\
\end{align*}`

Where: 
- `$x_t$` is a vector of time series observations at time `$t$`
- `$z_t$` is a vector of dynamic factor estimates at time `$t$`
- `$\theta$` is a matrix of loading coefficients that control how each series in `$x$` depends on the factors in `$z$`

---
class: middle center
### Dynamic factor models are hugely flexible
<br>
### The factors can take on a number of possible time series dynamics (RW, AR, GP)
<br>
### Similar loadings &#8680; correlated trends
---

## Dynamic factors in `mvgam` 📦

``` r
dfmod <- mvgam(
  y ~ 1,
* use_lv = TRUE,
* n_lv = 2,
  trend_model = AR(), 
  data = data_train,
  newdata = data_test,
  family = gaussian()
)
```

If multiple series are included in the data, we can use a dynamic factor model to estimate latent dynamics
- `n_lv`: the number of latent factors to estimate
- `trend_model`: can be `RW()`, `AR(p = 1, 2, or 3)` or `GP()`
- variance parameters for all factors fixed to ensure identifiability

---

## Trend estimates
<img src="lecture_5_slidedeck_files/figure-html/unnamed-chunk-20-1.png" style="display: block; margin: auto;" />
---

## Hindasts / forecasts
<img src="lecture_5_slidedeck_files/figure-html/unnamed-chunk-21-1.png" style="display: block; margin: auto;" />

---

## Induced correlations

``` r
mean_corrs <- lv_correlations(object = dfmod)$mean_correlations
mean_corrs
```

<table class=" lightable-minimal" style='color: black; font-family: "Trebuchet MS", verdana, sans-serif; width: auto !important; margin-left: auto; margin-right: auto;'>
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:right;"> series_1 </th>
   <th style="text-align:right;"> series_2 </th>
   <th style="text-align:right;"> series_3 </th>
   <th style="text-align:right;"> series_4 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> series_1 </td>
   <td style="text-align:right;"> 1.00 </td>
   <td style="text-align:right;"> 0.93 </td>
   <td style="text-align:right;"> 0.91 </td>
   <td style="text-align:right;"> 0.50 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> series_2 </td>
   <td style="text-align:right;"> 0.93 </td>
   <td style="text-align:right;"> 1.00 </td>
   <td style="text-align:right;"> 0.87 </td>
   <td style="text-align:right;"> 0.44 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> series_3 </td>
   <td style="text-align:right;"> 0.91 </td>
   <td style="text-align:right;"> 0.87 </td>
   <td style="text-align:right;"> 1.00 </td>
   <td style="text-align:right;"> 0.47 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> series_4 </td>
   <td style="text-align:right;"> 0.50 </td>
   <td style="text-align:right;"> 0.44 </td>
   <td style="text-align:right;"> 0.47 </td>
   <td style="text-align:right;"> 1.00 </td>
  </tr>
</tbody>
</table>

---
## In-sample LOO comparison

``` r
loo_compare(
  varmod,
  dfmod
)
```

```
##        elpd_diff se_diff
## varmod   0.0       0.0  
## dfmod  -13.5       3.6
```
---

class: middle center
### We have many ways to estimate multivariate dynamic components in the `mvgam` framework
<br>
### VARs can approximate Granger causality and allow for targeted hypotheses about interactions to be evaluated; Dynamic factors can capture time series correlations with fewer parameters
<br>
### But how do we go about *evaluating* forecasts to choose among models?
---

# Multivariate forecast evaluation

---

## Energy score
Generalizes the CRPS for multivariate forecasts

`$$ES(F,y)= \frac{1}{m}\sum_{i=1}^{m}||F_i - y||-\frac{1}{2m^2}\sum_{i=1}^{m}\sum_{j=1}^{m}||F_i-F_j||$$`

Where:
- `$F(\hat{y})$` is a set of `$m$` samples from the forecast distribution
- `$||\cdot||$` is the Euclidean norm

Essentially a weighted distance between the forecast distribution and distribution of observations

---

## Compare energy scores

``` r
# score forecasts from each model and 
# compute differences (VAR scores - DF scores)
fc_var <- forecast(varmod); fc_df <- forecast(dfmod)
diff_scores <- score(fc_var, score = 'energy')$all_series$score -
  score(fc_df, score = 'energy')$all_series$score

# plot the differences (negative means VAR1 was better; 
# positive means DF was better)
plot(diff_scores, pch = 16, col = 'darkred', 
     ylim = c(-1*max(abs(diff_scores), na.rm = TRUE),
              max(abs(diff_scores), na.rm = TRUE)),
     bty = 'l', xlab = 'Forecast horizon',
     ylab = expression(energy[VAR1]~-~energy[DF]))
abline(h = 0, lty = 'dashed', lwd = 2)
```

]

---

class: middle center
### Energy score gernealizes to the univariate CRPS; is readily applicable to a variety of forecasts (ensemble, samples)
<br>
### May not be able to tell us whether a forecast misses correlation structures that are evident in out of sample observations
<br>
### We may also need a score that penalizes forecasts which do not replicate the observed correlation structure

---

## Variogram score
Asks if forecast distribution captures .emphasize[*correlation structure*] evident in observations

`$$Variogram(F,y)= \sum_{i,j=1}^{d}w_{ij}(\sqrt{|y_i-y_j|}-\sqrt{|F_i-F_j|})^2$$`

Where:
- `$F(\hat{y})$` is the mean (or median) forecast for the `$d$` series
- `$w$` is a set of non-negative weights

Penalizes forecasts whose variograms don't match the observed variogram

---

## Compare variogram scores

``` r
# score forecasts from each model and 
# compute differences (VAR scores - DF scores)
fc_var <- forecast(varmod); fc_df <- forecast(dfmod)
diff_scores <- score(fc_var, score = 'variogram')$all_series$score -
  score(fc_df, score = 'variogram')$all_series$score

# plot the differences (negative means VAR1 was better; 
# positive means DF was better)
plot(diff_scores, pch = 16, col = 'darkred', 
     ylim = c(-1*max(abs(diff_scores), na.rm = TRUE),
              max(abs(diff_scores), na.rm = TRUE)),
     bty = 'l', xlab = 'Forecast horizon',
     ylab = expression(variogram[VAR1]~-~variogram[DF]))
abline(h = 0, lty = 'dashed', lwd = 2)
```

]

---
class: middle center
### The variogram score is more useful for directly asking if a forecast distribution resembles the dependency structure in the observations
<br>
### But it only uses a summary of the forecast (i.e. the mean or median forecast for each series), so does not address the sharpness or calibration of the forecast
<br>
### Weighted combinations of variogram and energy scores may be necessary to fully characterize the performance of a forecast

---

## In the next lecture, we will cover

Extended practical examples using `mvgam`

Review and open discussion