class: inverse, middle, left, my-title-slide, title-slide .title[ # Ecological forecasting in R ] .subtitle[ ## Lecture 4: multivariate ecological timeseries ] .author[ ### Nicholas Clark ] .institute[ ### School of Veterinary Science, University of Queensland ] .date[ ### 0900–1200 CET Thursday 30th May, 2024 ] --- ## Workflow Press the "o" key on your keyboard to navigate among slides Access the [tutorial html here](https://nicholasjclark.github.io/physalia-forecasting-course/day4/tutorial_4_physalia) - Download the data objects and exercise
script from the html file - Complete exercises and use Slack to ask questions Relevant open-source materials include: - [GAMs for time series](https://youtu.be/Ukfvd8akfco?si=tlmSm2-51ZZ1wQf6) - [Smoothed dynamic factor analysis for identifying trends in multivariate time series](https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.13788) - [Multivariate State-Space models](https://youtu.be/4nrZZGMY1bc?si=VuKYtJVKKaMBliI6) --- ## This lecture's topics Multivariate ecological time series Vector autoregressive processes Dynamic factor models Multivariate forecast evaluation --- class: inverse middle center big-subsection # Multivariate ecological time series --- ## We often measure *multiple* series <img src="lecture_5_slidedeck_files/figure-html/unnamed-chunk-1-1.svg" style="display: block; margin: auto;" /> --- ## Applicable for many situations Multivariate time series arise when we have: - Multiple species in one site - Same species in multiple sites - Multiple subjects in an experiment - Multiple plots within a site - etc... Often the structure of the data is grouped in some way (i.e. .emphasize[*hierarchical*]) Both `mvgam` and `brms` 📦's were designed to handle this kind of data --- class: middle center ### Hierarchical models *learn from all groups at once* to inform group-level estimates <br> ### This induces *regularization*, where noisy estimates are pulled towards the overall mean <br> ### The regularization is known as [partial pooling](https://www.jstor.org/stable/25471160) --- background-image: url('./resources/partial_pool.gif') ## Partial pooling in action <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> .small[[McElreath 2023](https://www.youtube.com/watch?v=SocRgsf202M)] --- ## Noisy estimates *pulled* to the mean <br> .center[<img align="center" width="768" height="384" src="resources/partial_pool_estimates.png">] .small[[Johnson *et al* 2021](https://www.bayesrulesbook.com/)] --- ## What about *nonlinear* effects? <img src="resources/mozzie_cycles.jpg" style="position:fixed; right:13%; top:20%; width:896px; height:465px; border:none;"/> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> .small[[Whittaker et al 2022](https://royalsocietypublishing.org/doi/10.1098/rspb.2022.0089)] --- ## What about *nonlinear* effects? We very often expect to encounter nonlinear effects in ecology But if we measure multiple species / plots / individuals etc.. through time, we can also encounter hierarchical nonlinear effects - Same species may respond similarly to environmental change over different sites - Different species may respond similarly in the same site Our data may not be rich enough to estimate all effects individually; so what can we do? --- ## Simulate some hierarchical data .panelset[ .panel[.panel-name[Code] ```r library(mvgam) # set a seed for reproducibility set.seed(650) # simulate four time series with very little # trend and with hierarchical seasonality simdat <- sim_mvgam(seasonality = 'hierarchical', trend_rel = 0.05, n_series = 4, mu = c(1, 1, 2, 1.2)) ``` ] .panel[.panel-name[Data] <table class=" lightable-minimal" style='color: black; font-family: "Trebuchet MS", verdana, sans-serif; width: auto !important; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="text-align:right;"> y </th> <th style="text-align:right;"> season </th> <th style="text-align:right;"> year </th> <th style="text-align:left;"> series </th> <th style="text-align:right;"> time </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> series_1 </td> <td style="text-align:right;"> 1 </td> </tr> <tr> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> series_2 </td> <td style="text-align:right;"> 1 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> series_3 </td> <td style="text-align:right;"> 1 </td> </tr> <tr> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> series_4 </td> <td style="text-align:right;"> 1 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> series_1 </td> <td style="text-align:right;"> 2 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> series_2 </td> <td style="text-align:right;"> 2 </td> </tr> </tbody> </table> ] ] --- ## The series <img src="lecture_5_slidedeck_files/figure-html/unnamed-chunk-4-1.svg" style="display: block; margin: auto;" /> --- ## Similar seasonal patterns <img src="lecture_5_slidedeck_files/figure-html/unnamed-chunk-5-1.svg" style="display: block; margin: auto;" /> --- class: middle center ### Can we somehow estimate the average population smooth *and* a smooth to determine how each series deviates from the population? <br> ### Yes! We can use .multicolor[hierarchical GAMs] --- ## Decomposing seasonality <img src="lecture_5_slidedeck_files/figure-html/unnamed-chunk-6-1.svg" style="display: block; margin: auto;" /> --- ## How did we model this? ```r mod <- mvgam(y ~ s(season, bs = 'cc', k = 12) + s(season, series, k = 6, bs = 'fs'), data = data, family = poisson()) ``` --- ## How did we model this? ```r mod <- mvgam(y ~ * s(season, bs = 'cc', k = 12) + s(season, series, k = 6, bs = 'fs'), data = data, family = poisson()) ``` A .emphasize[*shared*] smooth of seasonality This is a group-level smooth, similar to what we might expect the average seasonal function to be in this set of series --- ## How did we model this? ```r mod <- mvgam(y ~ s(season, bs = 'cc', k = 12) + * s(season, series, k = 6, bs = 'fs'), data = data, family = poisson()) ``` Series-level .emphasize[*deviation*] smooths of seasonality, which all share a common smoothing penalty These are individual-level smooths that capture how each series' seasonal pattern differs from the shared smooth - There are a number of ways to do this using splines - See [Pedersen et al 2019](https://peerj.com/articles/6876/) for useful guidance --- ## Conditional predictions <img src="lecture_5_slidedeck_files/figure-html/unnamed-chunk-10-1.svg" style="display: block; margin: auto;" /> --- ## Hierarchical GAMs (HGAMs) By decomposing the linear predictor into a set of additive smooths, HGAMs offer a number of very useful ways to model multivariate data Each of these strategies allows us to: - Learn smooth functions for each series using data from .emphasize[*all series in the data*] - Regularize functions when using noisy and/or sparse data - Make predictions for .emphasize[*new series that haven't been measured yet*] See [`?mgcv::factor.smooth.interaction`](https://stat.ethz.ch/R-manual/R-devel/library/mgcv/html/factor.smooth.html) for more details --- class: middle center ### But what if we have smooths of two or more covariates? Can we still learn these hierarchically? --- ## Hierarchical smooth interactions ```r mvgam(y ~ * te(temp, month, k = c(4, 4)) + * te(temp, month, k = c(4, 4), by = series), family = gaussian(), data = plankton_train, newdata = plankton_test, trend_model = 'None') ``` Here each series is given a [tensor product smooth](https://onlinelibrary.wiley.com/doi/10.1111/j.1541-0420.2006.00574.x) of two covariates But we learn the series-specific smooth as a deviation from the "shared" smooth More on this example in [Tutorial 4](https://nicholasjclark.github.io/physalia-forecasting-course/day4/tutorial_4_physalia) --- ## Hierarchical distributed lags ```r mvgam(y ~ s(ndvi_ma12, trend, bs = 're') + * te(mintemp, lag, k = c(3, 4), bs = c('tp', 'cr')) + * te(mintemp, lag, by = weights_dm, k = c(3, 4), bs = c('tp', 'cr')) + * te(mintemp, lag, by = weights_do, k = c(3, 4), bs = c('tp', 'cr')) + * te(mintemp, lag, by = weights_ot, k = c(3, 4), bs = c('tp', 'cr')) + * te(mintemp, lag, by = weights_pp, k = c(3, 4), bs = c('tp', 'cr')), ...) ``` Here the effect of minimum temperature changes smoothly over lags for each series, using the `by` argument with a set of series-specific weights for deviations More on this type of example [in this recent blogpost](https://ecogambler.netlify.app/blog/distributed-lags-mgcv/) --- class: middle center ### HGAMs offer a solution to estimate the hierarchical, nonlinear effects that we think are common in ecology <br> ### This is a huge advantage over traditional time series models <br> ### But how can we handle multivariate *dynamic components*? --- class: inverse middle center big-subsection # Live code example --- class: inverse middle center big-subsection # Vector autoregressive processes --- ## VAR1 Similar to an AR1, but the response is now a .emphasize[*vector*] <br/> <br/> `\begin{align*} x_t & \sim \text{Normal}(A * x_{t-1}, \Sigma) \\ \end{align*}` Where: - `\(x_t\)` is a vector of time series observations at time `\(t\)` - `\(\Sigma\)` determines the spread (or flexibility) of the process and any correlations among time series errors - `\(A\)` is a matrix of coefficients estimating lagged dependence and cross-dependence between the elements of `\(x\)` --- background-image: url('./resources/VAR.svg') background-size: contain ## VAR1 --- ## Properties of a VAR1 If off-diagonals in both `\(A\)` and `\(\Sigma\)` = 0, process is an AR1 If off-diagonals in `\(A\)`, but not in `\(\Sigma\)`, = 0, process is an AR1 with correlated errors If `\(A\)` has no zero entries, the process can quickly become .emphasize[*nonstationary*], leading to explosive forecasts --- ## Advantages of VARs They allow us to explore whether one response variable can be useful in predicting another, giving us a means to tackle .emphasize[*interactions*] They allow for impulse response analyses, where the response of one variable to a sudden but temporary change in another variable can be predicted They allow us to identify which other responses are most influential in driving our uncertainty in a focal response --- ## Some simulated data <img src="lecture_5_slidedeck_files/figure-html/unnamed-chunk-14-1.svg" style="display: block; margin: auto;" /> --- ## Latent VAR1s in `mvgam` 📦 ```r varmod <- mvgam(y ~ 1, * trend_model = VAR(), data = data_train, newdata = data_test, family = gaussian()) ``` If multiple series are included in the data, we can use a VAR1 to estimate latent dynamics - `trend_model = VAR()`: a VAR1 with uncorrelated process errors (off-diagonals in `\(\Sigma\)` set to zero) - `trend_model = VAR(cor = TRUE)`: a VAR1 with possibly correlated process errors --- ## Enforcing stationarity Forecasts from VAR models, especially those with many series, can very quickly become nonstationary Not only will forecast variance increase indefinitely, but dynamics can become unstable and forecasts can .emphasize[*explode*] `mvgam` 📦 enforces stationarity in all VAR1 models using careful prior choices - Maps the process to unconstrained partial autocorrelations, which can be constrained to preserve stationarity - Derived and explained in detail by Sarah Heaps [here](https://www.youtube.com/watch?v=W7rUEvQKyus) and [here](https://www.tandfonline.com/doi/full/10.1080/10618600.2022.2079648) --- ## Trend estimates <img src="lecture_5_slidedeck_files/figure-html/unnamed-chunk-16-1.svg" style="display: block; margin: auto;" /> --- ## Hindasts / forecasts <img src="lecture_5_slidedeck_files/figure-html/unnamed-chunk-17-1.svg" style="display: block; margin: auto;" /> --- class: middle center ### VAR models give us a tool to capture complex multi-series dynamics while providing sensible forecasts <br> ### But often we have enough series that estimating VAR parameters becomes intractible <br> ### How can we achieve dimension reduction when forecasting multiple series? --- class: inverse middle center big-subsection # Dynamic factor models --- background-image: url('./resources/df_with_series.gif') ## Inducing multiseries correlations --- ## Dynamic factors Propagate a set of latent factors and allow each observed series to .emphasize[*depend*] on these factors with a set of weights <br/> <br/> `\begin{align*} x_t & = \theta*z_t \\ \end{align*}` Where: - `\(x_t\)` is a vector of time series observations at time `\(t\)` - `\(z_t\)` is a vector of dynamic factor estimates at time `\(t\)` - `\(\theta\)` is a matrix of loading coefficients that control how each series in `\(x\)` depends on the factors in `\(z\)` --- class: middle center ### Dynamic factor models are hugely flexible <br> ### The factors can take on a number of possible time series dynamics (RW, AR, GP) <br> ### Similar loadings ⇨ correlated trends --- ## Dynamic factors in `mvgam` 📦 ```r dfmod <- mvgam(y ~ 1, * use_lv = TRUE, * n_lv = 2, trend_model = AR(), data = data_train, newdata = data_test, family = gaussian()) ``` If multiple series are included in the data, we can use a dynamic factor model to estimate latent dynamics - `n_lv`: the number of latent factors to estimate - `trend_model`: can be `RW()`, `AR(p = 1, 2, or 3)` or `GP()` - variance parameters for all factors fixed to ensure identifiability --- ## Trend estimates <img src="lecture_5_slidedeck_files/figure-html/unnamed-chunk-19-1.svg" style="display: block; margin: auto;" /> --- ## Hindasts / forecasts <img src="lecture_5_slidedeck_files/figure-html/unnamed-chunk-20-1.svg" style="display: block; margin: auto;" /> --- `plot_mvgam_factors(dfmod)` <img src="lecture_5_slidedeck_files/figure-html/unnamed-chunk-21-1.svg" style="display: block; margin: auto;" /> --- `mcmc_hist(dfmod$model_output, regex_pars = 'lv_coefs')` <img src="lecture_5_slidedeck_files/figure-html/unnamed-chunk-22-1.svg" style="display: block; margin: auto;" /> --- ## Induced correlations ```r mean_corrs <- lv_correlations(object = dfmod)$mean_correlations mean_corrs ``` <table class=" lightable-minimal" style='color: black; font-family: "Trebuchet MS", verdana, sans-serif; width: auto !important; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> series_1 </th> <th style="text-align:right;"> series_2 </th> <th style="text-align:right;"> series_3 </th> <th style="text-align:right;"> series_4 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> series_1 </td> <td style="text-align:right;"> 1.00 </td> <td style="text-align:right;"> -0.92 </td> <td style="text-align:right;"> 1.00 </td> <td style="text-align:right;"> -0.97 </td> </tr> <tr> <td style="text-align:left;"> series_2 </td> <td style="text-align:right;"> -0.92 </td> <td style="text-align:right;"> 1.00 </td> <td style="text-align:right;"> -0.92 </td> <td style="text-align:right;"> 0.83 </td> </tr> <tr> <td style="text-align:left;"> series_3 </td> <td style="text-align:right;"> 1.00 </td> <td style="text-align:right;"> -0.92 </td> <td style="text-align:right;"> 1.00 </td> <td style="text-align:right;"> -0.96 </td> </tr> <tr> <td style="text-align:left;"> series_4 </td> <td style="text-align:right;"> -0.97 </td> <td style="text-align:right;"> 0.83 </td> <td style="text-align:right;"> -0.96 </td> <td style="text-align:right;"> 1.00 </td> </tr> </tbody> </table> --- `heatmap(mean_corrs, distfun = function(c) as.dist(1 - c))` <img src="lecture_5_slidedeck_files/figure-html/unnamed-chunk-25-1.svg" style="display: block; margin: auto;" /> --- class: middle center ### We have many ways to estimate multivariate dynamic components in the `mvgam` framework <br> ### VARs can approximate Granger causality and allow for targeted hypotheses about interactions to be evaluated; Dynamic factors can capture time series correlations with fewer parameters <br> ### But how do we go about *evaluating* forecasts to choose among models? --- class: inverse middle center big-subsection # Live code example --- class: inverse middle center big-subsection # Multivariate forecast evaluation --- ## Energy score Generalizes the CRPS for multivariate forecasts `$$ES(F,y)= \frac{1}{m}\sum_{i=1}^{m}||F_i - y||-\frac{1}{2m^2}\sum_{i=1}^{m}\sum_{j=1}^{m}||F_i-F_j||$$` Where: - `\(F(\hat{y})\)` is a set of `\(m\)` samples from the forecast distribution - `\(||\cdot||\)` is the Euclidean norm Essentially a weighted distance between the forecast distribution and distribution of observations --- ## Compare energy scores .panelset[ .panel[.panel-name[Code] ```r # score forecasts from each model and # compute differences (VAR scores - DF scores) fc_var <- forecast(varmod); fc_df <- forecast(dfmod) diff_scores <- score(fc_var, score = 'energy')$all_series$score - score(fc_df, score = 'energy')$all_series$score # plot the differences (negative means VAR1 was better; # positive means DF was better) plot(diff_scores, pch = 16, col = 'darkred', ylim = c(-1*max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE)), bty = 'l', xlab = 'Forecast horizon', ylab = expression(energy[VAR1]~-~energy[DF])) abline(h = 0, lty = 'dashed', lwd = 2) ``` ] .panel[.panel-name[Plot] .center[![](lecture_5_slidedeck_files/figure-html/energy_comp-1.svg)] ] ] --- class: middle center ### Energy score gernealizes to the univariate CRPS; is readily applicable to a variety of forecasts (ensemble, samples) <br> ### May not be able to tell us whether a forecast misses correlation structures that are evident in out of sample observations <br> ### We may also need a score that penalizes forecasts which do not replicate the observed correlation structure --- ## Variogram score Asks if forecast distribution captures .emphasize[*correlation structure*] evident in observations `$$Variogram(F,y)= \sum_{i,j=1}^{d}w_{ij}(\sqrt{|y_i-y_j|}-\sqrt{|F_i-F_j|})^2$$` Where: - `\(F(\hat{y})\)` is the mean (or median) forecast for the `\(d\)` series - `\(w\)` is a set of non-negative weights Penalizes forecasts whose variograms don't match the observed variogram --- ## Compare variogram scores .panelset[ .panel[.panel-name[Code] ```r # score forecasts from each model and # compute differences (VAR scores - DF scores) fc_var <- forecast(varmod); fc_df <- forecast(dfmod) diff_scores <- score(fc_var, score = 'variogram')$all_series$score - score(fc_df, score = 'variogram')$all_series$score # plot the differences (negative means VAR1 was better; # positive means DF was better) plot(diff_scores, pch = 16, col = 'darkred', ylim = c(-1*max(abs(diff_scores), na.rm = TRUE), max(abs(diff_scores), na.rm = TRUE)), bty = 'l', xlab = 'Forecast horizon', ylab = expression(variogram[VAR1]~-~variogram[DF])) abline(h = 0, lty = 'dashed', lwd = 2) ``` ] .panel[.panel-name[Plot] .center[![](lecture_5_slidedeck_files/figure-html/var_comp-1.svg)] ] ] --- class: middle center ### The variogram score is more useful for directly asking if a forecast distribution resembles the dependency structure in the observations <br> ### But it only uses a summary of the forecast (i.e. the mean or median forecast for each series), so does not address the sharpness or calibration of the forecast <br> ### Weighted combinations of variogram and energy scores may be necessary to fully characterize the performance of a forecast --- ## In the next lecture, we will cover Extended practical examples using `mvgam` Review and open discussion