Overview of the mvgam package
Nicholas J Clark
2024-12-16
Source:vignettes/mvgam_overview.Rmd
mvgam_overview.Rmd
The purpose of this vignette is to give a general overview of the
mvgam
package and its primary functions.
Dynamic GAMs
mvgam
is designed to propagate unobserved temporal
processes to capture latent dynamics in the observed time series. This
works in a state-space format, with the temporal trend evolving
independently of the observation process. An introduction to the package
and some worked examples are also shown in this seminar: Ecological Forecasting with Dynamic Generalized Additive
Models. Briefly, assume \(\tilde{\boldsymbol{y}}_{i,t}\) is the
conditional expectation of response variable \(\boldsymbol{i}\) at time \(\boldsymbol{t}\). Assuming \(\boldsymbol{y_i}\) is drawn from an
exponential distribution with an invertible link function, the linear
predictor for a multivariate Dynamic GAM can be written as:
\[for~i~in~1:N_{series}~...\] \[for~t~in~1:N_{timepoints}~...\]
\[g^{-1}(\tilde{\boldsymbol{y}}_{i,t})=\alpha_{i}+\sum\limits_{j=1}^J\boldsymbol{s}_{i,j,t}\boldsymbol{x}_{j,t}+\boldsymbol{Z}\boldsymbol{z}_{k,t}\,,\]
Here \(\alpha\) are the unknown
intercepts, the \(\boldsymbol{s}\)’s
are unknown smooth functions of covariates (\(\boldsymbol{x}\)’s), which can potentially
vary among the response series, and \(\boldsymbol{z}\) are dynamic latent
processes. Each smooth function \(\boldsymbol{s_j}\) is composed of basis
expansions whose coefficients, which must be estimated, control the
functional relationship between \(\boldsymbol{x}_{j}\) and \(g^{-1}(\tilde{\boldsymbol{y}})\). The size
of the basis expansion limits the smooth’s potential complexity. A
larger set of basis functions allows greater flexibility. For more
information on GAMs and how they can smooth through data, see this blogpost on how to interpret nonlinear effects from
Generalized Additive Models. Latent processes are captured with
\(\boldsymbol{Z}\boldsymbol{z}_{i,t}\),
where \(\boldsymbol{Z}\) is an \(i~by~k\) matrix of loading coefficients
(which can be fixed or a combination of fixed and freely estimated
parameters) and \(\boldsymbol{z}_{k,t}\) are a set of \(K\) latent factors that can also include
their own GAM linear predictors (see the State-Space
models vignette), the N-mixtures
vignette and the example in jsdgam
to get an idea of how flexible these processes can be.
Several advantages of GAMs are that they can model a diversity of
response families, including discrete distributions (i.e. Poisson,
Negative Binomial, Gamma) that accommodate common ecological features
such as zero-inflation or overdispersion, and that they can be
formulated to include hierarchical smoothing for multivariate responses.
mvgam
supports a number of different observation families,
which are summarized below:
Supported observation families
Distribution | Function | Support | Extra parameter(s) |
---|---|---|---|
Gaussian (identity link) | gaussian() |
Real values in \((-\infty, \infty)\) | \(\sigma\) |
Student’s T (identity link) | student-t() |
Heavy-tailed real values in \((-\infty, \infty)\) | \(\sigma\), \(\nu\) |
LogNormal (identity link) | lognormal() |
Positive real values in \([0, \infty)\) | \(\sigma\) |
Gamma (log link) | Gamma() |
Positive real values in \([0, \infty)\) | \(\alpha\) |
Beta (logit link) | betar() |
Real values (proportional) in \([0,1]\) | \(\phi\) |
Bernoulli (logit link) | bernoulli() |
Binary data in \({0,1}\) | - |
Poisson (log link) | poisson() |
Non-negative integers in \((0,1,2,...)\) | - |
Negative Binomial2 (log link) | nb() |
Non-negative integers in \((0,1,2,...)\) | \(\phi\) |
Binomial (logit link) | binomial() |
Non-negative integers in \((0,1,2,...)\) | - |
Beta-Binomial (logit link) | beta_binomial() |
Non-negative integers in \((0,1,2,...)\) | \(\phi\) |
Poisson Binomial N-mixture (log link) | nmix() |
Non-negative integers in \((0,1,2,...)\) | - |
For all supported observation families, any extra parameters that
need to be estimated (i.e. the \(\sigma\) in a Gaussian model or the \(\phi\) in a Negative Binomial model) are by
default estimated independently for each series. However, users can opt
to force all series to share extra observation parameters using
share_obs_params = TRUE
in mvgam()
. Note that
default link functions cannot currently be changed.
Supported temporal dynamic processes
As stated above, the latent processes can take a wide variety of
forms, some of which can be multivariate to allow the different
observational variables to interact or be correlated. When using the
mvgam()
function, the user chooses between different
process models with the trend_model
argument. Available
process models are described in detail below.
Correlated multivariate processes
If more than one observational unit (usually referred to as ‘series’)
is included in data
\((N_{series}
> 1)\), use trend_model = ZMVN()
to set up a
model where the outcomes for different observational units may be
correlated according to:
\[\begin{align*} z_{t} & \sim \text{MVNormal}(0, \Sigma) \end{align*}\]
The covariance matrix \(\Sigma\)
will capture potentially correlated process errors. It is parameterised
using a Cholesky factorization, which requires priors on the
series-level variances \(\sigma\) and
on the strength of correlations using Stan
’s
lkj_corr_cholesky
distribution. Note that this
trend_model
does not assume that measurements occur over
time, as users can specify what variable in the
data
represents the unit of analysis (i.e. outcomes could
be counts of different species across different sites
or regions, for example; see `?ZMVN()
for guidelines).
Independent Random Walks
Use trend_model = 'RW'
or
trend_model = RW()
to set up a model where each series in
data
has independent latent temporal dynamics of the
form:
\[\begin{align*} z_{i,t} & \sim \text{Normal}(z_{i,t-1}, \sigma_i) \end{align*}\]
Process error parameters \(\sigma\)
are modeled independently for each series. If a moving average process
is required, use trend_model = RW(ma = TRUE)
to set up the
following:
\[\begin{align*} z_{i,t} & = z_{i,t-1} + \theta_i * error_{i,t-1} + error_{i,t} \\ error_{i,t} & \sim \text{Normal}(0, \sigma_i) \end{align*}\]
Moving average coefficients \(\theta\) are independently estimated for each series and will be forced to be stationary by default \((abs(\theta)<1)\). Only moving averages of order \(q=1\) are currently allowed.
Multivariate Random Walks
If more than one series is included in data
\((N_{series} > 1)\), a multivariate
Random Walk can be set up using
trend_model = RW(cor = TRUE)
, resulting in the
following:
\[\begin{align*} z_{t} & \sim \text{MVNormal}(z_{t-1}, \Sigma) \end{align*}\]
Where the latent process estimate \(z_t\) now takes the form of a vector. The
covariance matrix \(\Sigma\) will
capture contemporaneously correlated process errors. It is parameterised
using a Cholesky factorization, which requires priors on the
series-level variances \(\sigma\) and
on the strength of correlations using Stan
’s
lkj_corr_cholesky
distribution.
Moving average terms can also be included for multivariate random walks, in which case the moving average coefficients \(\theta\) will be parameterised as an \(N_{series} * N_{series}\) matrix
Autoregressive processes
Autoregressive models up to \(p=3\),
in which the autoregressive coefficients are estimated independently for
each series, can be used by specifying trend_model = 'AR1'
,
trend_model = 'AR2'
, trend_model = 'AR3'
, or
trend_model = AR(p = 1, 2, or 3)
. For example, a univariate
AR(1) model takes the form:
\[\begin{align*} z_{i,t} & \sim \text{Normal}(ar1_i * z_{i,t-1}, \sigma_i) \end{align*}\]
All options are the same as for Random Walks, but additional options
will be available for placing priors on the autoregressive coefficients.
By default, these coefficients will not be forced into stationarity, but
users can impose this restriction by changing the upper and lower bounds
on their priors. See ?get_mvgam_priors
for more
details.
Vector Autoregressive processes
A Vector Autoregression of order \(p=1\) can be specified if \(N_{series} > 1\) using
trend_model = 'VAR1'
or trend_model = VAR()
. A
VAR(1) model takes the form:
\[\begin{align*} z_{t} & \sim \text{Normal}(A * z_{t-1}, \Sigma) \end{align*}\]
Where \(A\) is an \(N_{series} * N_{series}\) matrix of
autoregressive coefficients in which the diagonals capture lagged
self-dependence (i.e. the effect of a process at time \(t\) on its own estimate at time \(t+1\)), while off-diagonals capture lagged
cross-dependence (i.e. the effect of a process at time \(t\) on the process for another series at
time \(t+1\)). By default, the
covariance matrix \(\Sigma\) will
assume no process error covariance by fixing the off-diagonals to \(0\). To allow for correlated errors, use
trend_model = 'VAR1cor'
or
trend_model = VAR(cor = TRUE)
. A moving average of order
\(q=1\) can also be included using
trend_model = VAR(ma = TRUE, cor = TRUE)
.
Note that for all VAR models, stationarity of the process is enforced with a structured prior distribution that is described in detail in Heaps 2022
Heaps, Sarah E. “Enforcing stationarity through the prior in vector autoregressions.” Journal of Computational and Graphical Statistics 32.1 (2023): 74-83.
Hierarchical processes
Several of the above-mentioned trend_model
options can
be modified to account for grouping structures in data
by
setting up hierarchical latent processes. If an optional grouping
variable (gr
; which must be a factor
in the
supplied data
) exists, users can model hierarchical
residual correlation structures. where the residual correlations for a
specific level of gr
are modelled hierarchically:
\[\begin{align*} \Omega_{group} & = \alpha_{cor}\Omega_{global} + (1 - \alpha_{cor})\Omega_{group, local} \end{align*}\]
where \(\Omega_{global}\) is a
global correlation matrix, \(\Omega_{group, local}\) is a local
deviation correlation matrix and \(\alpha_{cor}\) is a weighting parameter
controlling how strongly the local correlation matrix \(\Omega_{group}\) (i.e. the derived
correlation matrix that will be used for each level of the grouping
factor gr
) is shrunk towards the global correlation matrix
\(\Omega_{global}\) (larger values of
\(\alpha_{cor}\) indicate a greater
degree of shrinkage, i.e. a greater degree of partial pooling). This
option is valuable for many types of designs where the same
observational units (i.e. financial assets or species,
for example) are measured in different strata (i.e. regions,
countries or experimental units, for example).
Currently hierarchical correlations can be included for
AR()
, VAR()
or ZMVN()
trend_model
options.
Gaussian Processes
The final option for modelling temporal dynamics is to use a Gaussian
Process with squared exponential kernel. These are set up independently
for each series (there is currently no multivariate GP option), using
trend_model = 'GP'
. The dynamics for each latent process
are modelled as:
\[\begin{align*} z & \sim \text{MVNormal}(0, \Sigma_{error}) \\ \Sigma_{error}[t_i, t_j] & = \alpha^2 * exp(-0.5 * ((|t_i - t_j| / \rho))^2) \end{align*}\]
The latent dynamic process evolves from a complex, high-dimensional
Multivariate Normal distribution which depends on \(\rho\) (often called the length scale
parameter) to control how quickly the correlations between the model’s
errors decay as a function of time. For these models, covariance decays
exponentially fast with the squared distance (in time) between the
observations. The functions also depend on a parameter \(\alpha\), which controls the marginal
variability of the temporal function at all points; in other words it
controls how much the GP term contributes to the linear predictor.
mvgam
capitalizes on some advances that allow GPs to be
approximated using Hilbert space basis functions, which considerably speed up computation at little cost to
accuracy or prediction performance.
Piecewise logistic and linear trends
Modeling growth for many types of time series is often similar to
modeling population growth in natural ecosystems, where there series
exhibits nonlinear growth that saturates at some particular carrying
capacity. The logistic trend model available in {mvgam
}
allows for a time-varying capacity \(C(t)\) as well as a non-constant growth
rate. Changes in the base growth rate \(k\) are incorporated by explicitly defining
changepoints throughout the training period where the growth rate is
allowed to vary. The changepoint vector \(a\) is represented as a vector of
1
s and 0
s, and the rate of growth at time
\(t\) is represented as \(k+a(t)^T\delta\). Potential changepoints
are selected uniformly across the training period, and the number of
changepoints, as well as the flexibility of the potential rate changes
at these changepoints, can be controlled using
trend_model = PW()
. The full piecewise logistic growth
model is then:
\[\begin{align*} z_t & = \frac{C_t}{1 + \exp(-(k+a(t)^T\delta)(t-(m+a(t)^T\gamma)))} \end{align*}\]
For time series that do not appear to exhibit saturating growth, a piece-wise constant rate of growth can often provide a useful trend model. The piecewise linear trend is defined as:
\[\begin{align*} z_t & = (k+a(t)^T\delta)t + (m+a(t)^T\gamma) \end{align*}\]
In both trend models, \(m\) is an offset parameter that controls the trend intercept. Because of this parameter, it is not recommended that you include an intercept in your observation formula because this will not be identifiable. You can read about the full description of piecewise linear and logistic trends in this paper by Taylor and Letham.
Sean J. Taylor and Benjamin Letham. “Forecasting at scale.” The American Statistician 72.1 (2018): 37-45.
Continuous time AR(1) processes
Most trend models in the mvgam()
function expect time to
be measured in regularly-spaced, discrete intervals (i.e. one
measurement per week, or one per year for example). But some time series
are taken at irregular intervals and we’d like to model autoregressive
properties of these. The trend_model = CAR()
can be useful
to set up these models, which currently only support autoregressive
processes of order 1
. The evolution of the latent dynamic
process follows the form:
\[\begin{align*} z_{i,t} & \sim \text{Normal}(ar1_i^{distance} * z_{i,t-1}, \sigma_i) \end{align*}\]
Where \(distance\) is a vector of
non-negative measurements of the time differences between successive
observations. See the Examples section in
?CAR
for an illustration of how to set these models up.
Regression formulae
mvgam
supports an observation model regression formula,
built off the mgcv
package, as well as an optional process
model regression formula. The formulae supplied to are exactly like
those supplied to glm()
except that smooth terms,
s()
, te()
, ti()
and
t2()
, time-varying effects using dynamic()
,
monotonically increasing (using s(x, bs = 'moi')
) or
decreasing splines (using s(x, bs = 'mod')
; see
?smooth.construct.moi.smooth.spec
for details), as well as
Gaussian Process functions using gp()
, can be added to the
right hand side (and .
is not supported in
mvgam
formulae). See ?mvgam_formulae
for more
guidance.
For setting up State-Space models, the optional process model formula can be used (see the State-Space model vignette and the shared latent states vignette for guidance on using trend formulae).
Example time series data
The ‘portal_data’ object contains time series of rodent captures from the Portal Project, a long-term monitoring study based near the town of Portal, Arizona. Researchers have been operating a standardized set of baited traps within 24 experimental plots at this site since the 1970’s. Sampling follows the lunar monthly cycle, with observations occurring on average about 28 days apart. However, missing observations do occur due to difficulties accessing the site (weather events, COVID disruptions etc…). You can read about the full sampling protocol in this preprint by Ernest et al on the Biorxiv.
data("portal_data")
As the data come pre-loaded with the mvgam
package, you
can read a little about it in the help page using
?portal_data
. Before working with data, it is important to
inspect how the data are structured, first using head
:
head(portal_data)
#> moon DM DO PP OT year month mintemp precipitation ndvi
#> 1 329 10 6 0 2 2004 1 -9.710 37.8 1.465889
#> 2 330 14 8 1 0 2004 2 -5.924 8.7 1.558507
#> 3 331 9 1 2 1 2004 3 -0.220 43.5 1.337817
#> 4 332 NA NA NA NA 2004 4 1.931 23.9 1.658913
#> 5 333 15 8 10 1 2004 5 6.568 0.9 1.853656
#> 6 334 NA NA NA NA 2004 6 11.590 1.4 1.761330
But the glimpse
function in dplyr
is also
useful for understanding how variables are structured
dplyr::glimpse(portal_data)
#> Rows: 199
#> Columns: 10
#> $ moon <int> 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 3…
#> $ DM <int> 10, 14, 9, NA, 15, NA, NA, 9, 5, 8, NA, 14, 7, NA, NA, 9…
#> $ DO <int> 6, 8, 1, NA, 8, NA, NA, 3, 3, 4, NA, 3, 8, NA, NA, 3, NA…
#> $ PP <int> 0, 1, 2, NA, 10, NA, NA, 16, 18, 12, NA, 3, 2, NA, NA, 1…
#> $ OT <int> 2, 0, 1, NA, 1, NA, NA, 1, 0, 0, NA, 2, 1, NA, NA, 1, NA…
#> $ year <int> 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 20…
#> $ month <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6,…
#> $ mintemp <dbl> -9.710, -5.924, -0.220, 1.931, 6.568, 11.590, 14.370, 16…
#> $ precipitation <dbl> 37.8, 8.7, 43.5, 23.9, 0.9, 1.4, 20.3, 91.0, 60.5, 25.2,…
#> $ ndvi <dbl> 1.4658889, 1.5585069, 1.3378172, 1.6589129, 1.8536561, 1…
We will focus analyses on the time series of captures for one specific rodent species, the Desert Pocket Mouse Chaetodipus penicillatus. This species is interesting in that it goes into a kind of “hibernation” during the colder months, leading to very low captures during the winter period
Manipulating data for modelling
Manipulating the data into a ‘long’ format is necessary for modelling
in mvgam
. By ‘long’ format, we mean that each
series x time
observation needs to have its own entry in
the dataframe
or list
object that we wish to
use as data for modelling. A simple example can be viewed by simulating
data using the sim_mvgam
function. See
?sim_mvgam
for more details
data <- sim_mvgam(n_series = 4, T = 24)
head(data$data_train, 12)
#> y season year series time
#> 1 1 1 1 series_1 1
#> 2 2 1 1 series_2 1
#> 3 1 1 1 series_3 1
#> 4 0 1 1 series_4 1
#> 5 0 2 1 series_1 2
#> 6 0 2 1 series_2 2
#> 7 0 2 1 series_3 2
#> 8 0 2 1 series_4 2
#> 9 0 3 1 series_1 3
#> 10 0 3 1 series_2 3
#> 11 0 3 1 series_3 3
#> 12 0 3 1 series_4 3
Notice how we have four different time series in these simulated
data, but we do not spread the outcome values into different columns.
Rather, there is only a single column for the outcome variable, labelled
y
in these simulated data. We also must supply a variable
labelled time
to ensure the modelling software knows how to
arrange the time series when building models. This setup still allows us
to formulate multivariate time series models, as you can see in the State-Space
vignette. Below are the steps needed to shape our
portal_data
object into the correct form. First, we create
a time
variable, select the column representing counts of
our target species (PP
), and select appropriate variables
that we can use as predictors
portal_data %>%
# mvgam requires a 'time' variable be present in the data to index
# the temporal observations. This is especially important when tracking
# multiple time series. In the Portal data, the 'moon' variable indexes the
# lunar monthly timestep of the trapping sessions
dplyr::mutate(time = moon - (min(moon)) + 1) %>%
# We can also provide a more informative name for the outcome variable, which
# is counts of the 'PP' species (Chaetodipus penicillatus) across all control
# plots
dplyr::mutate(count = PP) %>%
# The other requirement for mvgam is a 'series' variable, which needs to be a
# factor variable to index which time series each row in the data belongs to.
# Again, this is more useful when you have multiple time series in the data
dplyr::mutate(series = as.factor('PP')) %>%
# Select the variables of interest to keep in the model_data
dplyr::select(series, year, time, count, mintemp, ndvi) -> model_data
The data now contain six variables:series
, a factor indexing which time series each
observation belongs toyear
, the year of samplingtime
, the indicator of which time step each observation
belongs tocount
, the response variable representing the number of
captures of the species PP
in each sampling
observationmintemp
, the monthly average minimum temperature at each
time stepndvi
, the monthly average Normalized Difference Vegetation
Index at each time step
Now check the data structure again
head(model_data)
#> series year time count mintemp ndvi
#> 1 PP 2004 1 0 -9.710 1.465889
#> 2 PP 2004 2 1 -5.924 1.558507
#> 3 PP 2004 3 2 -0.220 1.337817
#> 4 PP 2004 4 NA 1.931 1.658913
#> 5 PP 2004 5 10 6.568 1.853656
#> 6 PP 2004 6 NA 11.590 1.761330
dplyr::glimpse(model_data)
#> Rows: 199
#> Columns: 6
#> $ series <fct> PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP…
#> $ year <int> 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 20…
#> $ time <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,…
#> $ count <int> 0, 1, 2, NA, 10, NA, NA, 16, 18, 12, NA, 3, 2, NA, NA, 13, NA,…
#> $ mintemp <dbl> -9.710, -5.924, -0.220, 1.931, 6.568, 11.590, 14.370, 16.520, …
#> $ ndvi <dbl> 1.4658889, 1.5585069, 1.3378172, 1.6589129, 1.8536561, 1.76132…
You can also summarize multiple variables, which is helpful to search for data ranges and identify missing values
summary(model_data)
#> series year time count mintemp
#> PP:199 Min. :2004 Min. : 1.0 Min. : 0.00 Min. :-24.000
#> 1st Qu.:2008 1st Qu.: 50.5 1st Qu.: 2.50 1st Qu.: -3.884
#> Median :2012 Median :100.0 Median :12.00 Median : 2.130
#> Mean :2012 Mean :100.0 Mean :15.14 Mean : 3.504
#> 3rd Qu.:2016 3rd Qu.:149.5 3rd Qu.:24.00 3rd Qu.: 12.310
#> Max. :2020 Max. :199.0 Max. :65.00 Max. : 18.140
#> NA's :36
#> ndvi
#> Min. :0.2817
#> 1st Qu.:1.0741
#> Median :1.3501
#> Mean :1.4709
#> 3rd Qu.:1.8178
#> Max. :3.9126
#>
We have some NA
s in our response variable
count
. These observations will generally be thrown out by
most modelling packages in . But as you will see when we work through
the tutorials, mvgam
keeps these in the data so that
predictions can be automatically returned for the full dataset. The time
series and some of its descriptive features can be plotted using
plot_mvgam_series()
:
plot_mvgam_series(data = model_data, series = 1, y = 'count')
GLMs with temporal random effects
Our first task will be to fit a Generalized Linear Model (GLM) that
can adequately capture the features of our count
observations (integer data, lower bound at zero, missing values) while
also attempting to model temporal variation. We are almost ready to fit
our first model, which will be a GLM with Poisson observations, a log
link function and random (hierarchical) intercepts for
year
. This will allow us to capture our prior belief that,
although each year is unique, having been sampled from the same
population of effects, all years are connected and thus might contain
valuable information about one another. This will be done by
capitalizing on the partial pooling properties of hierarchical models.
Hierarchical (also known as random) effects offer many advantages when
modelling data with grouping structures (i.e. multiple species,
locations, years etc…). The ability to incorporate these in time series
models is a huge advantage over traditional models such as ARIMA or
Exponential Smoothing. But before we fit the model, we will need to
convert year
to a factor so that we can use a random effect
basis in mvgam
. See ?smooth.terms
and
?smooth.construct.re.smooth.spec
for details about the
re
basis construction that is used by both
mvgam
and mgcv
model_data %>%
# Create a 'year_fac' factor version of 'year'
dplyr::mutate(year_fac = factor(year)) -> model_data
Preview the dataset to ensure year is now a factor with a unique factor level for each year in the data
dplyr::glimpse(model_data)
#> Rows: 199
#> Columns: 7
#> $ series <fct> PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, PP, P…
#> $ year <int> 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2…
#> $ time <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
#> $ count <int> 0, 1, 2, NA, 10, NA, NA, 16, 18, 12, NA, 3, 2, NA, NA, 13, NA…
#> $ mintemp <dbl> -9.710, -5.924, -0.220, 1.931, 6.568, 11.590, 14.370, 16.520,…
#> $ ndvi <dbl> 1.4658889, 1.5585069, 1.3378172, 1.6589129, 1.8536561, 1.7613…
#> $ year_fac <fct> 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2…
levels(model_data$year_fac)
#> [1] "2004" "2005" "2006" "2007" "2008" "2009" "2010" "2011" "2012" "2013"
#> [11] "2014" "2015" "2016" "2017" "2018" "2019" "2020"
We are now ready for our first mvgam
model. The syntax
will be familiar to users who have previously built models with
mgcv
. But for a refresher, see ?formula.gam
and the examples in ?gam
. Random effects can be specified
using the s
wrapper with the re
basis. Note
that we can also suppress the primary intercept using the usual
R
formula syntax - 1
. mvgam
has a
number of possible observation families that can be used, see
?mvgam_families
for more information. We will use
Stan
as the fitting engine, which deploys Hamiltonian Monte
Carlo (HMC) for full Bayesian inference. By default, 4 HMC chains will
be run using a warmup of 500 iterations and collecting 500 posterior
samples from each chain. The package will also aim to use the
Cmdstan
backend when possible, so it is recommended that
users have an up-to-date installation of Cmdstan
and the
associated cmdstanr
interface on their machines (note that
you can set the backend yourself using the backend
argument: see ?mvgam
for details). Interested users should
consult the Stan
user’s guide for more information
about the software and the enormous variety of models that can be
tackled with HMC.
The model can be described mathematically for each timepoint \(t\) as follows: \[\begin{align*} \boldsymbol{count}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = \beta_{year[year_t]} \\ \beta_{year} & \sim \text{Normal}(\mu_{year}, \sigma_{year}) \end{align*}\]
Where the \(\beta_{year}\) effects
are drawn from a population distribution that is parameterized
by a common mean \((\mu_{year})\) and
variance \((\sigma_{year})\). Priors on
most of the model parameters can be interrogated and changed using
similar functionality to the options available in brms
. For
example, the default priors on \((\mu_{year})\) and \((\sigma_{year})\) can be viewed using the
following code:
get_mvgam_priors(count ~ s(year_fac, bs = 're') - 1,
family = poisson(),
data = model_data)
#> param_name param_length param_info
#> 1 vector[1] mu_raw; 1 s(year_fac) pop mean
#> 2 vector<lower=0>[1] sigma_raw; 1 s(year_fac) pop sd
#> prior example_change
#> 1 mu_raw ~ std_normal(); mu_raw ~ normal(-0.28, 0.5);
#> 2 sigma_raw ~ student_t(3, 0, 2.5); sigma_raw ~ exponential(0.16);
#> new_lowerbound new_upperbound
#> 1 NA NA
#> 2 NA NA
See examples in ?get_mvgam_priors
to find out different
ways that priors can be altered. Once the model has finished, the first
step is to inspect the summary
to ensure no major
diagnostic warnings have been produced and to quickly summarise
posterior distributions for key parameters
summary(model1)
#> GAM formula:
#> count ~ s(year_fac, bs = "re") - 1
#>
#> Family:
#> poisson
#>
#> Link function:
#> log
#>
#> Trend model:
#> None
#>
#> N series:
#> 1
#>
#> N timepoints:
#> 199
#>
#> Status:
#> Fitted using Stan
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1
#> Total post-warmup draws = 2000
#>
#>
#> GAM coefficient (beta) estimates:
#> 2.5% 50% 97.5% Rhat n_eff
#> s(year_fac).1 1.80 2.1 2.3 1.00 2540
#> s(year_fac).2 2.50 2.7 2.8 1.00 2867
#> s(year_fac).3 3.00 3.1 3.2 1.00 2947
#> s(year_fac).4 3.10 3.3 3.4 1.00 2811
#> s(year_fac).5 1.90 2.1 2.3 1.00 2625
#> s(year_fac).6 1.50 1.8 2.0 1.00 3346
#> s(year_fac).7 1.80 2.0 2.3 1.00 2690
#> s(year_fac).8 2.80 3.0 3.1 1.00 3911
#> s(year_fac).9 3.10 3.3 3.4 1.00 2377
#> s(year_fac).10 2.60 2.8 2.9 1.00 2601
#> s(year_fac).11 3.00 3.1 3.2 1.00 2507
#> s(year_fac).12 3.10 3.2 3.3 1.00 2478
#> s(year_fac).13 2.00 2.2 2.4 1.00 2289
#> s(year_fac).14 2.50 2.6 2.8 1.00 2799
#> s(year_fac).15 2.00 2.2 2.4 1.00 2406
#> s(year_fac).16 1.90 2.1 2.3 1.00 2723
#> s(year_fac).17 -0.27 1.0 1.9 1.01 400
#>
#> GAM group-level estimates:
#> 2.5% 50% 97.5% Rhat n_eff
#> mean(s(year_fac)) 2.10 2.40 2.8 1.02 182
#> sd(s(year_fac)) 0.45 0.69 1.1 1.02 262
#>
#> Approximate significance of GAM smooths:
#> edf Ref.df Chi.sq p-value
#> s(year_fac) 13.8 17 1627 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 0 of 2000 iterations ended with a divergence (0%)
#> 0 of 2000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#>
#> Samples were drawn using NUTS(diag_e) at Mon Dec 16 10:15:28 AM 2024.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#>
#> Use how_to_cite(model1) to get started describing this model
The diagnostic messages at the bottom of the summary show that the
HMC sampler did not encounter any problems or difficult posterior
spaces. This is a good sign. Posterior distributions for model
parameters can be extracted in any way that an object of class
brmsfit
can (see ?mvgam::mvgam_draws
for
details). For example, we can extract the coefficients related to the
GAM linear predictor (i.e. the \(\beta\)’s) into a data.frame
using:
beta_post <- as.data.frame(model1, variable = 'betas')
dplyr::glimpse(beta_post)
#> Rows: 2,000
#> Columns: 17
#> $ `s(year_fac).1` <dbl> 1.89769, 1.98366, 2.15896, 2.26582, 2.08635, 2.03994,…
#> $ `s(year_fac).2` <dbl> 2.76784, 2.68716, 2.79604, 2.61487, 2.68530, 2.65969,…
#> $ `s(year_fac).3` <dbl> 3.07994, 3.06957, 3.06070, 2.97259, 3.09135, 3.06799,…
#> $ `s(year_fac).4` <dbl> 3.20458, 3.26290, 3.26900, 3.34298, 3.23872, 3.26329,…
#> $ `s(year_fac).5` <dbl> 2.03800, 2.07135, 2.10599, 2.10248, 2.12629, 2.08648,…
#> $ `s(year_fac).6` <dbl> 1.72916, 1.76054, 1.61952, 1.83496, 1.80755, 1.60408,…
#> $ `s(year_fac).7` <dbl> 2.12459, 2.16718, 1.92828, 1.91316, 1.91632, 2.14008,…
#> $ `s(year_fac).8` <dbl> 2.85277, 2.99776, 3.02364, 3.07568, 3.06913, 2.85720,…
#> $ `s(year_fac).9` <dbl> 3.23542, 3.21788, 3.23683, 3.26629, 3.28237, 3.21785,…
#> $ `s(year_fac).10` <dbl> 2.73377, 2.74353, 2.80890, 2.76211, 2.74208, 2.78979,…
#> $ `s(year_fac).11` <dbl> 3.06019, 3.15316, 3.09879, 3.10128, 3.17136, 2.98653,…
#> $ `s(year_fac).12` <dbl> 3.19145, 3.25551, 3.25832, 3.14177, 3.27842, 3.16491,…
#> $ `s(year_fac).13` <dbl> 2.30975, 2.36029, 2.16900, 2.30780, 2.31025, 2.19012,…
#> $ `s(year_fac).14` <dbl> 2.54423, 2.68375, 2.67444, 2.58304, 2.62645, 2.64360,…
#> $ `s(year_fac).15` <dbl> 2.06438, 2.11249, 2.26017, 2.29234, 2.26032, 2.07294,…
#> $ `s(year_fac).16` <dbl> 2.17269, 2.20717, 1.93524, 2.08844, 2.00660, 2.11625,…
#> $ `s(year_fac).17` <dbl> 0.749502, 0.729834, 0.762282, -0.801137, 2.075410, 0.…
With any model fitted in mvgam
, the underlying
Stan
code can be viewed using the code
function:
code(model1)
#> // Stan model code generated by package mvgam
#> data {
#> int<lower=0> total_obs; // total number of observations
#> int<lower=0> n; // number of timepoints per series
#> int<lower=0> n_series; // number of series
#> int<lower=0> num_basis; // total number of basis coefficients
#> matrix[total_obs, num_basis] X; // mgcv GAM design matrix
#> array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
#> int<lower=0> n_nonmissing; // number of nonmissing observations
#> array[n_nonmissing] int<lower=0> flat_ys; // flattened nonmissing observations
#> matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
#> array[n_nonmissing] int<lower=0> obs_ind; // indices of nonmissing observations
#> }
#> parameters {
#> // raw basis coefficients
#> vector[num_basis] b_raw;
#>
#> // random effect variances
#> vector<lower=0>[1] sigma_raw;
#>
#> // random effect means
#> vector[1] mu_raw;
#> }
#> transformed parameters {
#> // basis coefficients
#> vector[num_basis] b;
#> b[1 : 17] = mu_raw[1] + b_raw[1 : 17] * sigma_raw[1];
#> }
#> model {
#> // prior for random effect population variances
#> sigma_raw ~ student_t(3, 0, 2.5);
#>
#> // prior for random effect population means
#> mu_raw ~ std_normal();
#>
#> // prior (non-centred) for s(year_fac)...
#> b_raw[1 : 17] ~ std_normal();
#> {
#> // likelihood functions
#> flat_ys ~ poisson_log_glm(flat_xs, 0.0, b);
#> }
#> }
#> generated quantities {
#> vector[total_obs] eta;
#> matrix[n, n_series] mus;
#> array[n, n_series] int ypred;
#>
#> // posterior predictions
#> eta = X * b;
#> for (s in 1 : n_series) {
#> mus[1 : n, s] = eta[ytimes[1 : n, s]];
#> ypred[1 : n, s] = poisson_log_rng(mus[1 : n, s]);
#> }
#> }
Plotting effects and residuals
Now for interrogating the model. We can get some sense of the
variation in yearly intercepts from the summary above, but it is easier
to understand them using targeted plots. Plot posterior distributions of
the temporal random effects using plot.mvgam
with
type = 're'
. See ?plot.mvgam
for more details
about the types of plots that can be produced from fitted
mvgam
objects
plot(model1, type = 're')
bayesplot
support
We can also capitalize on most of the useful MCMC plotting functions
from the bayesplot
package to visualize posterior
distributions and diagnostics (see ?mvgam::mcmc_plot.mvgam
for details):
mcmc_plot(object = model1,
variable = 'betas',
type = 'areas')
We can also use the wide range of posterior checking functions
available in bayesplot
(see
?mvgam::ppc_check.mvgam
for details):
pp_check(object = model1)
There is clearly some variation in these yearly intercept estimates.
But how do these translate into time-varying predictions? To understand
this, we can plot posterior hindcasts from this model for the training
period using plot.mvgam
with
type = 'forecast'
plot(model1, type = 'forecast')
If you wish to extract these hindcasts for other downstream analyses,
the hindcast
function can be used. This will return a list
object of class mvgam_forecast
. In the
hindcasts
slot, a matrix of posterior retrodictions will be
returned for each series in the data (only one series in our
example):
hc <- hindcast(model1)
str(hc)
#> List of 15
#> $ call :Class 'formula' language count ~ s(year_fac, bs = "re") - 1
#> .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
#> $ trend_call : NULL
#> $ family : chr "poisson"
#> $ trend_model : chr "None"
#> $ drift : logi FALSE
#> $ use_lv : logi FALSE
#> $ fit_engine : chr "stan"
#> $ type : chr "response"
#> $ series_names : chr "PP"
#> $ train_observations:List of 1
#> ..$ PP: int [1:199] 0 1 2 NA 10 NA NA 16 18 12 ...
#> $ train_times : num [1:199] 1 2 3 4 5 6 7 8 9 10 ...
#> $ test_observations : NULL
#> $ test_times : NULL
#> $ hindcasts :List of 1
#> ..$ PP: num [1:2000, 1:199] 7 11 7 8 8 5 8 14 10 13 ...
#> .. ..- attr(*, "dimnames")=List of 2
#> .. .. ..$ : NULL
#> .. .. ..$ : chr [1:199] "ypred[1,1]" "ypred[2,1]" "ypred[3,1]" "ypred[4,1]" ...
#> $ forecasts : NULL
#> - attr(*, "class")= chr "mvgam_forecast"
You can also extract these hindcasts on the linear predictor scale, which in this case is the log scale (our Poisson GLM used a log link function). Sometimes this can be useful for asking more targeted questions about drivers of variation:
In any regression analysis, a key question is whether the residuals
show any patterns that can be indicative of un-modelled sources of
variation. For GLMs, we can use a modified residual called the Dunn-Smyth,
or randomized quantile, residual. Inspect Dunn-Smyth residuals from
the model using plot.mvgam
with
type = 'residuals'
plot(model1, type = 'residuals')
Automatic forecasting for new data
These temporal random effects do not have a sense of “time”. Because
of this, each yearly random intercept is not restricted in some way to
be similar to the previous yearly intercept. This drawback becomes
evident when we predict for a new year. To do this, we can repeat the
exercise above but this time will split the data into training and
testing sets before re-running the model. We can then supply the test
set as newdata
. For splitting, we will make use of the
filter
function from dplyr
model_data %>%
dplyr::filter(time <= 160) -> data_train
model_data %>%
dplyr::filter(time > 160) -> data_test
model1b <- mvgam(count ~ s(year_fac, bs = 're') - 1,
family = poisson(),
data = data_train,
newdata = data_test)
We can view the test data in the forecast plot to see that the predictions do not capture the temporal variation in the test set
plot(model1b, type = 'forecast', newdata = data_test)
As with the hindcast
function, we can use the
forecast
function to automatically extract the posterior
distributions for these predictions. This also returns an object of
class mvgam_forecast
, but now it will contain both the
hindcasts and forecasts for each series in the data:
fc <- forecast(model1b)
str(fc)
#> List of 16
#> $ call :Class 'formula' language count ~ s(year_fac, bs = "re") - 1
#> .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
#> $ trend_call : NULL
#> $ family : chr "poisson"
#> $ family_pars : NULL
#> $ trend_model : chr "None"
#> $ drift : logi FALSE
#> $ use_lv : logi FALSE
#> $ fit_engine : chr "stan"
#> $ type : chr "response"
#> $ series_names : Factor w/ 1 level "PP": 1
#> $ train_observations:List of 1
#> ..$ PP: int [1:160] 0 1 2 NA 10 NA NA 16 18 12 ...
#> $ train_times : num [1:160] 1 2 3 4 5 6 7 8 9 10 ...
#> $ test_observations :List of 1
#> ..$ PP: int [1:39] NA 0 0 10 3 14 18 NA 28 46 ...
#> $ test_times : num [1:39] 161 162 163 164 165 166 167 168 169 170 ...
#> $ hindcasts :List of 1
#> ..$ PP: num [1:2000, 1:160] 7 6 12 7 5 10 7 8 6 8 ...
#> .. ..- attr(*, "dimnames")=List of 2
#> .. .. ..$ : NULL
#> .. .. ..$ : chr [1:160] "ypred[1,1]" "ypred[2,1]" "ypred[3,1]" "ypred[4,1]" ...
#> $ forecasts :List of 1
#> ..$ PP: num [1:2000, 1:39] 9 12 12 15 7 10 9 14 9 9 ...
#> .. ..- attr(*, "dimnames")=List of 2
#> .. .. ..$ : NULL
#> .. .. ..$ : chr [1:39] "ypred[161,1]" "ypred[162,1]" "ypred[163,1]" "ypred[164,1]" ...
#> - attr(*, "class")= chr "mvgam_forecast"
Adding predictors as “fixed” effects
Any users familiar with GLMs will know that we nearly always wish to
include predictor variables that may explain some of the variation in
our observations. Predictors are easily incorporated into GLMs / GAMs.
Here, we will update the model from above by including a parametric
(fixed) effect of ndvi
as a linear predictor:
model2 <- mvgam(count ~ s(year_fac, bs = 're') +
ndvi - 1,
family = poisson(),
data = data_train,
newdata = data_test)
The model can be described mathematically as follows: \[\begin{align*} \boldsymbol{count}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = \beta_{year[year_t]} + \beta_{ndvi} * \boldsymbol{ndvi}_t \\ \beta_{year} & \sim \text{Normal}(\mu_{year}, \sigma_{year}) \\ \beta_{ndvi} & \sim \text{Normal}(0, 1) \end{align*}\]
Where the \(\beta_{year}\) effects
are the same as before but we now have another predictor \((\beta_{ndvi})\) that applies to the
ndvi
value at each timepoint \(t\). Inspect the summary of this model
summary(model2)
#> GAM formula:
#> count ~ ndvi + s(year_fac, bs = "re") - 1
#>
#> Family:
#> poisson
#>
#> Link function:
#> log
#>
#> Trend model:
#> None
#>
#> N series:
#> 1
#>
#> N timepoints:
#> 199
#>
#> Status:
#> Fitted using Stan
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1
#> Total post-warmup draws = 2000
#>
#>
#> GAM coefficient (beta) estimates:
#> 2.5% 50% 97.5% Rhat n_eff
#> ndvi 0.32 0.39 0.46 1 2005
#> s(year_fac).1 1.10 1.40 1.70 1 2523
#> s(year_fac).2 1.80 2.00 2.20 1 2335
#> s(year_fac).3 2.20 2.40 2.60 1 2389
#> s(year_fac).4 2.30 2.50 2.70 1 2085
#> s(year_fac).5 1.20 1.40 1.70 1 2454
#> s(year_fac).6 1.00 1.30 1.50 1 2371
#> s(year_fac).7 1.10 1.40 1.70 1 2564
#> s(year_fac).8 2.10 2.30 2.40 1 2248
#> s(year_fac).9 2.70 2.90 3.00 1 1563
#> s(year_fac).10 2.00 2.20 2.40 1 2731
#> s(year_fac).11 2.30 2.40 2.60 1 1966
#> s(year_fac).12 2.50 2.70 2.80 1 2042
#> s(year_fac).13 1.40 1.60 1.90 1 2510
#> s(year_fac).14 0.67 2.00 3.20 1 1091
#> s(year_fac).15 0.46 2.00 3.40 1 1361
#> s(year_fac).16 0.59 2.00 3.20 1 1667
#> s(year_fac).17 0.68 2.00 3.30 1 1569
#>
#> GAM group-level estimates:
#> 2.5% 50% 97.5% Rhat n_eff
#> mean(s(year_fac)) 1.60 2.00 2.3 1.00 482
#> sd(s(year_fac)) 0.41 0.61 1.0 1.01 456
#>
#> Approximate significance of GAM smooths:
#> edf Ref.df Chi.sq p-value
#> s(year_fac) 11 17 254 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 1 of 2000 iterations ended with a divergence (0.05%)
#> *Try running with larger adapt_delta to remove the divergences
#> 0 of 2000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#>
#> Samples were drawn using NUTS(diag_e) at Mon Dec 16 10:16:24 AM 2024.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#>
#> Use how_to_cite(model2) to get started describing this model
Rather than printing the summary each time, we can also quickly look
at the posterior empirical quantiles for the fixed effect of
ndvi
(and other linear predictor coefficients) using
coef
:
coef(model2)
#> 2.5% 50% 97.5% Rhat n_eff
#> ndvi 0.3230276 0.3890805 0.4570898 1 2005
#> s(year_fac).1 1.1275795 1.4045450 1.6595450 1 2523
#> s(year_fac).2 1.8023990 2.0004850 2.2030892 1 2335
#> s(year_fac).3 2.1793462 2.3796100 2.5674402 1 2389
#> s(year_fac).4 2.3224095 2.5001250 2.6889207 1 2085
#> s(year_fac).5 1.1788270 1.4261800 1.6550987 1 2454
#> s(year_fac).6 1.0207870 1.2733500 1.5135682 1 2371
#> s(year_fac).7 1.1300977 1.4140150 1.6817643 1 2564
#> s(year_fac).8 2.0752695 2.2657500 2.4497997 1 2248
#> s(year_fac).9 2.7158347 2.8529150 2.9860237 1 1563
#> s(year_fac).10 1.9771693 2.1829050 2.3737013 1 2731
#> s(year_fac).11 2.2609400 2.4346700 2.6034545 1 1966
#> s(year_fac).12 2.5373095 2.6929550 2.8403870 1 2042
#> s(year_fac).13 1.3838902 1.6186450 1.8529940 1 2510
#> s(year_fac).14 0.6714289 1.9529550 3.2495730 1 1091
#> s(year_fac).15 0.4556091 1.9552750 3.4275588 1 1361
#> s(year_fac).16 0.5860934 1.9766900 3.2446817 1 1667
#> s(year_fac).17 0.6764054 1.9659850 3.2985817 1 1569
Look at the estimated effect of ndvi
using using a
histogram. This can be done by first extracting the posterior
coefficients:
beta_post <- as.data.frame(model2, variable = 'betas')
dplyr::glimpse(beta_post)
#> Rows: 2,000
#> Columns: 18
#> $ ndvi <dbl> 0.414028, 0.432178, 0.338904, 0.421866, 0.410825, 0.3…
#> $ `s(year_fac).1` <dbl> 1.43209, 1.24019, 1.38262, 1.40962, 1.22172, 1.23760,…
#> $ `s(year_fac).2` <dbl> 1.93211, 1.92994, 2.13627, 1.85369, 2.07502, 2.13946,…
#> $ `s(year_fac).3` <dbl> 2.35794, 2.29220, 2.55543, 2.39267, 2.28459, 2.32806,…
#> $ `s(year_fac).4` <dbl> 2.50069, 2.45566, 2.63147, 2.48173, 2.45837, 2.50887,…
#> $ `s(year_fac).5` <dbl> 1.57730, 1.31157, 1.54597, 1.28105, 1.52604, 1.53370,…
#> $ `s(year_fac).6` <dbl> 1.16172, 1.23905, 1.30011, 1.20797, 1.29448, 1.32374,…
#> $ `s(year_fac).7` <dbl> 1.32056, 1.35308, 1.43289, 1.37553, 1.32271, 1.59852,…
#> $ `s(year_fac).8` <dbl> 2.19325, 2.18904, 2.32978, 2.25644, 2.17717, 2.33082,…
#> $ `s(year_fac).9` <dbl> 2.76081, 2.81884, 2.88236, 2.85569, 2.81982, 2.88796,…
#> $ `s(year_fac).10` <dbl> 2.12438, 1.99728, 2.18540, 2.21379, 2.15305, 2.12597,…
#> $ `s(year_fac).11` <dbl> 2.42412, 2.39056, 2.47745, 2.32242, 2.45940, 2.41457,…
#> $ `s(year_fac).12` <dbl> 2.62501, 2.59292, 2.79199, 2.59333, 2.74426, 2.69922,…
#> $ `s(year_fac).13` <dbl> 1.73213, 1.44593, 1.53913, 1.73049, 1.40675, 1.55831,…
#> $ `s(year_fac).14` <dbl> 1.864030, 2.488070, 0.948617, 3.236940, 2.836550, 1.6…
#> $ `s(year_fac).15` <dbl> 1.199480, 1.447940, 2.373350, 2.533960, 2.300430, 1.9…
#> $ `s(year_fac).16` <dbl> 1.498060, 2.528860, 1.986840, 3.231060, 3.103550, 1.3…
#> $ `s(year_fac).17` <dbl> 1.205560, 1.371340, 2.767440, 1.800940, 1.846500, 2.3…
The posterior distribution for the effect of ndvi
is
stored in the ndvi
column. A quick histogram confirms our
inference that log(counts)
respond positively to increases
in ndvi
:
hist(beta_post$ndvi,
xlim = c(-1 * max(abs(beta_post$ndvi)),
max(abs(beta_post$ndvi))),
col = 'darkred',
border = 'white',
xlab = expression(beta[NDVI]),
ylab = '',
yaxt = 'n',
main = '',
lwd = 2)
abline(v = 0, lwd = 2.5)
marginaleffects
support
Given our model used a nonlinear link function (log link in this
example), it can still be difficult to fully understand what
relationship our model is estimating between a predictor and the
response. Fortunately, the marginaleffects
package makes
this relatively straightforward. Objects of class mvgam
can
be used with marginaleffects
to inspect contrasts,
scenario-based predictions, conditional and marginal effects, all on the
outcome scale. Like brms
, mvgam
has the simple
conditional_effects
function to make quick and informative
plots for main effects, which rely on marginaleffects
support. This will likely be your go-to function for quickly
understanding patterns from fitted mvgam
models
conditional_effects(model2)
Adding predictors as smooths
Smooth functions, using penalized splines, are a major feature of
mvgam
. Nonlinear splines are commonly viewed as variations
of random effects in which the coefficients that control the shape of
the spline are drawn from a joint, penalized distribution. This strategy
is very often used in ecological time series analysis to capture smooth
temporal variation in the processes we seek to study. When we construct
smoothing splines, the workhorse package mgcv
will
calculate a set of basis functions that will collectively control the
shape and complexity of the resulting spline. It is often helpful to
visualize these basis functions to get a better sense of how splines
work. We’ll create a set of 6 basis functions to represent possible
variation in the effect of time
on our outcome.In addition
to constructing the basis functions, mgcv
also creates a
penalty matrix \(S\), which contains
known coefficients that work to constrain the
wiggliness of the resulting smooth function. When fitting a GAM to data,
we must estimate the smoothing parameters (\(\lambda\)) that will penalize these
matrices, resulting in constrained basis coefficients and smoother
functions that are less likely to overfit the data. This is the key to
fitting GAMs in a Bayesian framework, as we can jointly estimate the
\(\lambda\)’s using informative priors
to prevent overfitting and expand the complexity of models we can
tackle. To see this in practice, we can now fit a model that replaces
the yearly random effects with a smooth function of time
.
We will need a reasonably complex function (large k
) to try
and accommodate the temporal variation in our observations. Following
some useful advice by Gavin Simpson, we will use a
b-spline basis for the temporal smooth. Because we no longer have
intercepts for each year, we also retain the primary intercept term in
this model (there is no -1
in the formula now):
model3 <- mvgam(count ~ s(time, bs = 'bs', k = 15) +
ndvi,
family = poisson(),
data = data_train,
newdata = data_test)
The model can be described mathematically as follows: \[\begin{align*} \boldsymbol{count}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = f(\boldsymbol{time})_t + \beta_{ndvi} * \boldsymbol{ndvi}_t \\ f(\boldsymbol{time}) & = \sum_{k=1}^{K}b * \beta_{smooth} \\ \beta_{smooth} & \sim \text{MVNormal}(0, (\Omega * \lambda)^{-1}) \\ \beta_{ndvi} & \sim \text{Normal}(0, 1) \end{align*}\]
Where the smooth function \(f_{time}\) is built by summing across a set
of weighted basis functions. The basis functions \((b)\) are constructed using a thin plate
regression basis in mgcv
. The weights \((\beta_{smooth})\) are drawn from a
penalized multivariate normal distribution where the precision matrix
\((\Omega\)) is multiplied by a
smoothing penalty \((\lambda)\). If
\(\lambda\) becomes large, this acts to
squeeze the covariances among the weights \((\beta_{smooth})\), leading to a less
wiggly spline. Note that sometimes there are multiple smoothing
penalties that contribute to the covariance matrix, but I am only
showing one here for simplicity. View the summary as before
summary(model3)
#> GAM formula:
#> count ~ s(time, bs = "bs", k = 15) + ndvi
#>
#> Family:
#> poisson
#>
#> Link function:
#> log
#>
#> Trend model:
#> None
#>
#> N series:
#> 1
#>
#> N timepoints:
#> 199
#>
#> Status:
#> Fitted using Stan
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1
#> Total post-warmup draws = 2000
#>
#>
#> GAM coefficient (beta) estimates:
#> 2.5% 50% 97.5% Rhat n_eff
#> (Intercept) 2.00 2.10 2.200 1.01 768
#> ndvi 0.26 0.33 0.400 1.01 733
#> s(time).1 -2.10 -1.10 0.075 1.00 463
#> s(time).2 0.47 1.30 2.400 1.01 338
#> s(time).3 -0.45 0.46 1.600 1.01 311
#> s(time).4 1.60 2.50 3.600 1.01 302
#> s(time).5 -1.10 -0.20 0.890 1.01 327
#> s(time).6 -0.59 0.38 1.600 1.01 334
#> s(time).7 -1.40 -0.51 0.630 1.01 315
#> s(time).8 0.62 1.50 2.600 1.01 311
#> s(time).9 1.20 2.10 3.300 1.01 308
#> s(time).10 -0.29 0.56 1.600 1.01 303
#> s(time).11 0.86 1.80 2.900 1.01 305
#> s(time).12 0.71 1.50 2.500 1.01 308
#> s(time).13 -1.10 -0.32 0.740 1.01 411
#> s(time).14 -7.50 -4.20 -1.300 1.01 448
#>
#> Approximate significance of GAM smooths:
#> edf Ref.df Chi.sq p-value
#> s(time) 8.92 14 70.6 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 0 of 2000 iterations ended with a divergence (0%)
#> 0 of 2000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#>
#> Samples were drawn using NUTS(diag_e) at Mon Dec 16 10:17:13 AM 2024.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#>
#> Use how_to_cite(model3) to get started describing this model
The summary above now contains posterior estimates for the smoothing
parameters as well as the basis coefficients for the nonlinear effect of
time
. We can visualize conditional_effects
as
before:
conditional_effects(model3, type = 'link')
Inspect the underlying Stan
code to gain some idea of
how the spline is being penalized:
code(model3)
#> // Stan model code generated by package mvgam
#> data {
#> int<lower=0> total_obs; // total number of observations
#> int<lower=0> n; // number of timepoints per series
#> int<lower=0> n_sp; // number of smoothing parameters
#> int<lower=0> n_series; // number of series
#> int<lower=0> num_basis; // total number of basis coefficients
#> vector[num_basis] zero; // prior locations for basis coefficients
#> matrix[total_obs, num_basis] X; // mgcv GAM design matrix
#> array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
#> matrix[14, 28] S1; // mgcv smooth penalty matrix S1
#> int<lower=0> n_nonmissing; // number of nonmissing observations
#> array[n_nonmissing] int<lower=0> flat_ys; // flattened nonmissing observations
#> matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
#> array[n_nonmissing] int<lower=0> obs_ind; // indices of nonmissing observations
#> }
#> parameters {
#> // raw basis coefficients
#> vector[num_basis] b_raw;
#>
#> // smoothing parameters
#> vector<lower=0>[n_sp] lambda;
#> }
#> transformed parameters {
#> // basis coefficients
#> vector[num_basis] b;
#> b[1 : num_basis] = b_raw[1 : num_basis];
#> }
#> model {
#> // prior for (Intercept)...
#> b_raw[1] ~ student_t(3, 2.6, 2.5);
#>
#> // prior for ndvi...
#> b_raw[2] ~ student_t(3, 0, 2);
#>
#> // prior for s(time)...
#> b_raw[3 : 16] ~ multi_normal_prec(zero[3 : 16],
#> S1[1 : 14, 1 : 14] * lambda[1]
#> + S1[1 : 14, 15 : 28] * lambda[2]);
#>
#> // priors for smoothing parameters
#> lambda ~ normal(5, 30);
#> {
#> // likelihood functions
#> flat_ys ~ poisson_log_glm(flat_xs, 0.0, b);
#> }
#> }
#> generated quantities {
#> vector[total_obs] eta;
#> matrix[n, n_series] mus;
#> vector[n_sp] rho;
#> array[n, n_series] int ypred;
#> rho = log(lambda);
#>
#> // posterior predictions
#> eta = X * b;
#> for (s in 1 : n_series) {
#> mus[1 : n, s] = eta[ytimes[1 : n, s]];
#> ypred[1 : n, s] = poisson_log_rng(mus[1 : n, s]);
#> }
#> }
The line below // prior for s(time)...
shows how the
spline basis coefficients are drawn from a zero-centred multivariate
normal distribution. The precision matrix \(S\) is penalized by two different smoothing
parameters (the \(\lambda\)’s) to
enforce smoothness and reduce overfitting
Latent dynamics in mvgam
Forecasts from the above model are not ideal:
plot(model3, type = 'forecast', newdata = data_test)
Why is this happening? The forecasts are driven almost entirely by variation in the temporal spline, which is extrapolating linearly forever beyond the edge of the training data. Any slight wiggles near the end of the training set will result in wildly different forecasts. To visualize this, we can plot the extrapolated temporal functions into the out-of-sample test set for the two models. Here are the extrapolated functions for the first model, with 15 basis functions:
plot_mvgam_smooth(model3, smooth = 's(time)',
# feed newdata to the plot function to generate
# predictions of the temporal smooth to the end of the
# testing period
newdata = data.frame(time = 1:max(data_test$time),
ndvi = 0))
abline(v = max(data_train$time), lty = 'dashed', lwd = 2)
This model is not doing well. Clearly we need to somehow account for
the strong temporal autocorrelation when modelling these data without
using a smooth function of time
. Now onto another prominent
feature of mvgam
: the ability to include (possibly latent)
autocorrelated residuals in regression models. To do so, we use the
trend_model
argument (see ?mvgam_trends
for
details of different dynamic trend models that are supported). This
model will use a separate sub-model for latent residuals that evolve as
an AR1 process (i.e. the error in the current time point is a function
of the error in the previous time point, plus some stochastic noise). We
also include a smooth function of ndvi
in this model,
rather than the parametric term that was used above, to showcase that
mvgam
can include combinations of smooths and dynamic
components:
model4 <- mvgam(count ~ s(ndvi, k = 6),
family = poisson(),
data = data_train,
newdata = data_test,
trend_model = 'AR1')
The model can be described mathematically as follows: \[\begin{align*} \boldsymbol{count}_t & \sim \text{Poisson}(\lambda_t) \\ log(\lambda_t) & = f(\boldsymbol{ndvi})_t + z_t \\ z_t & \sim \text{Normal}(ar1 * z_{t-1}, \sigma_{error}) \\ ar1 & \sim \text{Normal}(0, 1)[-1, 1] \\ \sigma_{error} & \sim \text{Exponential}(2) \\ f(\boldsymbol{ndvi}) & = \sum_{k=1}^{K}b * \beta_{smooth} \\ \beta_{smooth} & \sim \text{MVNormal}(0, (\Omega * \lambda)^{-1}) \end{align*}\]
Here the term \(z_t\) captures autocorrelated latent residuals, which are modelled using an AR1 process. You can also notice that this model is estimating autocorrelated errors for the full time period, even though some of these time points have missing observations. This is useful for getting more realistic estimates of the residual autocorrelation parameters. Summarise the model to see how it now returns posterior summaries for the latent AR1 process:
summary(model4)
#> GAM formula:
#> count ~ s(ndvi, k = 6)
#>
#> Family:
#> poisson
#>
#> Link function:
#> log
#>
#> Trend model:
#> AR1
#>
#> N series:
#> 1
#>
#> N timepoints:
#> 199
#>
#> Status:
#> Fitted using Stan
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1
#> Total post-warmup draws = 2000
#>
#>
#> GAM coefficient (beta) estimates:
#> 2.5% 50% 97.5% Rhat n_eff
#> (Intercept) 0.960 1.9000 2.400 1.05 62
#> s(ndvi).1 -0.210 -0.0090 0.077 1.02 152
#> s(ndvi).2 -0.150 0.0160 0.300 1.00 529
#> s(ndvi).3 -0.054 -0.0011 0.050 1.00 905
#> s(ndvi).4 -0.230 0.1200 1.400 1.01 168
#> s(ndvi).5 -0.100 0.1400 0.360 1.02 291
#>
#> Approximate significance of GAM smooths:
#> edf Ref.df Chi.sq p-value
#> s(ndvi) 1.1 5 5.46 0.67
#>
#> Latent trend parameter AR estimates:
#> 2.5% 50% 97.5% Rhat n_eff
#> ar1[1] 0.70 0.81 0.93 1.01 253
#> sigma[1] 0.68 0.80 0.95 1.01 501
#>
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhats above 1.05 found for 11 parameters
#> *Diagnose further to investigate why the chains have not mixed
#> 0 of 2000 iterations ended with a divergence (0%)
#> 0 of 2000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#>
#> Samples were drawn using NUTS(diag_e) at Mon Dec 16 10:18:22 AM 2024.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#>
#> Use how_to_cite(model4) to get started describing this model
View posterior hindcasts / forecasts and compare against the out of sample test data
plot(model4, type = 'forecast', newdata = data_test)
The trend is evolving as an AR1 process, which we can also view:
plot(model4, type = 'trend', newdata = data_test)
In-sample model performance can be interrogated using leave-one-out
cross-validation utilities from the loo
package (a higher
value is preferred for this metric):
loo_compare(model3, model4)
#> elpd_diff se_diff
#> model4 0.0 0.0
#> model3 -559.3 66.2
The higher estimated log predictive density (ELPD) value for the dynamic model suggests it provides a better fit to the in-sample data.
Though it should be obvious that this model provides better
forecasts, we can quantify forecast performance for models 3 and 4 using
the forecast
and score
functions. Here we will
compare models based on their Discrete Ranked Probability Scores (a
lower value is preferred for this metric)
fc_mod3 <- forecast(model3)
fc_mod4 <- forecast(model4)
score_mod3 <- score(fc_mod3, score = 'drps')
score_mod4 <- score(fc_mod4, score = 'drps')
sum(score_mod4$PP$score, na.rm = TRUE) - sum(score_mod3$PP$score, na.rm = TRUE)
#> [1] -137.438
A strongly negative value here suggests the score for the dynamic model (model 4) is much smaller than the score for the model with a smooth function of time (model 3)