Set up autoregressive or autoregressive moving average trend models in mvgam. These functions do not evaluate their arguments – they exist purely to help set up a model with particular autoregressive trend models.
Usage
RW(ma = FALSE, cor = FALSE, gr = NA, subgr = NA)
AR(p = 1, ma = FALSE, cor = FALSE, gr = NA, subgr = NA)
CAR(p = 1)
VAR(ma = FALSE, cor = FALSE, gr = NA, subgr = NA)
Arguments
- ma
Logical
Include moving average terms of order1
? Default isFALSE
.- cor
Logical
Include correlated process errors as part of a multivariate normal process model? IfTRUE
and ifn_series > 1
in the supplied data, a fully structured covariance matrix will be estimated for the process errors. Default isFALSE
.- gr
An optional grouping variable, which must be a
factor
in the supplieddata
, for setting up hierarchical residual correlation structures. If specified, this will automatically setcor = TRUE
and set up a model where the residual correlations for a specific level ofgr
are modelled hierarchically: \(\Omega_{group} = \alpha_{cor}\Omega_{global} + (1 - \alpha_{cor})\Omega_{group, local}\), where \(\Omega_{global}\) is a global correlation matrix, \(\Omega_{group, local}\) is a local deviation correlation matrix and \(\alpha_{cor}\) is a weighting parameter controlling how strongly the local correlation matrix \(\Omega_{group}\) is shrunk towards the global correlation matrix \(\Omega_{global}\) (larger values of \(\alpha_{cor}\) indicate a greater degree of shrinkage, i.e. a greater degree of partial pooling). Ifgr
is supplied thensubgr
must also be supplied- subgr
A subgrouping
factor
variable specifying which element indata
represents the different time series. Defaults toseries
, but note that models that use the hierarchical correlations, where thesubgr
time series are measured in each level ofgr
, should not include aseries
element indata
. Rather, this element will be created internally based on the supplied variables forgr
andsubgr
. For example, if you are modelling temporal counts for a group of species (labelled asspecies
indata
) across three different geographical regions (labelled asregion
), and you would like the residuals to be correlated within regions, then you should specifygr = region
andsubgr = species
. Internally,mvgam()
will create theseries
element for the data using:series = interaction(group, subgroup, drop = TRUE))
- p
A non-negative integer specifying the autoregressive (AR) order. Default is
1
. Cannot currently be larger than3
forAR
terms, and cannot be anything other than1
for continuous time AR (CAR
) terms
Value
An object of class mvgam_trend
, which contains a list of
arguments to be interpreted by the parsing functions in mvgam
Examples
# \donttest{
# A short example to illustrate CAR(1) models
# Function to simulate CAR1 data with seasonality
sim_corcar1 = function(n = 120,
phi = 0.5,
sigma = 1,
sigma_obs = 0.75){
# Sample irregularly spaced time intervals
time_dis <- c(0, runif(n - 1, -0.1, 1))
time_dis[time_dis < 0] <- 0; time_dis <- time_dis * 5
# Set up the latent dynamic process
x <- vector(length = n); x[1] <- -0.3
for(i in 2:n){
# zero-distances will cause problems in sampling, so mvgam uses a
# minimum threshold; this simulation function emulates that process
if(time_dis[i] == 0){
x[i] <- rnorm(1, mean = (phi ^ 1e-12) * x[i - 1], sd = sigma)
} else {
x[i] <- rnorm(1, mean = (phi ^ time_dis[i]) * x[i - 1], sd = sigma)
}
}
# Add 12-month seasonality
cov1 <- sin(2 * pi * (1 : n) / 12); cov2 <- cos(2 * pi * (1 : n) / 12)
beta1 <- runif(1, 0.3, 0.7); beta2 <- runif(1, 0.2, 0.5)
seasonality <- beta1 * cov1 + beta2 * cov2
# Take Gaussian observations with error and return
data.frame(y = rnorm(n, mean = x + seasonality, sd = sigma_obs),
season = rep(1:12, 20)[1:n],
time = cumsum(time_dis))
}
# Sample two time series
dat <- rbind(dplyr::bind_cols(sim_corcar1(phi = 0.65,
sigma_obs = 0.55),
data.frame(series = 'series1')),
dplyr::bind_cols(sim_corcar1(phi = 0.8,
sigma_obs = 0.35),
data.frame(series = 'series2'))) %>%
dplyr::mutate(series = as.factor(series))
# mvgam with CAR(1) trends and series-level seasonal smooths; the
# State-Space representation (using trend_formula) will be more efficient
mod <- mvgam(formula = y ~ 1,
trend_formula = ~ s(season, bs = 'cc',
k = 5, by = trend),
trend_model = CAR(),
data = dat,
family = gaussian(),
samples = 300,
chains = 2,
silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#> from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#> from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#> from stan/lib/stan_math/stan/math/prim.hpp:16,
#> from stan/lib/stan_math/stan/math/rev.hpp:16,
#> from stan/lib/stan_math/stan/math.hpp:19,
#> from stan/src/stan/model/model_header.hpp:4,
#> from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c3e2173a3.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#> 194 | if (cdf_n < 0.0)
#> |
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
# View usual summaries and plots
summary(mod)
#> GAM observation formula:
#> y ~ 1
#> <environment: 0x000001d61abb4b00>
#>
#> GAM process formula:
#> ~s(season, bs = "cc", k = 5, by = trend)
#> <environment: 0x000001d61abb4b00>
#>
#> Family:
#> gaussian
#>
#> Link function:
#> identity
#>
#> Trend model:
#> CAR()
#>
#>
#> N process models:
#> 2
#>
#> N series:
#> 2
#>
#> N timepoints:
#> 120
#>
#> Status:
#> Fitted using Stan
#> 2 chains, each with iter = 800; warmup = 500; thin = 1
#> Total post-warmup draws = 600
#>
#>
#> Observation error parameter estimates:
#> 2.5% 50% 97.5% Rhat n_eff
#> sigma_obs[1] 0.32 0.61 0.93 1.02 25
#> sigma_obs[2] 0.17 0.30 0.54 1.08 21
#>
#> GAM observation model coefficient (beta) estimates:
#> 2.5% 50% 97.5% Rhat n_eff
#> (Intercept) -2.1 -0.86 -0.026 2.18 3
#>
#> Process model AR parameter estimates:
#> 2.5% 50% 97.5% Rhat n_eff
#> ar1[1] 0.20 0.56 0.85 1.03 61
#> ar1[2] 0.69 0.80 0.89 1.00 183
#>
#> Process error parameter estimates:
#> 2.5% 50% 97.5% Rhat n_eff
#> sigma[1] 0.64 0.95 1.2 1.01 39
#> sigma[2] 0.73 0.89 1.1 1.02 87
#>
#> GAM process model coefficient (beta) estimates:
#> 2.5% 50% 97.5% Rhat n_eff
#> (Intercept)_trend -0.14 0.71 2.000 2.1 3
#> s(season):trendtrend1.1_trend -0.17 0.18 0.560 1.0 423
#> s(season):trendtrend1.2_trend -0.75 -0.30 0.120 1.0 522
#> s(season):trendtrend1.3_trend -0.86 -0.46 -0.044 1.0 488
#> s(season):trendtrend2.1_trend -0.13 0.24 0.700 1.0 388
#> s(season):trendtrend2.2_trend -0.95 -0.41 0.076 1.0 398
#> s(season):trendtrend2.3_trend -1.00 -0.50 -0.094 1.0 491
#>
#> Approximate significance of GAM process smooths:
#> edf Ref.df Chi.sq p-value
#> s(season):seriestrend1 2.29 3 10.5 0.061 .
#> s(season):seriestrend2 2.19 3 15.1 0.051 .
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhats above 1.05 found for 485 parameters
#> *Diagnose further to investigate why the chains have not mixed
#> 11 of 600 iterations ended with a divergence (1.8333%)
#> *Try running with larger adapt_delta to remove the divergences
#> 0 of 600 iterations saturated the maximum tree depth of 10 (0%)
#> Chain 1: E-FMI = 0.1507
#> Chain 2: E-FMI = 0.1239
#> *E-FMI below 0.2 indicates you may need to reparameterize your model
#>
#> Samples were drawn using NUTS(diag_e) at Wed Feb 19 11:59:08 AM 2025.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#>
#> Use how_to_cite(mod) to get started describing this model
conditional_effects(mod, type = 'expected')
plot(mod, type = 'trend', series = 1)
plot(mod, type = 'trend', series = 2)
plot(mod, type = 'residuals', series = 1)
plot(mod, type = 'residuals', series = 2)
# Now an example illustrating hierarchical dynamics
set.seed(123)
# Simulate three species monitored in three different
# regions, where dynamics can potentially vary across regions
simdat1 <- sim_mvgam(trend_model = VAR(cor = TRUE),
prop_trend = 0.95,
n_series = 3,
mu = c(1, 2, 3))
simdat2 <- sim_mvgam(trend_model = VAR(cor = TRUE),
prop_trend = 0.95,
n_series = 3,
mu = c(1, 2, 3))
simdat3 <- sim_mvgam(trend_model = VAR(cor = TRUE),
prop_trend = 0.95,
n_series = 3,
mu = c(1, 2, 3))
# Set up the data but DO NOT include 'series'
all_dat <- rbind(simdat1$data_train %>%
dplyr::mutate(region = 'qld'),
simdat2$data_train %>%
dplyr::mutate(region = 'nsw'),
simdat3$data_train %>%
dplyr::mutate(region = 'vic')) %>%
dplyr::mutate(species = gsub('series', 'species', series),
species = as.factor(species),
region = as.factor(region)) %>%
dplyr::arrange(series, time) %>%
dplyr::select(-series)
# Check priors for a hierarchical AR1 model
get_mvgam_priors(formula = y ~ species,
trend_model = AR(gr = region, subgr = species),
data = all_dat)
#> param_name param_length param_info
#> 1 (Intercept) 1 (Intercept)
#> 2 speciesspecies_2 1 speciesspecies_2 fixed effect
#> 3 speciesspecies_3 1 speciesspecies_3 fixed effect
#> 4 vector<lower=-1,upper=1>[n_series] ar1; 9 trend AR1 coefficient
#> 5 vector<lower=0>[n_series] sigma; 9 trend sd
#> prior example_change new_lowerbound new_upperbound
#> 1 (Intercept) ~ student_t(3, 1.9, 2.5); (Intercept) ~ normal(0, 1); NA NA
#> 2 speciesspecies_2 ~ student_t(3, 0, 2); speciesspecies_2 ~ normal(0, 1); NA NA
#> 3 speciesspecies_3 ~ student_t(3, 0, 2); speciesspecies_3 ~ normal(0, 1); NA NA
#> 4 ar1 ~ std_normal(); ar1 ~ normal(-0.79, 0.86); NA NA
#> 5 sigma ~ student_t(3, 0, 2.5); sigma ~ exponential(0.37); NA NA
# Fit the model
mod <- mvgam(formula = y ~ species,
trend_model = AR(gr = region, subgr = species),
data = all_dat,
chains = 2,
silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#> from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#> from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#> from stan/lib/stan_math/stan/math/prim.hpp:16,
#> from stan/lib/stan_math/stan/math/rev.hpp:16,
#> from stan/lib/stan_math/stan/math.hpp:19,
#> from stan/src/stan/model/model_header.hpp:4,
#> from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c48444ff2.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#> 194 | if (cdf_n < 0.0)
#> |
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
# Check standard outputs
summary(mod)
#> GAM formula:
#> y ~ species
#> <environment: 0x000001d61abb4b00>
#>
#> Family:
#> poisson
#>
#> Link function:
#> log
#>
#> Trend model:
#> AR(gr = region, subgr = species)
#>
#>
#> N series:
#> 9
#>
#> N timepoints:
#> 75
#>
#> Status:
#> Fitted using Stan
#> 2 chains, each with iter = 1000; warmup = 500; thin = 1
#> Total post-warmup draws = 1000
#>
#>
#> GAM coefficient (beta) estimates:
#> 2.5% 50% 97.5% Rhat n_eff
#> (Intercept) 0.90 1.10 1.3 1.00 175
#> speciesspecies_2 0.79 0.99 1.2 1.02 332
#> speciesspecies_3 1.60 1.90 2.1 1.01 125
#>
#> Latent trend parameter AR estimates:
#> 2.5% 50% 97.5% Rhat n_eff
#> ar1[1] 0.220 0.480 0.700 1.01 202
#> ar1[2] 0.120 0.330 0.550 1.02 157
#> ar1[3] -0.017 0.370 0.680 1.03 54
#> ar1[4] 0.230 0.640 0.870 1.02 136
#> ar1[5] -0.190 0.074 0.300 1.00 146
#> ar1[6] -0.430 -0.220 -0.015 1.00 120
#> ar1[7] -0.034 0.290 0.610 1.00 168
#> ar1[8] 0.570 0.730 0.890 1.00 178
#> ar1[9] 0.330 0.560 0.780 1.00 111
#> sigma[1] 0.810 1.000 1.300 1.00 531
#> sigma[2] 0.650 0.800 0.970 1.00 613
#> sigma[3] 0.820 0.970 1.200 1.00 358
#> sigma[4] 0.310 0.480 0.710 1.03 117
#> sigma[5] 0.630 0.760 0.900 1.00 495
#> sigma[6] 0.680 0.790 0.960 1.00 666
#> sigma[7] 0.610 0.790 1.000 1.00 256
#> sigma[8] 0.560 0.700 0.910 1.00 556
#> sigma[9] 0.690 0.820 1.000 1.00 1267
#>
#> Hierarchical correlation weighting parameter (alpha_cor) estimates:
#> 2.5% 50% 97.5% Rhat n_eff
#> alpha_cor 0.016 0.075 0.21 1 447
#>
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 0 of 1000 iterations ended with a divergence (0%)
#> 0 of 1000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#>
#> Samples were drawn using NUTS(diag_e) at Wed Feb 19 12:00:03 PM 2025.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#>
#> Use how_to_cite(mod) to get started describing this model
conditional_effects(mod, type = 'link')
# Inspect posterior estimates for the correlation weighting parameter
mcmc_plot(mod, variable = 'alpha_cor', type = 'hist')
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# }