Skip to contents

Set up autoregressive or autoregressive moving average trend models in mvgam. These functions do not evaluate their arguments – they exist purely to help set up a model with particular autoregressive trend models.

Usage

RW(ma = FALSE, cor = FALSE, gr = NA, subgr = NA)

AR(p = 1, ma = FALSE, cor = FALSE, gr = NA, subgr = NA)

CAR(p = 1)

VAR(ma = FALSE, cor = FALSE, gr = NA, subgr = NA)

Arguments

ma

Logical Include moving average terms of order 1? Default is FALSE.

cor

Logical Include correlated process errors as part of a multivariate normal process model? If TRUE and if n_series > 1 in the supplied data, a fully structured covariance matrix will be estimated for the process errors. Default is FALSE.

gr

An optional grouping variable, which must be a factor in the supplied data, for setting up hierarchical residual correlation structures. If specified, this will automatically set cor = TRUE and set up a model where the residual correlations for a specific level of gr are modelled hierarchically: \(\Omega_{group} = \alpha_{cor}\Omega_{global} + (1 - \alpha_{cor})\Omega_{group, local}\), where \(\Omega_{global}\) is a global correlation matrix, \(\Omega_{group, local}\) is a local deviation correlation matrix and \(\alpha_{cor}\) is a weighting parameter controlling how strongly the local correlation matrix \(\Omega_{group}\) is shrunk towards the global correlation matrix \(\Omega_{global}\) (larger values of \(\alpha_{cor}\) indicate a greater degree of shrinkage, i.e. a greater degree of partial pooling). If gr is supplied then subgr must also be supplied

subgr

A subgrouping factor variable specifying which element in data represents the different time series. Defaults to series, but note that models that use the hierarchical correlations, where the subgr time series are measured in each level of gr, should not include a series element in data. Rather, this element will be created internally based on the supplied variables for gr and subgr. For example, if you are modelling temporal counts for a group of species (labelled as species in data) across three different geographical regions (labelled as region), and you would like the residuals to be correlated within regions, then you should specify gr = region and subgr = species. Internally, mvgam() will create the series element for the data using: series = interaction(group, subgroup, drop = TRUE))

p

A non-negative integer specifying the autoregressive (AR) order. Default is 1. Cannot currently be larger than 3 for AR terms, and cannot be anything other than 1 for continuous time AR (CAR) terms

Value

An object of class mvgam_trend, which contains a list of arguments to be interpreted by the parsing functions in mvgam

Examples

# \donttest{
# A short example to illustrate CAR(1) models
# Function to simulate CAR1 data with seasonality
sim_corcar1 = function(n = 120,
                      phi = 0.5,
                      sigma = 1,
                      sigma_obs = 0.75){
# Sample irregularly spaced time intervals
time_dis <- c(0, runif(n - 1, -0.1, 1))
time_dis[time_dis < 0] <- 0; time_dis <- time_dis * 5

# Set up the latent dynamic process
x <- vector(length = n); x[1] <- -0.3
for(i in 2:n){
 # zero-distances will cause problems in sampling, so mvgam uses a
 # minimum threshold; this simulation function emulates that process
 if(time_dis[i] == 0){
   x[i] <- rnorm(1, mean = (phi ^ 1e-12) * x[i - 1], sd = sigma)
  } else {
    x[i] <- rnorm(1, mean = (phi ^ time_dis[i]) * x[i - 1], sd = sigma)
  }
}

# Add 12-month seasonality
cov1 <- sin(2 * pi * (1 : n) / 12); cov2 <- cos(2 * pi * (1 : n) / 12)
beta1 <- runif(1, 0.3, 0.7); beta2 <- runif(1, 0.2, 0.5)
seasonality <- beta1 * cov1 + beta2 * cov2

# Take Gaussian observations with error and return
data.frame(y = rnorm(n, mean = x + seasonality, sd = sigma_obs),
           season = rep(1:12, 20)[1:n],
           time = cumsum(time_dis))
}

# Sample two time series
dat <- rbind(dplyr::bind_cols(sim_corcar1(phi = 0.65,
                                         sigma_obs = 0.55),
                             data.frame(series = 'series1')),
            dplyr::bind_cols(sim_corcar1(phi = 0.8,
                             sigma_obs = 0.35),
                             data.frame(series = 'series2'))) %>%
      dplyr::mutate(series = as.factor(series))

# mvgam with CAR(1) trends and series-level seasonal smooths; the
# State-Space representation (using trend_formula) will be more efficient
mod <- mvgam(formula = y ~ 1,
            trend_formula = ~ s(season, bs = 'cc',
                                k = 5, by = trend),
            trend_model = CAR(),
            data = dat,
            family = gaussian(),
            samples = 300,
            chains = 2,
            silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c3e2173a3.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

# View usual summaries and plots
summary(mod)
#> GAM observation formula:
#> y ~ 1
#> <environment: 0x000001d61abb4b00>
#> 
#> GAM process formula:
#> ~s(season, bs = "cc", k = 5, by = trend)
#> <environment: 0x000001d61abb4b00>
#> 
#> Family:
#> gaussian
#> 
#> Link function:
#> identity
#> 
#> Trend model:
#> CAR()
#> 
#> 
#> N process models:
#> 2 
#> 
#> N series:
#> 2 
#> 
#> N timepoints:
#> 120 
#> 
#> Status:
#> Fitted using Stan 
#> 2 chains, each with iter = 800; warmup = 500; thin = 1 
#> Total post-warmup draws = 600
#> 
#> 
#> Observation error parameter estimates:
#>              2.5%  50% 97.5% Rhat n_eff
#> sigma_obs[1] 0.32 0.61  0.93 1.02    25
#> sigma_obs[2] 0.17 0.30  0.54 1.08    21
#> 
#> GAM observation model coefficient (beta) estimates:
#>             2.5%   50%  97.5% Rhat n_eff
#> (Intercept) -2.1 -0.86 -0.026 2.18     3
#> 
#> Process model AR parameter estimates:
#>        2.5%  50% 97.5% Rhat n_eff
#> ar1[1] 0.20 0.56  0.85 1.03    61
#> ar1[2] 0.69 0.80  0.89 1.00   183
#> 
#> Process error parameter estimates:
#>          2.5%  50% 97.5% Rhat n_eff
#> sigma[1] 0.64 0.95   1.2 1.01    39
#> sigma[2] 0.73 0.89   1.1 1.02    87
#> 
#> GAM process model coefficient (beta) estimates:
#>                                2.5%   50%  97.5% Rhat n_eff
#> (Intercept)_trend             -0.14  0.71  2.000  2.1     3
#> s(season):trendtrend1.1_trend -0.17  0.18  0.560  1.0   423
#> s(season):trendtrend1.2_trend -0.75 -0.30  0.120  1.0   522
#> s(season):trendtrend1.3_trend -0.86 -0.46 -0.044  1.0   488
#> s(season):trendtrend2.1_trend -0.13  0.24  0.700  1.0   388
#> s(season):trendtrend2.2_trend -0.95 -0.41  0.076  1.0   398
#> s(season):trendtrend2.3_trend -1.00 -0.50 -0.094  1.0   491
#> 
#> Approximate significance of GAM process smooths:
#>                         edf Ref.df Chi.sq p-value  
#> s(season):seriestrend1 2.29      3   10.5   0.061 .
#> s(season):seriestrend2 2.19      3   15.1   0.051 .
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhats above 1.05 found for 485 parameters
#>  *Diagnose further to investigate why the chains have not mixed
#> 11 of 600 iterations ended with a divergence (1.8333%)
#>  *Try running with larger adapt_delta to remove the divergences
#> 0 of 600 iterations saturated the maximum tree depth of 10 (0%)
#> Chain 1: E-FMI = 0.1507
#> Chain 2: E-FMI = 0.1239
#>  *E-FMI below 0.2 indicates you may need to reparameterize your model
#> 
#> Samples were drawn using NUTS(diag_e) at Wed Feb 19 11:59:08 AM 2025.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#> 
#> Use how_to_cite(mod) to get started describing this model
conditional_effects(mod, type = 'expected')

plot(mod, type = 'trend', series = 1)

plot(mod, type = 'trend', series = 2)

plot(mod, type = 'residuals', series = 1)

plot(mod, type = 'residuals', series = 2)


# Now an example illustrating hierarchical dynamics
set.seed(123)
# Simulate three species monitored in three different
# regions, where dynamics can potentially vary across regions
simdat1 <- sim_mvgam(trend_model = VAR(cor = TRUE),
                    prop_trend = 0.95,
                    n_series = 3,
                    mu = c(1, 2, 3))
simdat2 <- sim_mvgam(trend_model = VAR(cor = TRUE),
                    prop_trend = 0.95,
                    n_series = 3,
                    mu = c(1, 2, 3))
simdat3 <- sim_mvgam(trend_model = VAR(cor = TRUE),
                    prop_trend = 0.95,
                    n_series = 3,
                    mu = c(1, 2, 3))

# Set up the data but DO NOT include 'series'
all_dat <- rbind(simdat1$data_train %>%
                  dplyr::mutate(region = 'qld'),
                simdat2$data_train %>%
                  dplyr::mutate(region = 'nsw'),
                simdat3$data_train %>%
                  dplyr::mutate(region = 'vic')) %>%
dplyr::mutate(species = gsub('series', 'species', series),
              species = as.factor(species),
              region = as.factor(region)) %>%
dplyr::arrange(series, time) %>%
dplyr::select(-series)

# Check priors for a hierarchical AR1 model
get_mvgam_priors(formula = y ~ species,
                trend_model = AR(gr = region, subgr = species),
                data = all_dat)
#>                                param_name param_length                    param_info
#> 1                             (Intercept)            1                   (Intercept)
#> 2                        speciesspecies_2            1 speciesspecies_2 fixed effect
#> 3                        speciesspecies_3            1 speciesspecies_3 fixed effect
#> 4 vector<lower=-1,upper=1>[n_series] ar1;            9         trend AR1 coefficient
#> 5        vector<lower=0>[n_series] sigma;            9                      trend sd
#>                                    prior                   example_change new_lowerbound new_upperbound
#> 1  (Intercept) ~ student_t(3, 1.9, 2.5);      (Intercept) ~ normal(0, 1);             NA             NA
#> 2 speciesspecies_2 ~ student_t(3, 0, 2); speciesspecies_2 ~ normal(0, 1);             NA             NA
#> 3 speciesspecies_3 ~ student_t(3, 0, 2); speciesspecies_3 ~ normal(0, 1);             NA             NA
#> 4                    ar1 ~ std_normal();       ar1 ~ normal(-0.79, 0.86);             NA             NA
#> 5          sigma ~ student_t(3, 0, 2.5);       sigma ~ exponential(0.37);             NA             NA

# Fit the model
mod <- mvgam(formula = y ~ species,
            trend_model = AR(gr = region, subgr = species),
            data = all_dat,
            chains = 2,
            silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c48444ff2.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory

# Check standard outputs
summary(mod)
#> GAM formula:
#> y ~ species
#> <environment: 0x000001d61abb4b00>
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> AR(gr = region, subgr = species)
#> 
#> 
#> N series:
#> 9 
#> 
#> N timepoints:
#> 75 
#> 
#> Status:
#> Fitted using Stan 
#> 2 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 1000
#> 
#> 
#> GAM coefficient (beta) estimates:
#>                  2.5%  50% 97.5% Rhat n_eff
#> (Intercept)      0.90 1.10   1.3 1.00   175
#> speciesspecies_2 0.79 0.99   1.2 1.02   332
#> speciesspecies_3 1.60 1.90   2.1 1.01   125
#> 
#> Latent trend parameter AR estimates:
#>            2.5%    50%  97.5% Rhat n_eff
#> ar1[1]    0.220  0.480  0.700 1.01   202
#> ar1[2]    0.120  0.330  0.550 1.02   157
#> ar1[3]   -0.017  0.370  0.680 1.03    54
#> ar1[4]    0.230  0.640  0.870 1.02   136
#> ar1[5]   -0.190  0.074  0.300 1.00   146
#> ar1[6]   -0.430 -0.220 -0.015 1.00   120
#> ar1[7]   -0.034  0.290  0.610 1.00   168
#> ar1[8]    0.570  0.730  0.890 1.00   178
#> ar1[9]    0.330  0.560  0.780 1.00   111
#> sigma[1]  0.810  1.000  1.300 1.00   531
#> sigma[2]  0.650  0.800  0.970 1.00   613
#> sigma[3]  0.820  0.970  1.200 1.00   358
#> sigma[4]  0.310  0.480  0.710 1.03   117
#> sigma[5]  0.630  0.760  0.900 1.00   495
#> sigma[6]  0.680  0.790  0.960 1.00   666
#> sigma[7]  0.610  0.790  1.000 1.00   256
#> sigma[8]  0.560  0.700  0.910 1.00   556
#> sigma[9]  0.690  0.820  1.000 1.00  1267
#> 
#> Hierarchical correlation weighting parameter (alpha_cor) estimates:
#>            2.5%   50% 97.5% Rhat n_eff
#> alpha_cor 0.016 0.075  0.21    1   447
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 0 of 1000 iterations ended with a divergence (0%)
#> 0 of 1000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Wed Feb 19 12:00:03 PM 2025.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
#> 
#> Use how_to_cite(mod) to get started describing this model
conditional_effects(mod, type = 'link')


# Inspect posterior estimates for the correlation weighting parameter
mcmc_plot(mod, variable = 'alpha_cor', type = 'hist')
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# }