Set up autoregressive or autoregressive moving average trend models in mvgam. These functions do not evaluate their arguments – they exist purely to help set up a model with particular autoregressive trend models.
Usage
RW(ma = FALSE, cor = FALSE, gr = NA, subgr = NA)
AR(p = 1, ma = FALSE, cor = FALSE, gr = NA, subgr = NA)
CAR(p = 1)
VAR(ma = FALSE, cor = FALSE, gr = NA, subgr = NA)
Arguments
- ma
Logical
Include moving average terms of order1
? Default isFALSE
.- cor
Logical
Include correlated process errors as part of a multivariate normal process model? IfTRUE
and ifn_series > 1
in the supplied data, a fully structured covariance matrix will be estimated for the process errors. Default isFALSE
.- gr
An optional grouping variable, which must be a
factor
in the supplieddata
, for setting up hierarchical residual correlation structures. If specified, this will automatically setcor = TRUE
and set up a model where the residual correlations for a specific level ofgr
are modelled hierarchically: \(\Omega_{group} = \alpha_{cor}\Omega_{global} + (1 - \alpha_{cor})\Omega_{group, local}\), where \(\Omega_{global}\) is a global correlation matrix, \(\Omega_{group, local}\) is a local deviation correlation matrix and \(\alpha_{cor}\) is a weighting parameter controlling how strongly the local correlation matrix \(\Omega_{group}\) is shrunk towards the global correlation matrix \(\Omega_{global}\) (larger values of \(\alpha_{cor}\) indicate a greater degree of shrinkage, i.e. a greater degree of partial pooling). When used within aVAR()
model, this essentially sets up a hierarchical panel vector autoregression where both the autoregressive and correlation matrices are learned hierarchically. Ifgr
is supplied thensubgr
must also be supplied- subgr
A subgrouping
factor
variable specifying which element indata
represents the different time series. Defaults toseries
, but note that models that use the hierarchical correlations, where thesubgr
time series are measured in each level ofgr
, should not include aseries
element indata
. Rather, this element will be created internally based on the supplied variables forgr
andsubgr
. For example, if you are modelling temporal counts for a group of species (labelled asspecies
indata
) across three different geographical regions (labelled asregion
), and you would like the residuals to be correlated within regions, then you should specifygr = region
andsubgr = species
. Internally,mvgam()
will create theseries
element for the data using:series = interaction(group, subgroup, drop = TRUE))
- p
A non-negative integer specifying the autoregressive (AR) order. Default is
1
. Cannot currently be larger than3
forAR
terms, and cannot be anything other than1
for continuous time AR (CAR
) terms
Value
An object of class mvgam_trend
, which contains a list of
arguments to be interpreted by the parsing functions in mvgam
Examples
# \donttest{
# A short example to illustrate CAR(1) models
# Function to simulate CAR1 data with seasonality
sim_corcar1 = function(n = 125,
phi = 0.5,
sigma = 2,
sigma_obs = 0.75){
# Sample irregularly spaced time intervals
time_dis <- c(0, runif(n - 1, -0.1, 1))
time_dis[time_dis < 0] <- 0; time_dis <- time_dis * 5
# Set up the latent dynamic process
x <- vector(length = n); x[1] <- -0.3
for(i in 2:n){
# zero-distances will cause problems in sampling, so mvgam uses a
# minimum threshold; this simulation function emulates that process
if(time_dis[i] == 0){
x[i] <- rnorm(1, mean = (phi ^ 1e-12) * x[i - 1], sd = sigma)
} else {
x[i] <- rnorm(1, mean = (phi ^ time_dis[i]) * x[i - 1], sd = sigma)
}
}
# Add 12-month seasonality
cov1 <- sin(2 * pi * (1 : n) / 12); cov2 <- cos(2 * pi * (1 : n) / 12)
beta1 <- runif(1, 0.3, 0.7); beta2 <- runif(1, 0.2, 0.5)
seasonality <- beta1 * cov1 + beta2 * cov2
# Take Gaussian observations with error and return
data.frame(y = rnorm(n, mean = x + seasonality, sd = sigma_obs),
season = rep(1:12, 20)[1:n],
time = cumsum(time_dis))
}
# Sample two time series
dat <- rbind(dplyr::bind_cols(sim_corcar1(phi = 0.65,
sigma_obs = 0.55),
data.frame(series = 'series1')),
dplyr::bind_cols(sim_corcar1(phi = 0.8,
sigma_obs = 0.35),
data.frame(series = 'series2'))) %>%
dplyr::mutate(series = as.factor(series))
# mvgam with CAR(1) trends and series-level seasonal smooths; the
# State-Space representation (using trend_formula) will be more efficient;
# using informative priors on the sigmas often helps with convergence
mod <- mvgam(formula = y ~ -1,
trend_formula = ~ s(season, bs = 'cc',
k = 5, by = trend),
trend_model = CAR(),
priors = c(prior(exponential(3),
class = sigma),
prior(beta(4, 4),
class = sigma_obs)),
data = dat,
family = gaussian(),
chains = 2,
silent = 2)
# View usual summaries and plots
summary(mod)
#> GAM observation formula:
#> y ~ 1
#> <environment: 0x55e5eecc6608>
#>
#> GAM process formula:
#> ~s(season, bs = "cc", k = 5, by = trend)
#> <environment: 0x55e5eecc6608>
#>
#> Family:
#> gaussian
#>
#> Link function:
#> identity
#>
#> Trend model:
#> CAR()
#>
#>
#> N process models:
#> 2
#>
#> N series:
#> 2
#>
#> N timepoints:
#> 125
#>
#> Status:
#> Fitted using Stan
#> 2 chains, each with iter = 1000; warmup = 500; thin = 1
#> Total post-warmup draws = 1000
#>
#>
#> Observation error parameter estimates:
#> 2.5% 50% 97.5% Rhat n_eff
#> sigma_obs[1] 0.30 0.56 0.83 1.02 42
#> sigma_obs[2] 0.22 0.49 0.79 1.03 36
#>
#> GAM observation model coefficient (beta) estimates:
#> 2.5% 50% 97.5% Rhat n_eff
#> (Intercept) 0 0 0 NaN NaN
#>
#> Process model AR parameter estimates:
#> 2.5% 50% 97.5% Rhat n_eff
#> ar1[1] 0.60 0.74 0.85 1 1019
#> ar1[2] 0.72 0.81 0.88 1 800
#>
#> Process error parameter estimates:
#> 2.5% 50% 97.5% Rhat n_eff
#> sigma[1] 1.6 1.9 2.2 1.00 371
#> sigma[2] 1.7 1.9 2.2 1.01 304
#>
#> GAM process model coefficient (beta) estimates:
#> 2.5% 50% 97.5% Rhat n_eff
#> (Intercept)_trend -0.50 0.051 0.57 1 876
#> s(season):trendtrend1.1_trend -0.71 0.044 0.71 1 682
#> s(season):trendtrend1.2_trend -1.20 -0.310 0.40 1 948
#> s(season):trendtrend1.3_trend -1.70 -0.480 0.17 1 456
#> s(season):trendtrend2.1_trend -0.50 0.071 0.71 1 686
#> s(season):trendtrend2.2_trend -1.00 -0.260 0.37 1 581
#> s(season):trendtrend2.3_trend -1.10 -0.250 0.28 1 667
#>
#> Approximate significance of GAM process smooths:
#> edf Ref.df Chi.sq p-value
#> s(season):seriestrend1 1.77 3 13.95 0.62
#> s(season):seriestrend2 1.91 3 4.96 0.75
#>
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✖ Rhats above 1.05 found for some parameters
#> Use pairs() and mcmc_plot() to investigate
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#>
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#> crude measure of effective sample size, and Rhat is the potential scale
#> reduction factor on split MCMC chains (at convergence, Rhat = 1)
#>
#> Use how_to_cite() to get started describing this model
conditional_effects(mod, type = 'expected')
plot(mod, type = 'trend', series = 1)
plot(mod, type = 'trend', series = 2)
plot(mod, type = 'residuals', series = 1)
plot(mod, type = 'residuals', series = 2)
mcmc_plot(mod,
variable = 'ar1',
regex = TRUE,
type = 'hist')
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Now an example illustrating hierarchical dynamics
set.seed(123)
# Simulate three species monitored in three different
# regions, where dynamics can potentially vary across regions
simdat1 <- sim_mvgam(trend_model = VAR(cor = TRUE),
prop_trend = 0.95,
n_series = 3,
mu = c(1, 2, 3))
simdat2 <- sim_mvgam(trend_model = VAR(cor = TRUE),
prop_trend = 0.95,
n_series = 3,
mu = c(1, 2, 3))
simdat3 <- sim_mvgam(trend_model = VAR(cor = TRUE),
prop_trend = 0.95,
n_series = 3,
mu = c(1, 2, 3))
# Set up the data but DO NOT include 'series'
all_dat <- rbind(simdat1$data_train %>%
dplyr::mutate(region = 'qld'),
simdat2$data_train %>%
dplyr::mutate(region = 'nsw'),
simdat3$data_train %>%
dplyr::mutate(region = 'vic')) %>%
dplyr::mutate(species = gsub('series', 'species', series),
species = as.factor(species),
region = as.factor(region)) %>%
dplyr::arrange(series, time) %>%
dplyr::select(-series)
# Check priors for a hierarchical AR1 model
get_mvgam_priors(formula = y ~ species,
trend_model = AR(gr = region, subgr = species),
data = all_dat)
#> param_name param_length
#> 1 (Intercept) 1
#> 2 speciesspecies_2 1
#> 3 speciesspecies_3 1
#> 4 vector<lower=-1,upper=1>[n_series] ar1; 9
#> 5 vector<lower=0>[n_series] sigma; 9
#> param_info prior
#> 1 (Intercept) (Intercept) ~ student_t(3, 1.9, 2.5);
#> 2 speciesspecies_2 fixed effect speciesspecies_2 ~ student_t(3, 0, 2);
#> 3 speciesspecies_3 fixed effect speciesspecies_3 ~ student_t(3, 0, 2);
#> 4 trend AR1 coefficient ar1 ~ std_normal();
#> 5 trend sd sigma ~ inv_gamma(1.418, 0.452);
#> example_change new_lowerbound new_upperbound
#> 1 (Intercept) ~ normal(0, 1); NA NA
#> 2 speciesspecies_2 ~ normal(0, 1); NA NA
#> 3 speciesspecies_3 ~ normal(0, 1); NA NA
#> 4 ar1 ~ normal(-0.79, 0.86); NA NA
#> 5 sigma ~ exponential(0.37); NA NA
# Fit the model
mod <- mvgam(formula = y ~ species,
trend_model = AR(gr = region, subgr = species),
data = all_dat,
chains = 2,
silent = 2)
# Check standard outputs
summary(mod)
#> GAM formula:
#> y ~ species
#> <environment: 0x55e5eecc6608>
#>
#> Family:
#> poisson
#>
#> Link function:
#> log
#>
#> Trend model:
#> AR(gr = region, subgr = species)
#>
#>
#> N series:
#> 9
#>
#> N timepoints:
#> 75
#>
#> Status:
#> Fitted using Stan
#> 2 chains, each with iter = 1000; warmup = 500; thin = 1
#> Total post-warmup draws = 1000
#>
#>
#> GAM coefficient (beta) estimates:
#> 2.5% 50% 97.5% Rhat n_eff
#> (Intercept) 0.92 1.10 1.3 1.00 216
#> speciesspecies_2 0.79 0.99 1.2 1.00 268
#> speciesspecies_3 1.60 1.80 2.1 1.02 107
#>
#> Latent trend parameter AR estimates:
#> 2.5% 50% 97.5% Rhat n_eff
#> ar1[1] 0.2700 0.500 0.7200 1.01 98
#> ar1[2] 0.1300 0.340 0.5200 1.01 146
#> ar1[3] 0.0570 0.360 0.6800 1.05 46
#> ar1[4] 0.3000 0.630 0.8500 1.01 132
#> ar1[5] -0.1400 0.086 0.3000 1.03 117
#> ar1[6] -0.4200 -0.230 0.0021 1.03 114
#> ar1[7] 0.0015 0.320 0.6500 1.00 154
#> ar1[8] 0.5700 0.750 0.9100 1.01 223
#> ar1[9] 0.3300 0.540 0.8000 1.02 87
#> sigma[1] 0.7900 0.990 1.3000 1.00 447
#> sigma[2] 0.6400 0.780 0.9500 1.00 381
#> sigma[3] 0.8000 0.960 1.2000 1.00 817
#> sigma[4] 0.3100 0.470 0.6900 1.02 86
#> sigma[5] 0.6100 0.730 0.8800 1.00 564
#> sigma[6] 0.6700 0.780 0.9300 1.00 753
#> sigma[7] 0.5900 0.760 0.9800 1.00 294
#> sigma[8] 0.5400 0.690 0.8800 1.00 382
#> sigma[9] 0.6800 0.820 0.9800 1.00 762
#>
#> Hierarchical correlation weighting parameter (alpha_cor) estimates:
#> 2.5% 50% 97.5% Rhat n_eff
#> alpha_cor 0.014 0.072 0.2 1 519
#>
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✖ Rhats above 1.05 found for some parameters
#> Use pairs() and mcmc_plot() to investigate
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#>
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#> crude measure of effective sample size, and Rhat is the potential scale
#> reduction factor on split MCMC chains (at convergence, Rhat = 1)
#>
#> Use how_to_cite() to get started describing this model
conditional_effects(mod, type = 'link')
# Inspect posterior estimates for the correlation weighting parameter
mcmc_plot(mod, variable = 'alpha_cor', type = 'hist')
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# }