Set up latent correlated multivariate Gaussian residual processes in mvgam. This function does not evaluate it's arguments – it exists purely to help set up a model with particular error processes.
Arguments
- unit
The unquoted name of the variable that represents the unit of analysis in
data
over which latent residuals should be correlated. This variable should be either anumeric
orinteger
variable in the supplieddata
. Defaults totime
to be consistent with other functionalities in mvgam, though note that the data need not be time series in this case. See examples below for further details and explanations- gr
An optional grouping variable, which must be a
factor
in the supplieddata
, for setting up hierarchical residual correlation structures. If specified, this will automatically set up a model where the residual correlations for a specific level ofgr
are modelled hierarchically: \(\Omega_{group} = p\Omega_{global} + (1 - p)\Omega_{group, local}\), where \(\Omega_{global}\) is a global correlation matrix, \(\Omega_{group, local}\) is a local deviation correlation matrix and \(p\) is a weighting parameter controlling how strongly the local correlation matrix \(\Omega_{group}\) is shrunk towards the global correlation matrix \(\Omega_{global}\). Ifgr
is supplied thensubgr
must also be supplied- subgr
A subgrouping
factor
variable specifying which element indata
represents the different observational units. Defaults toseries
to be consistent with other functionalities in mvgam, though note that the data need not be time series in this case. But note that models that use the hierarchical correlations (by supplying a value forgr
) should not include aseries
element indata
. Rather, this element will be created internally based on the supplied variables forgr
andsubgr
. For example, if you are modelling counts for a group of species (labelled asspecies
in the data) across sampling sites (labelled assite
in the data) in three different geographical regions (labelled asregion
), and you would like the residuals to be correlated within regions, then you should specifyunit = site
,gr = region
, andsubgr = species
. Internally,mvgam()
will appropriately order the data byunit
(in this case, bysite
) and create theseries
element for the data using something like:series = as.factor(paste0(group, '_', subgroup))
Value
An object of class mvgam_trend
, which contains a list of
arguments to be interpreted by the parsing functions in mvgam
Examples
# \donttest{
# Simulate counts of four species over ten sampling locations
site_dat <- data.frame(site = rep(1:10, 4),
species = as.factor(sort(rep(letters[1:4], 10))),
y = c(NA, rpois(39, 3)))
head(site_dat)
#> site species y
#> 1 1 a NA
#> 2 2 a 2
#> 3 3 a 0
#> 4 4 a 1
#> 5 5 a 3
#> 6 6 a 4
# Set up a correlated residual (i.e. Joint Species Distribution) model,
# where 'site' represents the unit of analysis
trend_model <- ZMVN(unit = site, subgr = species)
mod <- mvgam(y ~ species,
trend_model = ZMVN(unit = site,
subgr = species),
data = site_dat,
chains = 2,
silent = 2)
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#> from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#> from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#> from stan/lib/stan_math/stan/math/prim.hpp:16,
#> from stan/lib/stan_math/stan/math/rev.hpp:16,
#> from stan/lib/stan_math/stan/math.hpp:19,
#> from stan/src/stan/model/model_header.hpp:4,
#> from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c7aea2ba7.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#> 194 | if (cdf_n < 0.0)
#> |
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
# Inspect the estimated species-species residual covariances
mcmc_plot(mod, variable = 'Sigma', regex = TRUE, type = 'hist')
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# A hierarchical correlation example; set up correlated counts
# for three species across two sampling locations
Sigma <- matrix(c(1, -0.4, 0.5,
-0.4, 1, 0.3,
0.5, 0.3, 1),
byrow = TRUE,
nrow = 3)
Sigma
#> [,1] [,2] [,3]
#> [1,] 1.0 -0.4 0.5
#> [2,] -0.4 1.0 0.3
#> [3,] 0.5 0.3 1.0
make_site_dat = function(...){
errors <- mgcv::rmvn(n = 30,
mu = c(0.6, 0.8, 1.8),
V = Sigma)
site_dat <- do.call(rbind, lapply(1:3, function(spec){
data.frame(y = rpois(30,
lambda = exp(errors[, spec])),
species = paste0('species',
spec),
site = 1:30)
}))
site_dat
}
site_dat <- rbind(make_site_dat() %>%
dplyr::mutate(group = 'group1'),
make_site_dat() %>%
dplyr::mutate(group = 'group2')) %>%
dplyr::mutate(species = as.factor(species),
group = as.factor(group))
# Fit the hierarchical correlated residual model
mod <- mvgam(y ~ species,
trend_model = ZMVN(unit = site,
gr = group,
subgr = species),
data = site_dat)
#> Compiling Stan program using cmdstanr
#>
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#> from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#> from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#> from stan/lib/stan_math/stan/math/prim.hpp:16,
#> from stan/lib/stan_math/stan/math/rev.hpp:16,
#> from stan/lib/stan_math/stan/math.hpp:19,
#> from stan/src/stan/model/model_header.hpp:4,
#> from C:/Users/uqnclar2/AppData/Local/Temp/RtmpotLup8/model-9e8c51c52b08.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#> 194 | if (cdf_n < 0.0)
#> |
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
#> Start sampling
#> Running MCMC with 4 parallel chains...
#>
#> Chain 1 Iteration: 1 / 1000 [ 0%] (Warmup)
#> Chain 2 Iteration: 1 / 1000 [ 0%] (Warmup)
#> Chain 3 Iteration: 1 / 1000 [ 0%] (Warmup)
#> Chain 4 Iteration: 1 / 1000 [ 0%] (Warmup)
#> Chain 1 Iteration: 100 / 1000 [ 10%] (Warmup)
#> Chain 2 Iteration: 100 / 1000 [ 10%] (Warmup)
#> Chain 3 Iteration: 100 / 1000 [ 10%] (Warmup)
#> Chain 4 Iteration: 100 / 1000 [ 10%] (Warmup)
#> Chain 1 Iteration: 200 / 1000 [ 20%] (Warmup)
#> Chain 2 Iteration: 200 / 1000 [ 20%] (Warmup)
#> Chain 3 Iteration: 200 / 1000 [ 20%] (Warmup)
#> Chain 4 Iteration: 200 / 1000 [ 20%] (Warmup)
#> Chain 4 Iteration: 300 / 1000 [ 30%] (Warmup)
#> Chain 1 Iteration: 300 / 1000 [ 30%] (Warmup)
#> Chain 2 Iteration: 300 / 1000 [ 30%] (Warmup)
#> Chain 3 Iteration: 300 / 1000 [ 30%] (Warmup)
#> Chain 1 Iteration: 400 / 1000 [ 40%] (Warmup)
#> Chain 2 Iteration: 400 / 1000 [ 40%] (Warmup)
#> Chain 3 Iteration: 400 / 1000 [ 40%] (Warmup)
#> Chain 4 Iteration: 400 / 1000 [ 40%] (Warmup)
#> Chain 4 Iteration: 500 / 1000 [ 50%] (Warmup)
#> Chain 4 Iteration: 501 / 1000 [ 50%] (Sampling)
#> Chain 1 Iteration: 500 / 1000 [ 50%] (Warmup)
#> Chain 2 Iteration: 500 / 1000 [ 50%] (Warmup)
#> Chain 3 Iteration: 500 / 1000 [ 50%] (Warmup)
#> Chain 1 Iteration: 501 / 1000 [ 50%] (Sampling)
#> Chain 2 Iteration: 501 / 1000 [ 50%] (Sampling)
#> Chain 2 Iteration: 600 / 1000 [ 60%] (Sampling)
#> Chain 3 Iteration: 501 / 1000 [ 50%] (Sampling)
#> Chain 3 Iteration: 600 / 1000 [ 60%] (Sampling)
#> Chain 4 Iteration: 600 / 1000 [ 60%] (Sampling)
#> Chain 1 Iteration: 600 / 1000 [ 60%] (Sampling)
#> Chain 2 Iteration: 700 / 1000 [ 70%] (Sampling)
#> Chain 3 Iteration: 700 / 1000 [ 70%] (Sampling)
#> Chain 4 Iteration: 700 / 1000 [ 70%] (Sampling)
#> Chain 1 Iteration: 700 / 1000 [ 70%] (Sampling)
#> Chain 2 Iteration: 800 / 1000 [ 80%] (Sampling)
#> Chain 3 Iteration: 800 / 1000 [ 80%] (Sampling)
#> Chain 4 Iteration: 800 / 1000 [ 80%] (Sampling)
#> Chain 1 Iteration: 800 / 1000 [ 80%] (Sampling)
#> Chain 2 Iteration: 900 / 1000 [ 90%] (Sampling)
#> Chain 4 Iteration: 900 / 1000 [ 90%] (Sampling)
#> Chain 1 Iteration: 900 / 1000 [ 90%] (Sampling)
#> Chain 3 Iteration: 900 / 1000 [ 90%] (Sampling)
#> Chain 2 Iteration: 1000 / 1000 [100%] (Sampling)
#> Chain 4 Iteration: 1000 / 1000 [100%] (Sampling)
#> Chain 2 finished in 2.3 seconds.
#> Chain 4 finished in 2.2 seconds.
#> Chain 1 Iteration: 1000 / 1000 [100%] (Sampling)
#> Chain 3 Iteration: 1000 / 1000 [100%] (Sampling)
#> Chain 1 finished in 2.3 seconds.
#> Chain 3 finished in 2.3 seconds.
#>
#> All 4 chains finished successfully.
#> Mean chain execution time: 2.3 seconds.
#> Total execution time: 2.5 seconds.
#>
# Inspect the estimated species-species residual covariances
mcmc_plot(mod, variable = 'Sigma', regex = TRUE, type = 'hist')
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# }