This function allows a previously fitted mvgam model to be updated.
Usage
# S3 method for mvgam
update(
object,
formula,
trend_formula,
knots,
trend_knots,
trend_model,
family,
share_obs_params,
data,
newdata,
trend_map,
use_lv,
n_lv,
priors,
chains,
burnin,
samples,
threads,
algorithm,
lfo = FALSE,
...
)
# S3 method for jsdgam
update(
object,
formula,
factor_formula,
knots,
factor_knots,
data,
newdata,
n_lv,
family,
share_obs_params,
priors,
chains,
burnin,
samples,
threads,
algorithm,
lfo = FALSE,
...
)
Arguments
- object
list
object returned frommvgam
. Seemvgam()
- formula
Optional new
formula
object. Note,mvgam
currently does not support dynamic formula updates such as removal of specific terms with- term
. When updating, the entire formula needs to be supplied.- trend_formula
An optional
formula
object specifying the GAM process model formula. If supplied, a linear predictor will be modelled for the latent trends to capture process model evolution separately from the observation model.Important notes:
Should not have a response variable specified on the left-hand side (e.g.,
~ season + s(year)
)Use
trend
instead ofseries
for effects that vary across time seriesIn
nmix()
family models, sets up linear predictor for latent abundanceConsider dropping one intercept using
- 1
convention to avoid estimation challenges
- knots
An optional
list
containing user specified knot values for basis construction. For most bases the user simply supplies the knots to be used, which must match up with thek
value supplied. Different terms can use different numbers of knots, unless they share a covariate.- trend_knots
As for
knots
above, this is an optionallist
of knot values for smooth functions within thetrend_formula
.- trend_model
character
orfunction
specifying the time series dynamics for the latent trend.Available options:
None
: No latent trend component (GAM component only, likegam
)ZMVN
orZMVN()
: Zero-Mean Multivariate Normal (Stan only)'RW'
orRW()
: Random Walk'AR1'
,'AR2'
,'AR3'
orAR(p = 1, 2, 3)
: Autoregressive models'CAR1'
orCAR(p = 1)
: Continuous-time AR (Ornstein–Uhlenbeck process)'VAR1'
orVAR()
: Vector Autoregressive (Stan only)'PWlogistic'
,'PWlinear'
orPW()
: Piecewise trends (Stan only)'GP'
orGP()
: Gaussian Process with squared exponential kernel (Stan only)
Additional features:
Moving average and/or correlated process error terms available for most types (e.g.,
RW(cor = TRUE)
for multivariate Random Walk)Hierarchical correlations possible for structured data
See mvgam_trends for details and
ZMVN()
for examples
- family
family
specifying the exponential observation family for the series.Supported families:
gaussian()
: Real-valued databetar()
: Proportional data on(0,1)
lognormal()
: Non-negative real-valued datastudent_t()
: Real-valued dataGamma()
: Non-negative real-valued databernoulli()
: Binary datapoisson()
: Count data (default)nb()
: Overdispersed count databinomial()
: Count data with imperfect detection when number of trials is known (usecbind()
to bind observations and trials)beta_binomial()
: Asbinomial()
but allows for overdispersionnmix()
: Count data with imperfect detection when number of trials is unknown (State-Space N-Mixture model with Poisson latent states and Binomial observations)
See
mvgam_families
for more details.- share_obs_params
logical
. IfTRUE
and thefamily
has additional family-specific observation parameters (e.g., variance components, dispersion parameters), these will be shared across all outcome variables. Useful when multiple outcomes share properties. Default isFALSE
.- data
A
dataframe
orlist
containing the model response variable and covariates required by the GAMformula
and optionaltrend_formula
.Required columns for most models:
series
: Afactor
index of the series IDs (number of levels should equal number of unique series labels)time
:numeric
orinteger
index of time points. For most dynamic trend types, time should be measured in discrete, regularly spaced intervals (i.e.,c(1, 2, 3, ...)
). Irregular spacing is allowed fortrend_model = CAR(1)
, but zero intervals are adjusted to1e-12
to prevent sampling errors.
Special cases:
Models with hierarchical temporal correlation (e.g.,
AR(gr = region, subgr = species)
) should NOT include aseries
identifierModels without temporal dynamics (
trend_model = 'None'
ortrend_model = ZMVN()
) don't require atime
variable
- newdata
Optional
dataframe
orlist
of test data containing the same variables as indata
. If included, observations in variabley
will be set toNA
when fitting the model so that posterior simulations can be obtained.- trend_map
Optional
data.frame
specifying which series should depend on which latent trends. Enables multiple series to depend on the same latent trend process with different observation processes.Required structure:
Column
series
: Single unique entry for each series (matching factor levels in data)Column
trend
: Integer values indicating which trend each series depends on
Notes:
Sets up latent factor model by enabling
use_lv = TRUE
Process model intercept is NOT automatically suppressed
Not yet supported for continuous time models (
CAR()
)
- use_lv
logical
. IfTRUE
, use dynamic factors to estimate series' latent trends in a reduced dimension format. Only available forRW()
,AR()
andGP()
trend models. Default isFALSE
. Seelv_correlations
for examples.- n_lv
integer
specifying the number of latent dynamic factors to use ifuse_lv == TRUE
. Cannot exceedn_series
. Default ismin(2, floor(n_series / 2))
.- priors
An optional
data.frame
with prior definitions or, preferably, a vector ofbrmsprior
objects (seeprior()
). Seeget_mvgam_priors()
and Details for more information.- chains
integer
specifying the number of parallel chains for the model. Ignored for variational inference algorithms.- burnin
integer
specifying the number of warmup iterations to tune sampling algorithms. Ignored for variational inference algorithms.- samples
integer
specifying the number of post-warmup iterations for sampling the posterior distribution.- threads
integer
. Experimental option for within-chain parallelisation in Stan usingreduce_sum
. Recommended only for experienced Stan users with slow models. Currently works for all families exceptnmix()
and when using Cmdstan backend.- algorithm
Character string naming the estimation approach:
"sampling"
: MCMC (default)"meanfield"
: Variational inference with factorized normal distributions"fullrank"
: Variational inference with multivariate normal distribution"laplace"
: Laplace approximation (cmdstanr only)"pathfinder"
: Pathfinder algorithm (cmdstanr only)
Can be set globally via
"brms.algorithm"
option. Limited testing suggests"meanfield"
performs best among non-MCMC approximations for dynamic GAMs.- lfo
logical
. Indicates whether this is part of lfo_cv.mvgam call. Returns lighter model version for speed. Users should leave asFALSE
.- ...
- factor_formula
Optional new
formula
object for the factor linear predictors- factor_knots
An optional
list
containing user specified knot values to be used for basis construction of any smooth terms infactor_formula
. For most bases the user simply supplies the knots to be used, which must match up with thek
value supplied (note that the number of knots is not always justk
). Different terms can use different numbers of knots, unless they share a covariate
Value
A list
object of class mvgam
containing model output,
the text representation of the model file, the mgcv model output (for
easily generating simulations at unsampled covariate values), Dunn-Smyth
residuals for each outcome variable and key information needed for other
functions in the package. See mvgam-class
for details. Use
methods(class = "mvgam")
for an overview on available methods.
A list
object of class mvgam
containing model output,
the text representation of the model file, the mgcv model output (for
easily generating simulations at unsampled covariate values), Dunn-Smyth
residuals for each series and key information needed for other functions in
the package. See mvgam-class
for details. Use
methods(class = "mvgam")
for an overview on available methods.
Examples
# \donttest{
# Simulate some data and fit a Poisson AR1 model
simdat <- sim_mvgam(n_series = 1, trend_model = AR())
mod <- mvgam(
y ~ s(season, bs = 'cc'),
trend_model = AR(),
noncentred = TRUE,
data = simdat$data_train,
chains = 2
)
#> Compiling Stan program using cmdstanr
#>
#> Start sampling
#> Running MCMC with 2 parallel chains...
#>
#> Chain 1 Iteration: 1 / 1000 [ 0%] (Warmup)
#> Chain 1 Iteration: 100 / 1000 [ 10%] (Warmup)
#> Chain 2 Iteration: 1 / 1000 [ 0%] (Warmup)
#> Chain 2 Iteration: 100 / 1000 [ 10%] (Warmup)
#> Chain 1 Iteration: 200 / 1000 [ 20%] (Warmup)
#> Chain 1 Iteration: 300 / 1000 [ 30%] (Warmup)
#> Chain 1 Iteration: 400 / 1000 [ 40%] (Warmup)
#> Chain 1 Iteration: 500 / 1000 [ 50%] (Warmup)
#> Chain 1 Iteration: 501 / 1000 [ 50%] (Sampling)
#> Chain 2 Iteration: 200 / 1000 [ 20%] (Warmup)
#> Chain 2 Iteration: 300 / 1000 [ 30%] (Warmup)
#> Chain 2 Iteration: 400 / 1000 [ 40%] (Warmup)
#> Chain 2 Iteration: 500 / 1000 [ 50%] (Warmup)
#> Chain 2 Iteration: 501 / 1000 [ 50%] (Sampling)
#> Chain 1 Iteration: 600 / 1000 [ 60%] (Sampling)
#> Chain 1 Iteration: 700 / 1000 [ 70%] (Sampling)
#> Chain 1 Iteration: 800 / 1000 [ 80%] (Sampling)
#> Chain 2 Iteration: 600 / 1000 [ 60%] (Sampling)
#> Chain 2 Iteration: 700 / 1000 [ 70%] (Sampling)
#> Chain 2 Iteration: 800 / 1000 [ 80%] (Sampling)
#> Chain 1 Iteration: 900 / 1000 [ 90%] (Sampling)
#> Chain 1 Iteration: 1000 / 1000 [100%] (Sampling)
#> Chain 2 Iteration: 900 / 1000 [ 90%] (Sampling)
#> Chain 2 Iteration: 1000 / 1000 [100%] (Sampling)
#> Chain 1 finished in 0.4 seconds.
#> Chain 2 finished in 0.4 seconds.
#>
#> Both chains finished successfully.
#> Mean chain execution time: 0.4 seconds.
#> Total execution time: 0.5 seconds.
#>
summary(mod)
#> GAM formula:
#> y ~ s(season, bs = "cc")
#> <environment: 0x558346d879c0>
#>
#> Family:
#> poisson
#>
#> Link function:
#> log
#>
#> Trend model:
#> AR()
#>
#> N series:
#> 1
#>
#> N timepoints:
#> 75
#>
#> Status:
#> Fitted using Stan
#> 2 chains, each with iter = 1000; warmup = 500; thin = 1
#> Total post-warmup draws = 1000
#>
#> GAM coefficient (beta) estimates:
#> 2.5% 50% 97.5% Rhat n_eff
#> (Intercept) 0.10 0.35 0.570 1 1147
#> s(season).1 -1.50 -0.73 -0.060 1 1635
#> s(season).2 -1.20 -0.49 0.210 1 1443
#> s(season).3 0.20 0.67 1.200 1 1488
#> s(season).4 0.76 1.30 1.700 1 1478
#> s(season).5 -1.90 -0.88 -0.023 1 1245
#> s(season).6 0.56 1.10 1.600 1 1499
#> s(season).7 1.40 1.90 2.400 1 1554
#> s(season).8 -1.20 -0.48 0.110 1 1321
#>
#> Approximate significance of GAM smooths:
#> edf Ref.df Chi.sq p-value
#> s(season) 6.535 8 71.73 <2e-16 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> standard deviation:
#> 2.5% 50% 97.5% Rhat n_eff
#> sigma[1] 0.074 0.17 0.37 1 548
#>
#> precision parameter:
#> 2.5% 50% 97.5% Rhat n_eff
#> tau[1] 7.5 36 180 1 799
#>
#> autoregressive coef 1:
#> 2.5% 50% 97.5% Rhat n_eff
#> ar1[1] -0.92 -0.058 0.87 1 922
#>
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✔ Rhat looks good for all parameters
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#>
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#> crude measure of effective sample size, and Rhat is the potential scale
#> reduction factor on split MCMC chains (at convergence, Rhat = 1)
#>
#> Use how_to_cite() to get started describing this model
conditional_effects(mod, type = 'link')
# Update to an AR2 model
updated_mod <- update(
mod,
trend_model = AR(p = 2),
noncentred = TRUE
)
#> Compiling Stan program using cmdstanr
#>
#> Start sampling
#> Running MCMC with 2 parallel chains...
#>
#> Chain 1 Iteration: 1 / 1000 [ 0%] (Warmup)
#> Chain 1 Iteration: 100 / 1000 [ 10%] (Warmup)
#> Chain 2 Iteration: 1 / 1000 [ 0%] (Warmup)
#> Chain 2 Iteration: 100 / 1000 [ 10%] (Warmup)
#> Chain 1 Iteration: 200 / 1000 [ 20%] (Warmup)
#> Chain 1 Iteration: 300 / 1000 [ 30%] (Warmup)
#> Chain 1 Iteration: 400 / 1000 [ 40%] (Warmup)
#> Chain 1 Iteration: 500 / 1000 [ 50%] (Warmup)
#> Chain 1 Iteration: 501 / 1000 [ 50%] (Sampling)
#> Chain 1 Iteration: 600 / 1000 [ 60%] (Sampling)
#> Chain 2 Iteration: 200 / 1000 [ 20%] (Warmup)
#> Chain 2 Iteration: 300 / 1000 [ 30%] (Warmup)
#> Chain 2 Iteration: 400 / 1000 [ 40%] (Warmup)
#> Chain 2 Iteration: 500 / 1000 [ 50%] (Warmup)
#> Chain 2 Iteration: 501 / 1000 [ 50%] (Sampling)
#> Chain 1 Iteration: 700 / 1000 [ 70%] (Sampling)
#> Chain 1 Iteration: 800 / 1000 [ 80%] (Sampling)
#> Chain 1 Iteration: 900 / 1000 [ 90%] (Sampling)
#> Chain 2 Iteration: 600 / 1000 [ 60%] (Sampling)
#> Chain 2 Iteration: 700 / 1000 [ 70%] (Sampling)
#> Chain 1 Iteration: 1000 / 1000 [100%] (Sampling)
#> Chain 2 Iteration: 800 / 1000 [ 80%] (Sampling)
#> Chain 2 Iteration: 900 / 1000 [ 90%] (Sampling)
#> Chain 2 Iteration: 1000 / 1000 [100%] (Sampling)
#> Chain 1 finished in 0.4 seconds.
#> Chain 2 finished in 0.4 seconds.
#>
#> Both chains finished successfully.
#> Mean chain execution time: 0.4 seconds.
#> Total execution time: 0.5 seconds.
#>
summary(updated_mod)
#> GAM formula:
#> y ~ s(season, bs = "cc")
#> <environment: 0x558346d879c0>
#>
#> Family:
#> poisson
#>
#> Link function:
#> log
#>
#> Trend model:
#> AR(p = 2)
#>
#> N series:
#> 1
#>
#> N timepoints:
#> 75
#>
#> Status:
#> Fitted using Stan
#> 2 chains, each with iter = 1000; warmup = 500; thin = 1
#> Total post-warmup draws = 1000
#>
#> GAM coefficient (beta) estimates:
#> 2.5% 50% 97.5% Rhat n_eff
#> (Intercept) 0.10 0.34 0.570 1 1080
#> s(season).1 -1.60 -0.76 -0.054 1 1285
#> s(season).2 -1.30 -0.47 0.200 1 1448
#> s(season).3 0.17 0.67 1.200 1 1484
#> s(season).4 0.75 1.20 1.800 1 778
#> s(season).5 -1.70 -0.84 -0.100 1 1095
#> s(season).6 0.55 1.10 1.600 1 1626
#> s(season).7 1.40 1.90 2.400 1 1189
#> s(season).8 -1.30 -0.52 0.140 1 1131
#>
#> Approximate significance of GAM smooths:
#> edf Ref.df Chi.sq p-value
#> s(season) 7.168 8 65.63 <2e-16 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> standard deviation:
#> 2.5% 50% 97.5% Rhat n_eff
#> sigma[1] 0.069 0.16 0.35 1 492
#>
#> precision parameter:
#> 2.5% 50% 97.5% Rhat n_eff
#> tau[1] 8.1 37 210 1 844
#>
#> autoregressive coef 1:
#> 2.5% 50% 97.5% Rhat n_eff
#> ar1[1] -0.92 -0.069 0.88 1.01 943
#>
#> autoregressive coef 2:
#> 2.5% 50% 97.5% Rhat n_eff
#> ar2[1] -0.92 -0.041 0.91 1 547
#>
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✔ Rhat looks good for all parameters
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#>
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#> crude measure of effective sample size, and Rhat is the potential scale
#> reduction factor on split MCMC chains (at convergence, Rhat = 1)
#>
#> Use how_to_cite() to get started describing this model
conditional_effects(updated_mod, type = 'link')
# Now update to a Binomial AR1 by adding information on trials
# requires that we supply newdata that contains the 'trials' variable
simdat$data_train$trials <- max(simdat$data_train$y) + 15
updated_mod <- update(
mod,
formula = cbind(y, trials) ~ s(season, bs = 'cc'),
noncentred = TRUE,
data = simdat$data_train,
family = binomial()
)
#> Compiling Stan program using cmdstanr
#>
#> Start sampling
#> Running MCMC with 2 parallel chains...
#>
#> Chain 1 Iteration: 1 / 1000 [ 0%] (Warmup)
#> Chain 1 Iteration: 100 / 1000 [ 10%] (Warmup)
#> Chain 1 Iteration: 200 / 1000 [ 20%] (Warmup)
#> Chain 2 Iteration: 1 / 1000 [ 0%] (Warmup)
#> Chain 2 Iteration: 100 / 1000 [ 10%] (Warmup)
#> Chain 1 Iteration: 300 / 1000 [ 30%] (Warmup)
#> Chain 1 Iteration: 400 / 1000 [ 40%] (Warmup)
#> Chain 1 Iteration: 500 / 1000 [ 50%] (Warmup)
#> Chain 1 Iteration: 501 / 1000 [ 50%] (Sampling)
#> Chain 1 Iteration: 600 / 1000 [ 60%] (Sampling)
#> Chain 2 Iteration: 200 / 1000 [ 20%] (Warmup)
#> Chain 2 Iteration: 300 / 1000 [ 30%] (Warmup)
#> Chain 2 Iteration: 400 / 1000 [ 40%] (Warmup)
#> Chain 2 Iteration: 500 / 1000 [ 50%] (Warmup)
#> Chain 2 Iteration: 501 / 1000 [ 50%] (Sampling)
#> Chain 1 Iteration: 700 / 1000 [ 70%] (Sampling)
#> Chain 1 Iteration: 800 / 1000 [ 80%] (Sampling)
#> Chain 1 Iteration: 900 / 1000 [ 90%] (Sampling)
#> Chain 2 Iteration: 600 / 1000 [ 60%] (Sampling)
#> Chain 2 Iteration: 700 / 1000 [ 70%] (Sampling)
#> Chain 2 Iteration: 800 / 1000 [ 80%] (Sampling)
#> Chain 1 Iteration: 1000 / 1000 [100%] (Sampling)
#> Chain 2 Iteration: 900 / 1000 [ 90%] (Sampling)
#> Chain 2 Iteration: 1000 / 1000 [100%] (Sampling)
#> Chain 1 finished in 0.4 seconds.
#> Chain 2 finished in 0.4 seconds.
#>
#> Both chains finished successfully.
#> Mean chain execution time: 0.4 seconds.
#> Total execution time: 0.5 seconds.
#>
summary(updated_mod)
#> GAM formula:
#> cbind(y, trials) ~ s(season, bs = "cc")
#> <environment: 0x558346d879c0>
#>
#> Family:
#> binomial
#>
#> Link function:
#> logit
#>
#> Trend model:
#> AR()
#>
#> N series:
#> 1
#>
#> N timepoints:
#> 75
#>
#> Status:
#> Fitted using Stan
#> 2 chains, each with iter = 1000; warmup = 500; thin = 1
#> Total post-warmup draws = 1000
#>
#> GAM coefficient (beta) estimates:
#> 2.5% 50% 97.5% Rhat n_eff
#> (Intercept) -3.30 -3.00 -2.800 1 1448
#> s(season).1 -1.70 -0.82 -0.045 1 1642
#> s(season).2 -1.30 -0.53 0.230 1 1973
#> s(season).3 0.15 0.68 1.200 1 1498
#> s(season).4 0.82 1.40 1.900 1 1079
#> s(season).5 -2.10 -1.10 -0.240 1 953
#> s(season).6 0.67 1.30 1.800 1 1012
#> s(season).7 1.60 2.20 2.700 1 1088
#> s(season).8 -1.50 -0.65 0.058 1 1232
#>
#> Approximate significance of GAM smooths:
#> edf Ref.df Chi.sq p-value
#> s(season) 6.399 8 92.53 <2e-16 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> standard deviation:
#> 2.5% 50% 97.5% Rhat n_eff
#> sigma[1] 0.08 0.2 0.48 1 505
#>
#> precision parameter:
#> 2.5% 50% 97.5% Rhat n_eff
#> tau[1] 4.4 25 160 1 712
#>
#> autoregressive coef 1:
#> 2.5% 50% 97.5% Rhat n_eff
#> ar1[1] -0.9 -0.12 0.88 1 718
#>
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✔ Rhat looks good for all parameters
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#>
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#> crude measure of effective sample size, and Rhat is the potential scale
#> reduction factor on split MCMC chains (at convergence, Rhat = 1)
#>
#> Use how_to_cite() to get started describing this model
conditional_effects(updated_mod, type = 'link')
# }