Update an existing mvgam model object — update.mvgam • mvgam

This function allows a previously fitted mvgam model to be updated.

Usage

# S3 method for mvgam
update(
  object,
  formula,
  trend_formula,
  knots,
  trend_knots,
  trend_model,
  family,
  share_obs_params,
  data,
  newdata,
  trend_map,
  use_lv,
  n_lv,
  priors,
  chains,
  burnin,
  samples,
  threads,
  algorithm,
  lfo = FALSE,
  ...
)

# S3 method for jsdgam
update(
  object,
  formula,
  factor_formula,
  knots,
  factor_knots,
  data,
  newdata,
  n_lv,
  family,
  share_obs_params,
  priors,
  chains,
  burnin,
  samples,
  threads,
  algorithm,
  lfo = FALSE,
  ...
)

Arguments

object

list object returned from mvgam. See mvgam()

formula

Optional new formula object. Note, mvgam currently does not support dynamic formula updates such as removal of specific terms with - term. When updating, the entire formula needs to be supplied.

trend_formula

An optional formula object specifying the GAM process model formula. If supplied, a linear predictor will be modelled for the latent trends to capture process model evolution separately from the observation model.

Important notes:

Should not have a response variable specified on the left-hand side (e.g., ~ season + s(year))
Use trend instead of series for effects that vary across time series
Only available for RW(), AR() and VAR() trend models
In nmix() family models, sets up linear predictor for latent abundance
Consider dropping one intercept using - 1 convention to avoid estimation challenges

knots

An optional list containing user specified knot values for basis construction. For most bases the user simply supplies the knots to be used, which must match up with the k value supplied. Different terms can use different numbers of knots, unless they share a covariate.

trend_knots

As for knots above, this is an optional list of knot values for smooth functions within the trend_formula.

trend_model

character or function specifying the time series dynamics for the latent trend.

Available options:

None: No latent trend component (GAM component only, like gam)
ZMVN or ZMVN(): Zero-Mean Multivariate Normal (Stan only)
'RW' or RW(): Random Walk
'AR1', 'AR2', 'AR3' or AR(p = 1, 2, 3): Autoregressive models
'CAR1' or CAR(p = 1): Continuous-time AR (Ornstein–Uhlenbeck process)
'VAR1' or VAR(): Vector Autoregressive (Stan only)
'PWlogistic', 'PWlinear' or PW(): Piecewise trends (Stan only)
'GP' or GP(): Gaussian Process with squared exponential kernel (Stan only)

Additional features:

Moving average and/or correlated process error terms available for most types (e.g., RW(cor = TRUE) for multivariate Random Walk)
Hierarchical correlations possible for structured data
See mvgam_trends for details and ZMVN() for examples

family

family specifying the exponential observation family for the series.

Supported families:

gaussian(): Real-valued data
betar(): Proportional data on (0,1)
lognormal(): Non-negative real-valued data
student_t(): Real-valued data
Gamma(): Non-negative real-valued data
bernoulli(): Binary data
poisson(): Count data (default)
nb(): Overdispersed count data
binomial(): Count data with imperfect detection when number of trials is known (use cbind() to bind observations and trials)
beta_binomial(): As binomial() but allows for overdispersion
nmix(): Count data with imperfect detection when number of trials is unknown (State-Space N-Mixture model with Poisson latent states and Binomial observations)

See mvgam_families for more details.

share_obs_params

logical. If TRUE and the family has additional family-specific observation parameters (e.g., variance components, dispersion parameters), these will be shared across all outcome variables. Useful when multiple outcomes share properties. Default is FALSE.

data

A dataframe or list containing the model response variable and covariates required by the GAM formula and optional trend_formula.

Required columns for most models:

series: A factor index of the series IDs (number of levels should equal number of unique series labels)
time: numeric or integer index of time points. For most dynamic trend types, time should be measured in discrete, regularly spaced intervals (i.e., c(1, 2, 3, ...)). Irregular spacing is allowed for trend_model = CAR(1), but zero intervals are adjusted to 1e-12 to prevent sampling errors.

Special cases:

Models with hierarchical temporal correlation (e.g., AR(gr = region, subgr = species)) should NOT include a series identifier
Models without temporal dynamics (trend_model = 'None' or trend_model = ZMVN()) don't require a time variable

newdata

Optional dataframe or list of test data containing the same variables as in data. If included, observations in variable y will be set to NA when fitting the model so that posterior simulations can be obtained.

trend_map

Optional data.frame specifying which series should depend on which latent trends. Enables multiple series to depend on the same latent trend process with different observation processes.

Required structure:

Column series: Single unique entry for each series (matching factor levels in data)
Column trend: Integer values indicating which trend each series depends on

Notes:

Sets up latent factor model by enabling use_lv = TRUE
Process model intercept is NOT automatically suppressed
Not yet supported for continuous time models (CAR())

use_lv

logical. If TRUE, use dynamic factors to estimate series' latent trends in a reduced dimension format. Only available for RW(), AR() and GP() trend models. Default is FALSE. See lv_correlations for examples.

n_lv

integer specifying the number of latent dynamic factors to use if use_lv == TRUE. Cannot exceed n_series. Default is min(2, floor(n_series / 2)).

priors

An optional data.frame with prior definitions or, preferably, a vector of brmsprior objects (see prior()). See get_mvgam_priors() and Details for more information.

chains

integer specifying the number of parallel chains for the model. Ignored for variational inference algorithms.

burnin

integer specifying the number of warmup iterations to tune sampling algorithms. Ignored for variational inference algorithms.

samples

integer specifying the number of post-warmup iterations for sampling the posterior distribution.

threads

integer. Experimental option for within-chain parallelisation in Stan using reduce_sum. Recommended only for experienced Stan users with slow models. Currently works for all families except nmix() and when using Cmdstan backend.

algorithm

Character string naming the estimation approach:

"sampling": MCMC (default)
"meanfield": Variational inference with factorized normal distributions
"fullrank": Variational inference with multivariate normal distribution
"laplace": Laplace approximation (cmdstanr only)
"pathfinder": Pathfinder algorithm (cmdstanr only)

Can be set globally via "brms.algorithm" option. Limited testing suggests "meanfield" performs best among non-MCMC approximations for dynamic GAMs.

lfo

logical. Indicates whether this is part of lfo_cv.mvgam call. Returns lighter model version for speed. Users should leave as FALSE.

...

Other arguments to be passed to mvgam or jsdgam

factor_formula

Optional new formula object for the factor linear predictors

factor_knots

An optional list containing user specified knot values to be used for basis construction of any smooth terms in factor_formula. For most bases the user simply supplies the knots to be used, which must match up with the k value supplied (note that the number of knots is not always just k). Different terms can use different numbers of knots, unless they share a covariate

Value

A list object of class mvgam containing model output, the text representation of the model file, the mgcv model output (for easily generating simulations at unsampled covariate values), Dunn-Smyth residuals for each outcome variable and key information needed for other functions in the package. See mvgam-class for details. Use methods(class = "mvgam") for an overview on available methods.

A list object of class mvgam containing model output, the text representation of the model file, the mgcv model output (for easily generating simulations at unsampled covariate values), Dunn-Smyth residuals for each series and key information needed for other functions in the package. See mvgam-class for details. Use methods(class = "mvgam") for an overview on available methods.

Author

Nicholas J Clark

Examples

# \donttest{
# Simulate some data and fit a Poisson AR1 model
simdat <- sim_mvgam(n_series = 1, trend_model = AR())

mod <- mvgam(
  y ~ s(season, bs = 'cc'),
  trend_model = AR(),
  noncentred = TRUE,
  data = simdat$data_train,
  chains = 2
)
#> Compiling Stan program using cmdstanr
#> 
#> Start sampling
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 1 finished in 0.4 seconds.
#> Chain 2 finished in 0.4 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 0.4 seconds.
#> Total execution time: 0.5 seconds.
#> 

summary(mod)
#> GAM formula:
#> y ~ s(season, bs = "cc")
#> <environment: 0x558346d879c0>
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> AR()
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 75 
#> 
#> Status:
#> Fitted using Stan 
#> 2 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 1000
#> 
#> GAM coefficient (beta) estimates:
#>              2.5%   50%  97.5% Rhat n_eff
#> (Intercept)  0.10  0.35  0.570    1  1147
#> s(season).1 -1.50 -0.73 -0.060    1  1635
#> s(season).2 -1.20 -0.49  0.210    1  1443
#> s(season).3  0.20  0.67  1.200    1  1488
#> s(season).4  0.76  1.30  1.700    1  1478
#> s(season).5 -1.90 -0.88 -0.023    1  1245
#> s(season).6  0.56  1.10  1.600    1  1499
#> s(season).7  1.40  1.90  2.400    1  1554
#> s(season).8 -1.20 -0.48  0.110    1  1321
#> 
#> Approximate significance of GAM smooths:
#>             edf Ref.df Chi.sq p-value    
#> s(season) 6.535      8  71.73  <2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> standard deviation:
#>           2.5%  50% 97.5% Rhat n_eff
#> sigma[1] 0.074 0.17  0.37    1   548
#> 
#> precision parameter:
#>        2.5% 50% 97.5% Rhat n_eff
#> tau[1]  7.5  36   180    1   799
#> 
#> autoregressive coef 1:
#>         2.5%    50% 97.5% Rhat n_eff
#> ar1[1] -0.92 -0.058  0.87    1   922
#> 
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✔ Rhat looks good for all parameters
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#> 
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#>   crude measure of effective sample size, and Rhat is the potential scale
#>   reduction factor on split MCMC chains (at convergence, Rhat = 1)
#> 
#> Use how_to_cite() to get started describing this model
conditional_effects(mod, type = 'link')


# Update to an AR2 model
updated_mod <- update(
  mod,
  trend_model = AR(p = 2),
  noncentred = TRUE
)
#> Compiling Stan program using cmdstanr
#> 
#> Start sampling
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 1 finished in 0.4 seconds.
#> Chain 2 finished in 0.4 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 0.4 seconds.
#> Total execution time: 0.5 seconds.
#> 

summary(updated_mod)
#> GAM formula:
#> y ~ s(season, bs = "cc")
#> <environment: 0x558346d879c0>
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> AR(p = 2)
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 75 
#> 
#> Status:
#> Fitted using Stan 
#> 2 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 1000
#> 
#> GAM coefficient (beta) estimates:
#>              2.5%   50%  97.5% Rhat n_eff
#> (Intercept)  0.10  0.34  0.570    1  1080
#> s(season).1 -1.60 -0.76 -0.054    1  1285
#> s(season).2 -1.30 -0.47  0.200    1  1448
#> s(season).3  0.17  0.67  1.200    1  1484
#> s(season).4  0.75  1.20  1.800    1   778
#> s(season).5 -1.70 -0.84 -0.100    1  1095
#> s(season).6  0.55  1.10  1.600    1  1626
#> s(season).7  1.40  1.90  2.400    1  1189
#> s(season).8 -1.30 -0.52  0.140    1  1131
#> 
#> Approximate significance of GAM smooths:
#>             edf Ref.df Chi.sq p-value    
#> s(season) 7.168      8  65.63  <2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> standard deviation:
#>           2.5%  50% 97.5% Rhat n_eff
#> sigma[1] 0.069 0.16  0.35    1   492
#> 
#> precision parameter:
#>        2.5% 50% 97.5% Rhat n_eff
#> tau[1]  8.1  37   210    1   844
#> 
#> autoregressive coef 1:
#>         2.5%    50% 97.5% Rhat n_eff
#> ar1[1] -0.92 -0.069  0.88 1.01   943
#> 
#> autoregressive coef 2:
#>         2.5%    50% 97.5% Rhat n_eff
#> ar2[1] -0.92 -0.041  0.91    1   547
#> 
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✔ Rhat looks good for all parameters
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#> 
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#>   crude measure of effective sample size, and Rhat is the potential scale
#>   reduction factor on split MCMC chains (at convergence, Rhat = 1)
#> 
#> Use how_to_cite() to get started describing this model
conditional_effects(updated_mod, type = 'link')


# Now update to a Binomial AR1 by adding information on trials
# requires that we supply newdata that contains the 'trials' variable
simdat$data_train$trials <- max(simdat$data_train$y) + 15

updated_mod <- update(
  mod,
  formula = cbind(y, trials) ~ s(season, bs = 'cc'),
  noncentred = TRUE,
  data = simdat$data_train,
  family = binomial()
)
#> Compiling Stan program using cmdstanr
#> 
#> Start sampling
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 1 finished in 0.4 seconds.
#> Chain 2 finished in 0.4 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 0.4 seconds.
#> Total execution time: 0.5 seconds.
#> 

summary(updated_mod)
#> GAM formula:
#> cbind(y, trials) ~ s(season, bs = "cc")
#> <environment: 0x558346d879c0>
#> 
#> Family:
#> binomial
#> 
#> Link function:
#> logit
#> 
#> Trend model:
#> AR()
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 75 
#> 
#> Status:
#> Fitted using Stan 
#> 2 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 1000
#> 
#> GAM coefficient (beta) estimates:
#>              2.5%   50%  97.5% Rhat n_eff
#> (Intercept) -3.30 -3.00 -2.800    1  1448
#> s(season).1 -1.70 -0.82 -0.045    1  1642
#> s(season).2 -1.30 -0.53  0.230    1  1973
#> s(season).3  0.15  0.68  1.200    1  1498
#> s(season).4  0.82  1.40  1.900    1  1079
#> s(season).5 -2.10 -1.10 -0.240    1   953
#> s(season).6  0.67  1.30  1.800    1  1012
#> s(season).7  1.60  2.20  2.700    1  1088
#> s(season).8 -1.50 -0.65  0.058    1  1232
#> 
#> Approximate significance of GAM smooths:
#>             edf Ref.df Chi.sq p-value    
#> s(season) 6.399      8  92.53  <2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> standard deviation:
#>          2.5% 50% 97.5% Rhat n_eff
#> sigma[1] 0.08 0.2  0.48    1   505
#> 
#> precision parameter:
#>        2.5% 50% 97.5% Rhat n_eff
#> tau[1]  4.4  25   160    1   712
#> 
#> autoregressive coef 1:
#>        2.5%   50% 97.5% Rhat n_eff
#> ar1[1] -0.9 -0.12  0.88    1   718
#> 
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✔ Rhat looks good for all parameters
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#> 
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#>   crude measure of effective sample size, and Rhat is the potential scale
#>   reduction factor on split MCMC chains (at convergence, Rhat = 1)
#> 
#> Use how_to_cite() to get started describing this model
conditional_effects(updated_mod, type = 'link')

# }