Skip to contents

Extract the LOOIC (leave-one-out information criterion) using loo::loo()

Usage

# S3 method for mvgam
loo(x, incl_dynamics = TRUE, ...)

# S3 method for mvgam
loo_compare(x, ..., model_names = NULL, incl_dynamics = TRUE)

Arguments

x

Object of class mvgam

incl_dynamics

Logical; indicates if any latent dynamic structures that were included in the model should be considered when calculating in-sample log-likelihoods. Defaults to TRUE

...

More mvgam objects.

model_names

If NULL (the default) will use model names derived from deparsing the call. Otherwise will use the passed values as model names.

Value

for loo.mvgam, an object of class psis_loo (see loo::loo()

for details). For loo_compare.mvgam, an object of class compare.loo ( loo::loo_compare() for details)

Details

When comparing two (or more) fitted mvgam models, we can estimate the difference in their in-sample predictive accuracies using the Expcted Log Predictive Density (ELPD). This metric can be approximated using Pareto Smoothed Importance Sampling, which is a method to re-weight posterior draws to approximate what predictions the models might have made for a given datapoint had that datapoint not been included in the original model fit (i.e. if we were to run a leave-one-out cross-validation and then made a prediction for the held-out datapoint). See details from loo::loo() and loo::loo_compare() for further information on how this importance sampling works.

There are two fundamentally different ways to calculate ELPD from mvgam models that included dynamic latent processes (i.e. "trend_models"). The first is to use the predictions that were generated when estimating these latent processes by setting incl_dynamics = TRUE. This works in the same way that setting incl_autocor = TRUE in brms::prepare_predictions(). But it may also be desirable to compare predictions by considering that the dynamic processes are nuisance parameters that we'd wish to account for when making inferences about other processes in the model (i.e. the linear predictor effects). Setting incl_dynamics = FALSE will accomplish this by ignoring the dynamic processes when making predictions. This option matches up with what mvgam's prediction functions return (i.e. predict.mvgam, ppc, pp_check.mvgam, posterior_epred.mvgam) and will be far less forgiving of models that may be overfitting the training data due to highly flexible dynamic processes (such as Random Walks, for example). However setting incl_dynamics = FALSE will often result in less stable Pareto k diagnostics for models with dynamic trends, making ELPD comparisons difficult and unstable. It is therefore recommended to generally stick with incl_dynamics = TRUE when comparing models based on in-sample fits, and then to perhaps use forecast evaluations for further scrutiny of models (see for example forecast.mvgam, score.mvgam_forecast and lfo_cv)

Examples

# \donttest{
# Simulate 4 time series with hierarchical seasonality
# and independent AR1 dynamic processes
set.seed(111)
simdat <- sim_mvgam(seasonality = 'hierarchical',
                   trend_model = AR(),
                   family = gaussian())

# Fit a model with shared seasonality
mod1 <- mvgam(y ~ s(season, bs = 'cc', k = 6),
             data = rbind(simdat$data_train,
             simdat$data_test),
             family = gaussian(),
             chains = 2,
             silent = 2)
#> Error in get_mvgam_priors(formula = formula, trend_formula = trend_formula,     data = data, family = family, use_lv = use_lv, n_lv = n_lv,     use_stan = TRUE, trend_model = trend_model, trend_map = trend_map,     drift = drift, knots = knots): object 'silent' not found

# Inspect the model and calculate LOO
conditional_effects(mod1)
#> Error in conditional_effects(mod1): object 'mod1' not found
mc.cores.def <- getOption('mc.cores')
options(mc.cores = 1)
loo(mod1)
#> Error in loo(mod1): object 'mod1' not found

# Now fit a model with hierarchical seasonality
mod2 <- update(mod1,
              formula = y ~ s(season, bs = 'cc', k = 6) +
              s(season, series, bs = 'fs',
              xt = list(bs = 'cc'), k = 4),
              chains = 2,
              silent = 2)
#> Error in update(mod1, formula = y ~ s(season, bs = "cc", k = 6) + s(season,     series, bs = "fs", xt = list(bs = "cc"), k = 4), chains = 2,     silent = 2): object 'mod1' not found
conditional_effects(mod2)
#> Error in conditional_effects(mod2): object 'mod2' not found
loo(mod2)
#> Error in loo(mod2): object 'mod2' not found

# Now add AR1 dynamic errors to mod2
mod3 <- update(mod2,
              trend_model = AR(),
              chains = 2,
              silent = 2)
#> Error in update(mod2, trend_model = AR(), chains = 2, silent = 2): object 'mod2' not found
conditional_effects(mod3)
#> Error in conditional_effects(mod3): object 'mod3' not found
plot(mod3, type = 'trend')
#> Error in plot(mod3, type = "trend"): object 'mod3' not found
loo(mod3)
#> Error in loo(mod3): object 'mod3' not found

# Compare models using LOO
loo_compare(mod1, mod2, mod3)
#> Error in loo_compare(mod1, mod2, mod3): object 'mod1' not found
options(mc.cores = mc.cores.def)

# Compare forecast abilities using an expanding training window and
# forecasting ahead 1 timepoint from each window; the first window by includes
# the first 92 timepoints (of the 100 that were simulated)
max(mod2$obs_data$time)
#> Error in eval(expr, envir, enclos): object 'mod2' not found
lfo_mod2 <- lfo_cv(mod2, min_t = 92)
#> Error in lfo_cv(mod2, min_t = 92): object 'mod2' not found
lfo_mod3 <- lfo_cv(mod3, min_t = 92)
#> Error in lfo_cv(mod3, min_t = 92): object 'mod3' not found

# Take the difference in forecast ELPDs; a model with higher ELPD is preferred,
# so negative values here indicate that mod3 gave better forecasts for a particular
# out of sample timepoint
plot(y = lfo_mod2$elpds - lfo_mod3$elpds,
    x = lfo_mod2$eval_timepoints, pch = 16,
    ylab = 'ELPD_mod2 - ELPD_mod3',
    xlab = 'Evaluation timepoint')
#> Error in plot(y = lfo_mod2$elpds - lfo_mod3$elpds, x = lfo_mod2$eval_timepoints,     pch = 16, ylab = "ELPD_mod2 - ELPD_mod3", xlab = "Evaluation timepoint"): object 'lfo_mod2' not found
abline(h = 0, lty = 'dashed')
#> Error in int_abline(a = a, b = b, h = h, v = v, untf = untf, ...): plot.new has not been called yet
# }