Skip to contents

Predict from the GAM component of an mvgam model

Usage

# S3 method for mvgam
predict(
  object,
  newdata,
  data_test,
  type = "link",
  process_error = TRUE,
  summary = TRUE,
  robust = FALSE,
  probs = c(0.025, 0.975),
  ...
)

Arguments

object

list object returned from mvgam. See mvgam()

newdata

Optional dataframe or list of test data containing the same variables that were included in the original data used to fit the model. If not supplied, predictions are generated for the original observations used for the model fit.

data_test

Deprecated. Still works in place of newdata but users are recommended to use newdata instead for more seamless integration into R workflows

type

When this has the value link (default) the linear predictor is calculated on the link scale. If expected is used, predictions reflect the expectation of the response (the mean) but ignore uncertainty in the observation process. When response is used, the predictions take uncertainty in the observation process into account to return predictions on the outcome scale. When variance is used, the variance of the response with respect to the mean (mean-variance relationship) is returned. When type = "terms", each component of the linear predictor is returned separately in the form of a list (possibly with standard errors, if summary = TRUE): this includes parametric model components, followed by each smooth component, but excludes any offset and any intercept. Two special cases are also allowed: type latent_N will return the estimated latent abundances from an N-mixture distribution, while type detection will return the estimated detection probability from an N-mixture distribution

process_error

Logical. If TRUE and a dynamic trend model was fit, expected uncertainty in the process model is accounted for by using draws from the latent trend SD parameters. If FALSE, uncertainty in the latent trend component is ignored when calculating predictions

summary

Should summary statistics be returned instead of the raw values? Default is TRUE..

robust

If FALSE (the default) the mean is used as the measure of central tendency and the standard deviation as the measure of variability. If TRUE, the median and the median absolute deviation (MAD) are applied instead. Only used if summary is TRUE.

probs

The percentiles to be computed by the quantile function. Only used if summary is TRUE.

...

Ignored

Value

Predicted values on the appropriate scale. If summary = FALSE and type != "terms", the output is a matrix of dimension n_draw x n_observations

containing predicted values for each posterior draw in object.

If summary = TRUE and type != "terms", the output is an n_observations x E

matrix. The number of summary statistics E is equal to 2 + length(probs): The Estimate column contains point estimates (either mean or median depending on argument robust), while the Est.Error column contains uncertainty estimates (either standard deviation or median absolute deviation depending on argument robust). The remaining columns starting with Q contain quantile estimates as specified via argument probs.

If type = "terms" and summary = FALSE, the output is a named list

containing a separate slot for each effect, with the effects returned as matrices of dimension n_draw x 1. If summary = TRUE, the output resembles that from predict.gam when using the call predict.gam(object, type = "terms", se.fit = TRUE), where mean contributions from each effect are returned in matrix form while standard errors (representing the interval: (max(probs) - min(probs)) / 2) are returned in a separate matrix

Details

Note that for all types of predictions for models that did not include a trend_formula, uncertainty in the dynamic trend component can be ignored by setting process_error = FALSE. However, if a trend_formula was supplied in the model, predictions for this component cannot be ignored. If process_error = TRUE, trend predictions will ignore autocorrelation coefficients or GP length scale coefficients, ultimately assuming the process is stationary. This method is similar to the types of posterior predictions returned from brms models when using autocorrelated error predictions for newdata. This function is therefore more suited to posterior simulation from the GAM components of a mvgam model, while the forecasting functions plot_mvgam_fc and forecast.mvgam are better suited to generate h-step ahead forecasts that respect the temporal dynamics of estimated latent trends.

Examples

# \donttest{
# Simulate 4 time series with hierarchical seasonality
# and independent AR1 dynamic processes
set.seed(111)
simdat <- sim_mvgam(seasonality = 'hierarchical',
                   trend_model = 'AR1',
                   family = gaussian())

# Fit a model with shared seasonality
mod1 <- mvgam(y ~ s(season, bs = 'cc', k = 6),
             data = simdat$data_train,
             family = gaussian(),
             trend_model = AR(),
             noncentred = TRUE,
             chains = 2)
#> Compiling Stan program using cmdstanr
#> 
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/RtmpARA4r4/model-1333c716465e2.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
#> Start sampling
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 finished in 1.9 seconds.
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 finished in 2.0 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 1.9 seconds.
#> Total execution time: 2.1 seconds.
#> 

# Generate predictions against observed data
preds <- predict(mod1, summary = TRUE)
head(preds)
#>        Estimate Est.Error      Q2.5     Q97.5
#> [1,] -0.1871753 0.7028183 -1.537458 1.2493342
#> [2,] -0.1736423 0.8792129 -1.866770 1.6123694
#> [3,] -0.1753754 0.5374867 -1.369066 0.9080175
#> [4,] -0.1408346 0.6724145 -1.423809 1.2863821
#> [5,] -0.1830376 0.9264275 -2.041098 1.5994410
#> [6,] -0.1722438 0.5117049 -1.178778 0.8864782

# Generate predictions against test data
preds <- predict(mod1, newdata = simdat$data_test, summary = TRUE)
head(preds)
#>       Estimate Est.Error       Q2.5    Q97.5
#> [1,] 0.3547489 0.6825374 -0.9814035 1.735038
#> [2,] 0.3745088 0.8929372 -1.3321812 2.123352
#> [3,] 0.3489994 0.5446603 -0.8114392 1.555448
#> [4,] 0.7608778 0.6994472 -0.6726953 2.235134
#> [5,] 0.7860978 0.9348880 -1.0391548 2.706290
#> [6,] 0.7910239 0.5181663 -0.2329447 1.946642
# }