Predict from the GAM component of an mvgam model
Arguments
- object
list
object returned frommvgam
. Seemvgam()
- newdata
Optional
dataframe
orlist
of test data containing the same variables that were included in the originaldata
used to fit the model. If not supplied, predictions are generated for the original observations used for the model fit.- data_test
Deprecated. Still works in place of
newdata
but users are recommended to usenewdata
instead for more seamless integration intoR
workflows- type
When this has the value
link
(default) the linear predictor is calculated on the link scale. Ifexpected
is used, predictions reflect the expectation of the response (the mean) but ignore uncertainty in the observation process. Whenresponse
is used, the predictions take uncertainty in the observation process into account to return predictions on the outcome scale. Whenvariance
is used, the variance of the response with respect to the mean (mean-variance relationship) is returned. Whentype = "terms"
, each component of the linear predictor is returned separately in the form of alist
(possibly with standard errors, ifsummary = TRUE
): this includes parametric model components, followed by each smooth component, but excludes any offset and any intercept. Two special cases are also allowed: typelatent_N
will return the estimated latent abundances from an N-mixture distribution, while typedetection
will return the estimated detection probability from an N-mixture distribution- process_error
Logical. If
TRUE
and a dynamic trend model was fit, expected uncertainty in the process model is accounted for by using draws from the latent trend SD parameters. IfFALSE
, uncertainty in the latent trend component is ignored when calculating predictions- summary
Should summary statistics be returned instead of the raw values? Default is
TRUE
..- robust
If
FALSE
(the default) the mean is used as the measure of central tendency and the standard deviation as the measure of variability. IfTRUE
, the median and the median absolute deviation (MAD) are applied instead. Only used ifsummary
isTRUE
.- probs
The percentiles to be computed by the
quantile
function. Only used ifsummary
isTRUE
.- ...
Ignored
Value
Predicted values on the appropriate scale.
If summary = FALSE
and type != "terms"
,
the output is a matrix of dimension n_draw x n_observations
containing predicted values for each posterior draw in object
.
If summary = TRUE
and type != "terms"
, the output is an n_observations
x E
matrix. The number of summary statistics E
is equal to 2 +
length(probs)
: The Estimate
column contains point estimates (either
mean or median depending on argument robust
), while the
Est.Error
column contains uncertainty estimates (either standard
deviation or median absolute deviation depending on argument
robust
). The remaining columns starting with Q
contain
quantile estimates as specified via argument probs
.
If type = "terms"
and summary = FALSE
, the output is a named list
containing a separate slot for each effect, with the effects returned as
matrices of dimension n_draw x 1
. If summary = TRUE
, the output resembles that
from predict.gam
when using the call
predict.gam(object, type = "terms", se.fit = TRUE)
, where mean contributions
from each effect are returned in matrix
form while standard errors (representing
the interval: (max(probs) - min(probs)) / 2
) are returned in a separate matrix
Details
Note that for all types of predictions for models that did not include
a trend_formula
, uncertainty in the dynamic trend
component can be ignored by setting process_error = FALSE
. However,
if a trend_formula
was supplied in the model, predictions for this component cannot be
ignored. If process_error = TRUE
, trend predictions will ignore autocorrelation
coefficients or GP length scale coefficients, ultimately assuming the process is stationary.
This method is similar to the types of posterior predictions returned from brms
models
when using autocorrelated error predictions for newdata.
This function is therefore more suited to posterior simulation from the GAM components
of a mvgam
model, while the forecasting functions
plot_mvgam_fc
and forecast.mvgam
are better suited to generate h-step ahead forecasts
that respect the temporal dynamics of estimated latent trends.
Examples
# \donttest{
# Simulate 4 time series with hierarchical seasonality
# and independent AR1 dynamic processes
set.seed(111)
simdat <- sim_mvgam(seasonality = 'hierarchical',
trend_model = 'AR1',
family = gaussian())
# Fit a model with shared seasonality
mod1 <- mvgam(y ~ s(season, bs = 'cc', k = 6),
data = simdat$data_train,
family = gaussian(),
trend_model = AR(),
noncentred = TRUE,
chains = 2)
#> Compiling Stan program using cmdstanr
#>
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#> from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#> from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#> from stan/lib/stan_math/stan/math/prim.hpp:16,
#> from stan/lib/stan_math/stan/math/rev.hpp:16,
#> from stan/lib/stan_math/stan/math.hpp:19,
#> from stan/src/stan/model/model_header.hpp:4,
#> from C:/Users/uqnclar2/AppData/Local/Temp/RtmpARA4r4/model-1333c716465e2.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#> 194 | if (cdf_n < 0.0)
#> |
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
#> Start sampling
#> Running MCMC with 2 parallel chains...
#>
#> Chain 1 Iteration: 1 / 1000 [ 0%] (Warmup)
#> Chain 2 Iteration: 1 / 1000 [ 0%] (Warmup)
#> Chain 1 Iteration: 100 / 1000 [ 10%] (Warmup)
#> Chain 2 Iteration: 100 / 1000 [ 10%] (Warmup)
#> Chain 1 Iteration: 200 / 1000 [ 20%] (Warmup)
#> Chain 2 Iteration: 200 / 1000 [ 20%] (Warmup)
#> Chain 1 Iteration: 300 / 1000 [ 30%] (Warmup)
#> Chain 1 Iteration: 400 / 1000 [ 40%] (Warmup)
#> Chain 2 Iteration: 300 / 1000 [ 30%] (Warmup)
#> Chain 1 Iteration: 500 / 1000 [ 50%] (Warmup)
#> Chain 2 Iteration: 400 / 1000 [ 40%] (Warmup)
#> Chain 1 Iteration: 501 / 1000 [ 50%] (Sampling)
#> Chain 2 Iteration: 500 / 1000 [ 50%] (Warmup)
#> Chain 2 Iteration: 501 / 1000 [ 50%] (Sampling)
#> Chain 1 Iteration: 600 / 1000 [ 60%] (Sampling)
#> Chain 2 Iteration: 600 / 1000 [ 60%] (Sampling)
#> Chain 1 Iteration: 700 / 1000 [ 70%] (Sampling)
#> Chain 2 Iteration: 700 / 1000 [ 70%] (Sampling)
#> Chain 1 Iteration: 800 / 1000 [ 80%] (Sampling)
#> Chain 2 Iteration: 800 / 1000 [ 80%] (Sampling)
#> Chain 1 Iteration: 900 / 1000 [ 90%] (Sampling)
#> Chain 1 Iteration: 1000 / 1000 [100%] (Sampling)
#> Chain 2 Iteration: 900 / 1000 [ 90%] (Sampling)
#> Chain 1 finished in 1.9 seconds.
#> Chain 2 Iteration: 1000 / 1000 [100%] (Sampling)
#> Chain 2 finished in 2.0 seconds.
#>
#> Both chains finished successfully.
#> Mean chain execution time: 1.9 seconds.
#> Total execution time: 2.1 seconds.
#>
# Generate predictions against observed data
preds <- predict(mod1, summary = TRUE)
head(preds)
#> Estimate Est.Error Q2.5 Q97.5
#> [1,] -0.1871753 0.7028183 -1.537458 1.2493342
#> [2,] -0.1736423 0.8792129 -1.866770 1.6123694
#> [3,] -0.1753754 0.5374867 -1.369066 0.9080175
#> [4,] -0.1408346 0.6724145 -1.423809 1.2863821
#> [5,] -0.1830376 0.9264275 -2.041098 1.5994410
#> [6,] -0.1722438 0.5117049 -1.178778 0.8864782
# Generate predictions against test data
preds <- predict(mod1, newdata = simdat$data_test, summary = TRUE)
head(preds)
#> Estimate Est.Error Q2.5 Q97.5
#> [1,] 0.3547489 0.6825374 -0.9814035 1.735038
#> [2,] 0.3745088 0.8929372 -1.3321812 2.123352
#> [3,] 0.3489994 0.5446603 -0.8114392 1.555448
#> [4,] 0.7608778 0.6994472 -0.6726953 2.235134
#> [5,] 0.7860978 0.9348880 -1.0391548 2.706290
#> [6,] 0.7910239 0.5181663 -0.2329447 1.946642
# }