Evaluate forecasts from fitted mvgam objects
Usage
eval_mvgam(
object,
n_samples = 5000,
eval_timepoint = 3,
fc_horizon = 3,
n_cores = 1,
score = "drps",
log = FALSE,
weights
)
roll_eval_mvgam(
object,
n_evaluations = 5,
evaluation_seq,
n_samples = 5000,
fc_horizon = 3,
n_cores = 1,
score = "drps",
log = FALSE,
weights
)
compare_mvgams(
model1,
model2,
n_samples = 1000,
fc_horizon = 3,
n_evaluations = 10,
n_cores = 1,
score = "drps",
log = FALSE,
weights
)
Arguments
- object
list
object returned frommvgam
- n_samples
integer
specifying the number of samples to generate from the model's posterior distribution- eval_timepoint
integer
indexing the timepoint that represents our last 'observed' set of outcome data- fc_horizon
integer
specifying the length of the forecast horizon for evaluating forecasts- n_cores
Deprecated. Parallel processing is no longer supported
- score
character
specifying the type of ranked probability score to use for evaluation. Options are:variogram
,drps
orcrps
- log
logical
. Should the forecasts and truths be logged prior to scoring? This is often appropriate for comparing performance of models when series vary in their observation ranges- weights
optional
vector
of weights (wherelength(weights) == n_series
) for weighting pairwise correlations when evaluating the variogram score for multivariate forecasts. Useful for down-weighting series that have larger magnitude observations or that are of less interest when forecasting. Ignored ifscore != 'variogram'
- n_evaluations
integer
specifying the total number of evaluations to perform- evaluation_seq
Optional
integer sequence
specifying the exact set of timepoints for evaluating the model's forecasts. This sequence cannot have values<3
or> max(training timepoints) - fc_horizon
- model1
list
object returned frommvgam
representing the first model to be evaluated- model2
list
object returned frommvgam
representing the second model to be evaluated
Value
For eval_mvgam
, a list
object containing information on specific evaluations for each series
(if using drps
or crps
as the score) or a vector of scores when using variogram
.
For roll_eval_mvgam
, a list
object containing information on specific evaluations for each series as well as
a total evaluation summary (taken by summing the forecast score for each series at each evaluation and averaging
the coverages at each evaluation)
For compare_mvgams
, a series of plots comparing forecast Rank Probability Scores for each competing
model. A lower score is preferred. Note however that it is possible to select a model that ultimately
would perform poorly in true out-of-sample forecasting. For example if a wiggly smooth function of 'year'
is included in the model then this function will be learned prior to evaluating rolling window forecasts,
and the model could generate very tight predictions as a result. But when forecasting ahead to timepoints
that the model has not seen (i.e. next year), the smooth function will end up extrapolating, sometimes
in very strange and unexpected ways. It is therefore recommended to only use smooth functions for
covariates that are adequately measured in the data (i.e. 'seasonality', for example) to reduce possible
extrapolation of smooths and let the latent trends in the mvgam
model capture any
temporal dependencies in the data. These trends are time series models and so will provide much more
stable forecasts
Details
eval_mvgam
may be useful when both repeated fitting of a model using update.mvgam
for exact leave-future-out cross-validation and approximate
leave-future-out cross-validation using lfo_cv
are impractical. The function generates a set of samples representing fixed parameters estimated from the full
mvgam
model and latent trend states at a given point in time. The trends are rolled forward
a total of fc_horizon
timesteps according to their estimated state space dynamics to
generate an 'out-of-sample' forecast that is evaluated against the true observations in the horizon window.
This function therefore simulates a situation where the model's parameters had already been estimated but
we have only observed data up to the evaluation timepoint and would like to generate forecasts from the
latent trends that have been observed up to that timepoint. Evaluation involves calculating an
appropriate Rank Probability Score and a binary indicator
for whether or not the true value lies within the forecast's 90% prediction interval
roll_eval_mvgam
sets up a sequence of evaluation timepoints along a rolling window and iteratively
calls eval_mvgam
to evaluate 'out-of-sample' forecasts.
Evaluation involves calculating the Rank Probability Scores and a binary indicator
for whether or not the true value lies within the forecast's 90% prediction interval
compare_mvgams
automates the evaluation to compare two fitted models using rolling window forecast evaluation and
provides a series of summary plots to facilitate model selection. It is essentially a wrapper for
roll_eval_mvgam
Examples
if (FALSE) {
# Simulate from a Poisson-AR2 model with a seasonal smooth
set.seed(100)
dat <- sim_mvgam(T = 75,
n_series = 1,
prop_trend = 0.75,
trend_model = 'AR2',
family = poisson())
# Fit an appropriate model
mod_ar2 <- mvgam(y ~ s(season, bs = 'cc'),
trend_model = AR(p = 2),
family = poisson(),
data = dat$data_train,
newdata = dat$data_test,
chains = 2)
# Fit a less appropriate model
mod_rw <- mvgam(y ~ s(season, bs = 'cc'),
trend_model = RW(),
family = poisson(),
data = dat$data_train,
newdata = dat$data_test,
chains = 2)
# Compare Discrete Ranked Probability Scores for the testing period
fc_ar2 <- forecast(mod_ar2)
fc_rw <- forecast(mod_rw)
score_ar2 <- score(fc_ar2, score = 'drps')
score_rw <- score(fc_rw, score = 'drps')
sum(score_ar2$series_1$score)
sum(score_rw$series_1$score)
# Use rolling evaluation for approximate comparisons of 3-step ahead
# forecasts across the training period
compare_mvgams(mod_ar2,
mod_rw,
fc_horizon = 3,
n_samples = 1000,
n_evaluations = 5)
# Now use approximate leave-future-out CV to compare
# rolling forecasts; start at time point 40 to reduce
# computational time and to ensure enough data is available
# for estimating model parameters
lfo_ar2 <- lfo_cv(mod_ar2,
min_t = 40,
fc_horizon = 3)
lfo_rw <- lfo_cv(mod_rw,
min_t = 40,
fc_horizon = 3)
# Plot Pareto-K values and ELPD estimates
plot(lfo_ar2)
plot(lfo_rw)
# Proportion of timepoints in which AR2 model gives
# better forecasts
length(which((lfo_ar2$elpds - lfo_rw$elpds) > 0)) /
length(lfo_ar2$elpds)
# A higher total ELPD is preferred
lfo_ar2$sum_ELPD
lfo_rw$sum_ELPD
}