Skip to contents

This function estimates the posterior distribution for Generalised Additive Models (GAMs) that can include smooth spline functions, specified in the GAM formula, as well as latent temporal processes, specified by trend_model. Further modelling options include State-Space representations to allow covariates and dynamic processes to occur on the latent 'State' level while also capturing observation-level effects. Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their beliefs. In addition, model fits can easily be assessed and compared with posterior predictive checks, forecast comparisons and leave-one-out / leave-future-out cross-validation.

Usage

mvgam(
  formula,
  trend_formula,
  knots,
  trend_knots,
  trend_model = "None",
  noncentred = FALSE,
  family = poisson(),
  share_obs_params = FALSE,
  data,
  newdata,
  use_lv = FALSE,
  n_lv,
  trend_map,
  priors,
  run_model = TRUE,
  prior_simulation = FALSE,
  residuals = TRUE,
  return_model_data = FALSE,
  backend = getOption("brms.backend", "cmdstanr"),
  algorithm = getOption("brms.algorithm", "sampling"),
  control = list(max_treedepth = 10, adapt_delta = 0.8),
  chains = 4,
  burnin = 500,
  samples = 500,
  thin = 1,
  parallel = TRUE,
  threads = 1,
  save_all_pars = FALSE,
  silent = 1,
  autoformat = TRUE,
  refit = FALSE,
  lfo = FALSE,
  ...
)

Arguments

formula

A formula object specifying the GAM observation model formula. These are exactly like the formula for a GLM except that smooth terms, s(), te(), ti(), t2(), as well as time-varying dynamic() terms, nonparametric gp() terms and offsets using offset(), can be added to the right hand side to specify that the linear predictor depends on smooth functions of predictors (or linear functionals of these). In nmix() family models, the formula is used to set up a linear predictor for the detection probability. Details of the formula syntax used by mvgam can be found in mvgam_formulae

trend_formula

An optional formula object specifying the GAM process model formula. If supplied, a linear predictor will be modelled for the latent trends to capture process model evolution separately from the observation model. Should not have a response variable specified on the left-hand side of the formula (i.e. a valid option would be ~ season + s(year)). Also note that you should not use the identifier series in this formula to specify effects that vary across time series. Instead you should use trend. This will ensure that models in which a trend_map is supplied will still work consistently (i.e. by allowing effects to vary across process models, even when some time series share the same underlying process model). This feature is only currently available for RW(), AR() and VAR() trend models. In nmix() family models, the trend_formula is used to set up a linear predictor for the underlying latent abundance. Be aware that it can be very challenging to simultaneously estimate intercept parameters for both the observation mode (captured by formula) and the process model (captured by trend_formula). Users are recommended to drop one of these using the - 1 convention in the formula right hand side.

knots

An optional list containing user specified knot values to be used for basis construction. For most bases the user simply supplies the knots to be used, which must match up with the k value supplied (note that the number of knots is not always just k). Different terms can use different numbers of knots, unless they share a covariate

trend_knots

As for knots above, this is an optional list of knot values for smooth functions within the trend_formula

trend_model

character or function specifying the time series dynamics for the latent trend. Options are:

  • None (no latent trend component; i.e. the GAM component is all that contributes to the linear predictor, and the observation process is the only source of error; similarly to what is estimated by gam)

  • ZMVN or ZMVN() (Zero-Mean Multivariate Normal; only available in Stan)

  • 'RW' or RW()

  • 'AR1' or AR(p = 1)

  • 'AR2' or AR(p = 2)

  • 'AR3' or AR(p = 3)

  • 'CAR1' or CAR(p = 1)

  • 'VAR1' or VAR()(only available in Stan)

  • 'PWlogistic, 'PWlinear' or PW() (only available in Stan)

  • 'GP' or GP() (Gaussian Process with squared exponential kernel; only available in Stan)

For all trend types apart from ZMVN(), GP(), CAR() and PW(), moving average and/or correlated process error terms can also be estimated (for example, RW(cor = TRUE) will set up a multivariate Random Walk if n_series > 1). It is also possible for many multivariate trends to estimate hierarchical correlations if the data are structured among levels of a relevant grouping factor. See mvgam_trends for more details and see ZMVN for an example.

noncentred

logical Use the non-centred parameterisation for autoregressive trend models? Setting to TRUE will reparameterise the model to avoid possible degeneracies that can show up when estimating the latent dynamic random effects. For some models, this can produce big gains in efficiency, meaning that fewer burnin and sampling iterations are required for posterior exploration. But for other models, where the data are highly informative about the latent dynamic processes, this can actually lead to worse performance. Only available for certain trend models (i.e. RW(), AR(), or CAR(), or for trend = 'None' when using a trend_formula). Not yet available for moving average or correlated error models

family

family specifying the exponential observation family for the series. Currently supported families are:

  • gaussian() for real-valued data

  • betar() for proportional data on (0,1)

  • lognormal() for non-negative real-valued data

  • student_t() for real-valued data

  • Gamma() for non-negative real-valued data

  • bernoulli() for binary data

  • poisson() for count data

  • nb() for overdispersed count data

  • binomial() for count data with imperfect detection when the number of trials is known; note that the cbind() function must be used to bind the discrete observations and the discrete number of trials

  • beta_binomial() as for binomial() but allows for overdispersion

  • nmix() for count data with imperfect detection when the number of trials is unknown and should be modeled via a State-Space N-Mixture model. The latent states are Poisson, capturing the 'true' latent abundance, while the observation process is Binomial to account for imperfect detection. See mvgam_families for an example of how to use this family

Default is poisson(). See mvgam_families for more details

share_obs_params

logical. If TRUE and the family has additional family-specific observation parameters (e.g. variance components in student_t() or gaussian(), or dispersion parameters in nb() or betar()), these parameters will be shared across all outcome variables. This is handy if you have multiple outcomes (time series in most mvgam models) that you believe share some properties, such as being from the same species over different spatial units. Default is FALSE.

data

A dataframe or list containing the model response variable and covariates required by the GAM formula and optional trend_formula. Most models should include columns:

  • series (a factor index of the series IDs; the number of levels should be identical to the number of unique series labels (i.e. n_series = length(levels(data$series))))

  • time (numeric or integer index of the time point for each observation). For most dynamic trend types available in mvgam (see argument trend_model), time should be measured in discrete, regularly spaced intervals (i.e. c(1, 2, 3, ...)). However you can use irregularly spaced intervals if using trend_model = CAR(1), though note that any temporal intervals that are exactly 0 will be adjusted to a very small number (1e-12) to prevent sampling errors. See an example of CAR() trends in CAR

Note however that there are special cases where these identifiers are not needed. For example, models with hierarchical temporal correlation processes (e.g. AR(gr = region, subgr = species)) should NOT include a series identifier, as this will be constructed internally (see mvgam_trends and AR for details). mvgam can also fit models that do not include a time variable if there are no temporal dynamic structures included (i.e. trend_model = 'None' or trend_model = ZMVN()). data should also include any other variables to be included in the linear predictor of formula

newdata

Optional dataframe or list of test data containing the same variables as in data. If included, the observations in variable y will be set to NA when fitting the model so that posterior simulations can be obtained

use_lv

logical. If TRUE, use dynamic factors to estimate series' latent trends in a reduced dimension format. Only available for RW(), AR() and GP() trend models. Defaults to FALSE

n_lv

integer the number of latent dynamic factors to use if use_lv == TRUE. Cannot be > n_series. Defaults arbitrarily to min(2, floor(n_series / 2))

trend_map

Optional data.frame specifying which series should depend on which latent trends. Useful for allowing multiple series to depend on the same latent trend process, but with different observation processes. If supplied, a latent factor model is set up by setting use_lv = TRUE and using the mapping to set up the shared trends. Needs to have column names series and trend, with integer values in the trend column to state which trend each series should depend on. The series column should have a single unique entry for each series in the data (names should perfectly match factor levels of the series variable in data). Note that if this is supplied, the intercept parameter in the process model will NOT be automatically suppressed. See examples for details

priors

An optional data.frame with prior definitions or, preferentially, a vector containing objects of class brmsprior (see. prior for details). See get_mvgam_priors and Details' for more information on changing default prior distributions

run_model

logical. If FALSE, the model is not fitted but instead the function will return the model file and the data / initial values that are needed to fit the model outside of mvgam

prior_simulation

logical. If TRUE, no observations are fed to the model, and instead simulations from prior distributions are returned

residuals

Logical indicating whether to compute series-level randomized quantile residuals and include them as part of the returned object. Defaults to TRUE, but you can set to FALSE to save computational time and reduce the size of the returned object (users can always add residuals to an object of class mvgam using add_residuals)

return_model_data

logical. If TRUE, the list of data that is needed to fit the model is returned, along with the initial values for smooth and AR parameters, once the model is fitted. This will be helpful if users wish to modify the model file to add other stochastic elements that are not currently available in mvgam. Default is FALSE to reduce the size of the returned object, unless run_model == FALSE

backend

Character string naming the package to use as the backend for fitting the Stan model. Options are "cmdstanr" (the default) or "rstan". Can be set globally for the current R session via the "brms.backend" option (see options). Details on the rstan and cmdstanr packages are available at https://mc-stan.org/rstan/ and https://mc-stan.org/cmdstanr/, respectively

algorithm

Character string naming the estimation approach to use. Options are "sampling" for MCMC (the default), "meanfield" for variational inference with factorized normal distributions, "fullrank" for variational inference with a multivariate normal distribution, "laplace" for a Laplace approximation (only available when using cmdstanr as the backend) or "pathfinder" for the pathfinder algorithm (only currently available when using cmdstanr as the backend). Can be set globally for the current R session via the "brms.algorithm" option (see options). Limited testing suggests that "meanfield" performs best out of the non-MCMC approximations for dynamic GAMs, possibly because of the difficulties estimating covariances among the many spline parameters and latent trend parameters. But rigorous testing has not been carried out

control

A named list for controlling the sampler's behaviour. Currently only accepts settings for

chains

integer specifying the number of parallel chains for the model. Ignored if algorithm %in% c('meanfield', 'fullrank', 'pathfinder', 'laplace')

burnin

integer specifying the number of warmup iterations of the Markov chain to run to tune sampling algorithms. Ignored if algorithm %in% c('meanfield', 'fullrank', 'pathfinder', 'laplace')

samples

integer specifying the number of post-warmup iterations of the Markov chain to run for sampling the posterior distribution

thin

Thinning interval for monitors. Ignored if algorithm %in% c('meanfield', 'fullrank', 'pathfinder', 'laplace')

parallel

logical specifying whether multiple cores should be used for generating MCMC simulations in parallel. If TRUE, the number of cores to use will be min(c(chains, parallel::detectCores() - 1))

threads

integer Experimental option to use multithreading for within-chain parallelisation in Stan. We recommend its use only if you are experienced with Stan's reduce_sum function and have a slow running model that cannot be sped up by any other means. Currently works for all families apart from nmix() and when using Cmdstan as the backend

save_all_pars

Logical flag to indicate if draws from all variables defined in Stan's parameters block should be saved (default is FALSE).

silent

Verbosity level between 0 and 2. If 1 (the default), most of the informational messages of compiler and sampler are suppressed. If 2, even more messages are suppressed. The actual sampling progress is still printed. Set refresh = 0 to turn this off as well. If using backend = "rstan" you can also set open_progress = FALSE to prevent opening additional progress bars.

autoformat

Logical. Use the stanc parser to automatically format the Stan code and check for deprecations. Only for development purposes, so leave to TRUE

refit

Logical indicating whether this is a refit, called using update.mvgam. Users should leave as FALSE

lfo

Logical indicating whether this is part of a call to lfo_cv.mvgam. Returns a lighter version of the model with no residuals and fewer monitored parameters to speed up post-processing. But other downstream functions will not work properly, so users should always leave this set as FALSE

...

Further arguments passed to Stan. For backend = "rstan" the arguments are passed to sampling or vb. For backend = "cmdstanr" the arguments are passed to the cmdstanr::sample, cmdstanr::variational, cmdstanr::laplace or cmdstanr::pathfinder method

Value

A list object of class mvgam containing model output, the text representation of the model file, the mgcv model output (for easily generating simulations at unsampled covariate values), Dunn-Smyth residuals for each series and key information needed for other functions in the package. See mvgam-class for details. Use methods(class = "mvgam") for an overview on available methods.

Details

Dynamic GAMs are useful when we wish to predict future values from time series that show temporal dependence but we do not want to rely on extrapolating from a smooth term (which can sometimes lead to unpredictable and unrealistic behaviours). In addition, smooths can often try to wiggle excessively to capture any autocorrelation that is present in a time series, which exacerbates the problem of forecasting ahead. As GAMs are very naturally viewed through a Bayesian lens, and we often must model time series that show complex distributional features and missing data, parameters for mvgam models are estimated in a Bayesian framework using Markov Chain Monte Carlo by default. A general overview is provided in the primary vignettes: vignette("mvgam_overview") and vignette("data_in_mvgam"). For a full list of available vignettes see vignette(package = "mvgam")

Formula syntax: Details of the formula syntax used by mvgam can be found in mvgam_formulae. Note that it is possible to supply an empty formula where there are no predictors or intercepts in the observation model (i.e. y ~ 0 or y ~ -1). In this case, an intercept-only observation model will be set up but the intercept coefficient will be fixed at zero. This can be handy if you wish to fit pure State-Space models where the variation in the dynamic trend controls the average expectation, and/or where intercepts are non-identifiable (as in piecewise trends, see examples below)

Families and link functions: Details of families supported by mvgam can be found in mvgam_families.

Trend models: Details of latent error process models supported by mvgam can be found in mvgam_trends.

Priors: Default priors for intercepts and any scale parameters are generated using the same practice as brms. Prior distributions for most important model parameters can be altered by the user to inspect model sensitivities to given priors (see get_mvgam_priors for details). Note that latent trends are estimated on the link scale so choose priors accordingly. However more control over the model specification can be accomplished by first using mvgam as a baseline, then editing the returned model accordingly. The model file can be edited and run outside of mvgam by setting run_model = FALSE and this is encouraged for complex modelling tasks. Note, no priors are formally checked to ensure they are in the right syntax so it is up to the user to ensure these are correct

Random effects: For any smooth terms using the random effect basis (smooth.construct.re.smooth.spec), a non-centred parameterisation is automatically employed to avoid degeneracies that are common in hierarchical models. Note however that centred versions may perform better for series that are particularly informative, so as with any foray into Bayesian modelling, it is worth building an understanding of the model's assumptions and limitations by following a principled workflow. Also note that models are parameterised using drop.unused.levels = FALSE in jagam to ensure predictions can be made for all levels of the supplied factor variable

Observation level parameters: When more than one series is included in data and an observation family that contains more than one parameter is used, additional observation family parameters (i.e. phi for nb() or sigma for gaussian()) are by default estimated independently for each series. But if you wish for the series to share the same observation parameters, set share_obs_params = TRUE

Residuals: For each series, randomized quantile (i.e. Dunn-Smyth) residuals are calculated for inspecting model diagnostics If the fitted model is appropriate then Dunn-Smyth residuals will be standard normal in distribution and no autocorrelation will be evident. When a particular observation is missing, the residual is calculated by comparing independent draws from the model's posterior distribution

Using Stan: mvgam is primarily designed to use Hamiltonian Monte Carlo for parameter estimation via the software Stan (using either the cmdstanr or rstan interface). There are great advantages when using Stan over Gibbs / Metropolis Hastings samplers, which includes the option to estimate nonlinear effects via Hilbert space approximate Gaussian Processes, the availability of a variety of inference algorithms (i.e. variational inference, laplacian inference etc...) and capabilities to enforce stationarity for complex Vector Autoregressions. Because of the many advantages of Stan over JAGS, further development of the package will only be applied to Stan. This includes the planned addition of more response distributions, plans to handle zero-inflation, and plans to incorporate a greater variety of trend models. Users are strongly encouraged to opt for Stan over JAGS in any proceeding workflows

How to start?: The mvgam cheatsheet is a good starting place if you are just learning to use the package. It gives an overview of the package's key functions and objects, as well as providing a reasonable workflow that new users can follow. In general it is recommended to

References

Nicholas J Clark & Konstans Wells (2020). Dynamic generalised additive models (DGAMs) for forecasting discrete ecological time series. Methods in Ecology and Evolution. 14:3, 771-784.

Author

Nicholas J Clark

Examples

# \donttest{
# Simulate a collection of three time series that have shared seasonal dynamics
# and independent AR1 trends, with a Poisson observation process
set.seed(0)
dat <- sim_mvgam(T = 80,
                n_series = 3,
                mu = 2,
                trend_model = AR(p = 1),
                prop_missing = 0.1,
                prop_trend = 0.6)

# Plot key summary statistics for a single series
plot_mvgam_series(data = dat$data_train, series = 1)


# Plot all series together
plot_mvgam_series(data = dat$data_train, series = 'all')


# Formulate a model using Stan where series share a cyclic smooth for
# seasonality and each series has an independent AR1 temporal process.
# Note that 'noncentred = TRUE' will likely give performance gains.
# Set run_model = FALSE to inspect the returned objects
mod1 <- mvgam(formula = y ~ s(season, bs = 'cc', k = 6),
             data = dat$data_train,
             trend_model = AR(),
             family = poisson(),
             noncentred = TRUE,
             run_model = FALSE)

# View the model code in Stan language
stancode(mod1)
#> // Stan model code generated by package mvgam
#> data {
#>   int<lower=0> total_obs; // total number of observations
#>   int<lower=0> n; // number of timepoints per series
#>   int<lower=0> n_sp; // number of smoothing parameters
#>   int<lower=0> n_series; // number of series
#>   int<lower=0> num_basis; // total number of basis coefficients
#>   vector[num_basis] zero; // prior locations for basis coefficients
#>   matrix[total_obs, num_basis] X; // mgcv GAM design matrix
#>   array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
#>   matrix[4, 4] S1; // mgcv smooth penalty matrix S1
#>   int<lower=0> n_nonmissing; // number of nonmissing observations
#>   array[n_nonmissing] int<lower=0> flat_ys; // flattened nonmissing observations
#>   matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
#>   array[n_nonmissing] int<lower=0> obs_ind; // indices of nonmissing observations
#> }
#> parameters {
#>   // raw basis coefficients
#>   vector[num_basis] b_raw;
#>   
#>   // latent trend AR1 terms
#>   vector<lower=-1, upper=1>[n_series] ar1;
#>   
#>   // latent trend variance parameters
#>   vector<lower=0>[n_series] sigma;
#>   
#>   // raw latent trends
#>   matrix[n, n_series] trend_raw;
#>   
#>   // smoothing parameters
#>   vector<lower=0>[n_sp] lambda;
#> }
#> transformed parameters {
#>   // basis coefficients
#>   vector[num_basis] b;
#>   
#>   // latent trends
#>   matrix[n, n_series] trend;
#>   trend = trend_raw .* rep_matrix(sigma', rows(trend_raw));
#>   for (s in 1 : n_series) {
#>     trend[2 : n, s] += ar1[s] * trend[1 : (n - 1), s];
#>   }
#>   b[1 : num_basis] = b_raw[1 : num_basis];
#> }
#> model {
#>   // prior for (Intercept)...
#>   b_raw[1] ~ student_t(3, 1.9, 2.5);
#>   
#>   // prior for s(season)...
#>   b_raw[2 : 5] ~ multi_normal_prec(zero[2 : 5], S1[1 : 4, 1 : 4] * lambda[1]);
#>   
#>   // priors for AR parameters
#>   ar1 ~ std_normal();
#>   
#>   // priors for smoothing parameters
#>   lambda ~ normal(5, 30);
#>   
#>   // priors for latent trend variance parameters
#>   sigma ~ student_t(3, 0, 2.5);
#>   to_vector(trend_raw) ~ std_normal();
#>   {
#>     // likelihood functions
#>     vector[n_nonmissing] flat_trends;
#>     flat_trends = to_vector(trend)[obs_ind];
#>     flat_ys ~ poisson_log_glm(append_col(flat_xs, flat_trends), 0.0,
#>                               append_row(b, 1.0));
#>   }
#> }
#> generated quantities {
#>   vector[total_obs] eta;
#>   matrix[n, n_series] mus;
#>   vector[n_sp] rho;
#>   vector[n_series] tau;
#>   array[n, n_series] int ypred;
#>   rho = log(lambda);
#>   for (s in 1 : n_series) {
#>     tau[s] = pow(sigma[s], -2.0);
#>   }
#>   
#>   // posterior predictions
#>   eta = X * b;
#>   for (s in 1 : n_series) {
#>     mus[1 : n, s] = eta[ytimes[1 : n, s]] + trend[1 : n, s];
#>     ypred[1 : n, s] = poisson_log_rng(mus[1 : n, s]);
#>   }
#> }
#> 
#> 

# View the data objects needed to fit the model in Stan
sdata1 <- standata(mod1)
str(sdata1)
#> List of 18
#>  $ y           : num [1:60, 1:3] 4 5 7 39 51 26 6 6 4 2 ...
#>  $ n           : int 60
#>  $ X           : num [1:180, 1:5] 1 1 1 1 1 1 1 1 1 1 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:5] "X.Intercept." "V2" "V3" "V4" ...
#>  $ S1          : num [1:4, 1:4] 1.244 -0.397 0.384 0.619 -0.397 ...
#>  $ zero        : num [1:5] 0 0 0 0 0
#>  $ p_coefs     : Named num 0
#>   ..- attr(*, "names")= chr "(Intercept)"
#>  $ p_taus      : num 0.853
#>  $ ytimes      : int [1:60, 1:3] 1 4 7 10 13 16 19 22 25 28 ...
#>  $ n_series    : int 3
#>  $ sp          : Named num 0.368
#>   ..- attr(*, "names")= chr "s(season)"
#>  $ y_observed  : num [1:60, 1:3] 1 1 1 1 1 1 1 1 1 1 ...
#>  $ total_obs   : int 180
#>  $ num_basis   : int 5
#>  $ n_sp        : num 1
#>  $ n_nonmissing: int 164
#>  $ obs_ind     : int [1:164] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ flat_ys     : num [1:164] 4 5 7 39 51 26 6 6 4 2 ...
#>  $ flat_xs     : num [1:164, 1:5] 1 1 1 1 1 1 1 1 1 1 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:5] "X.Intercept." "V2" "V3" "V4" ...
#>  - attr(*, "trend_model")= chr "AR1"

# Now fit the model
mod1 <- mvgam(formula = y ~ s(season, bs = 'cc', k = 6),
              data = dat$data_train,
              trend_model = AR(),
              family = poisson(),
              noncentred = TRUE,
              chains = 2)
#> Compiling Stan program using cmdstanr
#> 
#> Start sampling
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 finished in 1.9 seconds.
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 1 finished in 2.2 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 2.0 seconds.
#> Total execution time: 2.5 seconds.
#> 

# Extract the model summary
summary(mod1)
#> GAM formula:
#> y ~ s(season, bs = "cc", k = 6)
#> <environment: 0x0000025921362870>
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> AR()
#> 
#> 
#> N series:
#> 3 
#> 
#> N timepoints:
#> 60 
#> 
#> Status:
#> Fitted using Stan 
#> 2 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 1000
#> 
#> 
#> GAM coefficient (beta) estimates:
#>               2.5%   50% 97.5% Rhat n_eff
#> (Intercept)  1.900  2.00  2.10    1   704
#> s(season).1  0.047  0.30  0.52    1   586
#> s(season).2  0.590  0.82  1.10    1   491
#> s(season).3 -0.045  0.18  0.42    1   676
#> s(season).4 -0.660 -0.42 -0.18    1   698
#> 
#> Approximate significance of GAM smooths:
#>           edf Ref.df Chi.sq p-value    
#> s(season) 3.8      4   35.4  <2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Latent trend parameter AR estimates:
#>           2.5%   50% 97.5% Rhat n_eff
#> ar1[1]    0.33  0.71 0.980 1.00   335
#> ar1[2]   -0.96 -0.41 0.032 1.01   197
#> ar1[3]    0.21  0.69 0.970 1.00   279
#> sigma[1]  0.41  0.56 0.770 1.01   413
#> sigma[2]  0.33  0.49 0.700 1.00   338
#> sigma[3]  0.37  0.51 0.700 1.00   337
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 0 of 1000 iterations ended with a divergence (0%)
#> 0 of 1000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Fri Nov 22 7:55:50 AM 2024.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)

# Plot the estimated historical trend and forecast for one series
plot(mod1, type = 'trend', series = 1)

plot(mod1, type = 'forecast', series = 1)


# Residual diagnostics
plot(mod1, type = 'residuals', series = 1)

resids <- residuals(mod1)
str(resids)
#>  num [1:180, 1:4] -0.152 NaN -0.123 0.295 -0.862 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : NULL
#>   ..$ : chr [1:4] "Estimate" "Est.Error" "Q2.5" "Q97.5"

# Fitted values and residuals can also be added to training data
augment(mod1)
#> # A tibble: 180 × 14
#>        y season  year series    time .observed .fitted .fit.variability
#>    <int>  <int> <int> <fct>    <int>     <int>   <dbl>            <dbl>
#>  1     4      1     1 series_1     1         4    4.67             1.59
#>  2    NA      1     1 series_2     1        NA    6.76             3.47
#>  3     4      1     1 series_3     1         4    4.56             1.50
#>  4     5      2     1 series_1     2         5    4.72             1.72
#>  5     2      2     1 series_2     2         2    3.87             1.43
#>  6    NA      2     1 series_3     2        NA    5.10             2.75
#>  7     7      3     1 series_1     3         7    8.53             2.39
#>  8    12      3     1 series_2     3        12   11.4              2.88
#>  9     4      3     1 series_3     3         4    5.34             1.81
#> 10    39      4     1 series_1     4        39   35.9              5.66
#> # ℹ 170 more rows
#> # ℹ 6 more variables: .fit.cred.low <dbl>, .fit.cred.high <dbl>, .resid <dbl>,
#> #   .resid.variability <dbl>, .resid.cred.low <dbl>, .resid.cred.high <dbl>

# Compute the forecast using covariate information in data_test
fc <- forecast(mod1, newdata = dat$data_test)
str(fc)
#> List of 16
#>  $ call              :Class 'formula'  language y ~ s(season, bs = "cc", k = 6)
#>   .. ..- attr(*, ".Environment")=<environment: 0x0000025921362870> 
#>  $ trend_call        : NULL
#>  $ family            : chr "poisson"
#>  $ family_pars       : NULL
#>  $ trend_model       :List of 7
#>   ..$ trend_model: chr "AR1"
#>   ..$ ma         : logi FALSE
#>   ..$ cor        : logi FALSE
#>   ..$ unit       : chr "time"
#>   ..$ gr         : chr "NA"
#>   ..$ subgr      : chr "series"
#>   ..$ label      : language AR()
#>   ..- attr(*, "class")= chr "mvgam_trend"
#>  $ drift             : logi FALSE
#>  $ use_lv            : logi FALSE
#>  $ fit_engine        : chr "stan"
#>  $ type              : chr "response"
#>  $ series_names      : Factor w/ 3 levels "series_1","series_2",..: 1 2 3
#>  $ train_observations:List of 3
#>   ..$ series_1: int [1:60] 4 5 7 39 51 26 6 6 4 2 ...
#>   ..$ series_2: int [1:60] NA 2 12 16 6 31 9 15 5 3 ...
#>   ..$ series_3: int [1:60] 4 NA 4 NA NA 16 7 7 3 NA ...
#>  $ train_times       : int [1:60] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ test_observations :List of 3
#>   ..$ series_1: int [1:20] 1 NA NA 13 18 20 16 6 NA 4 ...
#>   ..$ series_2: int [1:20] 4 36 8 6 7 NA NA 1 6 4 ...
#>   ..$ series_3: int [1:20] 6 8 5 5 19 14 1 1 7 0 ...
#>  $ test_times        : int [1:20] 61 62 63 64 65 66 67 68 69 70 ...
#>  $ hindcasts         :List of 3
#>   ..$ series_1: num [1:1000, 1:60] 0 3 5 2 7 7 7 4 7 3 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:60] "ypred[1,1]" "ypred[2,1]" "ypred[3,1]" "ypred[4,1]" ...
#>   ..$ series_2: num [1:1000, 1:60] 13 3 18 5 7 6 20 0 3 3 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:60] "ypred[1,2]" "ypred[2,2]" "ypred[3,2]" "ypred[4,2]" ...
#>   ..$ series_3: num [1:1000, 1:60] 8 3 3 7 12 6 5 3 3 9 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : NULL
#>   .. .. ..$ : chr [1:60] "ypred[1,3]" "ypred[2,3]" "ypred[3,3]" "ypred[4,3]" ...
#>  $ forecasts         :List of 3
#>   ..$ series_1: int [1:1000, 1:20] 8 2 1 11 5 5 3 7 16 4 ...
#>   ..$ series_2: int [1:1000, 1:20] 5 2 3 5 28 24 9 7 3 7 ...
#>   ..$ series_3: int [1:1000, 1:20] 4 12 15 4 5 13 10 1 4 8 ...
#>  - attr(*, "class")= chr "mvgam_forecast"
plot(fc)
#> Out of sample DRPS:
#> 57.690845


# Plot the estimated seasonal smooth function
plot(mod1, type = 'smooths')


# Plot estimated first derivatives of the smooth
plot(mod1, type = 'smooths', derivatives = TRUE)


# Plot partial residuals of the smooth
plot(mod1, type = 'smooths', residuals = TRUE)


# Plot posterior realisations for the smooth
plot(mod1, type = 'smooths', realisations = TRUE)


# Plot conditional response predictions using marginaleffects
library(marginaleffects)
conditional_effects(mod1)

plot_predictions(mod1, condition = 'season', points = 0.5)
#> Warning: Removed 16 rows containing missing values or values outside the scale range
#> (`geom_point()`).


# Generate posterior predictive checks using bayesplot
pp_check(mod1)
#> Using 10 posterior draws for ppc type 'dens_overlay' by default.
#> Warning: NA responses are not shown in 'pp_check'.


# Extract observation model beta coefficient draws as a data.frame
beta_draws_df <- as.data.frame(mod1, variable = 'betas')
head(beta_draws_df)
#>   (Intercept) s(season).1 s(season).2 s(season).3 s(season).4
#> 1     2.03254    0.299574    0.729872 -0.08774370   -0.152966
#> 2     1.93907    0.389768    0.775010  0.17542300   -0.338085
#> 3     1.96962    0.373643    0.756057  0.14505700   -0.108823
#> 4     1.99381    0.293762    0.869831  0.10669500   -0.353948
#> 5     2.02935    0.156223    0.861661 -0.00555928   -0.397319
#> 6     1.92826    0.288386    0.894447  0.07278630   -0.402713
str(beta_draws_df)
#> 'data.frame':	1000 obs. of  5 variables:
#>  $ (Intercept): num  2.03 1.94 1.97 1.99 2.03 ...
#>  $ s(season).1: num  0.3 0.39 0.374 0.294 0.156 ...
#>  $ s(season).2: num  0.73 0.775 0.756 0.87 0.862 ...
#>  $ s(season).3: num  -0.08774 0.17542 0.14506 0.10669 -0.00556 ...
#>  $ s(season).4: num  -0.153 -0.338 -0.109 -0.354 -0.397 ...

# Investigate model fit
mc.cores.def <- getOption('mc.cores')
options(mc.cores = 1)
loo(mod1)
#> Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
#> 
#> Computed from 1000 by 164 log-likelihood matrix.
#> 
#>          Estimate   SE
#> elpd_loo   -462.1  9.0
#> p_loo        91.9  5.0
#> looic       924.3 18.0
#> ------
#> MCSE of elpd_loo is NA.
#> MCSE and ESS estimates assume MCMC draws (r_eff in [0.4, 1.9]).
#> 
#> Pareto k diagnostic values:
#>                           Count Pct.    Min. ESS
#> (-Inf, 0.67]   (good)     73    44.5%   50      
#>    (0.67, 1]   (bad)      82    50.0%   <NA>    
#>     (1, Inf)   (very bad)  9     5.5%   <NA>    
#> See help('pareto-k-diagnostic') for details.
options(mc.cores = mc.cores.def)


# Example of supplying a trend_map so that some series can share
# latent trend processes
sim <- sim_mvgam(n_series = 3)
mod_data <- sim$data_train

# Here, we specify only two latent trends; series 1 and 2 share a trend,
# while series 3 has it's own unique latent trend
trend_map <- data.frame(series = unique(mod_data$series),
                       trend = c(1, 1, 2))

# Fit the model using AR1 trends
mod <- mvgam(y ~ s(season, bs = 'cc', k = 6),
             trend_map = trend_map,
             trend_model = AR(),
             data = mod_data,
             return_model_data = TRUE,
             chains = 2)
#> Compiling Stan program using cmdstanr
#> 
#> Start sampling
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 finished in 2.4 seconds.
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 1 finished in 3.0 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 2.7 seconds.
#> Total execution time: 3.3 seconds.
#> 

# The mapping matrix is now supplied as data to the model in the 'Z' element
mod$model_data$Z
#>      [,1] [,2]
#> [1,]    1    0
#> [2,]    1    0
#> [3,]    0    1
code(mod)
#> // Stan model code generated by package mvgam
#> data {
#>   int<lower=0> total_obs; // total number of observations
#>   int<lower=0> n; // number of timepoints per series
#>   int<lower=0> n_lv; // number of dynamic factors
#>   int<lower=0> n_sp; // number of smoothing parameters
#>   int<lower=0> n_series; // number of series
#>   matrix[n_series, n_lv] Z; // matrix mapping series to latent trends
#>   int<lower=0> num_basis; // total number of basis coefficients
#>   vector[num_basis] zero; // prior locations for basis coefficients
#>   matrix[total_obs, num_basis] X; // mgcv GAM design matrix
#>   array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
#>   matrix[4, 4] S1; // mgcv smooth penalty matrix S1
#>   int<lower=0> n_nonmissing; // number of nonmissing observations
#>   array[n_nonmissing] int<lower=0> flat_ys; // flattened nonmissing observations
#>   matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
#>   array[n_nonmissing] int<lower=0> obs_ind; // indices of nonmissing observations
#> }
#> transformed data {
#>   
#> }
#> parameters {
#>   // raw basis coefficients
#>   vector[num_basis] b_raw;
#>   
#>   // latent factor SD terms
#>   vector<lower=0>[n_lv] sigma;
#>   
#>   // latent factor AR1 terms
#>   vector<lower=-1, upper=1>[n_lv] ar1;
#>   
#>   // dynamic factors
#>   matrix[n, n_lv] LV;
#>   
#>   // smoothing parameters
#>   vector<lower=0>[n_sp] lambda;
#> }
#> transformed parameters {
#>   // trends and dynamic factor loading matrix
#>   matrix[n, n_series] trend;
#>   
#>   // basis coefficients
#>   vector[num_basis] b;
#>   b[1 : num_basis] = b_raw[1 : num_basis];
#>   
#>   // derived latent trends
#>   for (i in 1 : n) {
#>     for (s in 1 : n_series) {
#>       trend[i, s] = dot_product(Z[s,  : ], LV[i,  : ]);
#>     }
#>   }
#> }
#> model {
#>   // prior for (Intercept)...
#>   b_raw[1] ~ student_t(3, 0, 2.5);
#>   
#>   // prior for s(season)...
#>   b_raw[2 : 5] ~ multi_normal_prec(zero[2 : 5], S1[1 : 4, 1 : 4] * lambda[1]);
#>   
#>   // priors for AR parameters
#>   ar1 ~ std_normal();
#>   
#>   // priors for smoothing parameters
#>   lambda ~ normal(5, 30);
#>   
#>   // priors for factor SD parameters
#>   sigma ~ student_t(3, 0, 2.5);
#>   
#>   // dynamic factor estimates
#>   LV[1, 1 : n_lv] ~ normal(0, sigma);
#>   for (j in 1 : n_lv) {
#>     LV[2 : n, j] ~ normal(ar1[j] * LV[1 : (n - 1), j], sigma[j]);
#>   }
#>   {
#>     // likelihood functions
#>     vector[n_nonmissing] flat_trends;
#>     flat_trends = to_vector(trend)[obs_ind];
#>     flat_ys ~ poisson_log_glm(append_col(flat_xs, flat_trends), 0.0,
#>                               append_row(b, 1.0));
#>   }
#> }
#> generated quantities {
#>   vector[total_obs] eta;
#>   matrix[n, n_series] mus;
#>   vector[n_sp] rho;
#>   vector[n_lv] penalty;
#>   array[n, n_series] int ypred;
#>   rho = log(lambda);
#>   penalty = 1.0 / (sigma .* sigma);
#>   
#>   matrix[n_series, n_lv] lv_coefs = Z;
#>   // posterior predictions
#>   eta = X * b;
#>   for (s in 1 : n_series) {
#>     mus[1 : n, s] = eta[ytimes[1 : n, s]] + trend[1 : n, s];
#>     ypred[1 : n, s] = poisson_log_rng(mus[1 : n, s]);
#>   }
#> }
#> 
#> 

# The first two series share an identical latent trend; the third is different
plot(mod, type = 'trend', series = 1)

plot(mod, type = 'trend', series = 2)

plot(mod, type = 'trend', series = 3)



# Example of how to use dynamic coefficients
# Simulate a time-varying coefficient for the effect of temperature
set.seed(123)
N <- 200
beta_temp <- vector(length = N)
beta_temp[1] <- 0.4
for(i in 2:N){
 beta_temp[i] <- rnorm(1, mean = beta_temp[i - 1] - 0.0025, sd = 0.05)
}
plot(beta_temp)


# Simulate a covariate called 'temp'
temp <- rnorm(N, sd = 1)

# Simulate some noisy Gaussian observations
out <- rnorm(N, mean = 4 + beta_temp * temp,
             sd = 0.5)

# Gather necessary data into a data.frame; split into training / testing
data = data.frame(out, temp, time = seq_along(temp))
data_train <- data[1:180,]
data_test <- data[181:200,]

# Fit the model using the dynamic() formula helper
mod <- mvgam(out ~
              dynamic(temp,
                      scale = FALSE,
                      k = 40),
             family = gaussian(),
             data = data_train,
             newdata = data_test,
             chains = 2)
#> Warning: gp effects in mvgam cannot yet handle autogrouping
#> resetting all instances of 'gr = TRUE' to 'gr = FALSE'
#> This warning is displayed once per session.
#> Compiling Stan program using cmdstanr
#> 
#> Start sampling
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 finished in 1.3 seconds.
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 finished in 1.4 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 1.4 seconds.
#> Total execution time: 1.8 seconds.
#> 

# Inspect the model summary, forecast and time-varying coefficient distribution
summary(mod)
#> GAM formula:
#> out ~ gp(time, by = temp, c = 5/4, k = 40, scale = FALSE)
#> <environment: 0x0000025921362870>
#> 
#> Family:
#> gaussian
#> 
#> Link function:
#> identity
#> 
#> Trend model:
#> None
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 200 
#> 
#> Status:
#> Fitted using Stan 
#> 2 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 1000
#> 
#> 
#> Observation error parameter estimates:
#>              2.5%  50% 97.5% Rhat n_eff
#> sigma_obs[1] 0.44 0.49  0.55    1  1879
#> 
#> GAM coefficient (beta) estimates:
#>                    2.5%      50% 97.5% Rhat n_eff
#> (Intercept)       4.000  4.0e+00 4.100 1.00  2559
#> gp(time):temp.1   0.380  3.1e+00 6.400 1.00  1090
#> gp(time):temp.2  -3.100  1.6e+00 6.800 1.00   948
#> gp(time):temp.3  -5.700 -1.3e+00 3.500 1.00  1072
#> gp(time):temp.4  -5.600 -1.5e+00 2.500 1.00  1183
#> gp(time):temp.5  -3.700  1.7e-01 3.500 1.00  1228
#> gp(time):temp.6  -2.300  6.0e-01 4.400 1.00  1211
#> gp(time):temp.7  -3.300 -2.7e-01 2.700 1.00  1416
#> gp(time):temp.8  -2.600 -1.3e-02 2.500 1.00  1391
#> gp(time):temp.9  -1.400  4.5e-01 3.400 1.00   990
#> gp(time):temp.10 -2.400 -1.4e-02 2.200 1.00  1310
#> gp(time):temp.11 -2.700 -3.3e-01 1.100 1.00  1218
#> gp(time):temp.12 -1.400  5.4e-02 2.000 1.00  1154
#> gp(time):temp.13 -0.990  1.0e-01 2.100 1.00   815
#> gp(time):temp.14 -1.700 -5.6e-03 1.100 1.00  1094
#> gp(time):temp.15 -1.400 -7.2e-03 1.100 1.00  1073
#> gp(time):temp.16 -0.940  6.9e-05 1.200 1.00  1071
#> gp(time):temp.17 -0.870  4.3e-09 1.200 1.00   912
#> gp(time):temp.18 -1.000 -1.7e-19 1.000 1.00   987
#> gp(time):temp.19 -1.000 -3.3e-13 0.740 1.00  1411
#> gp(time):temp.20 -0.880 -3.1e-07 0.580 1.00  1158
#> gp(time):temp.21 -0.660  4.3e-18 0.670 1.00  1237
#> gp(time):temp.22 -0.280  4.4e-06 0.930 1.01   949
#> gp(time):temp.23 -0.720 -6.5e-16 0.380 1.00  1653
#> gp(time):temp.24 -0.760 -6.3e-10 0.220 1.00   559
#> gp(time):temp.25 -0.300  5.5e-26 0.560 1.00  1198
#> gp(time):temp.26 -0.270  3.1e-20 0.450 1.00  1242
#> gp(time):temp.27 -0.600 -4.0e-21 0.210 1.00  1101
#> gp(time):temp.28 -0.350 -2.3e-36 0.280 1.00  1191
#> gp(time):temp.29 -0.110  2.2e-25 0.460 1.00   681
#> gp(time):temp.30 -0.260 -6.4e-57 0.220 1.00  1228
#> gp(time):temp.31 -0.460 -1.2e-40 0.120 1.00   997
#> gp(time):temp.32 -0.190  4.2e-51 0.270 1.00   876
#> gp(time):temp.33 -0.120  1.5e-38 0.250 1.00  1287
#> gp(time):temp.34 -0.150 -3.9e-60 0.140 1.00  1146
#> gp(time):temp.35 -0.120 -1.8e-59 0.110 1.00   774
#> gp(time):temp.36 -0.094 -7.0e-80 0.140 1.00  1139
#> gp(time):temp.37 -0.052 -1.8e-63 0.110 1.00   646
#> gp(time):temp.38 -0.095 -6.8e-81 0.054 1.00   888
#> gp(time):temp.39 -0.075 -1.7e-88 0.059 1.00   711
#> gp(time):temp.40 -0.039  1.8e-65 0.110 1.00  2892
#> 
#> GAM gp term marginal deviation (alpha) and length scale (rho) estimates:
#>                      2.5%   50%  97.5% Rhat n_eff
#> alpha_gp(time):temp  0.16  0.34   0.69 1.00   580
#> rho_gp(time):temp   11.00 31.00 100.00 1.02   127
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 41 of 1000 iterations ended with a divergence (4.1%)
#>  *Try running with larger adapt_delta to remove the divergences
#> 0 of 1000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Fri Nov 22 7:57:41 AM 2024.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
plot(mod, type = 'smooths')

fc <- forecast(mod, newdata = data_test)
plot(fc)
#> Out of sample CRPS:
#> 6.41565320625866


# Propagating the smooth term shows how the coefficient is expected to evolve
plot_mvgam_smooth(mod, smooth = 1, newdata = data)
abline(v = 180, lty = 'dashed', lwd = 2)
points(beta_temp, pch = 16)



# Example showing how to incorporate an offset; simulate some count data
# with different means per series
set.seed(100)
dat <- sim_mvgam(prop_trend = 0, mu = c(0, 2, 2),
                 seasonality = 'hierarchical')

# Add offset terms to the training and testing data
dat$data_train$offset <- 0.5 * as.numeric(dat$data_train$series)
dat$data_test$offset <- 0.5 * as.numeric(dat$data_test$series)

# Fit a model that includes the offset in the linear predictor as well as
# hierarchical seasonal smooths
mod <- mvgam(formula = y ~ offset(offset) +
              s(series, bs = 're') +
              s(season, bs = 'cc') +
              s(season, by = series, m = 1, k = 5),
             data = dat$data_train,
             chains = 2)
#> Compiling Stan program using cmdstanr
#> 
#> Start sampling
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 finished in 12.7 seconds.
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 1 finished in 13.8 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 13.2 seconds.
#> Total execution time: 14.0 seconds.
#> 

# Inspect the model file to see the modification to the linear predictor
# (eta)
code(mod)
#> // Stan model code generated by package mvgam
#> data {
#>   int<lower=0> total_obs; // total number of observations
#>   int<lower=0> n; // number of timepoints per series
#>   int<lower=0> n_sp; // number of smoothing parameters
#>   int<lower=0> n_series; // number of series
#>   int<lower=0> num_basis; // total number of basis coefficients
#>   vector[num_basis] zero; // prior locations for basis coefficients
#>   vector[total_obs] off_set; // offset vector
#>   matrix[total_obs, num_basis] X; // mgcv GAM design matrix
#>   array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
#>   matrix[8, 8] S1; // mgcv smooth penalty matrix S1
#>   matrix[4, 4] S2; // mgcv smooth penalty matrix S2
#>   matrix[4, 4] S3; // mgcv smooth penalty matrix S3
#>   matrix[4, 4] S4; // mgcv smooth penalty matrix S4
#>   int<lower=0> n_nonmissing; // number of nonmissing observations
#>   array[n_nonmissing] int<lower=0> flat_ys; // flattened nonmissing observations
#>   matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
#>   array[n_nonmissing] int<lower=0> obs_ind; // indices of nonmissing observations
#> }
#> parameters {
#>   // raw basis coefficients
#>   vector[num_basis] b_raw;
#>   
#>   // random effect variances
#>   vector<lower=0>[1] sigma_raw;
#>   
#>   // random effect means
#>   vector[1] mu_raw;
#>   
#>   // smoothing parameters
#>   vector<lower=0>[n_sp] lambda;
#> }
#> transformed parameters {
#>   // basis coefficients
#>   vector[num_basis] b;
#>   b[1 : 21] = b_raw[1 : 21];
#>   b[22 : 24] = mu_raw[1] + b_raw[22 : 24] * sigma_raw[1];
#> }
#> model {
#>   // prior for random effect population variances
#>   sigma_raw ~ student_t(3, 0, 2.5);
#>   
#>   // prior for random effect population means
#>   mu_raw ~ std_normal();
#>   
#>   // prior for (Intercept)...
#>   b_raw[1] ~ student_t(3, 1.6, 2.5);
#>   
#>   // prior for s(season)...
#>   b_raw[2 : 9] ~ multi_normal_prec(zero[2 : 9], S1[1 : 8, 1 : 8] * lambda[1]);
#>   
#>   // prior for s(season):seriesseries_1...
#>   b_raw[10 : 13] ~ multi_normal_prec(zero[10 : 13],
#>                                      S2[1 : 4, 1 : 4] * lambda[2]);
#>   
#>   // prior for s(season):seriesseries_2...
#>   b_raw[14 : 17] ~ multi_normal_prec(zero[14 : 17],
#>                                      S3[1 : 4, 1 : 4] * lambda[3]);
#>   
#>   // prior for s(season):seriesseries_3...
#>   b_raw[18 : 21] ~ multi_normal_prec(zero[18 : 21],
#>                                      S4[1 : 4, 1 : 4] * lambda[4]);
#>   
#>   // prior (non-centred) for s(series)...
#>   b_raw[22 : 24] ~ std_normal();
#>   
#>   // priors for smoothing parameters
#>   lambda ~ normal(5, 30);
#>   {
#>     // likelihood functions
#>     flat_ys ~ poisson_log_glm(flat_xs, off_set[obs_ind], b);
#>   }
#> }
#> generated quantities {
#>   vector[total_obs] eta;
#>   matrix[n, n_series] mus;
#>   vector[n_sp] rho;
#>   array[n, n_series] int ypred;
#>   rho = log(lambda);
#>   
#>   // posterior predictions
#>   eta = X * b + off_set;
#>   for (s in 1 : n_series) {
#>     mus[1 : n, s] = eta[ytimes[1 : n, s]];
#>     ypred[1 : n, s] = poisson_log_rng(mus[1 : n, s]);
#>   }
#> }
#> 
#> 

# Forecasts for the first two series will differ in magnitude
fc <- forecast(mod, newdata = dat$data_test)
layout(matrix(1:2, ncol = 2))
plot(fc, series = 1, ylim = c(0, 75))
#> Out of sample DRPS:
#> 26.749404
plot(fc, series = 2, ylim = c(0, 75))
#> Out of sample DRPS:
#> 101.807653

layout(1)

# Changing the offset for the testing data should lead to changes in
# the forecast
dat$data_test$offset <- dat$data_test$offset - 2
fc <- forecast(mod, newdata = dat$data_test)
plot(fc)
#> Out of sample DRPS:
#> 41.465009


# Relative Risks can be computed by fixing the offset to the same value
# for each series
dat$data_test$offset <- rep(1, NROW(dat$data_test))
preds_rr <- predict(mod, type = 'link', newdata = dat$data_test,
                    summary = FALSE)
series1_inds <- which(dat$data_test$series == 'series_1')
series2_inds <- which(dat$data_test$series == 'series_2')

# Relative Risks are now more comparable among series
layout(matrix(1:2, ncol = 2))
plot(preds_rr[1, series1_inds], type = 'l', col = 'grey75',
     ylim = range(preds_rr),
     ylab = 'Series1 Relative Risk', xlab = 'Time')
for(i in 2:50){
 lines(preds_rr[i, series1_inds], col = 'grey75')
}

plot(preds_rr[1, series2_inds], type = 'l', col = 'darkred',
     ylim = range(preds_rr),
     ylab = 'Series2 Relative Risk', xlab = 'Time')
for(i in 2:50){
 lines(preds_rr[i, series2_inds], col = 'darkred')
 }

layout(1)


# Example showcasing how cbind() is needed for Binomial observations
# Simulate two time series of Binomial trials
trials <- sample(c(20:25), 50, replace = TRUE)
x <- rnorm(50)
detprob1 <- plogis(-0.5 + 0.9*x)
detprob2 <- plogis(-0.1 -0.7*x)
dat <- rbind(data.frame(y = rbinom(n = 50, size = trials, prob = detprob1),
                        time = 1:50,
                        series = 'series1',
                        x = x,
                        ntrials = trials),
             data.frame(y = rbinom(n = 50, size = trials, prob = detprob2),
                        time = 1:50,
                        series = 'series2',
                        x = x,
                        ntrials = trials))
dat <- dplyr::mutate(dat, series = as.factor(series))
dat <- dplyr::arrange(dat, time, series)
plot_mvgam_series(data = dat, series = 'all')


# Fit a model using the binomial() family; must specify observations
# and number of trials in the cbind() wrapper
mod <- mvgam(cbind(y, ntrials) ~ series + s(x, by = series),
             family = binomial(),
             data = dat,
             chains = 2)
#> Warning: Binomial and Beta-binomial families require cbind(n_successes, n_trials)
#> in the formula left-hand side. Do not use cbind(n_successes, n_failures)!
#> This warning is displayed once per session.
#> Compiling Stan program using cmdstanr
#> 
#> Start sampling
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 1 finished in 2.7 seconds.
#> Chain 2 finished in 2.6 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 2.6 seconds.
#> Total execution time: 2.9 seconds.
#> 
summary(mod)
#> GAM formula:
#> cbind(y, ntrials) ~ series + s(x, by = series)
#> <environment: 0x0000025921362870>
#> 
#> Family:
#> binomial
#> 
#> Link function:
#> logit
#> 
#> Trend model:
#> None
#> 
#> N series:
#> 2 
#> 
#> N timepoints:
#> 50 
#> 
#> Status:
#> Fitted using Stan 
#> 2 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 1000
#> 
#> 
#> GAM coefficient (beta) estimates:
#>                        2.5%      50%  97.5% Rhat n_eff
#> (Intercept)          -0.610 -0.48000 -0.350 1.00   693
#> seriesseries2         0.110  0.28000  0.460 1.00   835
#> s(x):seriesseries1.1 -0.270 -0.01500  0.120 1.01   305
#> s(x):seriesseries1.2 -0.170  0.00300  0.180 1.00   481
#> s(x):seriesseries1.3 -0.083 -0.00340  0.055 1.01   323
#> s(x):seriesseries1.4 -0.120 -0.00017  0.100 1.00   414
#> s(x):seriesseries1.5 -0.046 -0.00130  0.031 1.00   881
#> s(x):seriesseries1.6 -0.074 -0.00030  0.087 1.00   418
#> s(x):seriesseries1.7 -0.030 -0.00075  0.027 1.00   638
#> s(x):seriesseries1.8 -0.590 -0.00230  0.460 1.00   392
#> s(x):seriesseries1.9  0.490  0.88000  1.100 1.01   308
#> s(x):seriesseries2.1 -0.150  0.00460  0.160 1.00   662
#> s(x):seriesseries2.2 -0.160  0.00490  0.190 1.00   364
#> s(x):seriesseries2.3 -0.046  0.00330  0.062 1.00   505
#> s(x):seriesseries2.4 -0.100  0.00690  0.120 1.01   312
#> s(x):seriesseries2.5 -0.030  0.00190  0.040 1.00   786
#> s(x):seriesseries2.6 -0.099 -0.00620  0.069 1.00   358
#> s(x):seriesseries2.7 -0.024 -0.00047  0.023 1.00   624
#> s(x):seriesseries2.8 -0.440  0.04900  0.620 1.00   336
#> s(x):seriesseries2.9 -0.990 -0.74000 -0.470 1.00   679
#> 
#> Approximate significance of GAM smooths:
#>                     edf Ref.df Chi.sq p-value    
#> s(x):seriesseries1 1.07      9   43.7 5.8e-05 ***
#> s(x):seriesseries2 2.25      9   29.2 < 2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 0 of 1000 iterations ended with a divergence (0%)
#> 0 of 1000 iterations saturated the maximum tree depth of 10 (0%)
#> E-FMI indicated no pathological behavior
#> 
#> Samples were drawn using NUTS(diag_e) at Fri Nov 22 7:59:26 AM 2024.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)
pp_check(mod, type = "bars_grouped",
         group = "series", ndraws = 50)

pp_check(mod, type = "ecdf_overlay_grouped",
         group = "series", ndraws = 50)

conditional_effects(mod, type = 'link')


# }