Skip to contents

Set up piecewise linear or logistic trend models in mvgam. These functions do not evaluate their arguments – they exist purely to help set up a model with particular piecewise trend models.

Usage

PW(
  n_changepoints = 10,
  changepoint_range = 0.8,
  changepoint_scale = 0.05,
  growth = "linear"
)

Arguments

n_changepoints

A non-negative integer specifying the number of potential changepoints. Potential changepoints are selected uniformly from the first changepoint_range proportion of timepoints in data. Default is 10

changepoint_range

Proportion of history in data in which trend changepoints will be estimated. Defaults to 0.8 for the first 80%.

changepoint_scale

Parameter modulating the flexibility of the automatic changepoint selection by altering the scale parameter of a Laplace distribution. The resulting prior will be double_exponential(0, changepoint_scale). Large values will allow many changepoints and a more flexible trend, while small values will allow few changepoints. Default is 0.05.

growth

Character string specifying either 'linear' or 'logistic' growth of the trend. If 'logistic', a variable labelled cap MUST be in data to specify the maximum saturation point for the trend (see details and examples in mvgam for more information). Default is 'linear'.

Value

An object of class mvgam_trend, which contains a list of arguments to be interpreted by the parsing functions in mvgam

Details

Offsets and intercepts: For each of these trend models, an offset parameter is included in the trend estimation process. This parameter will be incredibly difficult to identify if you also include an intercept in the observation formula. For that reason, it is highly recommended that you drop the intercept from the formula (i.e. y ~ x + 0 or y ~ x - 1, where x are your optional predictor terms).

Logistic growth and the cap variable: When forecasting growth, there is often some maximum achievable point that a time series can reach. For example, total market size, total population size or carrying capacity in population dynamics. It can be advantageous for the forecast to saturate at or near this point so that predictions are more sensible. This function allows you to make forecasts using a logistic growth trend model, with a specified carrying capacity. Note that this capacity does not need to be static over time, it can vary with each series x timepoint combination if necessary. But you must supply a cap value for each observation in the data when using growth = 'logistic'. For observation families that use a non-identity link function, the cap value will be internally transformed to the link scale (i.e. your specified cap will be log transformed if you are using a poisson() or nb() family). It is therefore important that you specify the cap values on the scale of your outcome. Note also that no missing values are allowed in cap.

References

Taylor, Sean J., and Benjamin Letham. "Forecasting at scale." The American Statistician 72.1 (2018): 37-45.

Examples

# \donttest{
# Example of logistic growth with possible changepoints
# Simple logistic growth model
dNt = function(r, N, k){
   r * N * (k - N)
}

# Iterate growth through time
Nt = function(r, N, t, k) {
for (i in 1:(t - 1)) {

 # population at next time step is current population + growth,
 # but we introduce several 'shocks' as changepoints
 if(i %in% c(5, 15, 25, 41, 45, 60, 80)){
   N[i + 1] <- max(1, N[i] + dNt(r + runif(1, -0.1, 0.1),
                                 N[i], k))
   } else {
   N[i + 1] <- max(1, N[i] + dNt(r, N[i], k))
   }
  }
 N
}

# Simulate expected values
set.seed(11)
expected <- Nt(0.004, 2, 100, 30)
plot(expected, xlab = 'Time')


# Take Poisson draws
y <- rpois(100, expected)
plot(y, xlab = 'Time')


# Assemble data into dataframe and model. We set a
# fixed carrying capacity of 35 for this example, but note that
# this value is not required to be fixed at each timepoint
mod_data <- data.frame(y = y,
                       time = 1:100,
                       cap = 35,
                       series = as.factor('series_1'))
plot_mvgam_series(data = mod_data)


# The intercept is nonidentifiable when using piecewise
# trends because the trend functions have their own offset
# parameters 'm'; it is recommended to always drop intercepts
# when using these trend models
mod <- mvgam(y ~ 0,
             trend_model = PW(growth = 'logistic'),
             family = poisson(),
             data = mod_data,
             chains = 2)
#> Compiling Stan program using cmdstanr
#> 
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:359,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:16,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/uqnclar2/AppData/Local/Temp/Rtmp2bnpq5/model-3cd01ce020d2.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       | 
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
#> Start sampling
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
<<<<<<< HEAD
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 finished in 6.7 seconds.
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 1 finished in 7.0 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 6.9 seconds.
#> Total execution time: 7.1 seconds.
=======
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 finished in 23.3 seconds.
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 1 finished in 24.1 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 23.7 seconds.
#> Total execution time: 24.2 seconds.
>>>>>>> 79e31b766e09ac39e22b93cb199f3eb4fda5e08f
#> 
summary(mod)
#> GAM formula:
#> y ~ 1
<<<<<<< HEAD
#> <environment: 0x000001b494c449b0>
=======
#> <environment: 0x000001f156b3c498>
>>>>>>> 79e31b766e09ac39e22b93cb199f3eb4fda5e08f
#> 
#> Family:
#> poisson
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> PW(growth = "logistic")
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 100 
#> 
#> Status:
#> Fitted using Stan 
#> 2 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 1000
#> 
#> 
#> GAM coefficient (beta) estimates:
#>             2.5% 50% 97.5% Rhat n_eff
#> (Intercept)    0   0     0  NaN   NaN
#> 
#> Latent trend growth rate estimates:
#>             2.5%   50%  97.5% Rhat n_eff
#> k_trend[1] -0.26 -0.15 -0.071 1.01   348
#> 
#> Latent trend offset estimates:
#>            2.5%  50% 97.5% Rhat n_eff
#> m_trend[1]  -14 -4.3 -0.36 1.01   347
#> 
#> Stan MCMC diagnostics:
#> n_eff / iter looks reasonable for all parameters
#> Rhat looks reasonable for all parameters
#> 0 of 1000 iterations ended with a divergence (0%)
#> 0 of 1000 iterations saturated the maximum tree depth of 12 (0%)
#> E-FMI indicated no pathological behavior
#> 
<<<<<<< HEAD
#> Samples were drawn using NUTS(diag_e) at Wed Oct 30 3:12:56 PM 2024.
=======
#> Samples were drawn using NUTS(diag_e) at Mon Oct 28 7:02:04 PM 2024.
>>>>>>> 79e31b766e09ac39e22b93cb199f3eb4fda5e08f
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split MCMC chains
#> (at convergence, Rhat = 1)

# Plot the posterior hindcast
plot(mod, type = 'forecast')


# View the changepoints with ggplot2 utilities
library(ggplot2)
<<<<<<< HEAD
#> Warning: package ‘ggplot2’ was built under R version 4.3.3
=======
#> Warning: package ‘ggplot2’ was built under R version 4.2.3
>>>>>>> 79e31b766e09ac39e22b93cb199f3eb4fda5e08f
mcmc_plot(mod, variable = 'delta_trend',
          regex = TRUE) +
scale_y_discrete(labels = mod$trend_model$changepoints) +
labs(y = 'Potential changepoint',
     x = 'Rate change')
#> Scale for y is already present.
#> Adding another scale for y, which will replace the existing scale.

# }