Extracting Fitted or Predicted Probability Distributions from gamlss2 Models

Description

Methods for gamlss2 model objects for extracting fitted (in-sample) or predicted (out-of-sample) probability distributions as distributions3 objects.

Usage

## S3 method for class 'gamlss2'
prodist(object, ...)

Arguments

object A model object of class gamlss2.
Arguments passed on to predict.gamlss2, e.g., newdata.

Details

To facilitate making probabilistic forecasts based on gamlss2 model objects, the prodist method extracts fitted or predicted probability distribution objects. Internally, the predict.gamlss2 method is used first to obtain the distribution parameters (mu, sigma, tau, nu, or a subset thereof). Subsequently, the corresponding distribution object is set up using the GAMLSS class from the gamlss.dist package, enabling the workflow provided by the distributions3 package (see Zeileis et al. 2022).

Note that these probability distributions only reflect the random variation in the dependent variable based on the model employed (and its associated distributional assumption for the dependent variable). This does not capture the uncertainty in the parameter estimates.

Value

An object of class GAMLSS inheriting from distribution.

References

Zeileis A, Lang MN, Hayes A (2022). “distributions3: From Basic Probability to Probabilistic Regression.” Presented at useR! 2022 - The R User Conference. Slides, video, vignette, code at https://www.zeileis.org/news/user2022/.

See Also

GAMLSS, predict.gamlss2

Examples

library("gamlss2")


## packages, code, and data
library("distributions3")
data("cars", package = "datasets")

## fit heteroscedastic normal GAMLSS model
## stopping distance (ft) explained by speed (mph)
m <- gamlss2(dist ~ s(speed) | s(speed), data = cars, family = NO)
GAMLSS-RS iteration  1: Global Deviance = 407.3647 eps = 0.125474     
GAMLSS-RS iteration  2: Global Deviance = 406.1777 eps = 0.002913     
GAMLSS-RS iteration  3: Global Deviance = 406.147 eps = 0.000075     
GAMLSS-RS iteration  4: Global Deviance = 406.1463 eps = 0.000001     
## obtain predicted distributions for three levels of speed
d <- prodist(m, newdata = data.frame(speed = c(10, 20, 30)))
print(d)
                                     1                                      2 
"GAMLSS NO(mu = 23.31, sigma = 10.09)" "GAMLSS NO(mu = 58.58, sigma = 18.62)" 
                                     3 
"GAMLSS NO(mu = 93.85, sigma = 34.37)" 
## obtain quantiles (works the same for any distribution object 'd' !)
quantile(d, 0.5)
       1        2        3 
23.30817 58.58058 93.85473 
quantile(d, c(0.05, 0.5, 0.95), elementwise = FALSE)
     q_0.05    q_0.5    q_0.95
1  6.712065 23.30817  39.90428
2 27.945695 58.58058  89.21546
3 37.320084 93.85473 150.38937
quantile(d, c(0.05, 0.5, 0.95), elementwise = TRUE)
         1          2          3 
  6.712065  58.580577 150.389369 
## visualization
plot(dist ~ speed, data = cars)
nd <- data.frame(speed = 0:240/4)
nd$dist <- prodist(m, newdata = nd)
nd$fit <- quantile(nd$dist, c(0.05, 0.5, 0.95))
matplot(nd$speed, nd$fit, type = "l", lty = 1, col = "slategray", add = TRUE)

## moments
mean(d)
       1        2        3 
23.30817 58.58058 93.85473 
variance(d)
        1         2         3 
 101.8024  346.8789 1181.3396 
## simulate random numbers
random(d, 5)
        r_1      r_2       r_3       r_4      r_5
1  25.82404 30.94005  37.51492  21.90094 43.00572
2  11.97414 56.63541  67.55349  62.35356 79.19364
3 -27.88451 95.10403 134.06271 104.47424 30.17880
## density and distribution
pdf(d, 50 * -2:2)
        d_-100        d_-50          d_0        d_50        d_100
1 1.460714e-34 1.361314e-13 0.0027429608 0.001194948 1.125502e-14
2 3.874569e-18 8.920877e-10 0.0001522568 0.019263306 1.806633e-03
3 1.435625e-09 1.823198e-06 0.0002789684 0.005142852 1.142302e-02
cdf(d, 50 * -2:2)
        p_-100        p_-50          p_0      p_50     p_100
1 1.198038e-34 1.856498e-13 0.0104415378 0.9959209 1.0000000
2 8.362856e-18 2.772565e-09 0.0008295292 0.3225034 0.9869224
3 8.496162e-09 1.423335e-05 0.0031603140 0.1009890 0.5709504
## Poisson example
data("FIFA2018", package = "distributions3")
m2 <- gamlss2(goals ~ s(difference), data = FIFA2018, family = PO)
GAMLSS-RS iteration  1: Global Deviance = 355.3922 eps = 0.045332     
GAMLSS-RS iteration  2: Global Deviance = 355.3922 eps = 0.000000     
d2 <- prodist(m2, newdata = data.frame(difference = 0))
print(d2)
                      1 
"GAMLSS PO(mu = 1.237)" 
quantile(d2, c(0.05, 0.5, 0.95))
[1] 0 1 3
## note that log_pdf() can replicate logLik() value
sum(log_pdf(prodist(m2), FIFA2018$goals))
[1] -177.6961
logLik(m2)
'log Lik.' -177.6961 (df=2.005144)