Generalized Additive Models for Location Scale and Shape

Description

The function gamlss2() fits generalized additive models for location, scale, and shape (GAMLSS). It estimates one additive predictor for each parameter of the chosen response distribution, such as mu, sigma, nu, or tau. The number and meaning of these parameters are determined by the selected family object, see gamlss2.family.

In addition to ordinary linear terms, gamlss2() supports smooth terms from mgcv as well as the model-term infrastructure from gamlss.

Usage

gamlss2(formula, ...)

## S3 method for class 'formula'
gamlss2(formula, data, family = NO,
  subset, na.action, weights, offset, start = NULL,
  knots = NULL, control = gamlss2_control(...), ...)

## S3 method for class 'list'
gamlss2(formula, ...)

Arguments

formula A GAM-type formula or Formula describing the additive predictors for the distribution parameters. All smooth terms from mgcv are supported, see also formula.gam. For gamlss.list(), formula is a list of formulas.
data A data frame, list, or environment containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which gamlss2() is called.
family A gamlss.family or gamlss2.family object defining the response distribution and the link functions of its parameters.
subset An optional vector specifying a subset of observations to be used in the fitting process.
na.action NA processing for setting up the model.frame.
weights An optional vector of prior weights to be used in the fitting process. Should be NULL or a numeric vector.
offset Optional offset terms to be included in the linear predictors during fitting. If a single numeric vector is supplied, it is assigned to the first parameter of the distribution. If offsets are needed for several parameters, provide a data frame or list in the same order as the parameter names in the family object.
start Optional starting values for the estimation algorithm, see gamlss2_start.
knots An optional list of user-specified knots, see smoothCon.
control A list of control arguments, see gamlss2_control.
Arguments passed to gamlss2_control.

Details

The function gamlss2() is the main entry point for fitting distributional regression models in gamlss2.

  • The response distribution is specified through a family object, either from gamlss.dist or created as a gamlss2.family object.

  • By default, estimation is carried out with the RS algorithm. The optimizer can be changed through gamlss2_control. A family object may also provide its own optimizer.

  • The returned object depends on the chosen optimizer. In the standard case, this is an object of class “gamlss2”, for which summary, plotting, prediction, and extractor methods are available.

Value

The object returned by the selected optimizer. By default, this is an object of class “gamlss2” produced by RS. Standard methods and extractor functions are available for this class.

References

Rigby RA, Stasinopoulos DM (2005). “Generalized Additive Models for Location, Scale and Shape (with Discussion).” Journal of the Royal Statistical Society, Series C (Applied Statistics), 54, 507–554. doi:10.1111/j.1467-9876.2005.00510.x

Rigby RA, Stasinopoulos DM, Heller GZ, De Bastiani F (2019). Distributions for Modeling Location, Scale, and Shape: Using GAMLSS in R, Chapman and Hall/CRC. doi:10.1201/9780429298547

Stasinopoulos DM, Rigby RA (2007). “Generalized Additive Models for Location Scale and Shape (GAMLSS) in R.” Journal of Statistical Software, 23(7), 1–46. doi:10.18637/jss.v023.i07

Stasinopoulos DM, Rigby RA, Heller GZ, Voudouris V, De Bastiani F (2017). Flexible Regression and Smoothing: Using GAMLSS in R, Chapman and Hall/CRC. doi:10.1201/b21973

See Also

RS, gamlss2_control, gamlss2.family, gamlss2_start

Examples

library("gamlss2")


## load the abdominal circumference data
data("abdom", package = "gamlss.data")

## specify a distributional model:
## location and scale vary with x,
## shape parameters remain constant
f <- y ~ s(x) | s(x) | 1 | 1

## estimate model
m1 <- gamlss2(f, data = abdom, family = BCT)
GAMLSS-RS iteration  1: Global Deviance = 4772.1542 eps = 0.534571     
GAMLSS-RS iteration  2: Global Deviance = 4772.1489 eps = 0.000001     
## model summary
summary(m1)
Call:
gamlss2(formula = f, data = abdom, family = BCT)
---
Family: BCT 
Link functions: mu = identity, sigma = log, nu = identity, tau = log
*--------
Coefficients:
                   Estimate Std. Error  t value Pr(>|t|)    
mu.(Intercept)    226.32913    1.34793  167.909   <2e-16 ***
sigma.(Intercept)  -2.92346    0.01241 -235.563   <2e-16 ***
nu.(Intercept)     -0.11564    0.05086   -2.274   0.0233 *  
tau.(Intercept)     2.48885    0.01465  169.831   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
---
Smooth terms:
    mu.s(x) sigma.s(x)
edf  4.5885     2.6274
*--------
n = 610 df =  11.22 res.df =  598.78
Deviance = 4772.1489 Null Dev. Red. = 33.37%
AIC = 4794.5805 elapsed =  0.81sec
## plot parameter-specific effects
plot(m1, which = "effects")

## plot diagnostics
plot(m1, which = "resid")

## predict selected distribution parameters
par <- predict(m1, model = c("mu", "sigma"))

## fitted centile curves
pq <- quantile(m1, probs = c(0.05, 0.5, 0.95))

## visualize
plot(y ~ x, data = abdom, pch = 19,
  col = rgb(0.1, 0.1, 0.1, alpha = 0.3))
matplot(abdom$x, pq, type = "l", lwd = 2,
  lty = 1, col = 4, add = TRUE)

## count response example using HarzTraffic data
data("HarzTraffic", package = "gamlss2")

## seasonal count model for motorcycles
m2 <- gamlss2(
  bikes ~ s(yday, bs = "cc") | s(yday, bs = "cc"),
  data = HarzTraffic,
  family = NBI
)
GAMLSS-RS iteration  1: Global Deviance = 10151.1432 eps = 0.149402     
GAMLSS-RS iteration  2: Global Deviance = 10150.8948 eps = 0.000024     
GAMLSS-RS iteration  3: Global Deviance = 10150.8818 eps = 0.000001     
## summary
summary(m2)
Call:
gamlss2(formula = bikes ~ s(yday, bs = "cc") | s(yday, bs = "cc"), 
    data = HarzTraffic, family = NBI)
---
Family: NBI 
Link functions: mu = log, sigma = log
*--------
Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
mu.(Intercept)     3.99347    0.03228  123.72   <2e-16 ***
sigma.(Intercept)  0.46777    0.03436   13.61   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
---
Smooth terms:
    mu.s(yday) sigma.s(yday)
edf     6.3823        6.2095
*--------
n = 1057 df =  14.59 res.df =  1042.41
Deviance = 10150.8818 Null Dev. Red. = 12.62%
AIC = 10180.0654 elapsed =  0.25sec
## effect plots
plot(m2)

## residual diagnostics
plot(m2, which = "resid")

## quantiles over the day of the year
nd <- data.frame(yday = 1:365)
pq <- quantile(m2, newdata = nd, probs = c(0.05, 0.5, 0.95))

## visualize fitted quantiles
plot(bikes ~ yday, data = HarzTraffic, pch = 19,
  col = gray(0.1, alpha = 0.3))
matplot(nd$yday, pq, type = "l", lty = c(2, 1, 2),
  lwd = 2, col = 4, add = TRUE)