Family Objects in gamlss2

Description

Family objects within the package gamlss2 are used to specify the information required to use a model fitting function. This includes details such as parameter names, corresponding link functions, the density function, log-likelihood function and derivatives of the log-likelihood with respect to the predictors. In addition, family objects are used in the calculation of post-modeling statistics, such as residual diagnostics and random number generation. An overview can be found in the accompanying details and examples.

Details

A gamlss2 family object must be a list of class “gamlss2.family”, containing the necessary elements to define the response distribution and how it is handled during model fitting. The minimum requirements are as follows:

  • family: the name of the distribution (character string).

  • names: a character vector of parameter names (e.g., c(“mu”, “sigma”)).

  • links: a named character vector specifying the link function for each parameter (e.g., c(mu = “identity”, sigma = “log”)), or a list of link functions, e.g., see softplus.

  • pdf(y, par, log = FALSE, …): a function to evaluate the (log-)density.

The pdf() function must accept the response y, a named list par of evaluated parameter values (e.g., par$mu, par$sigma), a logical log, and optional additional arguments.

Optionally, the family object may include:

  • score: a named list of functions (one per parameter), each computing the first derivative of the log-likelihood with respect to the linear predictor: score[param].

  • hess: a named list of functions computing second derivatives (negative Hessian). For parameters mu and sigma, this includes: hess[“mu”], hess[“sigma”], and optionally cross derivatives like hess[“mu.sigma”].

  • loglik(y, par, …): a function computing the total log-likelihood.

  • cdf(y, par, …): cumulative distribution function.

  • quantile(p, par, …): quantile function.

  • random(n, par, …): random number generator.

  • mean(par, …): mean function.

  • variance(par, …): variance function.

  • skewness(par, …), kurtosis(par, …): optional higher-order moment functions.

  • initialize: a named list of initialization functions, one per parameter (e.g., initialize$mu(y, …)), used to generate starting values.

  • valid.response(x): a function that checks whether the response is valid (e.g., numeric, non-factor).

  • optimizer(): an optional function to define a custom optimization method for use with gamlss2, see also link{RS}.

If the analytical score or Hessian functions are not provided, they will be approximated numerically. If quantile residuals are to be computed, a cdf() function must be provided.

See Also

gamlss2

Examples

library("gamlss2")


Normal <- function(...) {
  fam <- list(
    "family" = "Normal",
    "names" = c("mu", "sigma"),
    "links" = c("mu" = "identity", "sigma" = "log"),
    "score" = list(
      "mu" = function(y, par, ...) {
        (y - par$mu) / (par$sigma^2)
      },
      "sigma" = function(y, par, ...) {
        -1 + (y - par$mu)^2 / (par$sigma^2)
      }
    ),
    "hess" = list(
      "mu" = function(y, par, ...) {
        1 / (par$sigma^2)
      },
      "sigma" = function(y, par, ...) {
        rep(2, length(y))
      },
      "mu.sigma" = function(y, par, ...) {
        rep(0, length(y))
      }
    ),
    "loglik" = function(y, par, ...) {
      sum(dnorm(y, par$mu, par$sigma, log = TRUE))
    },
    "mu" = function(par, ...) {
      par$mu
    },
    "pdf" = function(y, par, log = FALSE) {
      dnorm(y, mean = par$mu, sd = par$sigma, log = log)
    },
    "cdf" = function(y, par, ...) {
      pnorm(y, mean = par$mu, sd = par$sigma, ...)
    },
    "random" = function(n, par) {
      rnorm(n, mean = par$mu, sd = par$sigma)
    },
    "quantile" = function(p, par) {
      qnorm(p, mean = par$mu, sd = par$sigma)
    },
    "initialize" = list(
      "mu"    = function(y, ...) { (y + mean(y)) / 2 },
      "sigma" = function(y, ...) { rep(sd(y), length(y)) }
    ),
    "mean"      = function(par) par$mu,
    "variance"  = function(par) par$sigma^2,
    "valid.response" = function(x) {
      if(is.factor(x) | is.character(x))
        stop("the response should be numeric!")
      return(TRUE)
    }
  )

  class(fam) <- "gamlss2.family"

  return(fam)
}

## load the abdominal circumference data
data("abdom", package = "gamlss.data")

## specify the model Formula
f <- y ~ s(x) | s(x)

## estimate model
b <- gamlss2(f, data = abdom, family = Normal)

## plot estimated effects
plot(b, which = "effects")

## plot diagnostics
plot(b, which = "resid")

## predict parameters
par <- predict(b)

## predict quantiles
pq <- sapply(c(0.05, 0.5, 0.95), function(q) family(b)$quantile(q, par))

## visualize
plot(y ~ x, data = abdom, pch = 19,
  col = rgb(0.1, 0.1, 0.1, alpha = 0.3))
matplot(abdom$x, pq, type = "l", lwd = 2,
  lty = 1, col = 4, add = TRUE)

## another example using only the density
## function, all derivatives are approximated
## in this case; for residual diagnostics,
## the $cdf() and $quantile() function is needed, too.
Gamma <- function(...) {
  fam <- list(
    "names" = c("mu", "sigma"),
    "links" = c("mu" = "log", "sigma" = "log"),
    "pdf" = function(y, par, log = FALSE, ...) {
      shape <- par$sigma
      scale <- par$mu/par$sigma
      dgamma(y, shape = shape, scale = scale, log = log)
    },
    "cdf" = function(y, par, lower.tail = TRUE, log.p = FALSE) {
      shape <- par$sigma
      scale <- par$mu/par$sigma
      pgamma(y, shape = shape, scale = scale,
        lower.tail = lower.tail, log.p = log.p)
    },
    "quantile" = function(p, par, lower.tail = TRUE, log.p = FALSE) {
      shape <- par$sigma
      scale <- par$mu/par$sigma
       qgamma(p, shape = shape, scale = scale,
         lower.tail = lower.tail, log.p = log.p)
    }
  )

  class(fam) <- "gamlss2.family"

  return(fam)
}

## example using the Munich rent data
data("rent", package = "gamlss.data")

## model formula
f <- R ~ ti(Fl) + ti(A) + ti(Fl, A, bs = "ps") |
  ti(Fl) + ti(A) + ti(Fl, A, bs = "ps")

## estimate model
b <- gamlss2(f, data = rent, family = Gamma)

## visualize estimated effects
plot(b, which = "effects")

## diagnostics, needs the $cdf() and $quantile() function!
plot(b, which = "resid")