Family Objects

Note that all family objects of the gamlss.dist package can be used for modeling. However, for users wanting to specify their own (new) distribution model, this document provides a guide on how to define custom family objects within the gamlss2 framework.

Family objects in the gamlss2 package play an essential role in defining the models used for fitting data to distributions. These objects encapsulate the necessary details about the distribution and the parameters, such as:

This document provides an overview of how to construct and use family objects within gamlss2. By the end, you should have a good understanding of how to implement a custom family for use in statistical models.

Defining Family Objects

A family object in gamlss2 is a list that must meet the following minimum criteria:

  • Family Name: The object must contain the family name as a character string.
  • Parameters: The object must list the parameters of the distribution (e.g., "mu" and "sigma" for a normal distribution).
  • Link Functions: It must specify the link functions associated with each parameter.
  • Density Function: A d() function must be provided to evaluate the (log-)density of the distribution.

Optionally, a family object can include functions to calculate the log-likelihood, random number generation, cumulative distribution function (CDF), and quantile function.

Here’s an example of a minimal family object for the normal distribution.

Normal <- function(...) {
  fam <- list(
    "family" = "Normal",
    "names" = c("mu", "sigma"),
    "links" = c("mu" = "identity", "sigma" = "log"),
    "d" = function(y, par, log = FALSE, ...) {
      dnorm(y, par$mu, par$sigma, log = log)
    }
  )
  class(fam) <- "gamlss2.family"
  return(fam)
}

In this example, we define a normal distribution with two parameters: "mu" (mean) and "sigma" (standard deviation). The link function for "mu" is the identity, and for "sigma", it is the log function. The density function uses the standard dnorm() function from to calculate the normal density.

Density Function

The density function must accept the following arguments:

d(y, par, log = FALSE, ...)
  • y: The response variable.
  • par: A named list of parameters (e.g., "mu", "sigma" for the normal distribution).
  • log: A logical value indicating whether to return the log-density.

Optional Derivatives

Family objects can optionally include functions to compute the first and second derivatives of the log-likelihood with respect to the predictors (or its expectations). These derivatives are used for optimization during model fitting.

The derivative functions follow the form:

function(y, par, ...)

The derivate functions of first order must be provided as a named list, one list element for each parameter of the distribution, and is named "score". The second order derivative list is named "hess". Note that these functions must return the derivative w.r.t. predictor and the "hess" functions must return the negative (expected) second derivatives

An example of setting up first and second order derivatives for the normal is provided in the following code:

Normal <- function(...) {
  fam <- list(
    "family" = "Normal",
    "names" = c("mu", "sigma"),
    "links" = c("mu" = "identity", "sigma" = "log"),
    "d" = function(y, par, log = FALSE, ...) {
      dnorm(y, par$mu, par$sigma, log = log)
    },
    "score" = list(
      "mu" = function(y, par, ...) {
        (y - par$mu) / (par$sigma^2)
      },
      "sigma" = function(y, par, ...) {
        -1 + (y - par$mu)^2 / (par$sigma^2)
      }
    ),
    "hess" = list(
      "mu" = function(y, par, ...) {
        1 / (par$sigma^2)
      },
      "sigma" = function(y, par, ...) {
        rep(2, length(y))
      }
    )
  )
  class(fam) <- "gamlss2.family"
  return(fam)
}

If no derivatives are provided, numerical approximations will be used by the package.

Additional Functions

Family objects can also include other functions such as:

  • Cumulative distribution function (p()).
  • Quantile function (q()).
  • Random number generation (r()).

These functions should adhere to the same structure as the density function, taking the response (y), parameters (par), and other relevant arguments.

Conclusion

Family objects in the gamlss2 package are a fundamental component for defining flexible, distribution-based regression models, and beyond. By encapsulating the necessary elements, such as parameters, link functions, and density functions, they provide a powerful framework for customizing models to fit specific data. The flexibility to define custom families, as demonstrated with the Kumaraswamy() distribution, enables users to extend the package beyond its default families, making it adaptable to a wide range of modeling scenarios. Furthermore, the ability to define both static and dynamic link functions enhances the versatility of gamlss2 for distributional regression, empowering users to tailor models to their unique data and research needs.

References

Rigby, R. A., and D. M. Stasinopoulos. 2005. “Generalized Additive Models for Location, Scale and Shape.” Journal of the Royal Statistical Society Series C (Applied Statistics) 54 (3): 507–54. https://doi.org/10.1111/j.1467-9876.2005.00510.x.