Cross Validation for gamlss2 Models

Description

cv_gamlss2() implements K-fold cross validation for models fitted with gamlss2. Different scoring rules can be supplied via the metric argument. Convenience metric functions (log_pdf_metric(), rqres_metric(), mse_metric()) are provided.

Usage

## K-fold cross-validation
cv_gamlss2(..., data, folds = 5,
  metric = log_pdf_metric, parallel = FALSE, simplify = TRUE)

## log-pdf for each observation
log_pdf_metric(model, data)

## randomized quantile residuals
rqres_metric(model, data)

## mean squared error
mse_metric(model, data)

Arguments

model specification passed to gamlss2 such as formula, family, etc.
data a data.frame containing the variables in the model. For functions supplied to argument metric, a data.frame for evaluating predictions or residuals.
folds either an integer specifying the number of folds, or a list, matrix, or data frame of index sets for test folds. Defaults to 5.
metric a function of the form metric(model, data) returning a score for the given fitted model and test data. Defaults to log_pdf_metric.
parallel logical. If TRUE, computation is carried out in parallel using future.apply.
simplify logical. If TRUE, results are returned in a simplified vector or data frame depending on the metric output.
model a fitted gamlss2 model.

Details

cv_gamlss2() splits the data into training and test folds. For each fold the model is fitted on the training data, and the chosen metric is evaluated on the held-out test data. By default, the scoring rule is the log predictive density (log_pdf_metric), but other metrics can be used, such as randomized quantile residuals (rqres_metric) or mean squared error of the conditional mean (mse_metric).

The function returns either a list of fold-wise results or, if simplify = TRUE, a named vector or data frame aligned with the original observations.

Value

If simplify = TRUE and the metric returns scalars, a named numeric vector of fold scores is returned. Otherwise a data frame with fold membership and scores per observation is returned.
The convenience metrics return a numeric vector of scores or residuals.

See Also

gamlss2, log_pdf

Examples

library("gamlss2")


data("abdom", package = "gamlss.data")

## cross-validation using the NO distribution
## only model the mean with s(x)
cv1 <- cv_gamlss2(y ~ s(x), data = abdom, family = NO)

## now, also model the standard deviation with s(x)
cv2 <- cv_gamlss2(y ~ s(x) | s(x), data = abdom, family = BCT)

## evaluate log-likelihood
sum(cv1$score)
sum(cv2$score)