library("gamlss2")
data("abdom", package = "gamlss.data")
## cross-validation using the NO distribution
## only model the mean with s(x)
cv1 <- cv_gamlss2(y ~ s(x), data = abdom, family = NO)
## now, also model the standard deviation with s(x)
cv2 <- cv_gamlss2(y ~ s(x) | s(x), data = abdom, family = BCT)
## evaluate log-likelihood
sum(cv1$score)
sum(cv2$score)Cross Validation for gamlss2 Models
Description
cv_gamlss2() implements K-fold cross validation for models fitted with gamlss2. Different scoring rules can be supplied via the metric argument. Convenience metric functions (log_pdf_metric(), rqres_metric(), mse_metric()) are provided.
Usage
## K-fold cross-validation
cv_gamlss2(..., data, folds = 5,
  metric = log_pdf_metric, parallel = FALSE, simplify = TRUE)
## log-pdf for each observation
log_pdf_metric(model, data)
## randomized quantile residuals
rqres_metric(model, data)
## mean squared error
mse_metric(model, data)
Arguments
…
 | 
model specification passed to gamlss2 such as formula, family, etc.
 | 
data
 | 
a data.frame containing the variables in the model. For functions supplied to argument metric, a data.frame for evaluating predictions or residuals.
 | 
folds
 | 
either an integer specifying the number of folds, or a list, matrix, or data frame of index sets for test folds. Defaults to 5. | 
metric
 | 
a function of the form metric(model, data) returning a score for the given fitted model and test data. Defaults to log_pdf_metric.
 | 
parallel
 | 
logical. If TRUE, computation is carried out in parallel using future.apply.
 | 
simplify
 | 
logical. If TRUE, results are returned in a simplified vector or data frame depending on the metric output.
 | 
model
 | 
a fitted gamlss2 model.
 | 
Details
cv_gamlss2() splits the data into training and test folds. For each fold the model is fitted on the training data, and the chosen metric is evaluated on the held-out test data. By default, the scoring rule is the log predictive density (log_pdf_metric), but other metrics can be used, such as randomized quantile residuals (rqres_metric) or mean squared error of the conditional mean (mse_metric).
The function returns either a list of fold-wise results or, if simplify = TRUE, a named vector or data frame aligned with the original observations.
Value
If simplify = TRUE and the metric returns scalars, a named numeric vector of fold scores is returned. Otherwise a data frame with fold membership and scores per observation is returned.
The convenience metrics return a numeric vector of scores or residuals.
See Also
gamlss2, log_pdf