library("gamlss2")
data("abdom", package = "gamlss.data")
## cross-validation using the NO distribution
## only model the mean with s(x)
<- cv_gamlss2(y ~ s(x), data = abdom, family = NO)
cv1
## now, also model the standard deviation with s(x)
<- cv_gamlss2(y ~ s(x) | s(x), data = abdom, family = BCT)
cv2
## evaluate log-likelihood
sum(cv1$score)
sum(cv2$score)
Cross Validation for gamlss2 Models
Description
cv_gamlss2()
implements K-fold cross validation for models fitted with gamlss2
. Different scoring rules can be supplied via the metric
argument. Convenience metric functions (log_pdf_metric()
, rqres_metric()
, mse_metric()
) are provided.
Usage
## K-fold cross-validation
cv_gamlss2(..., data, folds = 5,
metric = log_pdf_metric, parallel = FALSE, simplify = TRUE)
## log-pdf for each observation
log_pdf_metric(model, data)
## randomized quantile residuals
rqres_metric(model, data)
## mean squared error
mse_metric(model, data)
Arguments
…
|
model specification passed to gamlss2 such as formula , family , etc.
|
data
|
a data.frame containing the variables in the model. For functions supplied to argument metric , a data.frame for evaluating predictions or residuals.
|
folds
|
either an integer specifying the number of folds, or a list, matrix, or data frame of index sets for test folds. Defaults to 5. |
metric
|
a function of the form metric(model, data) returning a score for the given fitted model and test data. Defaults to log_pdf_metric .
|
parallel
|
logical. If TRUE , computation is carried out in parallel using future.apply.
|
simplify
|
logical. If TRUE , results are returned in a simplified vector or data frame depending on the metric output.
|
model
|
a fitted gamlss2 model.
|
Details
cv_gamlss2()
splits the data into training and test folds. For each fold the model is fitted on the training data, and the chosen metric
is evaluated on the held-out test data. By default, the scoring rule is the log predictive density (log_pdf_metric
), but other metrics can be used, such as randomized quantile residuals (rqres_metric
) or mean squared error of the conditional mean (mse_metric
).
The function returns either a list of fold-wise results or, if simplify = TRUE
, a named vector or data frame aligned with the original observations.
Value
If simplify = TRUE
and the metric returns scalars, a named numeric vector of fold scores is returned. Otherwise a data frame with fold membership and scores per observation is returned.
The convenience metrics return a numeric vector of scores or residuals.
See Also
gamlss2
, log_pdf