library("gamlss2")
set.seed(123)
## number of observations
<- 1000
n
## covariates
<- 100
k <- matrix(runif(n * k, -3, 3), ncol = k)
x colnames(x) <- paste0("x", 1:k)
<- as.data.frame(x)
d
## true effects
$f1 <- sin(d$x1)
d$f2 <- exp(d$x2)/15 - 1
d$f3 <- -d$x3 / 3
d$f4 <- d$x4^2/5 - 1
d$f5 <- cos(d$x5)
d
## true parameters
<- with(d, f1 + f3 + f4)
mu <- with(d, exp(1.5 + f2 + f4 + f5))
sigma
## simulate response
$y <- rnorm(n, mean = mu, sd = sigma)
d
## model formula
<- paste("~", paste0("s(x", 1:k, ")", collapse = "+"))
f <- as.formula(f)
f <- list(f, f)
f 1]] <- update(f[[1]], y ~ .)
f[[
## estimate model
<- select_gamlss2(f, data = d, family = NO)
b
## plot selected estimated effects
plot(b)
## final model
new_formula(b)
## example taken from ?gam.selection
library("mgcv")
set.seed(3)
<- 200
n
## simulate data
<- gamSim(1, n = n, scale = .15, dist = "poisson")
dat
## spurious
$x4 <- runif(n, 0, 1)
dat$x5 <- runif(n, 0, 1)
dat
## formula
<- y ~ s(x0) + s(x1) + s(x2) + s(x3) + s(x4) + s(x5)
f
## estimate model
<- gam(f, data = dat, family = poisson,
b1 select = TRUE, method = "REML")
summary(b1)
plot(b1, pages = 1)
## same with gamlss2
<- select_gamlss2(f, data = dat, family = PO)
b2
## plot selected effects
plot(b2)
## final model
new_formula(b2)
Smooth Model Term Selection with Additional Penalties
Description
The function select_gamlss2()
allows for penalizing all mgcv model terms with an additional shrinkage penalty, which can shrink some terms to zero, effectively selecting them out of the model. In addition to this penalty, model terms are selected based on two criteria: the estimated degrees of freedom of the term and the percentage of the predictor range covered by the model term. These two thresholds aim to mimic a natural selection process, similar to what one might do by inspecting summaries and effect plots, ensuring that only relevant terms are retained in the final model.
Usage
## Wrapper function.
select_gamlss2(formula, ..., criterion = "BIC", thres = c(0.9, 0.2))
## Modified RS optimizer function.
sRS(x, y, specials, family, offsets,
weights, start, xterms, sterms, control)
Arguments
formula
|
A GAM-type formula or Formula . All smooth terms of the mgcv package are supported, see also formula.gam . For gamlss.list() formula is a list of formula s.
|
…
|
Arguments passed to argument control in function sRS() .
|
criterion
|
The information criterion to be used for estimating the shrinkage parameters. This can also be a vector of length 2, where the first element specifies the criterion to be used during the selection step, and the second element specifies the criterion to be used during the refitting step. Possible options are “BIC” , “GCV” , “AIC” , “AICc” and “GAIC” with user defined penalty K , default is K = 2 .
|
thres
|
A vector of thresholds used for model term selection. The first element controls the minimum allowed estimated degrees of freedom for a model term to enter the model. The second element specifies the minimum percentage of the total estimated predictor range required for a term to be included in the model. |
x
|
The full model matrix to be used for fitting. |
y
|
The response vector or matrix. |
specials
|
A named list of special model terms, e.g., including design and penalty matrices for fitting smooth terms using smooth.construct .
|
family
|
A family object, see gamlss2.family .
|
offsets
|
If supplied, a list or data frame of possible model offset. |
weights
|
If supplied, a numeric vector of weights. |
start
|
Starting values, either for the parameters of the response distribution or, if specified as a named list in which each element of length one is named with “(Intercept)” , the respective intercepts are initialized. If starting values are specified as a named list, data frame or matrix, where each element/column is a vector with the same length as the number of observations in the data, the respective predictors are initialized with these. See the examples for gamlss2 .
|
xterms
|
A named list specifying the linear model terms. Each named list element represents one parameter as specified in the family object. |
sterms
|
A named list specifying the special model terms. Each named list element represents one parameter as specified in the family object. |
control
|
Further control arguments as specified within the call of gamlss2 .
|
Details
The function select_gamlss2
selects model terms by identifying those with estimated degrees of freedom greater than a pre-specified threshold (e.g., thres = 0.9
). In addition, model term selection can also be based on the percentage of the total predictor range covered by the model term to ensure that the term represents a substantial portion of the total effect. After the selection step, the model is refitted using only the selected terms, excluding the additional penalty applied during the initial fitting process. The additional penalty for automatic term selection is described in ?gam.selection
. Please note that this is experimental, and careful consideration should always be given to the modeling process.
Value
An object of class “gamlss2”
.
See Also
gamlss2
, new_formula
.