Stepwise Model Term Selection Using GAIC

Description

The optimizer function stepwise() performs stepwise model term selection using a Generalized Akaike Information Criterion (GAIC). Estimation is based on the Rigby and Stasinopoulos (RS) & Cole and Green (CG) algorithm as implemented in function RS.

Usage

## Wrapper function for stepwise GAMLSS estimation.
step_gamlss2(formula, ..., K = 2,
  strategy = c("both.linear", "both"), keeporder = FALSE,
  cores = 1L)

## Stepwise optimizer function.
stepwise(x, y, specials, family, offsets,
  weights, start, xterms, sterms, control)

Arguments

formula A model formula for gamlss2.
Arguments passed to gamlss2.
K Numeric, the penalty for the GAIC.
strategy Character, the strategy that should be applied for the stepwise algorithm. Possible options are “forward.linear”, “forward”, “backward”, “backward.linear”, “replace”, “replace.linear”, “both”, “both.linear”. See the details.
keeporder Logical, For the different strategies of the stepwise algorithm, should the updates be performed sequentially according to the order of the parameters of the response distribution as specified in the family (see gamlss2.family), or should the selection search be performed across all parameters?
cores Integer, if cores > 1L, function mclapply function is used to speed up computations using multiple cores within the selection steps.
x The full model matrix to be used for fitting.
y The response vector or matrix.
specials A named list of special model terms, e.g., including design and penalty matrices for fitting smooth terms using smooth.construct.
family A family object, see gamlss2.family.
offsets If supplied, a list or data frame of possible model offset.
weights If supplied, a numeric vector of weights.
start Starting values, either for the parameters of the response distribution or, if specified as a named list in which each element of length one is named with “(Intercept)”, the respective intercepts are initialized. If starting values are specified as a named list, data frame or matrix, where each element/column is a vector with the same length as the number of observations in the data, the respective predictors are initialized with these. See the examples for gamlss2.
xterms A named list specifying the linear model terms. Each named list element represents one parameter as specified in the family object.
sterms A named list specifying the special model terms. Each named list element represents one parameter as specified in the family object.
control Further control arguments as specified within the call of gamlss2.

Details

The wrapper function step_gamlss2() calls gamlss2 using the stepwise() optimizer function.

The stepwise algorithm can apply the following strategies:

  1. Each predictor must include an intercept.

  2. In a forward selection step, model terms with the highest improvement on the GAIC are selected.

  3. In a replacement step, each model term is tested to see if an exchange with a model term not yet selected will improve the GAIC.

  4. In a backward step, model terms are deselected, if the GAIC can be further improved.

  5. In a bidirectional step, model terms can be either added or removed.

  6. In addition, the forward, backward and replace selection step can be combined.

The selected strategies are iterated until no further improvement is achieved.

The different strategies can be selected using argument strategy. Please see the examples. Possible values are strategy = c(“both”, “forward”, “backward”, “replace”, “all”). Here, strategy = “all” combines the forward, backward and replace selection step.

In addition, each of the steps 2-4 can be applied to linear model terms only, prior to performing the steps for all model terms. This can be done by additionally setting strategy = c(“both.linear”, “forward.linear”, “backward.linear”, “replace.linear”, “all.linear”).

The default is strategy = c(“both.linear”, “both”) and keeporder = FALSE.

Note that each of the steps 2-4 can be performed while maintaining the order of the parameters of the response distribution, i.e., if the keeporder = TRUE argument is set, then the parameters will be updated in the order specified in the gamlss2.family. Using backward elimination, the model terms are deselected in reverse order.

Value

The optimizer function stepwise() returns the final model as named list of class “gamlss2”. See the return value of function RS. The wrapper function step_gamlss2() also returns the final model.

See Also

new_formula, gamlss2, gamlss2_control, RS

Examples

library("gamlss2")


data("rent", package = "gamlss.data")

## because of possible linear interactions,
## scale the covariates first
rent$Fl <- scale(rent$Fl)
rent$A <- scale(rent$A)

## the Formula defines the searching scope
f <- R ~ Fl + A + Fl:A + loc + s(Fl) + s(A) + te(Fl, A) |
  Fl + A + loc + Fl:A + s(Fl) + s(A) + te(Fl, A)

## estimate a Gamma model using the stepwise algorithm
b <- step_gamlss2(f, data = rent, family = GA, K = 2)

## same with
## b <- gamlss2(f, data = rent, family = GA, optimizer = stepwise, K = 2)

## show the new formula of selected model terms
new_formula(b)

## final model summary
summary(b)

## effect plots
plot(b)

## diagnostic plots
plot(b, which = "resid")

## plot GAIC
plot(b, which = "selection")

## use forward linear, replace and backward strategy
b <- step_gamlss2(f, data = rent, family = GA, K = 2,
  strategy = c("forward.linear", "replace", "backward"))

## more complex model
## note, the third parameter
## nu does not include any model terms
f <- R ~ Fl + A + Fl:A + loc + s(Fl) + s(A) + te(Fl, A) |
  Fl + A + loc + Fl:A + s(Fl) + s(A) + te(Fl, A) |
  1 |
  Fl + A + loc + Fl:A + s(Fl) + s(A) + te(Fl, A)

## model using the BCT family
b <- step_gamlss2(f, data = rent, family = BCT,
  K = 2, strategy = c("forward.linear", "both"),
  keeporder = TRUE)

## plot GAIC
plot(b, which = "selection")