Lasso Model Terms

Description

Constructor function and plotting for Lasso penalized model terms for GAMLSS.

Usage

## Model term constructor function.
la(x, type = 1, const = 1e-05, ...)

## Plotting function.
plot_lasso(x, terms = NULL,
  which = c("criterion", "coefficients"),
  zoom = c(3, 4), spar = TRUE, ...)

Arguments

x For function la(), a numeric vector or matrix, or a formula. See the examples. For function plot_lasso(), an object returned from gamlss2.
type Integer or character, the type of the Lasso penalty. type = 1 or type = “normal” uses the normal penalty, type = 2 or type = “group” the group penalty, type = 3 or type = “ordinal” the ordinal fusion penalty and type = 4 or type = “nominal” the nominal fusion penalty.
const Numeric, the constant that is used for approximating the absolute function.
terms Character or integer, the model term that should be plotted. The default terms = NULL is plotting all model terms.
which Character, should the information criterion or the coefficient paths be plotted? See the examples.
zoom Numeric vector of length 2, the zooming factors for plotting information criteria curves and coefficient paths. The first element sets the distance from the optimum shrinkage parameter lambda to the left side, and the second element to the right side, respectively.
spar Logical, should plotting parameters be automatically set in par?
For function la() further control arguments can be passed: The criterion = “bic” (BIC, default) for shrinkage parameter selection (other options are “aic”, “aicc”, “gcv” or “gaic” with default K = 2), arguments for creating the model.matrix if the model term is specified using a formula. For function plot_lasso() arguments like lwd, col, main, etc., that control plotting parameters can be supplied. An additional ridge penalty (elastic net) can be added to each la() term be setting add_ridge = TRUE in the gamlss2 call.

Details

To implement the Lasso penalty, an approximation of the absolute value function is used, following the approach by Oelker and Tutz (2015). This enables the use of standard Newton-Raphson-type algorithms for estimation. Each Lasso model term has its own shrinkage parameter, allowing a mix of different penalty types within the model. The framework builds on the methodology of Groll et al. (2019), where coefficients are updated through iteratively reweighted least squares (IWLS). This is feasible due to the absolute function approximation, which results in a quadratic penalty matrix similar to that used in penalized splines. By default, the shrinkage parameters are selected using the corrected Akaike Information Criterion (cAIC).

la() differs from the model term constructor gnet. While gnet() delegates estimation to glmnet and applies the standard Lasso penalty to a design matrix generated by model.matrix, la() provides a unified framework for several Lasso-type penalties (normal, group, ordinal and nominal fusion). This allows for structured shrinkage, for example by fusing levels of categorical covariates, which is not available in gnet(). In addition, an optional ridge component can be added via add_ridge = TRUE in the gamlss2 call, yielding an elastic-net-type penalty for each la() term.

For numerical stability and comparability of shrinkage parameters across different Lasso-type penalties, the design matrix of each la() model term is internally scaled before estimation. The scaling depends on the selected penalty type and is chosen such that it preserves the structural meaning of the penalty.

type = “group”

For group Lasso penalties, factor-specific design matrices are block-standardized using a QR decomposition. The columns within each factor block are transformed to an orthonormal basis (up to a factor \(\sqrt{n}\)), ensuring that the group penalty is invariant to the chosen contrast coding and that groups of different sizes are penalized on a comparable scale. This transformation may mix columns within a block but preserves the column space of the design matrix.

type = “ordinal” and type = “nominal”

For ordinal and nominal fusion penalties, no column mixing is applied. Instead, each factor block is multiplied by a single scalar chosen such that the root mean square column norm equals \(\sqrt{n}\). This scalar scaling preserves the interpretation of adjacent or pairwise differences between factor levels, which form the basis of the fusion penalty.

type = “normal”

For normal Lasso penalties, numeric covariates and matrices are scaled column-wise so that each column has Euclidean norm \(\sqrt{n}\). This corresponds to standard Lasso scaling and ensures that shrinkage is applied uniformly across covariates. If a factor is used with a normal penalty, a single scalar scaling is applied, analogous to the ordinal and nominal cases.

The applied scaling is stored internally and automatically reused during prediction. When plot_lasso() is called with rescale = TRUE, coefficient paths are transformed back to the original parameterization to facilitate interpretation.

Value

The la() function is used internally within gamlss2 and provides the necessary details for estimating Lasso-type model terms. Essentially, it serves as a special model term, as outlined in specials.

Currently, the plot_lasso() function does not return any output.

References

Groll A, Hambuckers J, Kneib T, Umlauf N (2019). Lasso-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape. Computational Statistics & Data Analysis, 140(12), 59-73. doi:10.1016/j.csda.2019.06.005

Oelker MR, Tutz G (2017). A Uniform Framework for Combination of Penalties in Generalized Structured Models. Advances in Data Analysis and Classification, 11(1), 97-120. doi:10.1007/s11634-015-0205-y

See Also

gnet, gamlss2, specials.

Examples

library("gamlss2")


data("rent", package = "gamlss.data")

## transform numeric to factor variables
rent$Flc <- cut(rent$Fl, breaks = seq(20, 160, by = 10),
  include.lowest = TRUE)
rent$Ac <- cut(rent$A, breaks = seq(1890, 1990, by = 10),
  include.lowest = TRUE)

## set up the model formula for a BCT model
f <- R ~ la(Flc,type=3) + la(Ac,type=3) + la(loc,type=3) | . | .

## estimation
b <- gamlss2(f, data = rent, family = BCT)

## summary, shows the estimated degrees of freedom
## for each model term
summary(b)

## plot estimated coefficients
plot(b)

## plot information criteria curves
## for each model term.
plot_lasso(b)

## plot parameter paths.
plot_lasso(b, which = "coefficients")

## plot a single model term.
plot_lasso(b, which = "coefficients", term = 5)

## same with
plot_lasso(b, which = "coefficients", term = "sigma.la(Ac")

## zoom out
plot_lasso(b, which = "coefficients", term = 5,
  zoom = c(8, 7))

## set names
plot_lasso(b, which = "coefficients", term = 5,
  zoom = c(8, 7), names = c("A", "B", "C"))

## set title
plot_lasso(b, which = "coefficients", term = 5,
  zoom = c(8, 7), main = "Fused Lasso")

## simulated example using the normal lasso
## and a matrix as argument for la()
set.seed(123)

## number of observations and covariates
n <- 500
k <- 50

## model matrix
X <- matrix(rnorm(n * k), n, k)
colnames(X) <- paste0("x", 1:k)

## true coefficients
beta <- list(
  "mu" = rbinom(k, 1, 0.1),
  "sigma" = rbinom(k, 1, 0.1) * 0.3
)

## parameters
mu <- X %*% beta$mu
sigma <- exp(-1 + X %*% beta$sigma)

## response
y <- rnorm(n, mean = mu, sd = sigma)

## model formula with nominal fused lasso
f <- y ~ la(X,type=4) | la(X,type=4)

## estimate model incl. extra ridge penalty
## for all la() model terms
b <- gamlss2(f, add_ridge = TRUE)

## plot information criteria curves
plot_lasso(b)

## coefficient paths
plot_lasso(b, which = "coefficients")

## zoom out
plot_lasso(b, which = "coefficients",
  zoom = c(8, 9))

## extract coefficients
cb <- coef(b, full = TRUE)

## compare (without intercept)
cb_mu <- cb[grep("mu.", names(cb))][-1]
cb_sigma <- cb[grep("sigma.", names(cb))][-1]

thres <- 0.01

## true positive rate
tp <- mean(c(abs(cb_mu[beta$mu > 0]) > thres,
  abs(cb_sigma[beta$sigma > 0]) > thres))
print(tp)

## false positive rate, needs threshold
fp <- mean(c(abs(cb_mu[beta$mu == 0]) > thres,
  abs(cb_sigma[beta$sigma == 0]) > thres))
print(fp)

## fused ordinal Lasso P-spline example with
## seasonal variation of motorcycle counts at Sonnenberg/Harz
data("HarzTraffic", package = "gamlss2")
plot(bikes ~ yday, data = HarzTraffic)

## P-spline design matrix
X <- smooth.construct(s(yday,bs="ps",k=20), data = HarzTraffic, NULL)$X
i <- order(HarzTraffic$yday)
with(HarzTraffic, matplot(yday[i], X[i, ], type = "l", lty = 1))

## estimate model
f <- bikes ~ la(X,type=3,ridge=TRUE) | . | .
b <- gamlss2(f, data = HarzTraffic, family = SICHEL)
summary(b)
plot(b, which = "resid")

## coefficient paths
plot_lasso(b, which = "coefficients", zoom = 15)

## plot estimated qauntiles
qu <- quantile(b, probs = c(0.05, 0.5, 0.95))
plot(bikes ~ yday, data = HarzTraffic, col = adjustcolor(1, alpha = 0.3))
with(HarzTraffic, matplot(yday[i], qu[i, ],
  type = "l", col = c(3, 1, 2), lty = 1, lwd = 2, add = TRUE))