Title: | Model Consistent Lasso Estimation Through the Bootstrap |
---|---|
Description: | Implements the bolasso algorithm for consistent variable selection and estimation accuracy. Includes support for many parallel backends via the future package. For details see: Bach (2008), 'Bolasso: model consistent Lasso estimation through the bootstrap', <arXiv:0804.1302>. |
Authors: | Daniel Molitor [aut, cre] |
Maintainer: | Daniel Molitor <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.0 |
Built: | 2024-11-13 05:14:58 UTC |
Source: | https://github.com/dmolitor/bolasso |
This function implements model-consistent Lasso estimation through the bootstrap. It supports parallel processing by way of the future package, allowing the user to flexibly specify many parallelization methods. This method was developed as a variable-selection algorithm, but this package also supports making ensemble predictions on new data using the bagged Lasso models.
bolasso( formula, data, n.boot = 100, progress = TRUE, implement = "glmnet", x = NULL, y = NULL, ... )
bolasso( formula, data, n.boot = 100, progress = TRUE, implement = "glmnet", x = NULL, y = NULL, ... )
formula |
An optional object of class formula (or one that can be
coerced to that class): a symbolic description of the model to be fitted.
Can be omitted when |
data |
An optional object of class data.frame that contains the
modeling variables referenced in |
n.boot |
An integer specifying the number of bootstrap replicates. |
progress |
A boolean indicating whether to display progress across bootstrap folds. |
implement |
A character; either 'glmnet' or 'gamlr', specifying which
Lasso implementation to utilize. For specific modeling details, see
|
x |
An optional predictor matrix in lieu of |
y |
An optional response vector in lieu of |
... |
Additional parameters to pass to either
|
An object of class bolasso
. This object is a list of length
n.boot
of cv.glmnet
or cv.gamlr
objects.
Bach FR (2008). “Bolasso: model consistent Lasso estimation through the bootstrap.” CoRR, abs/0804.1302. 0804.1302, https://arxiv.org/abs/0804.1302.
glmnet::cv.glmnet and gamlr::cv.gamlr for full details on the
respective implementations and arguments that can be passed to ...
.
mtcars[, c(2, 10:11)] <- lapply(mtcars[, c(2, 10:11)], as.factor) idx <- sample(nrow(mtcars), 22) mtcars_train <- mtcars[idx, ] mtcars_test <- mtcars[-idx, ] ## Formula Interface # Train model set.seed(123) bolasso_form <- bolasso( form = mpg ~ ., data = mtcars_train, n.boot = 20, nfolds = 5, implement = "glmnet" ) # Extract selected variables selected_vars(bolasso_form, threshold = 0.9, select = "lambda.min") # Bagged ensemble prediction on test data predict(bolasso_form, new.data = mtcars_test, select = "lambda.min") ## Alternal Matrix Interface # Train model set.seed(123) bolasso_mat <- bolasso( x = model.matrix(mpg ~ . - 1, mtcars_train), y = mtcars_train[, 1], data = mtcars_train, n.boot = 20, nfolds = 5, implement = "glmnet" ) # Extract selected variables selected_vars(bolasso_mat, threshold = 0.9, select = "lambda.min") # Bagged ensemble prediction on test data predict(bolasso_mat, new.data = model.matrix(mpg ~ . - 1, mtcars_test), select = "lambda.min")
mtcars[, c(2, 10:11)] <- lapply(mtcars[, c(2, 10:11)], as.factor) idx <- sample(nrow(mtcars), 22) mtcars_train <- mtcars[idx, ] mtcars_test <- mtcars[-idx, ] ## Formula Interface # Train model set.seed(123) bolasso_form <- bolasso( form = mpg ~ ., data = mtcars_train, n.boot = 20, nfolds = 5, implement = "glmnet" ) # Extract selected variables selected_vars(bolasso_form, threshold = 0.9, select = "lambda.min") # Bagged ensemble prediction on test data predict(bolasso_form, new.data = mtcars_test, select = "lambda.min") ## Alternal Matrix Interface # Train model set.seed(123) bolasso_mat <- bolasso( x = model.matrix(mpg ~ . - 1, mtcars_train), y = mtcars_train[, 1], data = mtcars_train, n.boot = 20, nfolds = 5, implement = "glmnet" ) # Extract selected variables selected_vars(bolasso_mat, threshold = 0.9, select = "lambda.min") # Bagged ensemble prediction on test data predict(bolasso_mat, new.data = model.matrix(mpg ~ . - 1, mtcars_test), select = "lambda.min")
Identifies independent variables that are selected by the Bolasso algorithm at least the fraction of the time specified by the user-defined threshold. The typical value for this threshold is 0.9 and typically shouldn't be lower than that.
selected_vars(object, threshold = 0.9, summarise = TRUE, ...)
selected_vars(object, threshold = 0.9, summarise = TRUE, ...)
object |
An object of class bolasso. |
threshold |
A numeric between 0 and 1, specifying the fraction of bootstrap replicates for which Lasso must select a variable for it to be considered a selected variable. |
summarise |
A Boolean indicator where |
... |
Additional arguments to pass to |
A tibble with each selected variable and its respective coefficient for each bootstrap replicate.
glmnet::predict.glmnet()
and gamlr:::predict.gamlr
for details
on additional arguments to pass to ...
.