Approximated exhaustive selection will be performed on the input GLM model.
apes(
model,
k = NULL,
estimator = "leaps",
fitted_values = NULL,
time_limit = 10L,
really_big = FALSE,
verbose = FALSE,
n_boot = 0,
workers = 1L
)
A "full" model with all variables of interest fitted to it. Accepts either a "glm" class or "coxph" class object.
Model size to explore. Default to NULL, which searches through all variables in the model. Alternatively, user can input a vector. If estimator is:
"leaps", then models up to size max(k) will be explored.
"mio", then models with specified values will be explored.
Either "leaps" (default) or "mio", which correspond to optimisation algorithms available in the leaps and bestsubset package, respectively.
Default to NULL, and fitted values are extracted from the fitted model itself. However, users can specify a vector of fitted values to improve the performance of APES as per Wang et. al. 2019.
The time limit for the maximum time allocated to each model size model when the "mio" estimator was selected. It will not affect the computational speed if "leaps" is selected as the estimator.
If set to FALSE (by default), then it will prevent variable selection greater than 30 variables. We recommend only setting this argument to TRUE if the number of variables exceed 30 while simultaneous setting the "estimator" argument to "mio".
Whether to print off messages during computations.
Number of bootstrap runs, default to 0, which doesn't perform any sampling.
Number of cores used for parallel processing for the bootstrap
Either an object of class "apes" in case "n_boot" is set to zero or an object of class "boot_apes" in case "n_boot" is set to a positive integer.
This is the main function of the APES package. Note that the feature dimensions equals to the number of columns if all columns are numeric. If factor/categorical variables are present in the input "glm" object, then the feature dimension also includes all the levels of factors. See vignette on birth weight.
set.seed(10)
n = 100
p = 10
beta = c(5, -5, rep(0, p-2))
x = matrix(rnorm(n*p), ncol = p)
colnames(x) = paste0("X", 1:p)
## Logistic regression example
y = rbinom(n = n, size = 1, prob = expit(x %*% beta))
data = data.frame(y, x)
model = glm(y ~ ., data = data, family = "binomial")
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
apes_result = apes(model = model)
#> No variable size specified, searching all sizes from 1 to p...
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
apes_result
#> Time taken: 5.30084e-05 minutes
#>
#> APES - AIC selected the following variables
#> intercept X1 X2 X3 X4 X5 X6 X7
#> -0.077 9.644 -8.097 0.000 0.000 0.000 0.000 0.000
#> X8 X9 X10
#> 1.501 0.000 0.000
#>
#> APES - BIC selected the following variables
#> intercept X1 X2 X3 X4 X5 X6 X7
#> -0.050 7.264 -5.972 0.000 0.000 0.000 0.000 0.000
#> X8 X9 X10
#> 0.000 0.000 0.000
class(apes_result)
#> [1] "apes"
names(apes_result)
#> [1] "apes_model_df" "apes_mle_beta" "apes_mle_beta_binary"
#> [4] "time_used" "selected_model_beta" "model_avg_beta"
#> [7] "response_tibble"
## Poisson regression example
y = rpois(n = n, lambda = exp(x %*% beta))
data = data.frame(y, x)
model = glm(y ~ ., data = data, family = "poisson")
apes(model = model)
#> No variable size specified, searching all sizes from 1 to p...
#> Time taken: 3.810724e-06 minutes
#>
#> APES - AIC selected the following variables
#> intercept X1 X2 X3 X4 X5 X6 X7
#> 10.844 1.689 0.000 0.000 0.000 0.000 0.000 0.000
#> X8 X9 X10
#> 0.000 0.000 0.000
#>
#> APES - BIC selected the following variables
#> intercept X1 X2 X3 X4 X5 X6 X7
#> 10.844 1.689 0.000 0.000 0.000 0.000 0.000 0.000
#> X8 X9 X10
#> 0.000 0.000 0.000
## Bootstrap examples
apes(model = model, n_boot = 2)
#> No variable size specified, searching all sizes from 1 to p...
#> Time taken: 0.000437359 minutes
#> Total number of bootstrap APES results: 2
apes(model = model, n_boot = 2, workers = 1)
#> No variable size specified, searching all sizes from 1 to p...
#> Time taken: 0.0007522821 minutes
#> Total number of bootstrap APES results: 2
## apes(model = model, n_boot = 2, workers = 2)
## Cox regression example
hx = exp(x %*% beta)
time = rexp(n, hx)
censor = rbinom(n = n, prob = 0.3, size = 1) # censoring indicator
data = data.frame(x, time = time, censor = censor)
model = survival::coxph(survival::Surv(time, censor) ~ ., data = data)
apes(model = model)
#> No variable size specified, searching all sizes from 1 to p...
#> Time taken: 4.11272e-06 minutes
#>
#> APES - AIC selected the following variables
#> intercept X1 X2 X3 X4 X5 X6 X7
#> 0.000 3.267 -2.844 -0.356 0.416 -0.282 0.000 0.547
#> X8 X9 X10
#> 0.794 0.000 0.000
#>
#> APES - BIC selected the following variables
#> intercept X1 X2 X3 X4 X5 X6 X7
#> 0.000 2.781 -2.186 0.000 0.000 0.000 0.000 0.000
#> X8 X9 X10
#> 0.729 0.000 0.000