Approximated exhaustive selection will be performed on the input GLM model.

apes(
  model,
  k = NULL,
  estimator = "leaps",
  fitted_values = NULL,
  time_limit = 10L,
  really_big = FALSE,
  verbose = FALSE,
  n_boot = 0,
  workers = 1L
)

Arguments

model

A "full" model with all variables of interest fitted to it. Accepts either a "glm" class or "coxph" class object.

k

Model size to explore. Default to NULL, which searches through all variables in the model. Alternatively, user can input a vector. If estimator is:

  • "leaps", then models up to size max(k) will be explored.

  • "mio", then models with specified values will be explored.

estimator

Either "leaps" (default) or "mio", which correspond to optimisation algorithms available in the leaps and bestsubset package, respectively.

fitted_values

Default to NULL, and fitted values are extracted from the fitted model itself. However, users can specify a vector of fitted values to improve the performance of APES as per Wang et. al. 2019.

time_limit

The time limit for the maximum time allocated to each model size model when the "mio" estimator was selected. It will not affect the computational speed if "leaps" is selected as the estimator.

really_big

If set to FALSE (by default), then it will prevent variable selection greater than 30 variables. We recommend only setting this argument to TRUE if the number of variables exceed 30 while simultaneous setting the "estimator" argument to "mio".

verbose

Whether to print off messages during computations.

n_boot

Number of bootstrap runs, default to 0, which doesn't perform any sampling.

workers

Number of cores used for parallel processing for the bootstrap

Value

Either an object of class "apes" in case "n_boot" is set to zero or an object of class "boot_apes" in case "n_boot" is set to a positive integer.

Details

This is the main function of the APES package. Note that the feature dimensions equals to the number of columns if all columns are numeric. If factor/categorical variables are present in the input "glm" object, then the feature dimension also includes all the levels of factors. See vignette on birth weight.

Examples

set.seed(10)
n = 100
p = 10
beta = c(5, -5, rep(0, p-2))
x = matrix(rnorm(n*p), ncol = p)
colnames(x) = paste0("X", 1:p)

## Logistic regression example
y = rbinom(n = n, size = 1, prob = expit(x %*% beta))
data = data.frame(y, x)
model = glm(y ~ ., data = data, family = "binomial")
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
apes_result = apes(model = model)
#> No variable size specified, searching all sizes from 1 to p... 
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
apes_result
#> Time taken:  5.30084e-05  minutes 
#> 
#>  APES - AIC selected the following variables 
#> intercept        X1        X2        X3        X4        X5        X6        X7 
#>    -0.077     9.644    -8.097     0.000     0.000     0.000     0.000     0.000 
#>        X8        X9       X10 
#>     1.501     0.000     0.000 
#> 
#>  APES - BIC selected the following variables 
#> intercept        X1        X2        X3        X4        X5        X6        X7 
#>    -0.050     7.264    -5.972     0.000     0.000     0.000     0.000     0.000 
#>        X8        X9       X10 
#>     0.000     0.000     0.000 
class(apes_result)
#> [1] "apes"
names(apes_result)
#> [1] "apes_model_df"        "apes_mle_beta"        "apes_mle_beta_binary"
#> [4] "time_used"            "selected_model_beta"  "model_avg_beta"      
#> [7] "response_tibble"     

## Poisson regression example
y = rpois(n = n, lambda = exp(x %*% beta))
data = data.frame(y, x)
model = glm(y ~ ., data = data, family = "poisson")
apes(model = model)
#> No variable size specified, searching all sizes from 1 to p... 
#> Time taken:  3.810724e-06  minutes 
#> 
#>  APES - AIC selected the following variables 
#> intercept        X1        X2        X3        X4        X5        X6        X7 
#>    10.844     1.689     0.000     0.000     0.000     0.000     0.000     0.000 
#>        X8        X9       X10 
#>     0.000     0.000     0.000 
#> 
#>  APES - BIC selected the following variables 
#> intercept        X1        X2        X3        X4        X5        X6        X7 
#>    10.844     1.689     0.000     0.000     0.000     0.000     0.000     0.000 
#>        X8        X9       X10 
#>     0.000     0.000     0.000 

## Bootstrap examples
apes(model = model, n_boot = 2)
#> No variable size specified, searching all sizes from 1 to p... 
#> Time taken:  0.000437359  minutes 
#> Total number of bootstrap APES results:  2
apes(model = model, n_boot = 2, workers = 1)
#> No variable size specified, searching all sizes from 1 to p... 
#> Time taken:  0.0007522821  minutes 
#> Total number of bootstrap APES results:  2
## apes(model = model, n_boot = 2, workers = 2)

## Cox regression example
hx = exp(x %*% beta)
time = rexp(n, hx)
censor = rbinom(n = n, prob = 0.3, size = 1) # censoring indicator
data = data.frame(x, time = time, censor = censor)
model = survival::coxph(survival::Surv(time, censor) ~ ., data = data)
apes(model = model)
#> No variable size specified, searching all sizes from 1 to p... 
#> Time taken:  4.11272e-06  minutes 
#> 
#>  APES - AIC selected the following variables 
#> intercept        X1        X2        X3        X4        X5        X6        X7 
#>     0.000     3.267    -2.844    -0.356     0.416    -0.282     0.000     0.547 
#>        X8        X9       X10 
#>     0.794     0.000     0.000 
#> 
#>  APES - BIC selected the following variables 
#> intercept        X1        X2        X3        X4        X5        X6        X7 
#>     0.000     2.781    -2.186     0.000     0.000     0.000     0.000     0.000 
#>        X8        X9       X10 
#>     0.729     0.000     0.000