vignettes/articles/speed.Rmd
speed.Rmd
The APES method relies on a best subset algorithm for linear
regression model. In the current implementation of the APES
package, we allow for two backend options. One is the leaps-and-bound
algorithm from the leaps
package and Mixed Integer
Optimisation from the bestsubset
package with a backend
connect to the Gurobi commercial software.
You can learn about how to install Gurobi and the
bestsubset
package from here
We will use a simulated example to demostrate APES with Gurobi Mixed
Integer Optimisation (MIO) via the bestsubset
package.
Invoking this option can be done by specifying
estimator = "mio"
.
## Simulating data
library(APES)
set.seed(10)
n = 100
p = 10
beta = c(1, -1, rep(0, p-2))
x = matrix(rnorm(n*p), ncol = p)
colnames(x) = paste0("X", 1:p)
y = rbinom(n = n, size = 1, prob = expit(x %*% beta))
data = data.frame(y, x)
model = glm(y ~ ., data = data, family = "binomial")
## Running APES
apes_result = apes(model = model, estimator = "mio")
#> Registered S3 method overwritten by 'bestsubset':
#> method from
#> predict.bs splines
#> 0. Computing max eigenvalue of X^T X, for the step size in projected gradient descent.
#> 1. Solving best subset selection with k=1.
#> 1. Solving best subset selection with k=1.
#> 1. Solving best subset selection with k=1.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 2. Solving best subset selection with k=2.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 3. Solving best subset selection with k=3.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 4. Solving best subset selection with k=4.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 5. Solving best subset selection with k=5.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 6. Solving best subset selection with k=6.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 7. Solving best subset selection with k=7.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 8. Solving best subset selection with k=8.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 9. Solving best subset selection with k=9.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 10. Solving best subset selection with k=10.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
apes_result
#> Time taken: 0.006937699 minutes
#>
#> APES - AIC selected the following variables
#> intercept X1 X2 X3 X4 X5 X6 X7
#> -0.054 0.910 -1.123 0.000 0.000 0.000 0.000 0.000
#> X8 X9 X10
#> 0.000 0.000 0.000
#>
#> APES - BIC selected the following variables
#> intercept X1 X2 X3 X4 X5 X6 X7
#> -0.041 0.912 -1.145 0.000 0.000 0.000 -0.392 0.000
#> X8 X9 X10
#> 0.000 0.000 0.000
class(apes_result)
#> [1] "apes"
names(apes_result)
#> [1] "apes_model_df" "apes_mle_beta" "apes_mle_beta_binary"
#> [4] "time_used" "selected_model_beta" "model_avg_beta"
#> [7] "response_tibble"
leaps
and bestsubset
(Gurobi
MIO)
k
directly
Imagine that we have a data with \(p =
10\) regressors and we are interested in getting the best model
of size \(k = 5\), where \(k\) is the number of regressors with
non-zero coefficient estimate. The leaps
package is
designed in such a way that all models of size \(1, \dots, k\) are computed and returned.
This is a behaviour that is rooted in the leaps-and-bound algorithm and
the intermediate computed models cannot be skipped. For large values of
p
, this could be very computationally heavy and unnecessary
if we only desire model of a particular size. On the other hand, the MIO
algorithm is able to optimise for a model of size k
directly and performs well for large vales of p
.
Continuing with the example above, if we specify an additional
k
parameter in our call of the apes
function,
then we can get the result for only one model using MIO but five models
if we solve using leaps
. Additionally, MIO is able to solve
for model sizes in a range, e.g. by setting k = 5:9
. The
sequential behaviour of leaps
means that it will be slower
than MIO for large values of p
but will be faster than MIO
for small values of p
due to the design of API.
## leaps
leaps_result = apes(model = model, estimator = "leaps", k = 5)
leaps_result$apes_model_df
#> # A tibble: 5 x 7
#> model_name model_size ic_opt_models apes_mle_loglike mle_aic mle_bic status
#> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <chr>
#> 1 apes_model… 2 "" -59.4 123. 128. leaps_o…
#> 2 apes_model… 3 "apes_min_bi… -53.3 113. 120. leaps_o…
#> 3 apes_model… 4 "apes_min_ai… -51.8 112. 122. leaps_o…
#> 4 apes_model… 5 "" -51.2 112. 125. leaps_o…
#> 5 apes_model… 6 "" -50.6 113. 129. leaps_o…
## mio
mio_result = apes(model = model, estimator = "mio", k = 5)
#> Registered S3 method overwritten by 'bestsubset':
#> method from
#> predict.bs splines
#> 0. Computing max eigenvalue of X^T X, for the step size in projected gradient descent.
#> 1. Solving best subset selection with k=5.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
mio_result$apes_model_df
#> # A tibble: 1 x 7
#> model_name model_size ic_opt_models apes_mle_loglike mle_aic mle_bic status
#> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <chr>
#> 1 apes_model… 6 apes_min_aicap… -50.6 113. 129. OPTIM…
mio_result2 = apes(model = model, estimator = "mio", k = 5:9)
#> 0. Computing max eigenvalue of X^T X, for the step size in projected gradient descent.
#> 1. Solving best subset selection with k=5.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 2. Solving best subset selection with k=6.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 3. Solving best subset selection with k=7.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 4. Solving best subset selection with k=8.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 5. Solving best subset selection with k=9.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
mio_result2$apes_model_df
#> # A tibble: 5 x 7
#> model_name model_size ic_opt_models apes_mle_loglike mle_aic mle_bic status
#> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <chr>
#> 1 apes_model… 6 "apes_min_aica… -50.6 113. 129. OPTIM…
#> 2 apes_model… 7 "" -50.1 114. 132. OPTIM…
#> 3 apes_model… 8 "" -50.1 116. 137. OPTIM…
#> 4 apes_model… 9 "" -50.1 118. 142. OPTIM…
#> 5 apes_model… 10 "" -50.0 120. 146. OPTIM…
Another advantage of MIO over the leaps-and-bound algorithm is that MIO can stop the computation before a full convergence in the optimisation and yet still return the model that is considered to be optimal at that point in time. This is a very attractive property as we can specify a time limit when searching for a model instead of waiting for an unknown amount of time.
In APES
, this can be specified with the
time_limit
parameter. Note that the unit is in seconds. In
the result below, we specified time_limit = 0.001
which is
small enough to trigger a time limit. In such a case, MIO will return a
result (mio_result3
) with a status of
TIME_LIMIT
. However, notice how the computed result is
identical to the computed result where the computed is global optimal
(mio_result
), except the time taken. This is known as the
sub-optimal property of the MIO algorithm.
mio_result3 = apes(model = model, estimator = "mio", k = 5, time_limit = 0.001)
#> 0. Computing max eigenvalue of X^T X, for the step size in projected gradient descent.
#> 1. Solving best subset selection with k=5.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: TIME_LIMIT.
mio_result3$apes_model_df
#> # A tibble: 1 x 7
#> model_name model_size ic_opt_models apes_mle_loglike mle_aic mle_bic status
#> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <chr>
#> 1 apes_model… 6 apes_min_aicap… -50.6 113. 129. TIME_…
all.equal(mio_result, mio_result3)
#> [1] "Component \"apes_model_df\": Component \"status\": 1 string mismatch"
#> [2] "Component \"time_used\": Mean relative difference: 0.599496"
We will conclude this vignette with a figure that shows the different computation options and the time cost associated with each option. We hope this can help the end-users to select the best combination of input parameters for their data problems.
In the plot below, we fix k = 15
and
n = 500
, where n
is the number of
observations. We grdually increase number of regressors (p
)
in simulated data and plot it on the x-axis. There are four options that
we explored:
APES-leaps, which uses leaps
to sequentially search
for the optimal model of size up to k
. This can be invoked
using the command
apes(model = model, estimator = "leaps", k = k)
or
apes(model = model)
which is the default.
APES-MIO-sequential, which uses MIO to sequentially search for
the optimal models up to size k
. We do not impose a time
limit in this case. This can be invoked using the command
apes(model = model, estimator = "mio", k = 1:k)
.
APES-MIO-sequential-threshold, which uses MIO to sequentially
search for the optimal models up to size k
. We impose a
time limit of 5 seconds in this case. This can be invoked using the
command
apes(model = model, estimator = "mio", k = 1:k, time_limit = 5)
.
APES-MIO-direct, which uses MIO to directly search for the
optimal model of size k
. This can be invoked using the
command
apes(model = model, estimator = "mio", k = k)
.
It can be seen that the leaps implementation can be orders of magnitudes faster than the MIO implementations when \(p < 35\). When p increases from 40 to 50, leaps overtook all MIO implementations with a much steeper increase in time taken. This simple demonstration shows that leaps is preferred in cases where p is small or moderate and MIO is virtually the only viable option for \(p > 50\).
## R version 4.3.2 (2023-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
##
## locale:
## [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
## [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
## [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
##
## time zone: UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] APES_1.0.0
##
## loaded via a namespace (and not attached):
## [1] tidyr_1.3.0 sass_0.4.7 utf8_1.2.4 future_1.33.0
## [5] generics_0.1.3 lattice_0.21-9 stringi_1.7.12 listenv_0.9.0
## [9] digest_0.6.33 magrittr_2.0.3 RColorBrewer_1.1-3 evaluate_0.23
## [13] grid_4.3.2 fastmap_1.1.1 Matrix_1.6-1.1 rprojroot_2.0.3
## [17] jsonlite_1.8.7 ggrepel_0.9.4 survival_3.5-7 purrr_1.0.2
## [21] fansi_1.0.5 scales_1.2.1 codetools_0.2-19 textshaping_0.3.7
## [25] jquerylib_0.1.4 cli_3.6.1 rlang_1.1.1 parallelly_1.36.0
## [29] splines_4.3.2 munsell_0.5.0 cachem_1.0.8 yaml_2.3.7
## [33] tools_4.3.2 parallel_4.3.2 memoise_2.0.1 dplyr_1.1.3
## [37] colorspace_2.1-0 ggplot2_3.4.4 forcats_1.0.0 globals_0.16.2
## [41] vctrs_0.6.4 R6_2.5.1 lifecycle_1.0.3 stringr_1.5.0
## [45] fs_1.6.3 ragg_1.2.6 furrr_0.3.1 pkgconfig_2.0.3
## [49] desc_1.4.2 pkgdown_2.0.7 bslib_0.5.1 pillar_1.9.0
## [53] gtable_0.3.4 Rcpp_1.0.11 glue_1.6.2 systemfonts_1.0.5
## [57] xfun_0.41 tibble_3.2.1 tidyselect_1.2.0 knitr_1.45
## [61] htmltools_0.5.6.1 rmarkdown_2.25 compiler_4.3.2