Speeding up APES

Kevin Y.X. Wang

School of Mathematics and Statistics, The University of Sydney, Australia

Garth Tarr

School of Mathematics and Statistics, The University of Sydney, Australia

Jean Y.H. Yang

School of Mathematics and Statistics, The University of Sydney, Australia

Samuel Mueller

Department of Mathematics and Statistics, Macquarie University, Australia
Source: vignettes/articles/speed.Rmd

speed.Rmd

Introduction

The APES method relies on a best subset algorithm for linear regression model. In the current implementation of the APES package, we allow for two backend options. One is the leaps-and-bound algorithm from the leaps package and Mixed Integer Optimisation from the bestsubset package with a backend connect to the Gurobi commercial software.

You can learn about how to install Gurobi and the bestsubset package from here

Running APES with bestsubset

We will use a simulated example to demostrate APES with Gurobi Mixed Integer Optimisation (MIO) via the bestsubset package. Invoking this option can be done by specifying estimator = "mio".

library(APES)

## Simulating data
library(APES)
set.seed(10)
n = 100
p = 10
beta = c(1, -1, rep(0, p-2))
x = matrix(rnorm(n*p), ncol = p)
colnames(x) = paste0("X", 1:p)
y = rbinom(n = n, size = 1, prob = expit(x %*% beta))
data = data.frame(y, x)
model = glm(y ~ ., data = data, family = "binomial")

## Running APES
apes_result = apes(model = model, estimator = "mio")
#> Registered S3 method overwritten by 'bestsubset':
#>   method     from   
#>   predict.bs splines
#> 0. Computing max eigenvalue of X^T X, for the step size in projected gradient descent.
#> 1. Solving best subset selection with k=1.
#> 1. Solving best subset selection with k=1.
#> 1. Solving best subset selection with k=1.
#>   a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ... 
#>   b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 2. Solving best subset selection with k=2.
#>   a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ... 
#>   b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 3. Solving best subset selection with k=3.
#>   a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ... 
#>   b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 4. Solving best subset selection with k=4.
#>   a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ... 
#>   b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 5. Solving best subset selection with k=5.
#>   a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ... 
#>   b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 6. Solving best subset selection with k=6.
#>   a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ... 
#>   b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 7. Solving best subset selection with k=7.
#>   a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ... 
#>   b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 8. Solving best subset selection with k=8.
#>   a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ... 
#>   b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 9. Solving best subset selection with k=9.
#>   a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ... 
#>   b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 10. Solving best subset selection with k=10.
#>   a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ... 
#>   b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
apes_result
#> Time taken:  0.006937699  minutes 
#> 
#>  APES - AIC selected the following variables 
#> intercept        X1        X2        X3        X4        X5        X6        X7 
#>    -0.054     0.910    -1.123     0.000     0.000     0.000     0.000     0.000 
#>        X8        X9       X10 
#>     0.000     0.000     0.000 
#> 
#>  APES - BIC selected the following variables 
#> intercept        X1        X2        X3        X4        X5        X6        X7 
#>    -0.041     0.912    -1.145     0.000     0.000     0.000    -0.392     0.000 
#>        X8        X9       X10 
#>     0.000     0.000     0.000
class(apes_result)
#> [1] "apes"
names(apes_result)
#> [1] "apes_model_df"        "apes_mle_beta"        "apes_mle_beta_binary"
#> [4] "time_used"            "selected_model_beta"  "model_avg_beta"      
#> [7] "response_tibble"

Comparing `leaps` and `bestsubset` (Gurobi MIO)

Solving for `k` directly

Imagine that we have a data with \(p = 10\) regressors and we are interested in getting the best model of size \(k = 5\), where \(k\) is the number of regressors with non-zero coefficient estimate. The leaps package is designed in such a way that all models of size \(1, \dots, k\) are computed and returned. This is a behaviour that is rooted in the leaps-and-bound algorithm and the intermediate computed models cannot be skipped. For large values of p, this could be very computationally heavy and unnecessary if we only desire model of a particular size. On the other hand, the MIO algorithm is able to optimise for a model of size k directly and performs well for large vales of p.

Continuing with the example above, if we specify an additional k parameter in our call of the apes function, then we can get the result for only one model using MIO but five models if we solve using leaps. Additionally, MIO is able to solve for model sizes in a range, e.g. by setting k = 5:9. The sequential behaviour of leaps means that it will be slower than MIO for large values of p but will be faster than MIO for small values of p due to the design of API.

## leaps
leaps_result = apes(model = model, estimator = "leaps", k = 5)
leaps_result$apes_model_df
#> # A tibble: 5 x 7
#>   model_name  model_size ic_opt_models apes_mle_loglike mle_aic mle_bic status  
#>   <chr>            <dbl> <chr>                    <dbl>   <dbl>   <dbl> <chr>   
#> 1 apes_model…          2 ""                       -59.4    123.    128. leaps_o…
#> 2 apes_model…          3 "apes_min_bi…            -53.3    113.    120. leaps_o…
#> 3 apes_model…          4 "apes_min_ai…            -51.8    112.    122. leaps_o…
#> 4 apes_model…          5 ""                       -51.2    112.    125. leaps_o…
#> 5 apes_model…          6 ""                       -50.6    113.    129. leaps_o…

## mio
mio_result = apes(model = model, estimator = "mio", k = 5)
#> Registered S3 method overwritten by 'bestsubset':
#>   method     from   
#>   predict.bs splines
#> 0. Computing max eigenvalue of X^T X, for the step size in projected gradient descent.
#> 1. Solving best subset selection with k=5.
#>   a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ... 
#>   b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
mio_result$apes_model_df
#> # A tibble: 1 x 7
#>   model_name  model_size ic_opt_models   apes_mle_loglike mle_aic mle_bic status
#>   <chr>            <dbl> <chr>                      <dbl>   <dbl>   <dbl> <chr> 
#> 1 apes_model…          6 apes_min_aicap…            -50.6    113.    129. OPTIM…

mio_result2 = apes(model = model, estimator = "mio", k = 5:9)
#> 0. Computing max eigenvalue of X^T X, for the step size in projected gradient descent.
#> 1. Solving best subset selection with k=5.
#>   a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ... 
#>   b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 2. Solving best subset selection with k=6.
#>   a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ... 
#>   b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 3. Solving best subset selection with k=7.
#>   a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ... 
#>   b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 4. Solving best subset selection with k=8.
#>   a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ... 
#>   b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 5. Solving best subset selection with k=9.
#>   a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ... 
#>   b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
mio_result2$apes_model_df
#> # A tibble: 5 x 7
#>   model_name  model_size ic_opt_models   apes_mle_loglike mle_aic mle_bic status
#>   <chr>            <dbl> <chr>                      <dbl>   <dbl>   <dbl> <chr> 
#> 1 apes_model…          6 "apes_min_aica…            -50.6    113.    129. OPTIM…
#> 2 apes_model…          7 ""                         -50.1    114.    132. OPTIM…
#> 3 apes_model…          8 ""                         -50.1    116.    137. OPTIM…
#> 4 apes_model…          9 ""                         -50.1    118.    142. OPTIM…
#> 5 apes_model…         10 ""                         -50.0    120.    146. OPTIM…

Time limit

Another advantage of MIO over the leaps-and-bound algorithm is that MIO can stop the computation before a full convergence in the optimisation and yet still return the model that is considered to be optimal at that point in time. This is a very attractive property as we can specify a time limit when searching for a model instead of waiting for an unknown amount of time.

In APES, this can be specified with the time_limit parameter. Note that the unit is in seconds. In the result below, we specified time_limit = 0.001 which is small enough to trigger a time limit. In such a case, MIO will return a result (mio_result3) with a status of TIME_LIMIT. However, notice how the computed result is identical to the computed result where the computed is global optimal (mio_result), except the time taken. This is known as the sub-optimal property of the MIO algorithm.

mio_result3 = apes(model = model, estimator = "mio", k = 5, time_limit = 0.001)
#> 0. Computing max eigenvalue of X^T X, for the step size in projected gradient descent.
#> 1. Solving best subset selection with k=5.
#>   a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ... 
#>   b. Running Gurobi's mixed integer program solver ... Return status: TIME_LIMIT.
mio_result3$apes_model_df
#> # A tibble: 1 x 7
#>   model_name  model_size ic_opt_models   apes_mle_loglike mle_aic mle_bic status
#>   <chr>            <dbl> <chr>                      <dbl>   <dbl>   <dbl> <chr> 
#> 1 apes_model…          6 apes_min_aicap…            -50.6    113.    129. TIME_…

all.equal(mio_result, mio_result3)
#> [1] "Component \"apes_model_df\": Component \"status\": 1 string mismatch"
#> [2] "Component \"time_used\": Mean relative difference: 0.599496"

Computational speed

We will conclude this vignette with a figure that shows the different computation options and the time cost associated with each option. We hope this can help the end-users to select the best combination of input parameters for their data problems.

In the plot below, we fix k = 15 and n = 500, where n is the number of observations. We grdually increase number of regressors (p) in simulated data and plot it on the x-axis. There are four options that we explored:

APES-leaps, which uses leaps to sequentially search for the optimal model of size up to k. This can be invoked using the command apes(model = model, estimator = "leaps", k = k) or apes(model = model) which is the default.
APES-MIO-sequential, which uses MIO to sequentially search for the optimal models up to size k. We do not impose a time limit in this case. This can be invoked using the command apes(model = model, estimator = "mio", k = 1:k).
APES-MIO-sequential-threshold, which uses MIO to sequentially search for the optimal models up to size k. We impose a time limit of 5 seconds in this case. This can be invoked using the command apes(model = model, estimator = "mio", k = 1:k, time_limit = 5).
APES-MIO-direct, which uses MIO to directly search for the optimal model of size k. This can be invoked using the command apes(model = model, estimator = "mio", k = k).

It can be seen that the leaps implementation can be orders of magnitudes faster than the MIO implementations when \(p < 35\). When p increases from 40 to 50, leaps overtook all MIO implementations with a much steeper increase in time taken. This simple demonstration shows that leaps is preferred in cases where p is small or moderate and MIO is virtually the only viable option for \(p > 50\).

Session Info

sessionInfo()

## R version 4.3.2 (2023-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
## 
## locale:
##  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
##  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
##  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
## [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
## 
## time zone: UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] APES_1.0.0
## 
## loaded via a namespace (and not attached):
##  [1] tidyr_1.3.0        sass_0.4.7         utf8_1.2.4         future_1.33.0     
##  [5] generics_0.1.3     lattice_0.21-9     stringi_1.7.12     listenv_0.9.0     
##  [9] digest_0.6.33      magrittr_2.0.3     RColorBrewer_1.1-3 evaluate_0.23     
## [13] grid_4.3.2         fastmap_1.1.1      Matrix_1.6-1.1     rprojroot_2.0.3   
## [17] jsonlite_1.8.7     ggrepel_0.9.4      survival_3.5-7     purrr_1.0.2       
## [21] fansi_1.0.5        scales_1.2.1       codetools_0.2-19   textshaping_0.3.7 
## [25] jquerylib_0.1.4    cli_3.6.1          rlang_1.1.1        parallelly_1.36.0 
## [29] splines_4.3.2      munsell_0.5.0      cachem_1.0.8       yaml_2.3.7        
## [33] tools_4.3.2        parallel_4.3.2     memoise_2.0.1      dplyr_1.1.3       
## [37] colorspace_2.1-0   ggplot2_3.4.4      forcats_1.0.0      globals_0.16.2    
## [41] vctrs_0.6.4        R6_2.5.1           lifecycle_1.0.3    stringr_1.5.0     
## [45] fs_1.6.3           ragg_1.2.6         furrr_0.3.1        pkgconfig_2.0.3   
## [49] desc_1.4.2         pkgdown_2.0.7      bslib_0.5.1        pillar_1.9.0      
## [53] gtable_0.3.4       Rcpp_1.0.11        glue_1.6.2         systemfonts_1.0.5 
## [57] xfun_0.41          tibble_3.2.1       tidyselect_1.2.0   knitr_1.45        
## [61] htmltools_0.5.6.1  rmarkdown_2.25     compiler_4.3.2

Speeding up APES

Kevin Y.X. Wang

Garth Tarr

Jean Y.H. Yang

Samuel Mueller

Introduction

Running APES with bestsubset

Comparing leaps and bestsubset (Gurobi MIO)

Solving for k directly

Time limit

Computational speed

Session Info

Comparing `leaps` and `bestsubset` (Gurobi MIO)

Solving for `k` directly