CPOP modelling — cpop

CPOP is consisted of three steps. Step 1 is to select features common to two transformed data. Note the input must be pairwise-differences between the original data columns. Step 2 is to select features in constructed models that shared similar characteristics. Step 3 is to construct a final model used for prediction.

cpop_model(
  x1,
  x2,
  y1,
  y2,
  w = NULL,
  n_features = 50,
  n_iter = 20,
  alpha = 1,
  family = "binomial",
  s = "lambda.min",
  cpop2_break = TRUE,
  cpop2_type = "sign",
  cpop2_mag = 1,
  cpop1_method = "normal",
  intercept = FALSE,
  z1,
  z2,
  ...
)

Arguments

x1: A data matrix of size n (number of samples) times p (number of features)
x2: A data matrix of size n (number of samples) times p (number of features) Column names should be identical to z1.
y1: A vector of response variable. Same length as the number of rows of x1.
y2: A vector of response variable. Same length as the number of rows of x2.
w: A vector of weights. Default to NULL, which uses `identity_dist`.
n_features: Breaking the CPOP-Step 1 loop if a certain number of features is reached. Default to 50.
n_iter: Number of iterations in Step 1 and 2. Default to 20.
alpha: The alpha parameter for elastic net models. See the alpha argument in glmnet::glmnet. Default to 1.
family: family of glmnet
s: CV-Lasso lambda choice. Default to "lambda.min", see cv.glmnet in the glmnet package.
cpop2_break: Should CPOP-step2 loop be broken the first time. Default to TRUE.
cpop2_type: Should CPOP-step2 select features based on sign of features of magnitude? Either "sign" (default) or "mag"..
cpop2_mag: a threshold for CPOP-step2 when selecting features based on coefficient difference magnitude. differential betas are removed
cpop1_method: CPOP step 1 selection method. See documentations on `cpop1`. Default to "Normal".
intercept: Default to FALSE
z1: (Deprecated) a data matrix, columns are pairwise-differences between the original data columns.
z2: (Deprecated) a data matrix, columns are pairwise-differences between the original data columns.
...: Extra parameter settings for cv.glmnet in in the glmnet package.

Value

A CPOP object containing:

model: the CPOP model as a glmnet object
coef_tbl: a tibble (data frame) of CPOP feature coefficients
cpop1_features: a vector of CPOP

Examples

data(cpop_data_binary, package = 'CPOP')
## Loading simulated matrices and vectors
x1 = cpop_data_binary$x1
x2 = cpop_data_binary$x2
y1 = cpop_data_binary$y1
y2 = cpop_data_binary$y2
set.seed(1)
cpop_result = cpop_model(x1 = x1, x2 = x2, y1 = y1, y2 = y2, alpha = 1, n_features = 10)
#> Absolute colMeans difference will be used as the weights for CPOP
#> Fitting CPOP model using alpha = 1
#> Based on previous alpha, 0 features are kept 
#> CPOP1 - Step 01: Number of selected features: 0 out of 190
#> CPOP1 - Step 02: Number of selected features: 9 out of 190
#> CPOP1 - Step 03: Number of selected features: 16 out of 190
#> 10 features was reached. 
#> A total of 16 features were selected. 
#> Removing sources of collinearity gives 13 features. 
#> 10 features was reached. 
#> A total of 13 features were selected. 
#> CPOP2 - Sign: Step 01: Number of leftover features: 9 out of 13
#> The sign matrix between the two data:
#>     
#>      -1 0 1
#>   -1  0 0 1
#>   0   0 0 0
#>   1   3 0 0
#> CPOP2 - Sign: Step 02: Number of leftover features: 8 out of 13
#> The sign matrix between the two data:
#>     
#>      -1 0 1
#>   -1  0 0 0
#>   0   0 0 0
#>   1   1 0 0
#> CPOP2 - Sign: Step 03: Number of leftover features: 8 out of 13
#> The sign matrix between the two data:
#>     
#>      -1 0 1
#>   -1  0 0 0
#>   0   0 0 0
#>   1   0 0 0
cpop_result
#> CPOP model with  8 features 
#> # A tibble: 9 × 3
#>   coef_name    coef1    coef2
#>   <chr>        <dbl>    <dbl>
#> 1 (Intercept)  0      0      
#> 2 X01--X10    -0.322 -0.246  
#> 3 X09--X17     0.722  0.521  
#> 4 X11--X14     0.130  0.00292
#> 5 X12--X20     0.404  0.170  
#> 6 X01--X07    -0.437 -0.408  
#> 7 X01--X15    -0.158 -0.334  
#> 8 X01--X17    -0.901 -0.644  
#> 9 X04--X12     0.353  0.431