cpop_model.Rd
CPOP is consisted of three steps. Step 1 is to select features common to two transformed data. Note the input must be pairwise-differences between the original data columns. Step 2 is to select features in constructed models that shared similar characteristics. Step 3 is to construct a final model used for prediction.
cpop_model(
x1,
x2,
y1,
y2,
w = NULL,
n_features = 50,
n_iter = 20,
alpha = 1,
family = "binomial",
s = "lambda.min",
cpop2_break = TRUE,
cpop2_type = "sign",
cpop2_mag = 1,
cpop1_method = "normal",
intercept = FALSE,
z1,
z2,
...
)
A data matrix of size n (number of samples) times p (number of features)
A data matrix of size n (number of samples) times p (number of features) Column names should be identical to z1.
A vector of response variable. Same length as the number of rows of x1.
A vector of response variable. Same length as the number of rows of x2.
A vector of weights. Default to NULL, which uses `identity_dist`.
Breaking the CPOP-Step 1 loop if a certain number of features is reached. Default to 50.
Number of iterations in Step 1 and 2. Default to 20.
The alpha parameter for elastic net models. See the alpha argument in glmnet::glmnet. Default to 1.
family of glmnet
CV-Lasso lambda choice. Default to "lambda.min", see cv.glmnet in the glmnet package.
Should CPOP-step2 loop be broken the first time. Default to TRUE.
Should CPOP-step2 select features based on sign of features of magnitude? Either "sign" (default) or "mag"..
a threshold for CPOP-step2 when selecting features based on coefficient difference magnitude. differential betas are removed
CPOP step 1 selection method. See documentations on `cpop1`. Default to "Normal".
Default to FALSE
(Deprecated) a data matrix, columns are pairwise-differences between the original data columns.
(Deprecated) a data matrix, columns are pairwise-differences between the original data columns.
Extra parameter settings for cv.glmnet in in the glmnet package.
A CPOP object containing:
model: the CPOP model as a glmnet object
coef_tbl: a tibble (data frame) of CPOP feature coefficients
cpop1_features: a vector of CPOP
data(cpop_data_binary, package = 'CPOP')
## Loading simulated matrices and vectors
x1 = cpop_data_binary$x1
x2 = cpop_data_binary$x2
y1 = cpop_data_binary$y1
y2 = cpop_data_binary$y2
set.seed(1)
cpop_result = cpop_model(x1 = x1, x2 = x2, y1 = y1, y2 = y2, alpha = 1, n_features = 10)
#> Absolute colMeans difference will be used as the weights for CPOP
#> Fitting CPOP model using alpha = 1
#> Based on previous alpha, 0 features are kept
#> CPOP1 - Step 01: Number of selected features: 0 out of 190
#> CPOP1 - Step 02: Number of selected features: 9 out of 190
#> CPOP1 - Step 03: Number of selected features: 16 out of 190
#> 10 features was reached.
#> A total of 16 features were selected.
#> Removing sources of collinearity gives 13 features.
#> 10 features was reached.
#> A total of 13 features were selected.
#> CPOP2 - Sign: Step 01: Number of leftover features: 9 out of 13
#> The sign matrix between the two data:
#>
#> -1 0 1
#> -1 0 0 1
#> 0 0 0 0
#> 1 3 0 0
#> CPOP2 - Sign: Step 02: Number of leftover features: 8 out of 13
#> The sign matrix between the two data:
#>
#> -1 0 1
#> -1 0 0 0
#> 0 0 0 0
#> 1 1 0 0
#> CPOP2 - Sign: Step 03: Number of leftover features: 8 out of 13
#> The sign matrix between the two data:
#>
#> -1 0 1
#> -1 0 0 0
#> 0 0 0 0
#> 1 0 0 0
cpop_result
#> CPOP model with 8 features
#> # A tibble: 9 × 3
#> coef_name coef1 coef2
#> <chr> <dbl> <dbl>
#> 1 (Intercept) 0 0
#> 2 X01--X10 -0.322 -0.246
#> 3 X09--X17 0.722 0.521
#> 4 X11--X14 0.130 0.00292
#> 5 X12--X20 0.404 0.170
#> 6 X01--X07 -0.437 -0.408
#> 7 X01--X15 -0.158 -0.334
#> 8 X01--X17 -0.901 -0.644
#> 9 X04--X12 0.353 0.431