impute_cpop.Rd
Imputing gene expression values using CPOP model
impute_cpop(cpop_result, x1, x2, newx)
cpop_model result
Original feature data matrix 1.
Original feature data matrix 2.
New original feature data matrix, with missing values.
A vector
data(cpop_data_binary, package = 'CPOP')
## Loading simulated matrices and vectors
x1 = cpop_data_binary$x1
x2 = cpop_data_binary$x2
x3 = cpop_data_binary$x3
y1 = cpop_data_binary$y1
y2 = cpop_data_binary$y2
y3 = cpop_data_binary$y3
set.seed(1)
cpop_result = cpop_model(x1 = x1, x2 = x2, y1 = y1, y2 = y2, alpha = 0.1, n_features = 10)
#> Absolute colMeans difference will be used as the weights for CPOP
#> Fitting CPOP model using alpha = 0.1
#> Based on previous alpha, 0 features are kept
#> CPOP1 - Step 01: Number of selected features: 0 out of 190
#> CPOP1 - Step 02: Number of selected features: 43 out of 190
#> 10 features was reached.
#> A total of 43 features were selected.
#> Removing sources of collinearity gives 18 features.
#> 10 features was reached.
#> A total of 18 features were selected.
#> CPOP2 - Sign: Step 01: Number of leftover features: 12 out of 18
#> The sign matrix between the two data:
#>
#> -1 0 1
#> -1 0 0 3
#> 0 0 0 0
#> 1 3 0 0
#> CPOP2 - Sign: Step 02: Number of leftover features: 12 out of 18
#> The sign matrix between the two data:
#>
#> -1 0 1
#> -1 0 0 0
#> 0 0 0 0
#> 1 0 0 0
cpop_result
#> CPOP model with 12 features
#> # A tibble: 13 × 3
#> coef_name coef1 coef2
#> <chr> <dbl> <dbl>
#> 1 (Intercept) 0 0
#> 2 X01--X02 -0.305 -0.216
#> 3 X01--X03 -0.139 -0.109
#> 4 X01--X06 -0.284 -0.193
#> 5 X01--X07 -0.216 -0.150
#> 6 X01--X09 -0.745 -0.382
#> 7 X01--X11 -0.372 -0.264
#> 8 X01--X13 -0.319 -0.205
#> 9 X01--X14 -0.0488 -0.138
#> 10 X01--X17 -0.176 -0.0962
#> 11 X01--X18 -0.338 -0.260
#> 12 X01--X19 -0.0219 -0.247
#> 13 X04--X20 0.481 0.286
x3_pred_result = predict_cpop(cpop_result, newx = x3)
head(x3_pred_result)
#> # A tibble: 6 × 6
#> samples cpop_model1 cpop_model2 cpop_model_avg cpop_model_avg_prob
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.437 -0.107 0.165 0.540
#> 2 2 -0.824 -0.0719 -0.448 0.394
#> 3 3 0.248 -0.115 0.0664 0.516
#> 4 4 -0.877 -0.198 -0.537 0.372
#> 5 5 0.955 0.671 0.813 0.692
#> 6 6 -0.623 -1.05 -0.836 0.304
#> # … with 1 more variable: cpop_model_avg_class <chr>
## Introduce a column of missing values in a new matrix, x4.
x4 = x3
x4[,2] = NA
## Without imputation, the prediction function would not work properly
## This prompts the user to use an imputation on their data.
## head(predict_cpop(cpop_result, newx = x4))
## CPOP can perform imputation on the x4 matrix, before this matrix is converted into z4.
x4_imp = impute_cpop(cpop_result, x1 = x1, x2 = x2, newx = x4)
x4_pred_result = predict_cpop(cpop_result, newx = x4_imp)
head(x4_pred_result)
#> # A tibble: 6 × 6
#> samples cpop_model1 cpop_model2 cpop_model_avg cpop_model_avg_prob
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 -0.192 -0.553 -0.373 0.409
#> 2 2 -1.63 -0.646 -1.14 0.254
#> 3 3 0.470 0.0415 0.256 0.563
#> 4 4 -1.41 -0.574 -0.991 0.278
#> 5 5 0.416 0.290 0.353 0.587
#> 6 6 -1.32 -1.54 -1.43 0.193
#> # … with 1 more variable: cpop_model_avg_class <chr>
plot(x3_pred_result$cpop_model_avg_prob, x3_pred_result$cpop_model_avg_prob)