Selection Method | Description |
---|---|
No feature selection (NO FS) | All probes used with a total of 49,386 probes. |
Differentially Expressed genes (DEGs) | Array probes that have a statistically significant Spearman correlation P < 0.05 with drug response |
LIMMA | Linear Empirical Bayes with a modified t-statistic as implemented in the LIMMA Bioconductor package in R. Genes were selected by running LIMMA on the top and bottom 25% sensitive and resistance cell lines. A false discovery rate of 5% was chosen as a cutoff. |
Bonferroni Correction (BC) | Bonferroni Correction \( {\rho}_{BC}=\frac{\alpha }{m} \) where α is significance level of 0.05 and m is the number of features tested, 49,386. ρBC = 1.0 x 10−6 |
DEG Bootstrap (BS) | Array probes which have a statistically significant Spearman correlation P < 0.05 in fifty random subsets containing 75% of the training data |
Histotype specific Bootstrap (BS-Hist) | 50 subsets of the training data were generated such that each subset contained only one cell from a specific histotype. Probes that have a significant Spearman correlation P < 0.05 in 50% of the splits were selected. ** Data not shown, reported Additional file 2 |
Maximum Relevance Minimum Redundancy (MRMR) | Maximum Relevance Minimum Redundancy. 1000 Probes are chosen such that they have a maximum correlation with drug response with minimal cross-correlation with other chosen probes. |
Control 1 (CTR1) | Probes are randomly selected from all 49,836 probes equal to the number of DEGs for each model/trial. For example, bleomycin dataset 1 yielded 5377 DEGs in DEG feature selection thus 5377 probes are selected randomly in control 1 experiments. |
Control 2 (CTR2) | The compliment of DEGs. For example, for bleomycin dataset 1 control 2 genes would include 38,009 probes excluded form the 5377 probes selected as DEGs. |
Random Control (RCTR) | A number, N, of probes equal to the number of DEGs are randomly selected. This gives N vectors with each entry corresponding to a cell line in the training set. This vector is then shuffled randomly such that the original value is no longer associated with the same cell yielding a feature matrix that is arbitrary. |
Histotype Only (HIST) | Each cell line is associated with a 55 dimensional vector where the nth entry is 1 if the cell comes from the corresponding histotype and 0 otherwise. (One hot encoded) |