No feature selection (NO FS)
|
All probes used with a total of 49,386 probes.
|
Differentially Expressed genes (DEGs)
|
Array probes that have a statistically significant Spearman correlation P < 0.05 with drug response
|
LIMMA
|
Linear Empirical Bayes with a modified t-statistic as implemented in the LIMMA Bioconductor package in R. Genes were selected by running LIMMA on the top and bottom 25% sensitive and resistance cell lines. A false discovery rate of 5% was chosen as a cutoff.
|
Bonferroni Correction (BC)
|
Bonferroni Correction \( {\rho}_{BC}=\frac{\alpha }{m} \) where α is significance level of 0.05 and m is the number of features tested, 49,386. ρBC = 1.0 x 10−6
|
DEG Bootstrap (BS)
|
Array probes which have a statistically significant Spearman correlation P < 0.05 in fifty random subsets containing 75% of the training data
|
Histotype specific Bootstrap (BS-Hist)
|
50 subsets of the training data were generated such that each subset contained only one cell from a specific histotype. Probes that have a significant Spearman correlation P < 0.05 in 50% of the splits were selected. ** Data not shown, reported Additional file 2
|
Maximum Relevance Minimum Redundancy (MRMR)
|
Maximum Relevance Minimum Redundancy. 1000 Probes are chosen such that they have a maximum correlation with drug response with minimal cross-correlation with other chosen probes.
|
Control 1 (CTR1)
|
Probes are randomly selected from all 49,836 probes equal to the number of DEGs for each model/trial. For example, bleomycin dataset 1 yielded 5377 DEGs in DEG feature selection thus 5377 probes are selected randomly in control 1 experiments.
|
Control 2 (CTR2)
|
The compliment of DEGs. For example, for bleomycin dataset 1 control 2 genes would include 38,009 probes excluded form the 5377 probes selected as DEGs.
|
Random Control (RCTR)
|
A number, N, of probes equal to the number of DEGs are randomly selected. This gives N vectors with each entry corresponding to a cell line in the training set. This vector is then shuffled randomly such that the original value is no longer associated with the same cell yielding a feature matrix that is arbitrary.
|
Histotype Only (HIST)
|
Each cell line is associated with a 55 dimensional vector where the nth entry is 1 if the cell comes from the corresponding histotype and 0 otherwise. (One hot encoded)
|