No feature selection (NO FS)

All probes used with a total of 49,386 probes.

Differentially Expressed genes (DEGs)

Array probes that have a statistically significant Spearman correlation P < 0.05 with drug response

LIMMA

Linear Empirical Bayes with a modified tstatistic as implemented in the LIMMA Bioconductor package in R. Genes were selected by running LIMMA on the top and bottom 25% sensitive and resistance cell lines. A false discovery rate of 5% was chosen as a cutoff.

Bonferroni Correction (BC)

Bonferroni Correction \( {\rho}_{BC}=\frac{\alpha }{m} \) where α is significance level of 0.05 and m is the number of features tested, 49,386. ρ_{BC} = 1.0 x 10^{−6}

DEG Bootstrap (BS)

Array probes which have a statistically significant Spearman correlation P < 0.05 in fifty random subsets containing 75% of the training data

Histotype specific Bootstrap (BSHist)

50 subsets of the training data were generated such that each subset contained only one cell from a specific histotype. Probes that have a significant Spearman correlation P < 0.05 in 50% of the splits were selected. ** Data not shown, reported Additional file 2

Maximum Relevance Minimum Redundancy (MRMR)

Maximum Relevance Minimum Redundancy. 1000 Probes are chosen such that they have a maximum correlation with drug response with minimal crosscorrelation with other chosen probes.

Control 1 (CTR1)

Probes are randomly selected from all 49,836 probes equal to the number of DEGs for each model/trial. For example, bleomycin dataset 1 yielded 5377 DEGs in DEG feature selection thus 5377 probes are selected randomly in control 1 experiments.

Control 2 (CTR2)

The compliment of DEGs. For example, for bleomycin dataset 1 control 2 genes would include 38,009 probes excluded form the 5377 probes selected as DEGs.

Random Control (RCTR)

A number, N, of probes equal to the number of DEGs are randomly selected. This gives N vectors with each entry corresponding to a cell line in the training set. This vector is then shuffled randomly such that the original value is no longer associated with the same cell yielding a feature matrix that is arbitrary.

Histotype Only (HIST)

Each cell line is associated with a 55 dimensional vector where the nth entry is 1 if the cell comes from the corresponding histotype and 0 otherwise. (One hot encoded)
