Fig. 4From: Improving lung cancer risk stratification leveraging whole transcriptome RNA sequencing and machine learning across multiple cohortsGene expression variation associated with smoking status, specimen collection timing, cohort and inhaled medication. PCA was performed using scaled normalized (VST) expression data for a 17,954 genes from all subjects in the smoking index training set (N = 1578); b and c 17,954 genes from within-indication subjects in the training set (N = 311); d 998 benign vs malignant DE genes from within-indication subjects in the training set (N = 311). e Distribution of basal, blood, cilia and immune cell type indexes in within-indication subjects in the training set, separate by specimen collection timingBack to article page