Identification of predictive, correlated gene clusters in a simulated data set. A simulated expression data set that included 3 independent correlated gene clusters, two of which contained a gene associated with outcome (gene1 and gene2), as well as an additional set of uncorrelated probes was generated as described in the Methods. Each figure shows data for 901 simulated genes with a univariate hazard ratio (HR) p-value less than 0.5. Each graph is a scatter plot of the negative log of the p-value of the univariate HR for a gene versus its correlation to a principal component (PC) variable. The PC variable was derived from the expression values of the top 5 ranked genes representing the most predictive correlated gene cluster identified in the current iteration. A large value on the y-axis corresponds to a small p-value, indicating that a gene is strongly associated with outcome. A. The first correlated set of predictive genes identified on an analysis of unadjusted expression data. B. The second set of correlated genes identified after the expression data were adjusted for the first PC identified in graph (A). HR p-values were computed using the adjusted data. C. A third set of correlated genes was revealed after the data were sequentially adjusted for the PC variables identified in (A) and (B). Note that no additional correlated clusters of genes were identified that had small p-values, indicating that the 2 PC variables represented the two major clusters of genes predictive of outcome in the simulated data set, thereby confirming the efficacy of this approach.