Hierarchical cluster analysis of the top 100 genes assessed from a text mining approach were strongly associated with the EMT program as shown on 326 MCC colon tumors sorted by PC1. The 100 gene set contains individual genes (CDH1, CLDN9, FGFR1, FN1, TWIST 1 & 2, AXL, VIM) as well as signatures of genes (PC1, EMT, TGFbeta, Proliferation, MYC, and RAS) that are up-regulated in mesenchymal tumors (shown in magenta), and that are up-regulated in epithelial tumors (shown in cyan). Names for the relevant gene signatures are shown in black. Samples (rows) are sorted by PC1. Genes (columns) are clustered using Pearson correlation and Ward linkage. Heatmap shows mean-centered probe intensities.