EMT is the dominant program in human colon cancer
© Loboda et al; licensee BioMed Central Ltd. 2011
Received: 23 July 2010
Accepted: 20 January 2011
Published: 20 January 2011
Colon cancer has been classically described by clinicopathologic features that permit the prediction of outcome only after surgical resection and staging.
We performed an unsupervised analysis of microarray data from 326 colon cancers to identify the first principal component (PC1) of the most variable set of genes. PC1 deciphered two primary, intrinsic molecular subtypes of colon cancer that predicted disease progression and recurrence.
Here we report that the most dominant pattern of intrinsic gene expression in colon cancer (PC1) was tightly correlated (Pearson R = 0.92, P < 10-135) with the EMT signature-- both in gene identity and directionality. In a global micro-RNA screen, we further identified the most anti-correlated microRNA with PC1 as MiR200, known to regulate EMT.
These data demonstrate that the biology underpinning the native, molecular classification of human colon cancer--previously thought to be highly heterogeneous-- was clarified through the lens of comprehensive transcriptome analysis.
Colon cancer has long been postulated to be a molecularly heterogeneous disease. This heterogeneity has been proposed as the reason why it has been difficult to identify unifying molecular hypotheses explaining the biology and behavior of the disease. Molecular profiling of colon cancer has been a relatively effective approach for identifying prognosis of early and intermediate stage disease. We and others have identified biologically complex signatures that affect multiple programs such as adhesion, invasion, and angiogenesis and correlate well with cancer progression and recurrence. These signatures appear to support Weinberg's hypothesis  of multiple programs leading to cancer development and progression. These signatures have generally been developed using supervised machine learning techniques that train their models on pre-determined good vs. poor prognosis patient populations [2–6]. Colon cancer, unlike breast cancer where luminal and basal "intrinsic" subtypes have been identified [7–13], or bladder cancer where intrinsic signatures of recurrence have been established [14, 15], has yet to be classified by unsupervised, molecular profiling approaches. We believed it was important to attempt to uncover unbiased, native biological traits that might underpin colon cancer.
Colon Cancer Samples
326 human colon cancer samples derived from the Moffitt Cancer Center were previously assessed using a single Affymetrix U133Plus2.0 platform and single standard operating procedure. Formalin fixed paraffin blocks (FFPE) were obtained for 69 of these cases and used to extract tumor RNA after macrodissection. Tumor RNA was submitted for global microRNA analysis using an Applied Biosystems platform covering ~700 unique microRNA species. The gene expression data were then compared directly to the microRNA data derived from the same samples. All patient samples and clinical information for the 326 colon samples were obtained through a protocol approved by The University of South Florida Institutional Review Board.
Identification of the cell line derived EMT signature
The EMT signature was derived from a microarray dataset with 93 lung cancer cell lines by performing a t-test comparing cell lines exhibiting mesenchymal-like gene expression pattern (high levels of VIM and low levels of CDH1) vs. cell lines with epithelial-like gene expression pattern (low levels of VIM and high levels of CDH1). Genes with p-value < 0.01 by a t-test were selected, and were split into those that were up-regulated in mesenchymal-like cell lines and those that were up-regulated in epithelial like, and further restricted to approximately 200 unique gene symbols in each up and down regulated gene sets based on the absolute value of the fold change.
Identification of PC1
Unsupervised analysis of the most variable genes expressed in the colon cancer data set (n = 326) was undertaken to discover new, "intrinsic" biology of colon cancer. Principal component analysis on the entire gene expression data set of 326 CRC samples, as implemented in the Princomp function in Matlab, (Mathworks Inc.), was computed by selecting the 1st principal component (PC1) corresponding to the highest eigenvalue of the covariance matrix, describing the inherent variability of the data.
Derivation of colon signatures
We identified a set of gene sets that were associated with different endpoints related to tumor histology. Signatures for each of the following scenarios was created: right/left (RT/LT) colon was computed by comparing 60 samples collected in RT Colon vs. 18 samples collected in LT Colon; Mucinous/Non-Mucinous colon carcinoma was developed by comparing 35 mucinous colon carcinomas vs. 165 non-mucinous; MSI/MSS was created by comparing 6 MSI vs. 73 MSS samples; Carcinoma vs. Adenoma was developed by comparing 22 pure adenocarcinoma samples vs. 5 pure adenomas; Poor/Well differentiation was discovered by comparing 32 poorly differentiated samples vs. 19 well differentiated, Colon/Rectum by comparing 50 samples collected in colon vs. 19 samples collected in rectum; Stage2/Stage1 was identified by comparing 59 stage 2 samples vs. 32 stage 1 samples, Stage3/Stage2 (71 Stage3 samples vs. 59 Stage2 samples) was similarly identified. Each comparison was carried on non-metastatic samples with known stage, histology, and collection site. For each comparison, two gene sets (up and down regulated) were identified by t-test with p-value < 0.01, split by a sign of fold change, selection of unique gene symbols among 100 probes most differentially expressed by an absolute value of fold change. Performance of these gene sets was evaluated by back substitution and the scores for gene sets were computed as the mean of probes mapped by the gene symbol to the up-regulated subset minus the mean of the probes that mapped by the gene symbol to the down-regulated subset. They were found to have ROC AUC>0.7 and 1-way ANOVA p-value < 1e-6 when applied to distinguish the same samples that were used to identify these gene sets.
Scoring of signatures in the data set
Signature score for a given gene set was obtained by averaging the expression levels of the probes that mapped by the gene symbol to that gene set. MYC and RAS signatures were obtained from Nevins et al [16, 17].
Standard microarray data processing
The microarray data was processed by running RMA normalization method as implemented in Affymetrix Power Tools using default settings, background correction and quantile normalization with subsequent application of log10 to obtained probe intensities.
We further confirmed the expression of this same embedded pattern of gene expression (PC1) in the independent ExPO data set (n = 269) (Figure 1b), suggesting that EMT is a pervasive program underpinning colon cancer biology. To further clarify the EMT association with the PC1 signature, genes previously linked to the EMT program such as VIM, FGFR, FLT1, FN1, TWIST, AXL, and TCF, were individually assessed and found to be positively correlated with PC1/EMT (Figure 2). Similarly, genes such as CDH1, CLDN9, EGFR, and MET were negatively correlated with PC1/EMT. Also shown are multi-gene signatures (black labels) such as EMT, TGF-beta, RAS, proliferation, and MYC; TGF-beta is a known driver of EMT and thus correlates with both PC1 and EMT. Alternatively, RAS activation/dependency/addiction has been shown to anti-correlate with EMT . K-RAS dependent cells exhibit an epithelial morphology, expressing significant cortical CDH1 but little VIM. Conversely, RAS-independent cells express little CDH1 but significant VIM. Our results are consistent with these findings. Of interest, proliferation, and an effecter of such (MYC), both anti-correlate with EMT.
We also tested known signatures, or developed a number of other signatures within the colon cancer dataset, that did not correlate well with PC1. These included a signature predicting MSI status , a signature separating right from left colon tumors, a signature separating mucinous from non-mucinous tumors, and a proliferation signature  (Figure 4a). The PC1 signature, while discriminating for epithelial vs mesenchymal tumors, may also be used to classify cell lines. This classification undoubtedly will be useful in further analysis of EMT in cancer using these cell lines as models. Understanding which cell lines best represent the epithelial vs the mesenchymal phenotype will help determine the best models for subsequent drug intervention and target validation studies and will allow cell lines to be mapped to human tumors. For this purpose, we have classified numerous colon cell lines using the tumor-derived PC1 signature (Additional File 16).
Colon cancer has heretofore been considered a very heterogeneous disease [24, 25]. It has been difficult to identify a unifying biological theme that could be leveraged for therapeutic intervention. Previously identified prognostic signatures have required supervised learning to elucidate predictive gene sets. For the first time, our data suggest that the PC1 signature, discovered through unsupervised approaches, appears to be a native, intrinsic subtype classifier of colon cancer that predicts recurrence, advancing stage, and poor prognosis based on the biology of EMT. Finding the PC1 score was predictive of recurrence for both stages II and III of colon cancer, and its strong relationship to EMT biology, leads to the possibility that this signature might be useful in these stages for discerning responsiveness to adjuvant chemotherapy. Our data suggest the otherwise molecularly and pathologically heterogeneous disease may be resolved into two principal molecular subtypes of colon cancer: epithelial or mesenchymal.
The identification of the intrinsic EMT program was further supported by additional molecular studies relating global microRNA profiling data to global gene expression datasets. From this analysis, the MiR 200 family and related MiRs were identified as highly negatively correlated with PC1. This finding was significant because the MiR 200 family has been closely linked to the EMT program. It has been previously demonstrated that MiR 200 over-expression may result in inhibition of ZEB1/2, which in turn leads to inhibition of transcriptional repressors of CDH1, thereby permitting the expression of CDH1 and expression of the epithelial phenotype [26, 27]. Thus, a negative correlation of MiR200 and the EMT signature promoting a mesenchymal phenotype is consistent. The relationship between MiR 200 and PC1 was strong enough to be detected on a relatively small number of tumors, even when non-mirror image FFPE tissues were used instead of the original frozen specimen, suggesting the EMT program is pervasive throughout the primary tumor. In addition, MiR 141, a MiR 200 family member, was also identified as negatively correlated with EMT, confirming previous observations. Finally, numerous additional MiRs have been identified that have not yet been previously reported to be linked to EMT.
Analysis of PC1 relative to biological programs beyond EMT was also informative. Of interest was that MSI tumors which are generally prone to better prognosis and right sided predisposition, were shown to have relatively low PC1 scores. Consistent with these data, recent studies have supported the hypothesis that MSI tumors are anti-correlated with EMT . Similarly, mucinous tumors have been linked to local more than distant recurrence, thus the finding that these tumors were more epithelial than mesenchymal is biologically consistent. Most interesting was that the proliferation signature, which has been previously used to identify poor prognosis breast and lung cancers, was linked to good prognosis colon tumors, suggesting proliferation may not play a critical role in colon cancer progression, yet we know it is important for colon epithelial biology in crypt bases. Consistent with this is the recent observation that colon metastases have a lower proliferative index than primary non-metastatic tumors . One explanation for this observation may be that metastatic lesions may have undergone a mesenchymal to epithelial transition (MET) . Thus, PC1, a harbinger of the transition to the mesenchymal state, is predictive of poor vs good outcome in a number of observed clinical scenarios. These PC1 anti-correlated signatures shed further light on the biology of colon cancer.
Our data also support the concept that the mesenchymal subtype, linked to TGF-B activation, is likely responsible for advancing stage, poor differentiation status, and distant recurrence of disease, but not to RAS, MYC, MSI, or proliferation. RAS activation, linked to the epithelial phenotype driven by genes such as EGFR, appears to be anti-correlated with the PC1/EMT or mesenchymal signature.
Analysis of other cancers such as pancreas and lung demonstrate a spectrum of EMT across diseases (colon < lung < pancreas) that may explain the differential survivability of these cancers (colon > lung > pancreas), variable recurrence rates, and responses to therapy . And, as we have shown, these observations can now be leveraged to classify cell lines for therapeutic intervention modeling. We anticipate the PC1 score may be useful in classifying tumors with a mesenchymal vs. epithelial phenotype that might be sensitive to new classes of drugs such as Src, Notch, and FGFR inhibitors vs EGFR inhibitors, respectively.
Collectively, our data suggest that the "intrinsic" PC1 signature, underpinned by robust EMT biology, is highly prognostic for colon cancer recurrence, may be useful to subclassify more than one type of cancer, and may provide a means of identifying sensitive subpopulations for the next generation of novel therapeutics.
Epithelial Mesenchymal Transition
First Principal Component
Mesenchymal Epithelial Transition
Supported by the National Institutes of Health, National Cancer Institute Grant CA112215 (T. J.Y.) and The Florida Department of Health Bankhead-Coley Cancer Research Program Grant: 08BR-02. We thank Magaly Mendez and Mike Gruidl for manuscript preparation.
Funding for this work was provided by Merck & Co as well as NIH Grant: CA112215.
All authors have read and approved the final manuscript.
- Hanahan D, Weinberg RA: The Hallmarks of Cancer. Cell. 2000, 100 (1): 57-70. 10.1016/S0092-8674(00)81683-9.View ArticlePubMed
- Agrawal D, Chen T, Irby R, Quackenbush J, Chambers AF, Szabo M, Cantor A, Coppola D, Yeatman TJ: Osteopontin identified as lead marker of colon cancer progression, using pooled sample expression profiling. J Natl Cancer Inst. 2002, 94 (7): 513-521.View ArticlePubMed
- Eschrich S, Yang I, Bloom G, Kwong KY, Boulware D, Cantor A, Coppola D, Kruhoffer M, Aaltonen L, Orntoft TF, et al: Molecular staging for survival prediction of colorectal cancer patients. J Clin Oncol. 2005, 23 (15): 3526-3535. 10.1200/JCO.2005.00.695.View ArticlePubMed
- Jorissen RN, Gibbs P, Christie M, Prakash S, Lipton L, Desai J, Kerr D, Aaltonen LA, Arango D, Kruhoffer M, et al: Metastasis-Associated Gene Expression Changes Predict Poor Outcomes in Patients with Dukes Stage B and C Colorectal Cancer. Clin Cancer Res. 2009, 15 (24): 7642-7651. 10.1158/1078-0432.CCR-09-1431.PubMed CentralView ArticlePubMed
- Smith JJ, Deane NG, Wu F, Merchant NB, Zhang B, Jiang A, Lu P, Johnson JC, Schmidt C, Bailey CE, et al: Experimentally derived metastasis gene expression profile predicts recurrence and death in patients with colon cancer. Gastroenterology. 2010, 138 (3): 958-968. 10.1053/j.gastro.2009.11.005.PubMed CentralView ArticlePubMed
- Wang Y, Jatkoe T, Zhang Y, Mutch MG, Talantov D, Jiang J, McLeod HL, Atkins D: Gene expression profiles and molecular markers to predict recurrence of Dukes' B colon cancer. J Clin Oncol. 2004, 22 (9): 1564-1571. 10.1200/JCO.2004.08.186.View ArticlePubMed
- Malvezzi M, Bertuccio P, Chatenoud L, Negri E, La Vecchia C, Decarli A: Cancer mortality in Italy, 2003. Tumori. 2009, 95 (6): 655-664.PubMed
- Bertucci F, Finetti P, Cervera N, Charafe-Jauffret E, Mamessier E, Adelaide J, Debono S, Houvenaeghel G, Maraninchi D, Viens P, et al: Gene expression profiling shows medullary breast cancer is a subgroup of basal breast cancers. Cancer Res. 2006, 66 (9): 4636-4644. 10.1158/0008-5472.CAN-06-0031.View ArticlePubMed
- Bertucci F, Finetti P, Rougemont J, Charafe-Jauffret E, Cervera N, Tarpin C, Nguyen C, Xerri L, Houlgatte R, Jacquemier J, et al: Gene expression profiling identifies molecular subtypes of inflammatory breast cancer. Cancer Res. 2005, 65 (6): 2170-2178. 10.1158/0008-5472.CAN-04-4115.View ArticlePubMed
- Bertucci F, Orsetti B, Negre V, Finetti P, Rouge C, Ahomadegbe JC, Bibeau F, Mathieu MC, Treilleux I, Jacquemier J, et al: Lobular and ductal carcinomas of the breast have distinct genomic and expression profiles. Oncogene. 2008, 27 (40): 5359-5372. 10.1038/onc.2008.158.PubMed CentralView ArticlePubMed
- Desmedt C, Haibe-Kains B, Wirapati P, Buyse M, Larsimont D, Bontempi G, Delorenzi M, Piccart M, Sotiriou C: Biological processes associated with breast cancer clinical outcome depend on the molecular subtypes. Clin Cancer Res. 2008, 14 (16): 5158-5165. 10.1158/1078-0432.CCR-07-4756.View ArticlePubMed
- Desmedt C, Piette F, Loi S, Wang Y, Lallemand F, Haibe-Kains B, Viale G, Delorenzi M, Zhang Y, d'Assignies MS, et al: Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the TRANSBIG multicenter independent validation series. Clin Cancer Res. 2007, 13 (11): 3207-3214. 10.1158/1078-0432.CCR-06-2765.View ArticlePubMed
- Dressman HK, Hans C, Bild A, Olson JA, Rosen E, Marcom PK, Liotcheva VB, Jones EL, Vujaskovic Z, Marks J, et al: Gene expression profiles of multiple breast cancer phenotypes and response to neoadjuvant chemotherapy. Clin Cancer Res. 2006, 12 (3 Pt 1): 819-826. 10.1158/1078-0432.CCR-05-1447.View ArticlePubMed
- Dyrskjot L, Zieger K, Real FX, Malats N, Carrato A, Hurst C, Kotwal S, Knowles M, Malmstrom PU, de la Torre M, et al: Gene expression signatures predict outcome in non-muscle-invasive bladder carcinoma: a multicenter validation study. Clin Cancer Res. 2007, 13 (12): 3545-3551. 10.1158/1078-0432.CCR-06-2940.View ArticlePubMed
- Rosser CJ, Liu L, Sun Y, Villicana P, McCullers M, Porvasnik S, Young PR, Parker AS, Goodison S: Bladder cancer-associated gene expression signatures identified by profiling of exfoliated urothelia. Cancer Epidemiol Biomarkers Prev. 2009, 18 (2): 444-453. 10.1158/1055-9965.EPI-08-1002.PubMed CentralView ArticlePubMed
- Bild AH, Yao G, Chang JT, Wang Q, Potti A, Chasse D, Joshi MB, Harpole D, Lancaster JM, Berchuck A, et al: Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature. 2006, 439 (7074): 353-357. 10.1038/nature04296.View ArticlePubMed
- Huang E, Ishida S, Pittman J, Dressman H, Bild A, Kloos M, D'Amico M, Pestell RG, West M, Nevins JR: Gene expression phenotypic models that predict the activity of oncogenic pathways. Nat Genet. 2003, 34 (2): 226-230. 10.1038/ng1167.View ArticlePubMed
- Loboda A, Nebozhyn M, Klinghoffer R, Frazier J, Chastain M, Arthur W, Roberts B, Zhang T, Chenard M, Haines B, et al: A gene expression signature of RAS pathway dependence predicts response to PI3K and RAS pathway inhibitors and expands the population of RAS pathway activated tumors. BMC Med Genomics. 3: 26-10.1186/1755-8794-3-26.
- Singh A, Greninger P, Rhodes D, Koopman L, Violette S, Bardeesy N, Settleman J: A gene expression signature associated with "K-Ras addiction" reveals regulators of EMT and tumor cell survival. Cancer Cell. 2009, 15 (6): 489-500. 10.1016/j.ccr.2009.03.022.PubMed CentralView ArticlePubMed
- Gumireddy K, Li A, Gimotty PA, Klein-Szanto AJ, Showe LC, Katsaros D, Coukos G, Zhang L, Huang Q: KLF17 is a negative regulator of epithelial-mesenchymal transition and metastasis in breast cancer. Nat Cell Biol. 2009, 11 (11): 1297-1304. 10.1038/ncb1974.PubMed CentralView ArticlePubMed
- Lin YH, Friederichs J, Black MA, Mages J, Rosenberg R, Guilford PJ, Phillips V, Thompson-Fawcett M, Kasabov N, Toro T, et al: Multiple gene expression classifiers from different array platforms predict poor prognosis of colorectal cancer. Clin Cancer Res. 2007, 13 (2 Pt 1): 498-507. 10.1158/1078-0432.CCR-05-2734.View ArticlePubMed
- Kruhoffer M, Jensen JL, Laiho P, Dyrskjot L, Salovaara R, Arango D, Birkenkamp-Demtroder K, Sorensen FB, Christensen LL, Buhl L, et al: Gene expression signatures for colorectal cancer microsatellite status and HNPCC. Br J Cancer. 2005, 92 (12): 2240-2248. 10.1038/sj.bjc.6602621.PubMed CentralView ArticlePubMed
- Enriquez JM, Diez M, Tobaruela E, Lozano O, Dominguez P, Gonzalez A, Muguerza JM, Ratia T: Clinical, histopathological, cytogenetic and prognostic differences between mucinous and nonmucinous colorectal adenocarcinomas. Rev Esp Enferm Dig. 1998, 90 (8): 563-572.PubMed
- Mojica W, Hawthorn L: Normal colon epithelium: a dataset for the analysis of gene expression and alternative splicing events in colon disease. BMC Genomics. 2010, 11: 5-10.1186/1471-2164-11-5.PubMed CentralView ArticlePubMed
- Reid JF, Gariboldi M, Sokolova V, Capobianco P, Lampis A, Perrone F, Signoroni S, Costa A, Leo E, Pilotti S, et al: Integrative approach for prioritizing cancer genes in sporadic colon cancer. Genes Chromosomes Cancer. 2009, 48 (11): 953-962. 10.1002/gcc.20697.View ArticlePubMed
- Gregory PA, Bert AG, Paterson EL, Barry SC, Tsykin A, Farshid G, Vadas MA, Khew-Goodall Y, Goodall GJ: The miR-200 family and miR-205 regulate epithelial to mesenchymal transition by targeting ZEB1 and SIP1. Nat Cell Biol. 2008, 10 (5): 593-601. 10.1038/ncb1722.View ArticlePubMed
- Park SM, Gaur AB, Lengyel E, Peter ME: The miR-200 family determines the epithelial phenotype of cancer cells by targeting the E-cadherin repressors ZEB1 and ZEB2. Genes Dev. 2008, 22 (7): 894-907. 10.1101/gad.1640608.PubMed CentralView ArticlePubMed
- Pino MS, Kikuchi H, Zeng M, Herraiz MT, Sperduti I, Berger D, Park DY, Iafrate AJ, Zukerberg LR, Chung DC: Epithelial to mesenchymal transition is impaired in colon cancer cells with microsatellite instability. Gastroenterology. 2010, 138 (4): 1406-1417. 10.1053/j.gastro.2009.12.010.PubMed CentralView ArticlePubMed
- Ganepola GA, Mazziotta RM, Weeresinghe D, Corner GA, Parish CJ, Chang DH, Tebbutt NC, Murone C, Ahmed N, Augenlicht LH, et al: Gene expression profiling of primary and metastatic colon cancers identifies a reduced proliferative rate in metastatic tumors. Clinical & experimental metastasis. 2010, 27 (1): 1-9.View Article
- Vincan E, Barker N: The upstream components of the Wnt signalling pathway in the dynamic EMT and MET associated with colorectal cancer progression. Clinical & experimental metastasis. 2008, 25 (6): 657-663.View Article
- Edwards BK, Ward E, Kohler BA, Eheman C, Zauber AG, Anderson RN, Jemal A, Schymura MJ, Lansdorp-Vogelaar I, Seeff LC, et al: Annual report to the nation on the status of cancer, 1975-2006 featuring colorectal cancer trends and impact of interventions (risk factors, screening, and treatment) to reduce future rates. Cancer. 2010, 116 (3): 544-573. 10.1002/cncr.24760.PubMed CentralView ArticlePubMed
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1755-8794/4/9/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.