This article has Open Peer Review reports available.
Cell cycle correlated genes dictate the prognostic power of breast cancer gene lists
© Mosley and Keri; licensee BioMed Central Ltd. 2008
Received: 11 March 2008
Accepted: 25 April 2008
Published: 25 April 2008
Numerous gene lists or "classifiers" have been derived from global gene expression data that assign breast cancers to good and poor prognosis groups. A remarkable feature of these molecular signatures is that they have few genes in common, prompting speculation that they may use distinct genes to measure the same pathophysiological process(es), such as proliferation. However, this supposition has not been rigorously tested. If gene-based classifiers function by measuring a minimal number of cellular processes, we hypothesized that the informative genes for these processes could be identified and the data sets could be adjusted for the predictive contributions of those genes. Such adjustment would then attenuate the predictive function of any signature measuring that same process.
We tested this hypothesis directly using a novel iterative-subtractive approach. We evaluated five gene expression data sets that sample a broad range of breast cancer subtypes. In all data sets, the dominant cluster capable of predicting metastasis was heavily populated by genes that fluctuate in concert with the cell cycle. When six well-characterized classifiers were examined, all contained a higher than expected proportion of genes that correlate with this cluster. Furthermore, when the data sets were globally adjusted for the cell cycle cluster, each classifier lost its ability to assign tumors to appropriate high and low risk groups. In contrast, adjusting for other predictive gene clusters did not impact their performance.
These data indicate that the discriminative ability of breast cancer classifiers is dependent upon genes that correlate with cell cycle progression.
Global gene expression profiling of breast tumors has provided considerable insight into the biological processes underlying this disease [1–3]. One application of gene expression data has been the development of prognostic "molecular signatures" or classifiers – short lists of genes that can be used to assign tumors to good or poor prognostic groups [4–11]. Some classifiers have now become commercially available diagnostic tools for breast cancer, making it imperative to understand how they function as well as how they differ from one another. A notable feature of published prognostic gene lists is that they have relatively little overlap in terms of composition. This lack of commonality has led to the proposition that prognostic lists use different genes to measure the same underlying biological process(es) [12, 13]. Evidence supporting this notion includes a study evaluating the predictive abilities of five prognostic gene lists using a common gene expression data set, which showed that four of the five classifiers examined performed comparably in predicting future metastases despite having almost no overlap in constituent genes . Where there is overlap between gene lists, the established function of many of the common genes is regulation of cell cycle progression . Thus, gene expression signatures predictive of tumor grade, which are highly enriched in cell cycle genes, tend to have the greatest overlap with other gene lists [15, 16]. It has further been demonstrated that, for some lists, the performance of the full list can be recapitulated using only the cell cycle-related genes contained within that list [15, 17]. This indicates that cell cycle genes are sufficient to create an effective classifier . In contrast to the sufficiency of known cell cycle regulators to predict outcome, the extent to which established classifiers actually depend upon cell cycle-correlated genes has not been determined.
Gene expression data contains numerous clusters, or groups, of genes whose patterns of expression are highly correlated across tumors [19, 20]. As a result, the expression of many different genes can be used as surrogate measures for a common pathological process. Thus, prognostic gene lists need only to select a single gene from a highly correlated gene cluster to capture the overall expression pattern of all genes within that cluster. If each prognostic gene list utilized a different gene from a correlated set, the lists would have no overlap in gene identity, but would perform comparably because they would effectively measure the same cellular activity. We sought to determine whether this phenomenon was occurring with prognostic gene lists for breast cancer metastases. Herein, we describe the identification of clusters of correlated genes that are associated with the risk of developing metastases. We then determined the extent to which published classifiers utilized these gene clusters to predict outcome. Using five independent data sets, we show that all published prognostic gene lists that we examined rely upon a single group of correlated genes to predict outcome. Notably, the most predictive members of this cluster were genes which showed a cell-cycle pattern of regulation.
Previously published microarray datasets
Three publicly available breast cancer gene expression data sets were analyzed in the primary analyses: the Netherlands Clinical Institute (NKI2) data, which contains clinical (including metastatic recurrence latencies) and gene expression data on 295 women [9, 21]; the Wang data set containing gene expression data on 296 women with lymph node negative disease  (GEO series GSE2034); and the KJX64 and KJ125 data sets containing data on 189 women, 64 of which were treated with tamoxifen, with primary operable invasive breast cancer  (GEO series GSE2990).
Gene expression data for the NKI2 data set were deposited as log10 expression ratios. For analyses identifying predictive clusters of correlated genes, a data set that contained no duplicated genes was created. Each probe in the data set was mapped to a gene symbol using the SOURCE database (source.stanford.edu). Gene symbols were identified for 21,220 probes. All probes that had complete data on more than 291 of the 295 subjects were used in the analysis. Missing values were imputed using the "impute" option from FastClus procedure (SAS). For genes for which there were multiple probes in the data set, the median expression value was used. The final data set contained complete data for 14,870 uniquely named genes. For all other analyses, all probes in the complete NKI2 data set were used.
For the Wang, KJX64 and KJ125 data sets, all probes (n = 22,286 in Wang and n = 22,285 in KJX64/KJ125) in each data set were used in all analyses. In the Wang data set, expression values were log2 transformed. Expression data from the KJX64 and KJ125 data sets was provided as log2 values. During data analyses, we noted that there were unexpectedly high rates of positive correlations among probes in the KJX64 and KJ125 data sets. To attenuate these correlations, expression data for each probe was normalized by dividing the values for each probe by the median expression value of all probes for a given subject.
Simulated gene expression data set
As proof of concept of our iterative-subtractive approach for identifying the determinant groups of genes that dictate prognostication in a gene expression dataset, we began with a simulated gene expression data set. This data set was comprised of 300 subjects and contained 3 sets of correlated genes, two of which were associated with outcome, and was generated as follows. First, simulated expression data for 1000 uncorrelated genes was generated for each subject using a random number generator derived from a normal distribution (mean = 0, variance = 2). Next, we generated clusters of correlated genes: a "parent" gene was randomly selected from the 1000 genes and the expression of additional genes were then computed to have expression values equaling the "parent" gene's expression value plus a random number drawn from a normal distribution of values having a mean of 0 and variances ranging from 1 to 7. The degree of correlation was determined by the magnitude of the variance. Three "parent" genes were selected to generate correlated clusters of either 120 (parent genes 1 and 2) or 240 (parent gene 3) genes. The larger cluster was generated to demonstrate that the size of the cluster does not impact the analysis. Finally, event and latency data were randomly assigned based on expression quartiles of gene1 and gene2, with increased expression of these genes being associated with an increased risk of an event. For each of these genes, the respective risk of an "event" across quartiles was randomly determined with probabilities of 0%, 5%, 20% and 30% and the latencies in the instance of an event were randomly generated to have mean values of 5, 4, 3 and 2 years. After generating the simulated data, the univariate Cox proportional hazard ratios for gene1, gene2 and gene3 were 1.4 (p < 0.0001), 1.2 (p = 0.0007) and 0.9 (p = 0.06), respectively.
Published gene lists
The published prognostic gene lists analyzed here were selected because they all effectively stratify patients into high and low risk prognostic groups using data sets comprised of both estrogen receptor (ER) negative and ER positive tumors. Using these classifiers, we assigned tumors to high and low risk groups following the authors' published methods, except where otherwise noted. When assignment of tumors to high and low risk groups involved the use of training sets, subjects used in the training sets were not included in the subsequent analyses evaluating the performance of the predictor. For each data set, these training sets were comprised of a random set of subjects, except for the NKI2 data set, where the training set for "good" prognosis subjects (subjects who did not experience metastases within 5 years) were the 30 subjects who did not experience metastases and where the conservFlag variable was either +1 or -1 . The numbers of subjects used in training sets varied, hence the statistical analyses for the selected gene lists may involve different numbers of subjects and the relative performance of a list with respect to other lists may not be directly comparable. Gene symbols, accession numbers or probe identifiers from prognostic lists were extracted from materials accompanying the original publications and mapped to probe identifiers in each expression data set using Unigene cluster IDs derived from the SOURCE database. Unless otherwise noted, each instance of a matching probe was used in the analyses. For some prognostic gene lists, there were genes that could not be mapped to a probe in a given expression data set. A list of the number of probes used to compute prognostic scores and the numbers of subjects used in training sets is summarized [see Additional file 2].
The "70-gene" prognostic score was computed as described previously [9, 11]. A 90% sensitivity (10% false positive) threshold was used to assign tumors to high and low risk groups. The "76-gene" predictor was computed as described by the authors, with one exception: a 90% sensitivity threshold was used to establish cut-off values for high and low risk tumors in the training sets rather than a 100% threshold since some tumors had outlying values in some data sets and this slightly lower threshold enhanced the performance of the classifier . Weighted hazard ratio coefficients for all analyses were computed using unadjusted data. Because the "76-gene" predictor uses separate gene lists for ER positive and negative tumors, each list of genes was analyzed separately. The "Wound Signature" prognosticator was only examined in the NKI2 data sets since computations for this prognosticator relied upon centroid values previously computed by the authors . The centroid values for "activated" (poor prognosis) tumors were obtained from the author's supplemental data files . A 90% sensitivity classifier was used to define high and low risk groups. The "Histological grade signature" was computed as described by the authors, with the exception that a 90% sensitivity threshold was used for group assignment . The probe identifiers for each of the data sets analyzed were previously mapped by the authors and were available as a supplement to the original publications. The Naderi prognostic score was computed as described by the authors, except that all probes mapped to each data set were individually used in the computations . Finally, the 21 gene "Recurrence Score" was analyzed only in ER positive tumors. The score was computed as described by the authors . In gene expression data sets where there were multiple probes for a given gene, the median value of the probes was used. Expression data for each gene analyzed was adjusted to have a mean value of 7.5 to ensure that expression values fell within the ranges reported by the authors. Since data adjustment precluded the use of fixed cut-off points to define high and low risk groups, as described by the authors, the performance of the predictor was analyzed by either ROC analysis or a 90% sensitivity classifier. Expression data from the NKI2 data were transformed to log2 values when computing the Recurrence score.
All survival analyses are based on 5-year metastatic recurrence latencies. All subjects not experiencing metastases within 5 years were censored at that time point. Cox proportional hazards regression models were used to measure the strength of the association between the covariates and recurrence (PHReg procedure). For most analyses, univariate proportional hazards ratios comparing good and poor prognostic groups are reported. For multivariable models, the most parsimonious set of covariates was determined using forward and backward selection methods on all clinical covariates contained within each dataset. In the NKI2 data set, these were determined to be age (in years), tumor size, ER status (ER positive vs. ER negative) and grade (1 vs. 2 and 1 vs. 3). In the KJX64/KJ125 data set, the covariates were tumor size (> 2 cm vs. < 2 cm) and grade (grade 2 vs. grade 1/3). ER status was the only clinical covariate contained in the Wang data set. Principal component variables were standardized to have a mean of 0 and standard deviation of 1 in proportional hazards models.
Identifying predictive clusters of correlated genes
The following approach was taken to iteratively identify clusters of positively correlated genes associated with the risk of developing metastases in each data set. This method assigns a rank to each gene probe within a data set such that a highly ranked probe has the following properties: 1) it is strongly predictive of outcome (i.e. it has a low univariate hazard ratio (HR) p-value); 2) its expression is highly positively correlated with other probes; and 3) the HR p-values of the positively correlated probes decreases in direct proportion to their correlation with the gene being ranked. The rationale for the third criteria is that many probes may be associated with outcome solely because their expression pattern follows, to various degrees, the expression pattern for a highly predictive gene.
A graphical representation of two potential outcomes of the scoring procedure is provided [see Additional file 1]. The covariance component of this score gives increased weight to probes that are positively correlated with a large number of genes that are highly predictive of survival (i.e. have small p-values). The Pearson's correlation component of this score gives increased weight to those probes for which there is a linear relationship between its correlations with all other probes and their HR p-values. In total, this score ranks each probe in the data set such that those with the lowest (i.e. most negative) values are the most predictive members of clusters of correlated genes associated with the risk of developing metastases.
After assigning scores to each probe, the top 10 ranked genes (the genes with the lowest scores) were selected as representatives of the cluster. If the expression levels for these genes was highly correlated across tumors, this would indicate that a cluster of correlated genes associated with metastases had been identified. Having identified a cluster, the "average" expression pattern of the 10 genes was measured by computing the first principal component (PC) for the 10 genes (Princomp procedure). The PC represents a new variable that is maximally correlated with the expression values of the top 10 genes. We then used the PC variable to globally adjust the data set thereby eliminating correlations between the top 10 genes and all other genes (described below). After this global adjustment, the next predictive cluster of genes could be identified by repeating this entire scoring process using the adjusted data.
Global adjustment of gene expression data
Identification of subsidiary principal components required the selective elimination of correlations between probes and a previously identified PC variable. To accomplish this, the expression data were globally adjusted for either an individual gene probe or a principal component (PC) variable by fitting a least-squares regression line (which included an intercept term) to each probe in the dataset and then computing the residuals (GLM procedure). Each probe was the dependent variable and the probe or the PC being adjusted for was the continuous independent variable. The residuals (adjusted values) for each probe represent the new expression values for that probe. This adjustment removes the variance from each probe that is linearly explained by the independent variable. The consequence of adjustment is that the adjusted value (residual) for a probe is no longer linearly correlated with the independent variable. Thus, probes that are predictive of metastases solely because their expression levels are correlated with the independent variable will no longer be predictive after adjustment. As a control for the effects of globally adjusting data, expression values were adjusted using a regression model that contained an intercept term only, but no other covariates.
All calculations were performed using SAS version 9.1 (SAS Institute, Cary, NC). All statistical tests were two-sided. All correlation values represent Pearson's correlation coefficients. Receive operator characteristic (ROC) curves were generated using the Logistic procedure and were based on 5 year metastatic recurrence rates.
In the NKI2 data, expression values for probes typically fell between the ranges of -1 and +1. In some instances, the values for some probes represented extreme outliers. Thus, observations for probes with absolute values greater than 1.9 were omitted from these analyses.
Development of a method for iteratively identifying clusters of correlated genes associated with an outcome
The multiple prognostic gene lists for breast cancer that have been described have little overlap in their gene composition. We developed a method to sequentially identify clusters of correlated genes that control the prognostic power of these gene lists with the goal of identifying the common underlying biological features of these classifiers. Specifically, we devised an approach to identify clusters of genes that are independently predictive of metastases.
We then used the PC variable to globally adjust the gene expression data. This permitted repetition of the scoring process, leading to identification of a second correlated gene group that is independently predictive of outcome. In the next iteration, the top 10 ranked genes were gene2 and genes highly correlated with it. This cluster is shown in Figure 1B. Note that the overall magnitude of the p-values is smaller for this cluster since gene2 was less strongly associated with outcome than gene1. On the next iteration, a third correlated cluster of genes was identified that was not strongly associated with outcome (Figure 1C). In sum, these analyses demonstrate that clusters of correlated genes independently predictive of outcome can be sequentially identified, and that the order in which clusters are elucidated reflects the strength of their association with the outcome.
Identification of correlated gene clusters associated with breast cancer metastases
This approach was repeated with the KJX64/KJ125 data sets, which are comprised of data from 189 women with primary operable invasive breast cancer . Similar to the NKI2 data set, the top ranked predictive cluster of probes was dominated by cell cycle genes (Figure 2B) [see Additional file 2]. Of 1,282 probes initially analyzed, 255 retained significant HR p-values (p < 0.05) after adjusting for this PC. The top ranked probes in the second cluster were predominantly related to immune function (Figure 2B) [see Additional file 2]. Only 58 genes were still associated with outcome after adjusting the data for this second gene cluster. Additional predictive gene clusters were not found among these remaining genes.
Using a third, independent data set, we found that the primary cluster in the Wang data set of 296 women with lymph node negative disease  was also dominated by a set of cell cycle related genes (Figure 2C) [see Additional file 2]. Of the 4,088 probes initially analyzed, 1,490 remained predictive after adjusting for this cluster. The next cluster consisted of relatively few correlated genes predictive of outcome (Figure 2C) [see Additional file 2], and was comprised of genes potentially related to TGF-β or activin signaling . This cluster was relatively small, and 850 genes were still predictive after adjustment. The top ranked genes in the third iteration were not consistently correlated with each other and, thus, did not define a correlated gene cluster [see Additional file 2].
In summary, the dominant predictive network in all three independent expression data sets was a cluster of correlated genes whose most predictive members were cell cycle-associated. Similar results were obtained when analyses were performed on two additional independent data sets [see Additional file 2], i.e. the primary PC was dominated by cell cycle correlated genes.
Each principal component variable is independently predictive of metastases in multivariable analyses
Multivariable Cox proportional hazards analysis for each principal component variable1.
ER positive tumors
ER negative tumors
Genes correlated with the cell cycle principal component are overrepresented on published prognostic gene lists
Few, if any, probes in the expression data sets were correlated with the secondary PC variables. The notable exception was the Naderi prognostic gene list  in the KJX64/KJ125 data set. Almost 10% (2–3 fold higher than expected) of probes within this predictor had absolute correlations greater than 0.4. In sum, these data show that most prognostic gene lists are comprised of a higher than expected number of genes whose expression is correlated with the primary PC variable which measures fluctuations in the cell cycle.
Prognostic gene lists depend upon the cell cycle principal component to identify good and poor prognosis tumors
Prognostic gene lists rely on genes correlated with the first (cell cycle) principal component variable1.
5.8 (< 0.0001)
5.7 (< 0.0001)
Wang 76-gene (ER+)
4.5 (< 0.0001)
Wang 76-gene (ER-)
4.7 (< 0.0001)
4.7 (< 0.0001)
Wang 76-gene (ER+)
Wang 76-gene (ER-)4
Wang 76-gene (ER+)
Wang 76-gene (ER-)
2 (< 0.0001)
3.1 (< 0.0001)
Collectively, these analyses revealed that prognostic gene lists rely on genes whose expression correlates with cell cycle genes to assign tumors to high and low risk groups across data sets. Further supporting this conclusion, we found that when data were adjusted for individual genes, rather than a PC variable representing a cluster of genes, the impact of such adjustment on the performance of the predictor was directly related to the magnitude of that gene's correlation with the cell cycle PC [see Additional file 3].
Herein, we described an approach to determine biological commonalities among non-overlapping gene-based classifiers that are predictive of breast cancer metastases. Although it has previously been suggested that cellular proliferation may be an underlying component of these classifiers, no studies have directly shown that this is, indeed, the case. To determine if there was a fundamental biological process that was consistently being interrogated by these prognostic lists, we developed a method to identify clusters of correlated genes that are associated with the risk of developing metastases in breast cancer. The advantage of this method is that it identifies large groups of correlated genes whose expression pattern is determined by common processes associated with an outcome. We suggest that such an approach is necessary when devising classifiers because all members of a correlated network function as a group to relay information regarding a common pathophysiological state . More importantly, any one gene from a correlated cluster can act as a sentinel or reporter for all other genes within that same cluster. As a result, numerous prognostic lists can be generated which measure the same biological process but have little to no overlap in gene content. In addition, analytic approaches which examine individual probes are apt to identify only those correlated probes that are strongly associated with outcome, but not those probes with weaker correlations since they will only be modestly predictive .
Using an iterative-subtractive approach, we consistently show that the most predictive gene cluster in five independent, publicly available breast cancer data sets is represented by genes that either directly participate in the cell cycle or whose expression is dictated by cell cycle progression. Visualization of this cluster in the NKI2 gene expression data set was particularly striking as it revealed that the prognostic capacity of virtually every gene was directly related to that gene's correlation with cell cycle genes (Figure 2A). Indeed, when data were adjusted to selectively eliminate the correlations among the genes in this cluster, very few genes remained predictive of outcome. The highly correlated nature of all of the predictive genes in this data set explains why a comparison of the performance of five prognostic gene lists showed that 4 of 5 performed comparably even though these lists had very little overlap in their constituent genes . It also reveals why using only the cell cycle correlated genes from a predictive gene list is sufficient to recapitulate the performance of the entire list [15, 17, 26] – because the other predictive genes in these lists function as weak proxies of the cell cycle genes.
A number of studies have identified cell cycle genes as being predictive of poor outcome within and across cancer types [23, 27, 28]. Cell cycle genes have also been shown to represent the molecular basis of tumor grade in breast cancer, a strong predictor of metastasis [15, 16, 29, 30]. While cell cycle regulators may play a causative role in tumor development , their upregulation also likely represents an important unifying pathway for a number of tumor initiating events. As a result, the fundamental commonality among all breast cancer subtypes that predicts outcome is an increase in proliferation. This outcome further supports the notion that breast cancer is an amalgam of very different subtypes of disease, with each subtype having a distinct group of prognostic genes. When all of these classes of tumors are combined, the major common theme is cell cycle progression. This concept is further illustrated by the inherent lack of known mammary oncogenes, such as HER2/ErbB2 , in any of the predictive clusters we identified. A similar observation has been made in other studies which have used statistical methods to define prognostic gene sets [6, 7, 11]. While expression of cell cycle regulators is upregulated in HER2 tumors as well as ER negative tumors [19, 33], HER2 and ER-α were either uncorrelated or modestly linearly correlated with the cell cycle clusters. The low correlations are likely due to the fact that these types of tumors represent a relatively small proportion of tumors in these data sets and that expression levels of these genes are uncorrelated with those for the cell cycle genes in the majority of tumor types. Thus, it is not surprising that these factors would not be the primary indicators of outcome, either by the method described herein or with other classifiers, when examining the entire set of tumors. In contrast, many of the genes negatively correlated with the cell cycle PC variable were genes that have previously been shown to define the good prognosis class of Luminal A tumors (e.g. SCUBE2, XBP1, FoxA1 etc.) [3, 20] (data not shown). The negative correlations may arise either because these genes inhibit proliferation  or because they define a class of tumors with intrinsically low rates of proliferation. In either case, from the perspective of predicting metastasis latencies, they all function to various degrees as proxies for the cell cycle.
Prognostic tools for breast cancer that utilize gene-based classifiers are currently being tested or used in clinical settings . As more of these prognostic tools become commercially available, it will be imperative to have a thorough understanding of how they may differ in order to facilitate selection of the most appropriate test. Our studies suggest that these tools, despite having little overlap in gene content, rely heavily on a single metric to identify high and low risk tumors. The common biological process that is being measured is the upregulation of genes associated with cell proliferation. Consistent with this conclusion is the fact that the greatest overlap in established classifiers is typically genes encoding cell cycle regulators . Furthermore, the cell cycle cluster of genes (PC1 for each data set) was as effective as all of the other established classifiers we examined when used to evaluate data sets other than those from which they were derived (Table 2) [see Additional file 2]. Most importantly, adjustment of the data sets for this cluster resulted in a loss of the predictive capacity for these classifiers, indicating that they require measuring fluctuations in the cell cycle correlated genes to effectively separate tumors into high and low risk groups. These findings suggest that validation of novel classifying gene lists should involve a comparison with a prognosticator comprised of a small number of cell cycle associated genes, as such a classifier may be able to capture the same amount of critical prognostic information with a smaller number of genes This study also underscores the fact that many genes are correlated with the cell cycle genes even though they may not be directly associated with this process. Thus, simply removing known cell cycle regulatory genes from a gene expression data set or prognostic classifier is insufficient to adjust for the contribution of cellular proliferation in a prognostic gene list.
We are grateful for critical and statistical input from Robert C. Elston. This work was supported by grants from the National Institutes of Health [RO1-CA090398 (to RAK) and training grant T32-GM07250 (MSTP/CASE) (to JDM)] and an Ohio Innovation Incentive Fellowship (to JDM).
- Sorlie T, Tibshirani R, Parker J, Hastie T, Marron JS, Nobel A, Deng S, Johnsen H, Pesich R, Geisler S, Demeter J, Perou CM, Lonning PE, Brown PO, Borresen-Dale AL, Botstein D: Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci U S A. 2003, 100: 8418-8423. 10.1073/pnas.0932692100.View ArticlePubMedPubMed CentralGoogle Scholar
- Rouzier R, Perou CM, Symmans WF, Ibrahim N, Cristofanilli M, Anderson K, Hess KR, Stec J, Ayers M, Wagner P, Morandi P, Fan C, Rabiul I, Ross JS, Hortobagyi GN, Pusztai L: Breast cancer molecular subtypes respond differently to preoperative chemotherapy. Clin Cancer Res. 2005, 11: 5678-5685. 10.1158/1078-0432.CCR-04-2421.View ArticlePubMedGoogle Scholar
- Sorlie T, Wang Y, Xiao C, Johnsen H, Naume B, Samaha RR, Borresen-Dale AL: Distinct molecular mechanisms underlying clinically relevant subtypes of breast cancer: gene expression analyses across three different platforms. BMC Genomics. 2006, 7: 127-10.1186/1471-2164-7-127.View ArticlePubMedPubMed CentralGoogle Scholar
- Chang HY, Nuyten DS, Sneddon JB, Hastie T, Tibshirani R, Sorlie T, Dai H, He YD, van't Veer LJ, Bartelink H, van de RM, Brown PO, van de Vijver MJ: Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. Proc Natl Acad Sci U S A. 2005, 102: 3738-3743. 10.1073/pnas.0409462102.View ArticlePubMedPubMed CentralGoogle Scholar
- Ma XJ, Wang Z, Ryan PD, Isakoff SJ, Barmettler A, Fuller A, Muir B, Mohapatra G, Salunga R, Tuggle JT, Tran Y, Tran D, Tassin A, Amon P, Wang W, Wang W, Enright E, Stecker K, Estepa-Sabal E, Smith B, Younger J, Balis U, Michaelson J, Bhan A, Habin K, Baer TM, Brugge J, Haber DA, Erlander MG, Sgroi DC: A two-gene expression ratio predicts clinical outcome in breast cancer patients treated with tamoxifen. Cancer Cell. 2004, 5: 607-616. 10.1016/j.ccr.2004.05.015.View ArticlePubMedGoogle Scholar
- Naderi A, Teschendorff AE, Barbosa-Morais NL, Pinder SE, Green AR, Powe DG, Robertson JF, Aparicio S, Ellis IO, Brenton JD, Caldas C: A gene-expression signature to predict survival in breast cancer across independent data sets. Oncogene. 2006, 26: 1507-1516. 10.1038/sj.onc.1209920.View ArticlePubMedGoogle Scholar
- Teschendorff AE, Naderi A, Barbosa-Morais NL, Pinder SE, Ellis IO, Aparicio S, Brenton JD, Caldas C: A consensus prognostic gene expression classifier for ER positive breast cancer. Genome Biol. 2006, 7: R101-10.1186/gb-2006-7-10-r101.View ArticlePubMedPubMed CentralGoogle Scholar
- Paik S, Shak S, Tang G, Kim C, Baker J, Cronin M, Baehner FL, Walker MG, Watson D, Park T, Hiller W, Fisher ER, Wickerham DL, Bryant J, Wolmark N: A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med. 2004, 351: 2817-2826. 10.1056/NEJMoa041588.View ArticlePubMedGoogle Scholar
- van de Vijver MJ, He YD, van't Veer LJ, Dai H, Hart AA, Voskuil DW, Schreiber GJ, Peterse JL, Roberts C, Marton MJ, Parrish M, Atsma D, Witteveen A, Glas A, Delahaye L, van , Bartelink H, Rodenhuis S, Rutgers ET, Friend SH, Bernards R: A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med. 2002, 347: 1999-2009. 10.1056/NEJMoa021967.View ArticlePubMedGoogle Scholar
- Wang Y, Klijn JG, Zhang Y, Sieuwerts AM, Look MP, Yang F, Talantov D, Timmermans M, Meijer-van Gelder ME, Yu J, Jatkoe T, Berns EM, Atkins D, Foekens JA: Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet. 2005, 365: 671-679.View ArticlePubMedGoogle Scholar
- van't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der KK, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH: Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002, 415: 530-536. 10.1038/415530a.View ArticleGoogle Scholar
- Jenssen TK, Hovig E: Gene-expression profiling in breast cancer. Lancet. 2005, 365: 634-635.View ArticlePubMedGoogle Scholar
- Massague J: Sorting out breast-cancer gene signatures. N Engl J Med. 2007, 356: 294-297. 10.1056/NEJMe068292.View ArticlePubMedGoogle Scholar
- Fan C, Oh DS, Wessels L, Weigelt B, Nuyten DS, Nobel AB, van't Veer LJ, Perou CM: Concordance among gene-expression-based predictors for breast cancer. N Engl J Med. 2006, 355: 560-569. 10.1056/NEJMoa052933.View ArticlePubMedGoogle Scholar
- Sotiriou C, Wirapati P, Loi S, Harris A, Fox S, Smeds J, Nordgren H, Farmer P, Praz V, Haibe-Kains B, Desmedt C, Larsimont D, Cardoso F, Peterse H, Nuyten D, Buyse M, van de Vijver MJ, Bergh J, Piccart M, Delorenzi M: Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst. 2006, 98: 262-272.View ArticlePubMedGoogle Scholar
- Ivshina AV, George J, Senko O, Mow B, Putti TC, Smeds J, Lindahl T, Pawitan Y, Hall P, Nordgren H, Wong JE, Liu ET, Bergh J, Kuznetsov VA, Miller LD: Genetic reclassification of histologic grade delineates new clinical subtypes of breast cancer. Cancer Res. 2006, 66: 10292-10301. 10.1158/0008-5472.CAN-05-4414.View ArticlePubMedGoogle Scholar
- Sotiriou C, Desmedt C: Gene expression profiling in breast cancer. Ann Oncol. 2006, 17 Suppl 10: x259-x262. 10.1093/annonc/mdl270.View ArticlePubMedGoogle Scholar
- Desmedt C, Sotiriou C: Proliferation: the most prominent predictor of clinical outcome in breast cancer. Cell Cycle. 2006, 5: 2198-2202.View ArticlePubMedGoogle Scholar
- Perou CM, Sorlie T, Eisen MB, van de RM, Jeffrey SS, Rees CA, Pollack JR, Ross DT, Johnsen H, Akslen LA, Fluge O, Pergamenschikov A, Williams C, Zhu SX, Lonning PE, Borresen-Dale AL, Brown PO, Botstein D: Molecular portraits of human breast tumours. Nature. 2000, 406: 747-752. 10.1038/35021093.View ArticlePubMedGoogle Scholar
- Sorlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, Hastie T, Eisen MB, van de RM, Jeffrey SS, Thorsen T, Quist H, Matese JC, Brown PO, Botstein D, Eystein LP, Borresen-Dale AL: Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci U S A. 2001, 98: 10869-10874. 10.1073/pnas.191367098.View ArticlePubMedPubMed CentralGoogle Scholar
- Rosetta Inpharmatics. 2008, [http://www.rii.com/publications/default.htm]
- Shipitsin M, Campbell LL, Argani P, Weremowicz S, Bloushtain-Qimron N, Yao J, Nikolskaya T, Serebryiskaya T, Beroukhim R, Hu M, Halushka MK, Sukumar S, Parker LM, Anderson KS, Harris LN, Garber JE, Richardson AL, Schnitt SJ, Nikolsky Y, Gelman RS, Polyak K: Molecular definition of breast tumor heterogeneity. Cancer Cell. 2007, 11: 259-273. 10.1016/j.ccr.2007.01.013.View ArticlePubMedGoogle Scholar
- Dai H, van't Veer L, Lamb J, He YD, Mao M, Fine BM, Bernards R, van , Deutsch P, Sachs A, Stoughton R, Friend S: A cell proliferation signature is a marker of extremely poor outcome in a subpopulation of breast cancer patients. Cancer Res. 2005, 65: 4059-4066. 10.1158/0008-5472.CAN-04-3953.View ArticlePubMedGoogle Scholar
- Alter O, Brown PO, Botstein D: Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci U S A. 2000, 97: 10101-10106. 10.1073/pnas.97.18.10101.View ArticlePubMedPubMed CentralGoogle Scholar
- Nilsson R, Pena JM, Bjorkegren J, Tegner J: Detecting multivariate differentially expressed genes. BMC Bioinformatics. 2007, 8: 150-10.1186/1471-2105-8-150.View ArticlePubMedPubMed CentralGoogle Scholar
- Vuaroqueaux V, Urban P, Labuhn M, Delorenzi M, Wirapati P, Benz CC, Flury R, Dieterich H, Spyratos F, Eppenberger U, Eppenberger-Castori S: Low E2F1 transcript levels are a strong determinant of favorable breast cancer outcome. Breast Cancer Res. 2007, 9: R33-10.1186/bcr1681.View ArticlePubMedPubMed CentralGoogle Scholar
- Rosenwald A, Wright G, Chan WC, Connors JM, Campo E, Fisher RI, Gascoyne RD, Muller-Hermelink HK, Smeland EB, Giltnane JM, Hurt EM, Zhao H, Averett L, Yang L, Wilson WH, Jaffe ES, Simon R, Klausner RD, Powell J, Duffey PL, Longo DL, Greiner TC, Weisenburger DD, Sanger WG, Dave BJ, Lynch JC, Vose J, Armitage JO, Montserrat E, Lopez-Guillermo A, Grogan TM, Miller TP, LeBlanc M, Ott G, Kvaloy S, Delabie J, Holte H, Krajci P, Stokke T, Staudt LM: The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N Engl J Med. 2002, 346: 1937-1947. 10.1056/NEJMoa012914.View ArticlePubMedGoogle Scholar
- Whitfield ML, George LK, Grant GD, Perou CM: Common markers of proliferation. Nat Rev Cancer. 2006, 6: 99-106. 10.1038/nrc1802.View ArticlePubMedGoogle Scholar
- Volpi A, Bacci F, Paradiso A, Saragoni L, Scarpi E, Ricci M, Aldi M, Bianchi S, Muretto P, Nuzzo F, Simone G, Mangia A, Schittulli F, Amadori D: Prognostic relevance of histological grade and its components in node-negative breast cancer patients. Mod Pathol. 2004, 17: 1038-1044. 10.1038/modpathol.3800161.View ArticlePubMedGoogle Scholar
- Michels JJ, Marnay J, Delozier T, Denoux Y, Chasle J: Proliferative activity in primary breast carcinomas is a salient prognostic factor. Cancer. 2004, 100: 455-464. 10.1002/cncr.11916.View ArticlePubMedGoogle Scholar
- Wang TC, Cardiff RD, Zukerberg L, Lees E, Arnold A, Schmidt EV: Mammary hyperplasia and carcinoma in MMTV-cyclin D1 transgenic mice. Nature. 1994, 369: 669-671. 10.1038/369669a0.View ArticlePubMedGoogle Scholar
- Slamon DJ, Clark GM, Wong SG, Levin WJ, Ullrich A, McGuire WL: Human breast cancer: correlation of relapse and survival with amplification of the HER-2/neu oncogene. Science. 1987, 235: 177-182. 10.1126/science.3798106.View ArticlePubMedGoogle Scholar
- Hu Z, Fan C, Oh DS, Marron JS, He X, Qaqish BF, Livasy C, Carey LA, Reynolds E, Dressler L, Nobel A, Parker J, Ewend MG, Sawyer LR, Wu J, Liu Y, Nanda R, Tretiakova M, Ruiz OA, Dreher D, Palazzo JP, Perreard L, Nelson E, Mone M, Hansen H, Mullins M, Quackenbush JF, Ellis MJ, Olopade OI, Bernard PS, Perou CM: The molecular portraits of breast tumors are conserved across microarray platforms. BMC Genomics. 2006, 7: 96-10.1186/1471-2164-7-96.View ArticlePubMedPubMed CentralGoogle Scholar
- Moggs JG, Murphy TC, Lim FL, Moore DJ, Stuckey R, Antrobus K, Kimber I, Orphanides G: Anti-proliferative effect of estrogen in breast cancer cells that re-express ERalpha is mediated by aberrant regulation of cell cycle genes. J Mol Endocrinol. 2005, 34: 535-551. 10.1677/jme.1.01677.View ArticlePubMedGoogle Scholar
- Harris L, Fritsche H, Mennel R, Norton L, Ravdin P, Taube S, Somerfield MR, Hayes DF, Bast RC: American Society of Clinical Oncology 2007 update of recommendations for the use of tumor markers in breast cancer. J Clin Oncol. 2007, 25: 5287-5312. 10.1200/JCO.2007.14.2364.View ArticlePubMedGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1755-8794/1/11/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.