Our conclusions are pertinent to both bioinformatics in general and to this particular breast cancer subset.
Observations of Normalization Strategies to Remove Institutional Bias in Meta-Analysis of Gene Expression Array Data
In the course of our investigation, we compared the effectiveness of normalizing data using quantile normalzation, conventional median-centering, and a recently published algorithm called XPN. Although the data from the two institutions demonstrated adequate correlation after quantile normalization, results of the hierarchical clustering continued to be affected by institutional bias. This may indicate a particular sensitivity of hierarchical clustering to institutional bias.
Molecular Equivalence of the "ER-Subclass A" with "Molecular Apocrine" Breast Cancer
We have proposed three criteria for evaluating molecular equivalence between transcript-defined subsets identified by two or more independently conducted studies: 1) the majority of molecularly equivalent samples should cluster together and distinctively separate from the remaining samples in unsupervised clustering of the combined data; 2) there should be statistically significant overlap of gene signatures used to define the phenotype in each separate study; and 3) a classifier trained on data from one institution should successfully predict the phenotype in the other institution, and vice versa. We call upon the microarray community to consider these criteria and establish a standard protocol for etablishing molecular equivalence.
In the course of our evaluation, we demonstrate that two of the three criteria proposed are met even without combining and normalizing the data together: the 25-gene overlap between the signatures identified by Farmer et al. and Doane et al. is statistically significant; and the published signatures for each of these studies adequately predicts the hypothesized breast cancer subset in the other index cohort. However, not only were we able to enlarge the extent of overlap in the signatures, but we found that only after appropriate normalization did the samples from the two institutions cluster together by hypothesized phenotype using hierarchical clustering.
Role of AR Signaling in Molecular Apocrine Tumors
Both authors suggest a role for AR signaling in this subtype of breast cancer based on comparison to data generated by cell lines. In addition, Doane et al. suggests that there is some overlap of the signatures with known ER+ genes. We chose two different network inference methods to explore causal networks in this data. LeFEminer utilizes a gene set enrichment type approach while BCRI functions as a discovery strategy supplemented by pathway information from Metacore. We selected pathways that were common to both strategies as highly supported. The AR and ER signals were the two signaling pathways that were identified by both algorithms as relevant to the molecular apocrine phenotype. Expression of the ER molecular profile in the molecular apocrine group, in spite of the fact that it is ER- by immunohistochemistry, has been described by other authors [6, 7, 31]. From a bioinformatics perspective, and since BCRI is a relatively new method of network inference, we see this result as validation of its utility in pathway discovery.
Pathways that Interact with AR in Molecular Apocrine Breast Cancer
Our analysis shows that the molecular apocrine phenotype lacks an overexpression of basal cytokeratins, which is considered to be a defining feature of basal-like breast cancer [36, 37]. Thus, we can consider molecular apocrine tumors to be a distinct subset of ER- tumors that includes both triple-negative and ER-/PR-/ErbB2+ tumors. Since we started our research, two other studies have discovered this subgroup [9, 38]. One study identified it within triple-negative tumors alone while the other identified it to combine AR and ErbB2 signaling. We agree with the original authors that the molecular apocrine tumors can be either ErbB2+ or ErbB2- based on intraction studies that we will discuss below.
Our results reveal a strong interaction between the AR cluster and a cluster with several genes involved in EGFR processing. Several cell lines studies have hypothesized an interaction between EGFR and both AR and ER, suggesting that together they form a complex with Src that enhances EGFR phosphorylation of tyrosine and therefore increases the effectiveness of EGF signaling [5, 39, 40]. However, this is the first study of gene expression data using cancer tissue from patients in which this interaction has been detected using data analysis methods.
A significant relationship is also revealed between the AR cluster and the ErbB2 cluster. The strength of the interaction between this cluster and the AR cluster is weaker than the EGFR processing cluster. In the index studies of molecular apocrine tumors, approximately half of the cases were ErbB2+. This is consistent with the less strong, but significant interaction between AR and ErbB2 in our analysis. In addition to simple co-expression, actual cross-talk between ErbB2 and AR pathways has been suggested based on cell line studies in breast [5, 6]. These studies demonstrated an additive affect of AR inhibition in reducing ErbB2 signaling, and suggested that tumors that are AR+/ErbB2+ might need AR inhibition in addition to targeted anti-ErbB2 therapy to completely neutralize the effective of the ErbB2 signal.
In prostate cancer, cell lines studies have led investigators to hyopthesize that ErbB family signaling, including EGFR (ErbB1), ErbB2, and ErbB3, can activate AR and is responsible for evolution from androgen dependent to androgen independent tumor growth . Thus, at least some tumors with AR transcription profiles might require therapy with ErbB family inihibitors.
Our results with both BCRI and GS combined with RBNA also support the role of FOXA1 interacting with AR in this phenotype. FOXA1 is known to have a role in potentiating steroid receptor transcription regulation, and its association with AR by immunohistochemistry has been reported by several other investigators [42–48]. FOXA1 is a member of the AR cluster and was also directly identified by BCRI (see Additional File 1 -Figure S5). Three other genes identified directly by BCRI (i.e., SPDEF, MLPH, and SERHL) are also part of the AR cluster, which further emphasizes BCRI as a valid network inference strategy.
Associations between PIK3CA mutations and AR in triple-negative tumors have been reported recently . Strong associations between a PIK3CA expressing cluster and AR cluster were not identified. However, given that mutations in PIK3CA may not be picked up on standard gene expression platforms, this association may not be readily discovered from the data.
Clinical and Therapeutic Implications for Molecular Apocrine Breast Cancer
We propose that therapies targeting AR activity may present a rational strategy for managing these patients. The concept of introducing AR blockade as a therapeutic option for breast cancer has received more attention recently [6–9, 50–54]. Older trials of AR blockade did not select for patients with AR dependent signaling or AR expression and therefore may not have addressed the question with an optimal cohort . Based upon our interaction studies, we also recommend that any therapeutic strategies for the molecular apocrine subgroup consider combinatorial targeted therapy to include ErbB family targets, particularly EGFR targeted therapy for the entire molecular apocrine subtype and ErbB2 therapy for those tumors that overexpress ErbB2.
While there is evidence to support ER response genes in the molecular apocrine subset, anti-estrogen therapy using tamoxifen in ER- women in general has been shown to have too little benefit for clinical use. However, small benefits were reported that point to the need for more study . An important question arises - is the presence of ER signaling inferred because AR and ER share a common pathway, or is there cross-talk where AR activation stimulates the ER pathway? Our pathway analyses from BCRI that demonstrate AR and ER as related signals (Figure 6), and analysis of Cluster 16 (see Additional File 1- Figure S10), do not support a common pathway that is activated by AR and ER. While interesting, these results are not conclusive. We note that if cross-talk from activated AR signaling is the cause of the ER signal activation in ER- tumors, then AR inhibition therapy would be sufficient to interrupt this signal.
There is little known about the survival of molecular apocrine tumors as they have only been recently introduced as a subtype. Farmer et al.  describes poor survival in the cohort that they identified from the literature. Weigelt et al.  suggest that apocrine carinomas can expect a 10-year survival rate of 35-50%, and Teschendorff etl al.  suggest that it has the poorest outcome of all of the ER- tumor types. Other data suggests that AR+ tumors that are otherwise triple-negative as defined by immunohistochemistry may have a better prognosis than the basal subtype of tumors . In a recent study of AR protein expression in any type of breast cancer, an improved prognosis was associated with AR expression above a certain threshold in ER+ tumors . It may be that interactions with ErbB family members modify the survival characteristics of AR+ tumors. This deserves further study.
Learning the Systems Biology of Cancer Using Network Inference Methods to Analyze Gene Expression Data
Our results support the strength of using network inference to analyze gene expression array data for oncogenic pathways and their interactions. This study demonstrates that the discovery of oncogenic pathways and their interactions does not have to rely on comparison with signatures from cell lines, but can be discovered using network inference methods. Thus our results demonstrate the rich knowledge resource within gene expression data generated from human tissues.