Gene expression meta-analysis supports existence of molecular apocrine breast cancer with a role for androgen receptor and implies interactions with ErbB family
© Sanga et al; licensee BioMed Central Ltd. 2009
Received: 18 May 2009
Accepted: 11 September 2009
Published: 11 September 2009
Pathway discovery from gene expression data can provide important insight into the relationship between signaling networks and cancer biology. Oncogenic signaling pathways are commonly inferred by comparison with signatures derived from cell lines. We use the Molecular Apocrine subtype of breast cancer to demonstrate our ability to infer pathways directly from patients' gene expression data with pattern analysis algorithms.
We combine data from two studies that propose the existence of the Molecular Apocrine phenotype. We use quantile normalization and XPN to minimize institutional bias in the data. We use hierarchical clustering, principal components analysis, and comparison of gene signatures derived from Significance Analysis of Microarrays to establish the existence of the Molecular Apocrine subtype and the equivalence of its molecular phenotype across both institutions. Statistical significance was computed using the Fasano & Franceschini test for separation of principal components and the hypergeometric probability formula for significance of overlap in gene signatures. We perform pathway analysis using LeFEminer and Backward Chaining Rule Induction to identify a signaling network that differentiates the subset. We identify a larger cohort of samples in the public domain, and use Gene Shaving and Robust Bayesian Network Analysis to detect pathways that interact with the defining signal.
We demonstrate that the two separately introduced ER- breast cancer subsets represent the same tumor type, called Molecular Apocrine breast cancer. LeFEminer and Backward Chaining Rule Induction support a role for AR signaling as a pathway that differentiates this subset from others. Gene Shaving and Robust Bayesian Network Analysis detect interactions between the AR pathway, EGFR trafficking signals, and ErbB2.
We propose criteria for meta-analysis that are able to demonstrate statistical significance in establishing molecular equivalence of subsets across institutions. Data mining strategies used here provide an alternative method to comparison with cell lines for discovering seminal pathways and interactions between signaling networks. Analysis of Molecular Apocrine breast cancer implies that therapies targeting AR might be hampered if interactions with ErbB family members are not addressed.
Gene expression array data can be mined to provide critical insight into our understanding of the relationship between signaling networks and the biology of cancer [1–3]. In addition to identifying individual pathways, recent attention has been given to "cross-talk" or interactions that cause aberrant signaling patterns in cancer [4–6]. The conventional method of identifying oncogenic pathways and their interactions has been through studying cell lines [1, 2, 7, 8]. Our goal is to be able to identify dominant pathways using data mining methods that do not require direct comparison with cell lines.
To pursue our goal we investigate a recently introduced subtype of ER- breast cancer that is hypothesized to result from AR signaling. We analyze the data using several different bioinformatics approaches to pathway discovery. We are able to detect patterns that support the same conclusions reached with comparison to cell lines data by the original authors. In addition, we introduce interactions not previously discovered in the data that have important therapeutic implications. Thus, our results contribute to both bioinformatics and to breast cancer biology.
The ER- breast cancer subtype that we study here has been termed the "molecular apocrine" subtype [8, 9] and the "ER- class A" subtype  in two separate studies that proposed its existence. The studies were independently performed, but both groups hypothesized AR signaling as a defining feature of the transcript profile, leading us to question whether or not they represent the same tumor subset. One study identifies six of 16 ER- tumors as the molecular apocrine subtype and the other study identifies ten of 41 ER- tumors as the class A subtype. Since there has not been a meta-analysis of both studies to actually confirm that the individual tumor clusters actually represent the same breast cancer subset as defined by gene expression, we start by performing a comparative study. We call this a test of "molecular equivalence," and we propose a set of criteria for establishing molecular equivalence cancer subsets defined by gene expression data: 1) the majority of the molecular phenotype should cluster together and their combined profile should be distinct from the remaining samples in unsupervised clustering of the combined data; 2) there should be significant overlap of the gene signatures used to classify the phenotype from each institution; and 3) a classifier trained on data from one institution should be able to predict the phenotype correctly in the other institution's data, and vice versa. In the process of establishing molecular equivalence, we test different methods of normalizing the data to remove institutional bias and we comment on their effectiveness.
Once having established the molecular equivalence of the group, we use Learner of Functional Enrichment algorithm (LeFEminer), which is based on gene set enrichment , and Backward Chaining Rule Induction (BCRI), which is a de novo discovery method [11–13], to identify pathways in the combined data. Both of these methods incorporate existing pathway knowledge from the literature within their methodology. Our results indicate a role for AR in this breast cancer subset. Subsequently, we use a gene expression classifier to identify more molecular apocrine data for discovery of pathway interactions. We use Gene Shaving and Robust Bayesian Network Analysis on this data because it facilitates discovery of interactions that have variable prevalence in the patient population [14, 15]. We demonstrate that there are highly prevalent interactions between AR signaling and members of the ErbB family. We discuss the therapeutic implications of cross-talk between AR and members of the ErbB family in molecular apocrine type breast cancer. Taken together, these results demonstrate that data mining methods can be used to generate network information directly from gene expression data.
The data used in this study were generated on Affymetrix U133A oligonucleotide microarrays and are publicly available [7, 8, 16–18]. The cohort from Farmer et al.  includes 22 ER- breast carcinoma samples with six classified as molecular apocrine. The cohort from Doane et al.  includes 41 ER- breast carcinoma samples with ten classified as molecular apocrine. We refer to data generated by Farmer et al. and Doane et al. as the "index cohorts." We use additional cohorts from Ivshina et al. , Rouzier et al. , and Sotiriou et al. , which contain 59, 51, and 34 ER- breast carcinoma samples, respectively, to confirm the existence of the molecular apocrine phenotype in larger cohorts outside the index cohorts and to explore gene network interactions.
Comparison for Molecular Equivalence
We evaluate our second proposed criterion for determining molecular equivalence by using Significance Analysis of Microarrays (SAM)  to identify the top 100 statistically significant probesets in each of the index cohorts (after normalization) that differentiate the hypothesized molecular apocrine phenotype from the remaining samples. The resulting gene signatures share 76 genes (see Additional file 2), while the original two studies identified 138-gene  and 400-gene  profiles with 25 overlapping genes. The extent of overlap for both results is statistically significant (both p < 0.0001). For comparison, 100-gene signatures derived from a median-centered dataset using manufacture-provided probeset definitions has 25 overlapping genes and from a median-centered dataset using AffyProbeMiner probeset definitions has 33 overlapping genes (both p < 0.0001). While there is no notable difference in statistical significance, the larger number of common genes gives us more attributes with which to investigate the networks and gene interactions that define this species.
Functional Analysis of the "Molecular Apocrine" Phenotype Using LeFEminer
Top-ranking Gene Ontology categories identified by LeFEminer on the normalized index cohorts
carboxylic acid metabolism
Fatty acid metabolism
Fatty acid metabolism BioCarta
MAP00480 Glutathione metabolism GenMAPP
organic acid metabolism
aromatic compound catabolism
MAP00350 Tyrosine metabolism GenMAPP
oxidoreductase activity, acting on CH-OH group of donors
aromatic amino acid family catabolism
electron transporter activity BioCarta
Cyclic nucleotide-dependent protein kinase activity
ANDROGEN UP GENES na
Fatty Acid Synthesis BioCarta
regulation of locomotion
polypeptide N-acetylgalactosaminyltransferase activity
MAP00360 Phenylalanine metabolism GenMAPP
breast cancer estrogen signalling GEArray
MAP00512 O Glycans biosynthesis GenMAPP
cAMP-dependent protein kinase activity
regulation of cell migration
microtubule cytoskeleton organization and biogenesis
mitotic spindle checkpoint
electron transporter activity
regulation of behaviour
neuronal lineage restriction
Network Inference Analysis of "Molecular Apocrine" Phenotype Using Backward Chaining Rule Induction and MetaCore
Transcription regulation analysis by GeneGo's MetaCore
positive regulation of retinoic acid receptor signaling pathway (16.7%; 1.463e-03), negative regulation of mitosis (16.7%; 2.194e-03), epithelial cell maturation (16.7%; 2.559e-03), regulation of retinoic acid receptor signaling pathway (16.7%; 2.925e-03), melanosome localization (16.7%; 4.019e-03)
negative regulation of protein import into nucleus, translocation (16.7%; 7.318e-04), negative regulation of tyrosine phosphorylation of Stat5 protein (16.7%; 1.463e-03), regulation of protein import into nucleus, translocation (16.7%; 1.829e-03), ornithine metabolic process (16.7%; 2.559e-03), positive regulation of gluconeogenesis (16.7%; 2.559e-03)
prostate gland development (40.0%; 4.905e-06), male somatic sex determination (20.0%; 3.050e-04), somatic sex determination (20.0%; 6.099e-04), gland development (40.0%; 7.323e-04), urogenital system development (40.0%; 7.845e-04)
glucose homeostasis (66.7%; 5.376e-05), carbohydrate homeostasis (66.7%; 5.376e-05), epithelial cell maturation (33.3%; 1.280e-03), bile acid biosynthetic process (33.3%; 1.646e-03), paraxial mesoderm formation (33.3%; 1.829e-03)
neuron fate specification (66.7%; 2.343e-06), positive regulation of neuron differentiation (66.7%; 6.156e-06), neuron fate commitment (66.7%; 2.409e-05), cell fate specification (66.7%; 3.555e-05), epithelial cell differentiation (66.7%; 6.683e-05)
Persistence of Molecularly-defined Phenotype in Larger Dataset
Network Inference Analysis of "Molecular Apocrine" Phenotype Using Gene Shaving & Robust Bayesian Network Analysis
First, we perform an unsupervised gene clustering using Gene Shaving (GS), and subsequently use Robust Bayesian Network Analysis (RBNA) to discover relationships between an AR-based cluster and other gene clusters [14, 15]. Note that we do not seek support for the AR pathway as having a role in the molecular apocrine subtype in the model classified cohort because our gene classifier that predicts membership in the molecular apocrine subtype includes AR. This would have biased the network inferences toward selecting AR.
Analysis of the Interacting Gene Clusters
We submit the members of the interacting clusters to both MetaCore  and GeneCards [34, 35] to identify associated gene ontologies and known transcription regulation relationships. From GeneCards we identify multiple upregulated species associated with EGFR processing in Cluster 24, which we label as the EGFR processing cluster, although EGFR itself is not a member of the cluster. MetaCore shows that Cluster 71, which contains ErbB2, also contains other EGFR-related genes. We call this cluster the ErbB2 cluster. We also analyzed the clusters that indirectly interact with the AR cluster. Cluster 16 is interesting because although AR is not a member, MetaCore reveals a large number of genes whose transcription is regulated by AR (see Additional File 1 -Figure S10). Furthermore, MetaCore analysis of Cluster 16 suggests network relationships related to ER, p53, and Maspin (a tumor suppressor gene associated with breast, prostate, and pancreatic cancer). In addition to Cluster 16, MetaCore identifies relationships between ER and the genes in the AR and ErbB2 clusters along with clusters 56, 62, 71, 76, 80, and 92. Of interest, ErbB3 is present in cluster 62, which has an indirect link to the molecular apocrine subtype (see Additional File 1 -Figure S6).
Discussion & Conclusion
Our conclusions are pertinent to both bioinformatics in general and to this particular breast cancer subset.
Observations of Normalization Strategies to Remove Institutional Bias in Meta-Analysis of Gene Expression Array Data
In the course of our investigation, we compared the effectiveness of normalizing data using quantile normalzation, conventional median-centering, and a recently published algorithm called XPN. Although the data from the two institutions demonstrated adequate correlation after quantile normalization, results of the hierarchical clustering continued to be affected by institutional bias. This may indicate a particular sensitivity of hierarchical clustering to institutional bias.
Molecular Equivalence of the "ER-Subclass A" with "Molecular Apocrine" Breast Cancer
We have proposed three criteria for evaluating molecular equivalence between transcript-defined subsets identified by two or more independently conducted studies: 1) the majority of molecularly equivalent samples should cluster together and distinctively separate from the remaining samples in unsupervised clustering of the combined data; 2) there should be statistically significant overlap of gene signatures used to define the phenotype in each separate study; and 3) a classifier trained on data from one institution should successfully predict the phenotype in the other institution, and vice versa. We call upon the microarray community to consider these criteria and establish a standard protocol for etablishing molecular equivalence.
In the course of our evaluation, we demonstrate that two of the three criteria proposed are met even without combining and normalizing the data together: the 25-gene overlap between the signatures identified by Farmer et al. and Doane et al. is statistically significant; and the published signatures for each of these studies adequately predicts the hypothesized breast cancer subset in the other index cohort. However, not only were we able to enlarge the extent of overlap in the signatures, but we found that only after appropriate normalization did the samples from the two institutions cluster together by hypothesized phenotype using hierarchical clustering.
Role of AR Signaling in Molecular Apocrine Tumors
Both authors suggest a role for AR signaling in this subtype of breast cancer based on comparison to data generated by cell lines. In addition, Doane et al. suggests that there is some overlap of the signatures with known ER+ genes. We chose two different network inference methods to explore causal networks in this data. LeFEminer utilizes a gene set enrichment type approach while BCRI functions as a discovery strategy supplemented by pathway information from Metacore. We selected pathways that were common to both strategies as highly supported. The AR and ER signals were the two signaling pathways that were identified by both algorithms as relevant to the molecular apocrine phenotype. Expression of the ER molecular profile in the molecular apocrine group, in spite of the fact that it is ER- by immunohistochemistry, has been described by other authors [6, 7, 31]. From a bioinformatics perspective, and since BCRI is a relatively new method of network inference, we see this result as validation of its utility in pathway discovery.
Pathways that Interact with AR in Molecular Apocrine Breast Cancer
Our analysis shows that the molecular apocrine phenotype lacks an overexpression of basal cytokeratins, which is considered to be a defining feature of basal-like breast cancer [36, 37]. Thus, we can consider molecular apocrine tumors to be a distinct subset of ER- tumors that includes both triple-negative and ER-/PR-/ErbB2+ tumors. Since we started our research, two other studies have discovered this subgroup [9, 38]. One study identified it within triple-negative tumors alone while the other identified it to combine AR and ErbB2 signaling. We agree with the original authors that the molecular apocrine tumors can be either ErbB2+ or ErbB2- based on intraction studies that we will discuss below.
Our results reveal a strong interaction between the AR cluster and a cluster with several genes involved in EGFR processing. Several cell lines studies have hypothesized an interaction between EGFR and both AR and ER, suggesting that together they form a complex with Src that enhances EGFR phosphorylation of tyrosine and therefore increases the effectiveness of EGF signaling [5, 39, 40]. However, this is the first study of gene expression data using cancer tissue from patients in which this interaction has been detected using data analysis methods.
A significant relationship is also revealed between the AR cluster and the ErbB2 cluster. The strength of the interaction between this cluster and the AR cluster is weaker than the EGFR processing cluster. In the index studies of molecular apocrine tumors, approximately half of the cases were ErbB2+. This is consistent with the less strong, but significant interaction between AR and ErbB2 in our analysis. In addition to simple co-expression, actual cross-talk between ErbB2 and AR pathways has been suggested based on cell line studies in breast [5, 6]. These studies demonstrated an additive affect of AR inhibition in reducing ErbB2 signaling, and suggested that tumors that are AR+/ErbB2+ might need AR inhibition in addition to targeted anti-ErbB2 therapy to completely neutralize the effective of the ErbB2 signal.
In prostate cancer, cell lines studies have led investigators to hyopthesize that ErbB family signaling, including EGFR (ErbB1), ErbB2, and ErbB3, can activate AR and is responsible for evolution from androgen dependent to androgen independent tumor growth . Thus, at least some tumors with AR transcription profiles might require therapy with ErbB family inihibitors.
Our results with both BCRI and GS combined with RBNA also support the role of FOXA1 interacting with AR in this phenotype. FOXA1 is known to have a role in potentiating steroid receptor transcription regulation, and its association with AR by immunohistochemistry has been reported by several other investigators [42–48]. FOXA1 is a member of the AR cluster and was also directly identified by BCRI (see Additional File 1 -Figure S5). Three other genes identified directly by BCRI (i.e., SPDEF, MLPH, and SERHL) are also part of the AR cluster, which further emphasizes BCRI as a valid network inference strategy.
Associations between PIK3CA mutations and AR in triple-negative tumors have been reported recently . Strong associations between a PIK3CA expressing cluster and AR cluster were not identified. However, given that mutations in PIK3CA may not be picked up on standard gene expression platforms, this association may not be readily discovered from the data.
Clinical and Therapeutic Implications for Molecular Apocrine Breast Cancer
We propose that therapies targeting AR activity may present a rational strategy for managing these patients. The concept of introducing AR blockade as a therapeutic option for breast cancer has received more attention recently [6–9, 50–54]. Older trials of AR blockade did not select for patients with AR dependent signaling or AR expression and therefore may not have addressed the question with an optimal cohort . Based upon our interaction studies, we also recommend that any therapeutic strategies for the molecular apocrine subgroup consider combinatorial targeted therapy to include ErbB family targets, particularly EGFR targeted therapy for the entire molecular apocrine subtype and ErbB2 therapy for those tumors that overexpress ErbB2.
While there is evidence to support ER response genes in the molecular apocrine subset, anti-estrogen therapy using tamoxifen in ER- women in general has been shown to have too little benefit for clinical use. However, small benefits were reported that point to the need for more study . An important question arises - is the presence of ER signaling inferred because AR and ER share a common pathway, or is there cross-talk where AR activation stimulates the ER pathway? Our pathway analyses from BCRI that demonstrate AR and ER as related signals (Figure 6), and analysis of Cluster 16 (see Additional File 1- Figure S10), do not support a common pathway that is activated by AR and ER. While interesting, these results are not conclusive. We note that if cross-talk from activated AR signaling is the cause of the ER signal activation in ER- tumors, then AR inhibition therapy would be sufficient to interrupt this signal.
There is little known about the survival of molecular apocrine tumors as they have only been recently introduced as a subtype. Farmer et al.  describes poor survival in the cohort that they identified from the literature. Weigelt et al.  suggest that apocrine carinomas can expect a 10-year survival rate of 35-50%, and Teschendorff etl al.  suggest that it has the poorest outcome of all of the ER- tumor types. Other data suggests that AR+ tumors that are otherwise triple-negative as defined by immunohistochemistry may have a better prognosis than the basal subtype of tumors . In a recent study of AR protein expression in any type of breast cancer, an improved prognosis was associated with AR expression above a certain threshold in ER+ tumors . It may be that interactions with ErbB family members modify the survival characteristics of AR+ tumors. This deserves further study.
Learning the Systems Biology of Cancer Using Network Inference Methods to Analyze Gene Expression Data
Our results support the strength of using network inference to analyze gene expression array data for oncogenic pathways and their interactions. This study demonstrates that the discovery of oncogenic pathways and their interactions does not have to rely on comparison with signatures from cell lines, but can be discovered using network inference methods. Thus our results demonstrate the rich knowledge resource within gene expression data generated from human tissues.
The raw CEL files from Farmer et al.  are available for download at NCBI GEO Datasets under accession GSE1561. The raw CEL files from Doane et al.  are available for download at the National Cancer Institute caArray database. The raw CEL files from Ivshina et al.  and Sotiriou et al.  are available for download at NCBI GEO Datasets under accession GSE4922 and GSE2990, respectively. The raw CEL files from Rouzier et al.  are available for download at http://bioinformatics.mdanderson.org/pubdata.html.
Microarray Normalization: Removing Systematic and Institutional Bias
The Doane et al. and Farmer et al. cohorts were first quantile-normalized  together using the default settings in DNA-Chip Analyzer (dChip), a software package for probe-level analysis of gene expression microarrays . This process was repeated twice: the first time, the original Affymetrix-provided chip definition file (CDF) was used, and the second time, a transcript-consistent Affymetrix-formatted Chip Definition File (CDF) downloaded from AffyProbeMiner  was used. A recently published cross-study normalization scheme called XPN  was subsequently implemented to further combine the quantile-normalized datasets into a single, unified datasets with significantly reduced systematic bias; one dataset derives from normalization with Affymetrix' CDF and a second dataset (the primary dataset used for analysis in this study) derives from normalization with AffyProbeMiner's CDF. The details regarding the normalization scheme, referred to as XPN, have been previously described . In short, the XPN algorithm is based on linking gene/sample clusters amongst given datasets. Data is scaled and shifted according to the assumption that similar gene-sets cluster together across multiple platforms. XPN has been shown to successfully remove systematic bias, while avoiding the loss of useful biological information due to data over-correction .
The other cohorts were included to investigate the persistence of the molecular trends identified in the Doane et al. and Farmer et al. datasets. All five cohorts were quantile-normalized with dChip using a transcript-consistent Affymetrix-formatted CDF provided by AffyProbeMiner . Then, XPN was used in serial increments to bring the five cohorts into uniform agreement by removing persistent systematic bias between the datasets.
Significance Analysis of Microarrays: Modified T-Test
Significance Analysis of Microarrays (SAM) was performed on the normalized Doane et al.  and Farmer et al.  data individually to identify top 100 probesets that classify between the molecular apocrine samples and the remaining samples. SAM was also performed on the combined Doane et al. and Farmer et al. subset of the cross-study normalized, five-cohort data to identify a gene signature with 0% false discovery rate for classifying molecular apocrine samples from the remaining samples, and identifying similar molecular trends in the remaining data. SAM is based on a modified T-test; details regarding the algorithm have been previously described [28, 57].
Hierarchical Clustering and Principal Components Analysis
Hierarchical Clustering was performed using a Pairwise-Average Linking method and Euclidian Distance as the distance measure. Both Hierarchical Clustering and Principal Components Analysis were performed on the GenePattern software package provided by the Broad Institute [20, 26]. Visualizations of the Principal Components Analysis were performed with MATLAB (Mathworks, Natick, MA).
Two-Dimensional Kolmogorov-Smirnov Test
The Fasano & Franceschini statistical test , a two-dimensional adaptation of the Kolmogorov-Smirnov test , was performed on the coordinates derived from the first two principal components using an algorithm provided by Numerical Recipes in Fortran 90 .
Statistical Significance of Overlap Between Gene Signatures
The probability of finding a specified number of overlapping genes between two gene signatures was calculated using the exact hypergeometric probability formula using a web-based tool at http://elegans.uky.edu/MA/progs/overlap_stats.html based on algorithms provided by Numerical Recipes in C .
Backward Chaining Rule Induction
Backward Chaining Rule Induction (BCRI) is a supervised learning approach for identifying relationships amongst genes that can predict for the molecular apocrine phenotype. In order to initialize the BCRI strategy, we use a classifier method called See5 (Rulequest, St. Ives, Australia) to build a prediction model from the normalized gene expression data for classifying the molecular apocrine phenotype from the remaining samples in the index cohorts. Successive iterations of the BCRI strategy infert gene network relationships by predicting threshold expression of genes from other genes. Further details regarding the BCRI strategy have been previously described [11–13].
To identify clusters of highly correlated genes, we used unsupervised Gene Shaving . Specifically, we used a high-performance, parallel C implementation of the method that was developed from the GeneClust software package . Gene Shaving was used independently on both unweighted data and on 127 bootstrap resamples, extracting the first 150 gene clusters in each case. In both cases, the data was first ranked within each sample. To obtain the unweighted data, the ranked data was ranked again, this time across samples within each cohort. For the bootstrap resamples, each sample within a cohort was assigned a random weight chosen from the Bayesian bootstrap distribution  and weighted rankings across samples within each cohort were computed. In both cases, the rank of each sample was scaled by the number of samples in the cohort, so that for each cohort the data is in the range zero to one. Robust clusters were obtained from the combined outputs of the Gene Shave runs by selecting those genes that occur frequently together in the outputs of individual runs. We extracted the first 200 clusters with the largest number of co-clustering genes, weighted by the homogeneity of the clusters to which they belong.
Robust Bayesian Network Analysis
The 200 robust clusters obtained by Gene Shaving were ranked by their correlation with their molecular apocrine phenotype. A cluster meta-gene score was obtained for each sample by computing the signed average mean gene. (Unlike other gene clustering methods, Gene Shaving clusters may include both correlated and anti-correlated genes). The 26 clusters with the highest absolute Kendall Tau correlation between the cluster meta-gene scores and the molecular apocrine phenotype status were selected for network analysis.
The network analysis included nodes for the 26 gene clusters most highly correlated with molecular apocrine status and a node for molecular apocrine status. The cluster meta-gene scores were each discretized to three levels: the lowest, middle, and highest thirds of the expression range for each meta-gene. Forty thousand bootstrap resamples of the discretized weights were obtained by randomly weighting each sample according to the Bayesian bootstrap distribution , and a high-scoring network was found for each resample using greedy hill-climbing with random restarts and the sparse candidate algorithm . The scoring function used was DPSM with λ = 1 .
Edges that occurred frequently (in either direction) within the forty thousand best networks thus obtained were selected for the final network. Edges that occurred in at least 97.5% of the networks are drawn with a triple black line, those that occurred in at least 95% of the networks with a black line, and those that occurred in at least 85% of the networks with a dashed line. Gene clusters that are not connected by any path along such edges to the node for molecular apocrine status are not included.
Backward Chaining Rule Induction
Epidermal Growth Factor Receptor
Human Epidermal Growth Factor Receptor 2 commonly referred to as Her2neu
Principal Components Analysis
Robust Bayesian Network Analysis
Significance Analysis of Microarrays.
SS, MEE, BMB, and VC were supported by the National Library of Medicine (R01-LM008000) to MEE. We thank Andrey Shabalin for helpful discussion regarding cross-study microarray gene expression normalization with XPN. We thank Roland Bassett, Jr. for helpful statistics advice in determining significance of gene signature overlaps. We also thank Douglas H. Fisher, a co-developer of BCRI, for helpful comments. A portion of this material was presented as a platform at the 2009 Annual Meeting of the United States and Canadian Academy of Pathology in Boston, MA ("Does Androgen Receptor Have a Role in a Subgroup of ER Negative Breast Cancer?").
We would like to dedicate this work to the memory of Vivian Cooper, a close friend and colleague of Dr. Fisher, who died of breast cancer on April 10 of this year.
- Bild AH, Yao G, Chang JT, Wang Q, Potti A, Chasse D, Joshi MB, Harpole D, Lancaster JM, Berchuck A, et al: Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature. 2006, 439: 353-357. 10.1038/nature04296.View ArticlePubMedGoogle Scholar
- Heiser LM, Wang NJ, Talcott CL, Laderoute KR, Knapp M, Guan Y, Hu Z, Ziyad S, Weber BL, Laquerre S, et al: Integrated analysis of breast cancer cell lines reveals unique signaling pathways. Genome Biol. 2009, 10: R31-10.1186/gb-2009-10-3-r31.View ArticlePubMedPubMed CentralGoogle Scholar
- Desmedt C, Haibe-Kains B, Wirapati P, Buyse M, Larsimont D, Bontempi G, Delorenzi M, Piccart M, Sotiriou C: Biological processes associated with breast cancer clinical outcome depend on the molecular subtypes. Clin Cancer Res. 2008, 14: 5158-5165. 10.1158/1078-0432.CCR-07-4756.View ArticlePubMedGoogle Scholar
- Citri A, Yarden Y: EGF-ERBB signalling: towards the systems level. Nat Rev Mol Cell Biol. 2006, 7: 505-516. 10.1038/nrm1962.View ArticlePubMedGoogle Scholar
- Migliaccio A, Castoria G, Di Domenico M, Ciociola A, Lombardi M, De Falco A, Nanayakkara M, Bottero D, De Stasio R, Varricchio L, Auricchio F: Crosstalk between EGFR and extranuclear steroid receptors. Ann N Y Acad Sci. 2006, 1089: 194-200. 10.1196/annals.1386.006.View ArticlePubMedGoogle Scholar
- Naderi A, Hughes-Davies L: A functionally significant cross-talk between androgen receptor and ErbB2 pathways in estrogen receptor negative breast cancer. Neoplasia. 2008, 10: 542-548.View ArticlePubMedPubMed CentralGoogle Scholar
- Doane AS, Danso M, Lal P, Donaton M, Zhang L, Hudis C, Gerald WL: An estrogen receptor-negative breast cancer subset characterized by a hormonally regulated transcriptional program and response to androgen. Oncogene. 2006, 25: 3994-4008. 10.1038/sj.onc.1209415.View ArticlePubMedGoogle Scholar
- Farmer P, Bonnefoi H, Becette V, Tubiana-Hulin M, Fumoleau P, Larsimont D, Macgrogan G, Bergh J, Cameron D, Goldstein D, et al: Identification of molecular apocrine breast tumours by microarray analysis. Oncogene. 2005, 24: 4660-4671. 10.1038/sj.onc.1208561.View ArticlePubMedGoogle Scholar
- Weigelt B, Horlings HM, Kreike B, Hayes MM, Hauptmann M, Wessels LFA, de Jong D, Vijver Van de MJ, Van't Veer LJ, Peterse JL: Refinement of breast cancer classification by molecular characterization of histological special types. The Journal of Pathology. 2008, 216: 141-150. 10.1002/path.2407.View ArticlePubMedGoogle Scholar
- Eichler GS, Reimers M, Kane D, Weinstein JN: The LeFE algorithm: embracing the complexity of gene expression in the interpretation of microarray data. Genome Biol. 2007, 8: R187-10.1186/gb-2007-8-9-r187.View ArticlePubMedPubMed CentralGoogle Scholar
- Edgerton ME, Fisher DH, Tang L, Frey LJ, Chen Z: Data Mining for Gene Networks Relevant to Poor Prognosis in Lung Cancer Via Backward-Chaining Rule Induction. Cancer Inform. 2007, 2: 93114.Google Scholar
- Fisher D, Edgerton M, Tang L, Frey L, Chen Z: Searching for Meaningful Feature Interactions with Backward Chaining Rule Induction. Advances in Intelligent Data Analysis VI. Edited by: Famili AF, Kok JN, Pena JM, Siebes A, Feelders A. 2005, Berlin/Heidelberg: Springer, 3646: 86-96. [Kanade T, Kittler J, Kleinberg JM, Mattern F, Mitchell JC, Nierstrasz O, Noar M, Rangan CP, Steffen B, Sudan M, et al (Series Editor): Lecture Notes in Computer Science].View ArticleGoogle Scholar
- Fisher DH, Edgerton ME, Chen Z, Tang L, Frey LJ: Backward Chaining Rule Induction. Intelligent Data Analysis. 2006, 10: 397-417.Google Scholar
- Hastie T, Tibshirani R, Eisen MB, Alizadeh A, Levy R, Staudt L, Chan WC, Botstein D, Brown P: 'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns. Genome Biol. 2000, 1: RESEARCH0003-10.1186/gb-2000-1-2-research0003.View ArticlePubMedPubMed CentralGoogle Scholar
- Koch M, Broom BM, Subramanian D: Learning robust cell signalling models from high throughput proteomic data. Int J Bioinform Res Appl. 2009, 5: 241-253. 10.1504/IJBRA.2009.026417.View ArticlePubMedPubMed CentralGoogle Scholar
- Ivshina AV, George J, Senko O, Mow B, Putti TC, Smeds J, Lindahl T, Pawitan Y, Hall P, Nordgren H, et al: Genetic reclassification of histologic grade delineates new clinical subtypes of breast cancer. Cancer Res. 2006, 66: 10292-10301. 10.1158/0008-5472.CAN-05-4414.View ArticlePubMedGoogle Scholar
- Rouzier R, Perou CM, Symmans WF, Ibrahim N, Cristofanilli M, Anderson K, Hess KR, Stec J, Ayers M, Wagner P, et al: Breast cancer molecular subtypes respond differently to preoperative chemotherapy. Clin Cancer Res. 2005, 11: 5678-5685. 10.1158/1078-0432.CCR-04-2421.View ArticlePubMedGoogle Scholar
- Sotiriou C, Wirapati P, Loi S, Harris A, Fox S, Smeds J, Nordgren H, Farmer P, Praz V, Haibe-Kains B, et al: Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst. 2006, 98: 262-272.View ArticlePubMedGoogle Scholar
- Bolstad BM, Irizarry RA, Astrand M, Speed TP: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003, 19: 185-193. 10.1093/bioinformatics/19.2.185.View ArticlePubMedGoogle Scholar
- Kuehn H, Liberzon A, Reich M, Mesirov JP: Using GenePattern for gene expression analysis. Curr Protoc Bioinformatics. 2008, Chapter 7 (Unit 7): 12.PubMedGoogle Scholar
- Shabalin AA, Tjelmeland H, Fan C, Perou CM, Nobel AB: Merging two gene-expression studies via cross-platform normalization. Bioinformatics. 2008, 24: 1154-1160. 10.1093/bioinformatics/btn083.View ArticlePubMedGoogle Scholar
- Barnes M, Freudenberg J, Thompson S, Aronow B, Pavlidis P: Experimental comparison and cross-validation of the Affymetrix and Illumina gene expression analysis platforms. Nucleic Acids Res. 2005, 33: 5914-5923. 10.1093/nar/gki890.View ArticlePubMedPubMed CentralGoogle Scholar
- Carter SL, Eklund AC, Mecham BH, Kohane IS, Szallasi Z: Redefinition of Affymetrix probe sets by sequence overlap with cDNA microarray probes reduces cross-platform inconsistencies in cancer-associated gene expression measurements. BMC Bioinformatics. 2005, 6: 107-10.1186/1471-2105-6-107.View ArticlePubMedPubMed CentralGoogle Scholar
- Dai M, Wang P, Boyd AD, Kostov G, Athey B, Jones EG, Bunney WE, Myers RM, Speed TP, Akil H, et al: Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res. 2005, 33: e175-10.1093/nar/gni179.View ArticlePubMedPubMed CentralGoogle Scholar
- Liu H, Zeeberg BR, Qu G, Koru AG, Ferrucci A, Kahn A, Ryan MC, Nuhanovic A, Munson PJ, Reinhold WC, et al: AffyProbeMiner: a web resource for computing or retrieving accurately redefined Affymetrix probe sets. Bioinformatics. 2007, 23: 2385-2390. 10.1093/bioinformatics/btm360.View ArticlePubMedGoogle Scholar
- Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, Mesirov JP: GenePattern 2.0. Nat Genet. 2006, 38: 500-501. 10.1038/ng0506-500.View ArticlePubMedGoogle Scholar
- Fasano G, Franceschini A: A multidimensional version of the Kolmogorov-Smirnov test. Mon Not R Astron Soc. 1987, 225: 155-170.View ArticleGoogle Scholar
- Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA. 2001, 98: 5116-5121. 10.1073/pnas.091062498.View ArticlePubMedPubMed CentralGoogle Scholar
- Nelson PS, Clegg N, Arnold H, Ferguson C, Bonham M, White J, Hood L, Lin B: The program of androgen-responsive genes in neoplastic prostate epithelium. Proc Natl Acad Sci USA. 2002, 99: 11890-11895. 10.1073/pnas.182376299.View ArticlePubMedPubMed CentralGoogle Scholar
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA. 2005, 102: 15545-15550. 10.1073/pnas.0506580102.View ArticlePubMedPubMed CentralGoogle Scholar
- Teschendorff AE, Naderi A, Barbosa-Morais NL, Caldas C: PACK: Profile Analysis using Clustering and Kurtosis to find molecular classifiers in cancer. Bioinformatics. 2006, 22: 2269-2275. 10.1093/bioinformatics/btl174.View ArticlePubMedGoogle Scholar
- Ekins S, Nikolsky Y, Bugrim A, Kirillov E, Nikolskaya T: Pathway mapping tools for analysis of high content data. Methods Mol Biol. 2007, 356: 319-350.PubMedGoogle Scholar
- Soinov LA: Supervised classification for gene network reconstruction. Biochem Soc Trans. 2003, 31: 1497-1502. 10.1042/BST0311497.View ArticlePubMedGoogle Scholar
- Rebhan M, Chalifa-Caspi V, Prilusky J, Lancet D: GeneCards: integrating information about genes, proteins and diseases. Trends Genet. 1997, 13: 163-10.1016/S0168-9525(97)01103-7.View ArticlePubMedGoogle Scholar
- Stelzer G, Harel A, Dalah A, Rosen N, Shmoish M, Iny-Stein T, Sirota A, Madi A, Safran M, Lancet D: GeneCards: One stop site for human gene research. The 5th Congress of the Federation of the Israel Societies for Experimental Biology; Eliat, Israel. 2008Google Scholar
- Gusterson BA, Ross DT, Heath VJ, Stein T: Basal cytokeratins and their relationship to the cellular origin and functional classification of breast cancer. Breast Cancer Res. 2005, 7: 143-148. 10.1186/bcr1041.View ArticlePubMedPubMed CentralGoogle Scholar
- Livasy CA, Karaca G, Nanda R, Tretiakova MS, Olopade OI, Moore DT, Perou CM: Phenotypic evaluation of the basal-like subtype of invasive breast carcinoma. Mod Pathol. 2006, 19: 264-271. 10.1038/modpathol.3800528.View ArticlePubMedGoogle Scholar
- Teschendorff AE, Miremadi A, Pinder SE, Ellis IO, Caldas C: An immune response gene expression module identifies a good prognosis subtype in estrogen receptor negative breast cancer. Genome Biol. 2007, 8: R157-10.1186/gb-2007-8-8-r157.View ArticlePubMedPubMed CentralGoogle Scholar
- Migliaccio A, Castoria G, Di Domenico M, de Falco A, Bilancio A, Lombardi M, Barone MV, Ametrano D, Zannini MS, Abbondanza C, Auricchio F: Steroid-induced androgen receptor-oestradiol receptor beta-Src complex triggers prostate cancer cell proliferation. EMBO J. 2000, 19: 5406-5417. 10.1093/emboj/19.20.5406.View ArticlePubMedPubMed CentralGoogle Scholar
- Ignar-Trowbridge DM, Nelson KG, Bidwell MC, Curtis SW, Washburn TF, McLachlan JA, Korach KS: Coupling of dual signaling pathways: epidermal growth factor action involves the estrogen receptor. Proc Natl Acad Sci USA. 1992, 89: 4658-4662. 10.1073/pnas.89.10.4658.View ArticlePubMedPubMed CentralGoogle Scholar
- Stern H: EGFR Family Heterodimers in Cancer Pathogenesis and Treatment. EGFR Signaling Networks in Cancer Therapy. Edited by: Haley J, Gullick W. 2008, New York City: Humana Press, 15-30. FirstGoogle Scholar
- Badve S, Turbin D, Thorat MA, Morimiya A, Nielsen TO, Perou CM, Dunn S, Huntsman DG, Nakshatri H: FOXA1 expression in breast cancer--correlation with luminal subtype A and survival. Clin Cancer Res. 2007, 13: 4415-4421. 10.1158/1078-0432.CCR-07-0122.View ArticlePubMedGoogle Scholar
- Habashy HO, Powe DG, Rakha EA, Ball G, Paish C, Gee J, Nicholson RI, Ellis IO: Forkhead-box A1 (FOXA1) expression in breast cancer and its prognostic significance. Eur J Cancer. 2008, 44: 1541-1551. 10.1016/j.ejca.2008.04.020.View ArticlePubMedGoogle Scholar
- Lacroix M, Leclercq G: About GATA3, HNF3A, and XBP1, three genes co-expressed with the oestrogen receptor-alpha gene (ESR1) in breast cancer. Mol Cell Endocrinol. 2004, 219: 1-7. 10.1016/j.mce.2004.02.021.View ArticlePubMedGoogle Scholar
- Nakshatri H, Badve S: FOXA1 as a therapeutic target for breast cancer. Expert Opin Ther Targets. 2007, 11: 507-514. 10.1517/14728126.96.36.1997.View ArticlePubMedGoogle Scholar
- Thorat MA, Marchio C, Morimiya A, Savage K, Nakshatri H, Reis-Filho JS, Badve S: Forkhead box A1 expression in breast cancer is associated with luminal subtype and good prognosis. J Clin Pathol. 2008, 61: 327-332. 10.1136/jcp.2007.052431.View ArticlePubMedGoogle Scholar
- Tozlu S, Girault I, Vacher S, Vendrell J, Andrieu C, Spyratos F, Cohen P, Lidereau R, Bieche I: Identification of novel genes that co-cluster with estrogen receptor alpha in breast tumor biopsy specimens, using a large-scale real-time reverse transcription-PCR approach. Endocr Relat Cancer. 2006, 13: 1109-1120. 10.1677/erc.1.01120.View ArticlePubMedGoogle Scholar
- Wolf I, Bose S, Williamson EA, Miller CW, Karlan BY, Koeffler HP: FOXA 1: Growth inhibitor and a favorable prognostic factor in human breast cancer. Int J Cancer. 2007, 120: 1013-1022. 10.1002/ijc.22389.View ArticlePubMedGoogle Scholar
- Gonzalez-Angulo AM, Stemke-Hale K, Palla SL, Carey M, Agarwal R, Meric-Berstam F, Traina TA, Hudis C, Hortobagyi GN, Gerald WL, et al: Androgen receptor levels and association with PIK3CA mutations and prognosis in breast cancer. Clin Cancer Res. 2009, 15: 2472-2478. 10.1158/1078-0432.CCR-08-1763.View ArticlePubMedGoogle Scholar
- Moe RE, Anderson BO: Androgens and androgen receptors: a clinically neglected sector in breast cancer biology. J Surg Oncol. 2007, 95: 437-439. 10.1002/jso.20722.View ArticlePubMedGoogle Scholar
- Nahleh Z: Androgen receptor as a target for the treatment of hormone receptor-negative breast cancer: an unchartered territory. Future Oncol. 2008, 4: 15-21. 10.2217/147966188.8.131.52.View ArticlePubMedGoogle Scholar
- Ogawa Y, Hai E, Matsumoto K, Ikeda K, Tokunaga S, Nagahara H, Sakurai K, Inoue T, Nishiguchi Y: Androgen receptor expression in breast cancer: relationship with clinicopathological factors and biomarkers. Int J Clin Oncol. 2008, 13: 431-435. 10.1007/s10147-008-0770-6.View ArticlePubMedGoogle Scholar
- Swain SM: Tamoxifen for patients with estrogen receptor-negative breast cancer. J Clin Oncol. 2001, 19: 93S-97S.PubMedGoogle Scholar
- Tan AR, Swain SM: Therapeutic strategies for triple-negative breast cancer. Cancer J. 2008, 14: 343-351. 10.1097/PPO.0b013e31818d839b.View ArticlePubMedGoogle Scholar
- Perrault DJ, Logan DM, Stewart DJ, Bramwell VH, Paterson AH, Eisenhauer EA: Phase II study of flutamide in patients with metastatic breast cancer. A National Cancer Institute of Canada Clinical Trials Group study. Invest New Drugs. 1988, 6: 207-210. 10.1007/BF00175399.View ArticlePubMedGoogle Scholar
- Rakha EA, El-Sayed ME, Green AR, Lee AH, Robertson JF, Ellis IO: Prognostic markers in triple-negative breast cancer. Cancer. 2007, 109: 25-32. 10.1002/cncr.22381.View ArticlePubMedGoogle Scholar
- Cui X, Churchill GA: Statistical tests for differential expression in cDNA microarray experiments. Genome Biol. 2003, 4: 210-10.1186/gb-2003-4-4-210.View ArticlePubMedPubMed CentralGoogle Scholar
- Lopes RH, Hobson PR, Reid ID: Computationally efficient algorithms for the two-dimensional Kolmogorov-Smirnov test. International Conference on Computing in High Energy and Nuclear Physics. 2008, Victoria, British Columbia, Canada: IOP Publishing, 119: 1-9.Google Scholar
- Press WH, Teukolsky SA, Vetterling WT, Flannery BP: Numerical Recipes in Fortran 90: The Art of Scientific Computing. 1996, Cambridge University Press, 1269-1284. 2Google Scholar
- Press WH, Flannery BP, Teukolsky SA, Vetterling WT: Numerical Recipes in C: The Art of Scientific Computing. 1992, Cambridge University Press, 609-650.Google Scholar
- Do K-A, Broom BM, Wen S: GeneClust. The analysis of gene expression data: methods and software. Edited by: Parmigiani G, Garret ES, Irizarry RA, Zeger SL. 2003, New York: Springer, 342-361.View ArticleGoogle Scholar
- Rubin DB: The bayesian bootstrap. The Annals of Statistics. 1981, 9: 130-134. 10.1214/aos/1176345338.View ArticleGoogle Scholar
- Friedman N, Nachman I, Pe'er D: Learning bayesian network structure from massive datasets: the 'sparse candidate' algorithm. Proc 15th Conference on Uncertainty in Artifical Intelligence. Edited by: Dubois H, Laskey K, Kaufmann M. 1999, 206-215.Google Scholar
- Yang S, Chang K-C: Comparison of score metrics for Bayesian network learning. IEEE Transactions on Systems, Man, and Cybernetics: Part A: Systems and Humans. 2002, 32: 419-428. 10.1109/TSMCA.2002.803772.View ArticleGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1755-8794/2/59/prepub