Gene expression profiling has created new possibilities for the molecular characterization of cancer. The resulting gene expression signatures have the potential to explain the genetic heterogeneity of breast cancer and allow treatment strategies to be planned in accordance with their probability of success in individual patients . Molecular classification is changing the design of clinical trials. For example, the TAILORx http://www.cancer.gov/clinicaltrials/digestpage/TAILORx and MINDACT http://www.eortc.be/services/unit/mindact/MINDACT_websiteii.asp are two adjuvant breast cancer treatment trials in which patients are stratified according to select gene signatures present in their excised breast tumor. The molecular differences that underlie the phenotypes of breast cancer could reveal new therapeutic targets and influence clinical care .
To optimize the full capability of gene expression profiling using microarray-based assays, technologies are being optimized to reliably perform gene expression profiling on FFPE specimens, currently the most common type of clinical specimen available, particularly for phase III adjuvant treatment trials. FFPE is an extremely valuable resource of tissue for discovery and validation studies. While the combination of the Affymetrix GeneChip® Human X3P Array (Santa Clara, Ca) and Arcturus Paradise™ system (Mountain View, CA) has been optimized for FFPE tissue, it has been the experience of other investigators  and ourselves (unpublished observations) that call rates are unacceptably low, typically less than 30% . Whereas, we observed high call rates (percent of detectable genes at the p = 0.01 level), which are sample dependent, on average of > 87% and > 75% from the 1.5K and 24K panels, respectively (data not shown). Almac (Belfast, Ireland) has developed a promising technology that utilizes Affymetrix-based methodology and disease-specific arrays (DSAs) or transcriptome panels that have ~ 50,000 transcripts that can be utilized for FFPE tissue [32, 33].
The present report describes gene expression analyses of FFPE using the DASL Assay from Illumina, designed specifically to profile degraded RNAs derived from FFPE tumor samples. The DASL Assay has a dynamic range of 2.5 to 3 logs and limit of detection of 1 × 104 molecules, parameters comparable to those determined using standard microarray molecular profiling . Custom and commercially available gene panels have been successfully used on the DASL platform and resulting gene signatures have proven to have diagnostic value. A custom 512-gene panel was used to identify gene signatures that correlated with Gleason score and relapse of prostate cancer . The 502 Cancer Panelv1 and a 526 custom gene panel were used to identify gene expression patterns that were significantly associated with systemic progression after prostate specific antigen recurrence in men with prostate cancer . The whole genome 24K gene panel for use with the DASL platform recently became commercially available .
Our objective was to compare the performance of the 1.5K panel to the more recent 24K panel using the DASL platform to determine whether genes behave similarly between gene panels with different densities. The high correlations (0.815-0.997) observed between technical and extract replicates for both gene panels demonstrate that the reproducibility of results from both the 1.5K and 24K gene panels was excellent. The 24K panel revealed less variation between both technical and extract replicates compared to the 1.5K panel. Although it may be expected that the variability of hybridization signal intensities would be less for the 1.5K panel due to the higher probe density per gene for the 1.5K compared to the 24K panel, the 24K panel has a more stringent array hybridization condition compared to the 1.5K panel (i.e., the length of the probes is 50 nucleotides for the 24K BeadArray compared to ~22 nucleotides for the 1.5K panel). In addition, most of the genes on the 1.5K array are cancer-related and thus, in our study were expressed at higher levels compared to the genes on the 24K array. Furthermore, the intensity for the 1.5K array is the sum of a dual color assay (cy3+cy5 channels), whereas the 24K assay is a single-color assay (cy3), the hybridization conditions and washes are different, and the readouts are different (Universal Array Matrix versus whole genome BeadChip) and therefore, the scan settings are different. Lastly, it should be recognized that these technologies measure relative expression within the context of each platform.
As only 17 probes are identical of the 498 common genes, the two platforms have mostly non-overlapping nucleotide sequences for the same transcript target. The targeted regions in the 24K assay were designed to correspond to the largely 3' biased 50 nucleotide probe sequence content of the HumanRef-8 v3 BeadChip  and the targeted regions of the 1.5K assay were not restricted to the 3' end of transcripts . Specific probe information can be found online at http://www.switchtoi.com/annotationfiles.ilmn. For genes with poor fold-change correlations, it is also conceivable that the probes may be identifying splice variants of the same gene, and thereby targeting different mRNA isoforms due to variations in probe position on the panels.
At the gene level, we observed larger median correlations between the 1.5K and 24K panels for genes that were represented by more probes. In addition, within-platform data for the 1.5K assay, the expression profiles generated with three probes/transcript correlated well (R2~0.99) with those profiles generated with four or more (up to ten) probes/transcript .
The inter-panel agreement was good for probes with sequences that matched across the 1.5K and 24K panels; correlations ranged from 0.652 to 0.899. However, the agreement for probes that had different sequences that mapped to the same gene had fair correlation across the two panels; correlations ranged from 0.485 to 0.573. This is not unexpected as the expression level appears to be a function of the probe sequence location within the gene such that different probe sequences may correspond to different cDNA synthesis efficiencies and different oligo hybridization efficiencies . This was particularly evident for the ERBB2 gene expression obtained from the 24 K panel (Figure 3). It has been suggested that the differences in expression values between the two panels could result from non-specific hybridization in the 1.5K array (since increase in stringency in the hybridization affects the intensity of expression values) or from the increased complexity of the labeling step in the 24 K array that may lead to "less" labeling and reduced hybridization. However, hybridization conditions for both platforms have been optimized for the different length of probe (~22 vs. 50 nucleotides) minimizing non-specific/cross-hybridization. Also, the short address codes for the 1.5K array were carefully selected to have a similar overall length, GC-content, and melting temperature (Tm), whereas for the 24K array the targeted regions were somewhat restrained having been pre-determined by the 50 nucleotide probe sequences on the whole-genome gene expression BeadChip (HumanRef-8 v3). Despite the differences in absolute intensity, the relative differences between the HER2+ and HER2- groups is conserved across both platforms and all six probes.
It is also important to note that because of differences between the two platforms [e.g., non-overlapping nucleotide sequences for the same transcript targets as well as different hybridization conditions for the 1.5K and 24K assays (as described above)], direct comparisons of the raw intensities will yield seemingly poor cross-platform correlations. However, fold-change correlations of the gene intensities between the two platforms provide a common metric for comparisons.
Both panels detected significant differential ERBB2 gene expression between HER2+ and HER2- breast tumors, and the HER2 gene was the most differentially expressed gene for both panels. These results indicate that both panels correctly classified the HER2 status of the tumors when comparing gene expression to protein expression determined by IHC (gold standard) and when considering IHC score of 0-1+ as HER2- and IHC scores of 2-3+ as HER2+. The two tumors that had an IHC score of 2+ as defined by the 2007 ASCO/CAP guidelines  were initially considered 3+ when using the FDA-approved guidelines . In addition, there were eight concordant genes across the panels that had a log2-fold change > |0.5| and p-value < 0.05 to differentiate between HER2+ and HER2- tumors. Two of these 8 genes, ERBB2 and GRB7, are in the 10-gene HER2 cluster observed by Perou and Sorlie [4, 5]. We selected tumors to closely match on hormone receptor (majority are positive) and nodal status (all node positive) to maximize the difference in gene signatures largely resulting from the HER2 phenotype. We also wanted to minimize the molecular heterogeneity that can be found in HER2+ tumors, influenced by the hormone receptor status and basal-type signatures [36, 37]. Several well-known gene signatures identifying the same population of patients have very few genes in common, a feature of complex gene-expression data that contain large numbers of highly correlated variables (i.e., gene-expression measurements) . Several different combinations of the correlated variables can be selected to build similarly accurate prediction models. Thus, different differential gene lists from various platforms can be considered comparable when they reveal similar biological functions .
As the main purpose of gene expression studies using microarrays is to reveal the underlying biological differences between groups, functional networks were generated using MetaCore. We observed that the top 52 discriminating probes from the 24K panels are enriched with genes functionally linked to MYC and TP53/ESR1 networks. Nine of the 10 genes in the HER2 gene cluster from the Perou/Sorlie dataset [4, 5] form a regulatory network also centered around TP53 and ESR1. In addition, four (ERBB2, GRB7, PERLD1, and C17ORF37) of the top five HER2 discriminating genes from the 24K panel are genes commonly amplified in the HER2 amplicon (17q12-q21) and were overexpressed in HER2+ tumors. Their gene expressions were also highly correlated (r2 = 0.806-0.912, p < 0.005). Lastly, network analyses showed that the top eight discriminating genes common to both panels are connected by the shortest path network analysis with a two-step extension. Interconnecting genes include c-Myc (MYC), TP53, and ESR1.
Thus, it appears that genes in the MYC, TP53, and ESR1 regulatory networks are important in differentiating between HER2-positive and -negative tumors. HER2 expression has been shown to be influenced by the presence of ESR1 [36, 37, 39–42]. Although we selected tumors positive for the estrogen receptor protein (ER+) by immunohistochemistry, 11 of 13 HER2 0-2+ tumors had high ESR1 expression (≥ 12), whereas only two of the seven HER2 3+ tumors had high ESR1 expression (Fisher's Exact p = 0.022). In addition, significant correlations between ESR1 gene expression and ER protein expression levels were observed for the 1.5K (r2 = 0.71; p = 0.002) and 24K (r2 = 0.65; p = 0.006) panels (Additional File 1, Figure S4). Overall, the network analysis demonstrated biological consistency between the gene panels. Our data are consistent with recent findings that demonstrated that highly consistent biological information can be generated from different microarray platforms . As this study was designed primarily to evaluate and compare the technical performances of the two platforms with pre-defined tumor selection (e.g., all ER+ and node-positive tumors), conclusions regarding clinically relevant information of HER2+/HER2- biology need to be further validated.