In the clinical sciences, systematic review is a valuable tool to synthesise high-quality empirical evidence from independent investigations in order to determine a consensus view. Such reviews, or meta-analyses have greater statistical power to identify true effects from study-specific artefacts and, as such, are capable of identifying subtle effects that might be missed or deemed insignificant in smaller datasets. In the context of gene-expression analyses, meta-analysis of results from microarray studies has great potential, but also presents significant challenges due to differences between the platforms and analysis approaches employed in each study [1–5]. Direct integration of probe-level expression data from multiple studies is potentially even more powerful, but is further complicated due to differences in the conditions under which each dataset was generated, such as the amplification or labelling method, the scanner used or even just the date on which the samples were processed. A recent comprehensive review found that the aims of different microarray meta-analysis studies were quite distinct, with the majority combining p-values, effect size or ranked analysis, with only 27% (51 studies) seeking to directly merge the data and most of these were studies used the same platform . We and others have previously demonstrated that non-trivial systematic bias or ‘batch effects’ can occur within both Affymetrix GeneChips and Illumina Beadarrays [3, 4, 6, 7], but that they can largely be removed from each with appropriate correction methods.
Gene expression profiling has been applied to many areas of translational cancer research, including identification of new drug-targets, monitoring response to treatment, revealing mechanisms of resistance, and predicting prognosis . Although the majority of datasets are now made publicly available, many studies are limited in size and therefore cannot accurately reflect the general population, as they lack statistical power [9, 10]. A consequence of this is that gene signatures generated from a small cohort of patients (the ‘training set’), will never perform as well in subsequent cohorts (‘test sets’) which inevitably have subtle differences in composition of patient or tumour variables. We previously showed that combining several similar Affymetrix datasets leads to a greater overlap in differentially expressed genes and more accurate prognostic predictions . Collection of clinical material often remains the rate-limiting step, particularly with valuable ‘window-of-opportunity’ studies that utilise matched before- and after-intervention samples from the same patient [6, 11–14]. Due to the reduced patient-patient variation, these studies can be highly effective for identifying consistent gene-expression changes, such as the effects of (neoadjuvant) cancer treatment.
The extensive patient- and tissue-diversity inherent in molecular studies of cancer, which often contribute to underpowered studies  and confounding , mean that it is currently not necessarily critical (or appropriate) to measure gene-expression at the greatest resolution or specificity now offered by exon-arrays and RNA-sequencing. Rather, it may be of greater utility to maximise the number of existing biologically independent observations by combining the growing numbers of datasets in the public repositories, instead of simply generating another small independent dataset with limited statistical power .
Previous comparisons of expression measurements derived from Affymetrix and Illumina platforms have reported, ‘generally consistent’ , ‘very high agreement’  or ‘correspondence across platforms was high’ . However these studies are often based on titrated or technical replicates rather than clinical samples and have not sought to integrate the intensity-level data directly. Cross-platform analysis of microarray data has previously been shown to be possible and worthwhile, although this has normally been performed using transformed relative values , analogous to those from two-colour microarrays and have been shown to result in fold change compression .
Considering the fundamental differences in the design of the two platforms, it is not clear whether data derived from Affymetrix and Illumina microarrays can be reliably compared directly. In this study we demonstrate that it is possible to directly combine appropriate datasets at the intensity level to improve statistical power. We show that the inter-platform bias can be sufficiently reduced to expose previously obscured biological variation and that such data correction does not amplify meaningless noise in the results. Despite intrinsic differences between these technologies, suitably similar studies can be directly integrated for robust and powerful meta-analysis.