The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets – improving meta-analysis and prediction of prognosis

Table 3 Effect of dataset composition on differential gene expression.

	SAM Common, top 1000
Uneven comparisons	Between datasets				Across datasets
	UC	MC	wMC	DWD	UC	MC	wMC	DWD
Unamplified MCF7 (3) v MCF10A (3) Amplified MCF7 (3) v MCF10A (3)	522 (0.031) (0.032)	522 (0.031) (0.032)	-	427 (0.029) (0.028)	251 (0.023) (0.037)	594 (0.025) (0.035)	-	447 (0.023) (0.032)
Unamplified MCF7 (3) v MCF10A (3) Amplified MCF7 (3) v MCF10A (2)	495 (0.031) (0.036)	495 (0.031) (0.036)	495 (0.031) (0.036)	469 (0.03) (0.031)	232 (0.026) (0.035)	600 (0.024) (0.037)	597 (0.026) (0.040)	550 (0.028) (0.0)
Richardson et al. Non-basal (12) v basal (12) Farmer et al. Luminal A (12) v basal (12)	394 (0.003) (0.019)	394 (0.003) (0.019)	-	389 (0.003) (0.019)	368 (0.001) (0.019)	708 (0.047) (0.02)	-	695 (0.046) (0.014)
Richardson et al. Non-basal (7) v basal (19) Farmer et al. Luminal A (15) v basal (14)	380 (0.019) (0.001)	380 (0.019) (0.001)	380 (0.019) (0.001)	373 (0.001) (0.017)	346 (0) (0)	725 (0.003) (0.078)	608 (0.002) (0.038)	658 (0.005) (0.021)
Richardson et al. Non-basal (3) v basal (19) Farmer et al. Luminal A (15) v basal (3)	283 (0.1) (0.194)	283 (0.1) (0.194)	283 (0.1) (0.194)	258 (0.195) (0.099)	290 (0) (0.027)	480 (0.093) (0.9)	684 (0.001) (0.789)	506 (0.112) (0.9)

Sets of differentially expressed probesets comparing MCF7 and MCF10A replicates or basal/basal-like and luminal/nonbasal-like tumours were identified for each experiment, before and after mean batch-centering, comparisons both between and across datasets were performed. SAM Common: for each column two different pairwise comparisons using SAM were performed, and the top 1000 probesets identified for each comparison. The number reported is the intersection between the two sets. Before: comparison was performed prior to mean batch-centering. After: comparison was performed following mean batch-centering. Values in brackets are the FDR for each top 1000 probesets. Weighted mean-centering for datasets with even numbers of samples are not shown as the values are identical to mean-centering. UC = uncorrected, MC batch mean-centered, wMC = weighted mean-centered, DWD = distance-weighted discrimination.

ISSN: 1755-8794