Batch effect correction for genome-wide methylation data with Illumina Infinium platform

BMC Medical Genomics

Table 1 Statistical measures of batch effects and performance evaluation of normalization and batch correction

Dataset	Statistical measure	Raw β	QNβ	Lumi	ABnorm	QNβ+ EB	Lumi+ EB	ABnorm+ EB
	Number (%) of CpGs associated with batch at p < 0.01	17,458 (66)	6,466 (24.4)	8,478 (32)	6,926 (26)	12	25	23
2	PCs associated with batch(% variance explained)*	1 (51.6)	1 (17.9)	1 (22.1)	1 (18.9)	None	None	None
	Number (%) of differentially methylated CpGs between case and control at p < 0.01	345 (1.3)	759 (2.9)	714 (2.7)	763 (2.9)	1,155 (4.2)	1,146 (4.3)	1,229 (4.6)
	Number (%) of CpGs associated with batch at p < 0.01	13,881 (50.0)	10,300 (37.3)	12,668 (46)	9,694 (35.2)	2	6	8
3	PCs associated with batch (% variance explained)	1 (50.4)	1 (24.8)	1 (30.6)	1 (23.8)	None	None	None
	Number (%) of differentially methylated CpGs between cancer and normal at p < 0.01	794 (2.9)	1,877 (6.8)	1,131 (4.1)	1,635 (5.9)	2,799 (10.1)	2,400 (8.7)	2,289 (8.3)

Raw β: Raw average β without any correction; QNβ: quantile normalization at average β values; lumi: two step quantile normalization at probe signals implemented in R package "lumi"; ABnorm: quantile normalization for A and B signal separately; EB: Empirical Bayes batch correction. * The principal components (PC) significantly associated with batch effects at p value < 0.01 from the top 10 evaluated by Wilcoxon test and the percentage of variance the PC explains.

ISSN: 1755-8794