Table 1 Breast cancer datasets. Transcriptomic data from tumor and normal mammary tissue were downloaded from GEO. All datasets were combined to compare between cancer and control

From: Considerations for feature selection using gene pairs and applications in large-scale dataset integration, novel oncogene discovery, and interpretable cancer screening

Platform Data Type Control Tumor ER+ ER- HER2+ HER2- GSE ID
Affymetrix Array 804 4193 1391 732 449 797 GSE4611, GSE7904, GSE10780, GSE10797, GSE11121, GSE15852, GSE18728a, GSE18864a, GSE20711a, GSE21653a, GSE22093b, GSE23988b, GSE26639a, GSE42568b, GSE45827, GSE48091, GSE48390a, GSE53031a, GSE54002, GSE65095a, GSE78958, GSE93601b, GSE124646, GSE129551a, GSE131027
Agilent Array 618 1295 701 259 178 743 GSE21974a, GSE22820, GSE35186, GSE40206a, GSE43973, GSE49175, GSE49481a, GSE50939, GSE52604, GSE70905a, GSE70947a, GSE75678a, GSE80999a, GSE111601
Illumina Array 0 1083 588 171 104 481 GSE20462, GSE36693, GSE37181a, GSE45725a, GSE46563a, GSE60785a, GSE103744b, GSE111563
Illumina RNA-seq 68 3779 3186 337 524 3074 GSE47462, GSE81538a, GSE96058a, GSE99680a, GSE129508a
  Total: 1490 10,350 5866 1499 1255 5095 52 datasets
  1. aDataset used for classification based on ER and HER2 status. b Dataset used for classification based on ER status