Skip to main content

Table 1 Breast cancer datasets. Transcriptomic data from tumor and normal mammary tissue were downloaded from GEO. All datasets were combined to compare between cancer and control

From: Considerations for feature selection using gene pairs and applications in large-scale dataset integration, novel oncogene discovery, and interpretable cancer screening

Platform

Data Type

Control

Tumor

ER+

ER-

HER2+

HER2-

GSE ID

Affymetrix

Array

804

4193

1391

732

449

797

GSE4611, GSE7904, GSE10780, GSE10797, GSE11121, GSE15852, GSE18728a, GSE18864a, GSE20711a, GSE21653a, GSE22093b, GSE23988b, GSE26639a, GSE42568b, GSE45827, GSE48091, GSE48390a, GSE53031a, GSE54002, GSE65095a, GSE78958, GSE93601b, GSE124646, GSE129551a, GSE131027

Agilent

Array

618

1295

701

259

178

743

GSE21974a, GSE22820, GSE35186, GSE40206a, GSE43973, GSE49175, GSE49481a, GSE50939, GSE52604, GSE70905a, GSE70947a, GSE75678a, GSE80999a, GSE111601

Illumina

Array

0

1083

588

171

104

481

GSE20462, GSE36693, GSE37181a, GSE45725a, GSE46563a, GSE60785a, GSE103744b, GSE111563

Illumina

RNA-seq

68

3779

3186

337

524

3074

GSE47462, GSE81538a, GSE96058a, GSE99680a, GSE129508a

 

Total:

1490

10,350

5866

1499

1255

5095

52 datasets

  1. aDataset used for classification based on ER and HER2 status. b Dataset used for classification based on ER status