Prediction database creation based on the Comparative Toxicogenomics Database (CTD). A.) The CTD contained 85,937 total unique chemical-gene relations over 4,078 chemicals and 15,461 genes. Each relation had one or more citations of support. An example hypothetical relation, "TCDD lead to higher expression of CYP1A1 mRNA in H. sapiens as shown in Anwar-Mohamed et al" is seen on the right panel. B.) Creation of chemical-gene set relations. Each chemical-gene relation had a number of citations of support, xi. For each chemical, we constructed a gene set, or "signature" from the individual chemical-gene relations. We filtered out signatures that had at least 5 genes in the set, leaving a total of 1,338 chemical-gene sets. An example of one chemical-gene set is seen on the right panel of B: the genes CYP1A1, AHR, AHR2 are shown to have multiple citations for the relation, 60, 40, and 9 respectively.