- Research article
- Open Access
Identification association of drug-disease by using functional gene module for breast cancer
BMC Medical Genomicsvolume 8, Article number: S3 (2015)
In oncology drug development, it is important to develop low risk drugs efficiently. Meanwhile, computational methods have been paid more and more attention in drug discovery. However, few studies attempt to discover the mutual gene modules shared by the drug and disease association. Here we introduce a novel method to identify repositioned drug for breast cancer by integrating the breast cancer survival data with the drug sensitivity information. Among the 140 drug candidates, we are able to filter 4 FDA approved drugs and identify 2 breast cancer drugs among 4 known breast cancer therapeutic drug in total.
The goal of drug repositioning is to discover new association between indications of diseases and known drugs based on known associations . Drug repositioning provides a possible way to speed up the drug development and avoids the development cost and time consuming in the drug discovery. For example, the Pfizer's Viagra and Celgene's thalidomide are both successful examples that are used to treat new application rather than the designed indication during the development stage [2–4]. However, the mechanism of how drugs and diseases are associated in the gene regulatory network is still mysterious. Therefore, the need of developing a systematic approach to discover disease-drug association from the complicate heterogeneity data is urgent.
Roughly speaking, both human diseases and drug mode of action have a module basis on gene expression level. The rapidly developed microarray technique generates a huge amount of large-scale gene expression profiles. The gene expression data are both applied in cancer diagnosis and therapy. To identify the phenotype relevant gene modules could help us understand the mechanism of diseases and provides opportunities to develop new therapies. It is important to investigate the drug induced gene modules, and the differential expressed functional gene modules that influenced by diseases, especially the shared associated gene modules of drugs and disease phenotypes. Few analyses addressed the above questions in a systematic view. Gottlieb et al. presented a prediction method that based on the observation that similar drugs are indicated for similar diseases, and then discovers new associations by imposing the known "drug-disease" connections. Cheng et al. proposed a network-based inference method to predict new target for known drugs by using drug target bipartite network topology similarity [5, 6]. With the emerging development of the microarray technology, some researches focused on using the gene expression data as the connection between drug and disease for drug repositioning. For example, "the Connectivity Map" (CMAP) project developed a reference collection of gene expression profiles by treating cultured human cells with small molecules, and searched for the compounds that have the most negative correlation with the disease's gene expression profiles as the potential candidate [7, 8]. However, due to the limitation of the level of gene expression profiles among the above works, the molecular mechanism of diseases still remains unknown. Recently, some studies suppose that cellular components belonging to the same topological, functional or disease module have a high likelihood of being involved in similar diseases [9, 10]. Thus one can start with identifying the gene modules that associate with the topological, functional and disease. The gene modules here is defined by the group of genes that annotated by biological process or cellular component or molecular function terms in Gene Ontology [11, 12]. There are several studies proposed to characterize different disease modules. The investigation on drug action of the gene modules had shown effective results. Li et al., by using Gene Ontology, defined the Gene Ontology Module (GOM) and investigated the disease's biological mechanism, and then successfully identified several repositioned drugs by using the similarity of GOM . Xia et al. used Gene Ontology Term to define the miRNA target gene's function in various disease, and used it to predict several potential treatments . Since the prognosis studies were also sampled by high throughput gene expression profiles, in this work we present how to make use of this kind of clinical prognosis data and initiate the mutual gene modules to investigate drug repositioning for the treatment of breast cancer. The definition of the mutual gene module of disease is based on the Gene Ontology Cellular Component Terms and Gene Ontology Biological Process Term. We selected the gene modules that have significant association with tumor progression by applying t-test in multiple prognostic clinical trials. In the meanwhile, we define a sensitivity module of drugs to interrogate the mechanism of drug actions. The sensitivity module for each anti-cancer drug (in total 140) is generated by integrating the sensitivity data (e.g. GI50) from the Developmental Therapeutic Program (DTP http://dtp.nci.nih.gov/) database. Combining with the prognosis modules and drug sensitivity modules, a repositioning framework for breast cancer is built. As a result, we have identified 12 compounds related to breast cancer. Interestingly, Not only could they help in treatment, but some of them could also have caused the disease.
Material and Method
Here we proposed a systematic framework to identify the potential candidate drug for breast cancer and the associated mechanism module. The framework consists of three parts: (1) Identifying the prognosis module. The prognosis module is selected based on the statistical differences between the good outcome and bad outcome of the patients in three breast cancer clinical trials. (2) Generation of the sensitivity module network of drugs. The sensitivity module network of drugs is used to interrogate the mechanism of drug action. (3) Integration of the two results to search for the potential candidate drug. In the last step, we integrated the disease gene modules with the anti-cancer drug sensitivity module to filter the most significant potential anticancer treatment for breast cancer as shown in Figure 1.
The tumor progression associated information is collected from three breast cancer datasets (GSE2034 , GSE7390  and GSE11121 ). These data are used to select the differential expressed gene modules associated to the prognosis outcome among all the patient samples.
The drug information was downloaded from DTP database, which contains 172 anti-cancer compounds in total [18, 19]. The format of the analyzed data is shown in Table 1. After matched with the GI50 value in 60 NCI cell lines , the sensitivity geneset was generated for the remaining140 drugs.
Gene module enrichment analysis
A gene module was defined by a group of genes that sharing a similar function or regulation mechanism. Here we used the Gene Ontology as an example. The Gene Ontology was annotated into three categories: Biological Process (GOBP), Cellular Component (GOCC) and Molecular Function (GOMF). The annotated Gene Ontology Term was considered as a gene module.
To determine the universal prognosis gene modules that represent breast cancer pathologic information, gene set enrichment analysis is performed on three different breast cancer survival datasets, with the purpose of selecting the gene sets that express significantly enriched in the survived patients. First, the gene sets were adopted from GOCC and GOBP. For each breast cancer expression dataset, we have n samples. Then the Kolmogorov-Smirnov (KS) statistic is employed to test whether the expression of the gene set is enriched among the whole gene expression (N) in each sample. The KS is calculated as following formulas: 
where t is the number of the genes in the gene set, j is the jth gene according to the ranked expression in the gene set, N is the total number (13698) in the gene expression profile, V(j) is the ranked position of the jth gene, and KS score represents the enrichment score of each gene set in each sample. The result of this analysis is a matrix generated by the enrichment score of each gene module in each sample.
Identification of prognosis gene modules
For each disease dataset, we used the t-test (as shown in formula (4)) to find whether the gene set has significant differential enrichment scores between the good outcome and bad outcome. And then we selected those gene sets with significant P-value (<0.05) among three clinical trials.
where x stands for the average enrichment score of the survived patient, y stands that of the patient with bad outcome, n is the number of the survived patients, and m is the number of the rest of the patients.
Identification of drug sensitivity genes
Here we measure the drug sensitivity signature based on National Cancer Institute (NCI60) in vitro drug screen project . The subset of compounds in this study was defined in the "Standard Agent Database" from DTP website [18, 19]. The drug sensitivity genes were identified by the close correlation of the baseline expression of gene among 60 cell lines with the chemical activity of a treated drug on the same 60 cell lines. On one hand, since the NCI60 cell lines screen panel has proved to be an effective way to identify drug sensitivity specific biomarkers as the panel has already been comprehensively characterized via gene expression profiling. Therefore, the gene expression profile of NCI60 is used to measure the baseline of gene expression and represented by a matrix of gene expression profile in 60 cell lines. On the other hand, the biological response pattern is measured by GI50 value, which is the compound's concentration that causes 50% cell growth inhibition and was obtained from DTP. Since the GI50 value is represented the drug's chemo-activity, the gene-drug correlation is established based on the correlation of gene expression and drug activity across 60 cell lines. The sensitivity module is consisted of only the significant gene that has P-value<0.05 as the compound sensitivity associated gene.
Generalization of the drug-disease connection
The drug-disease link is established based on the P-value of the hyper-geometric test (5) to determine whether the gene module (from the selected prognosis gene modules) has a significant overlap with the sensitivity module of the drug.
where M is the total number of genes in the whole profiles, N is the number of the prognosis gene modules, m is the number of the sensitivity module of the drug, and n is the number of the overlap between the prognosis module and the sensitivity module.
The performance analysis of the identification of drugs
As shown in the final step of Figure 1, a ranked list of the modules and the selected prognosis modules was generated with the most significant connection to the prognosis.
The performance of the drug identification is assessed by the Effect Score which is the number of prognosis gene modules with significance divided by the half number of the identified prognosis gene modules. According to the ES, we estimated our performance by counting on the number of positive drug in the ranked drug list.
Characterization of drug sensitivity genes
To assess the potential application of this approach for identifying compounds, we generated the sensitivity module for each anti-cancer drug by applying Pearson correlation coefficient on the GI50 value of each drug with the NCI60 profiling. After this process, we gathered 140 compounds sensitivity gene sets. The genes which have significance connection between the biological response pattern (GI50) and the baseline gene expression, though may not be directly linked to the primary drug targets, they should be close to the pathway where the mechanism of action taking place. Thus these genes can indicate the key biological processes of drug efficacy and represent the sensitivity genes of a compound which can serve as an effective tool for probing the compounds mechanism of action.
Identification of prognosis gene modules of breast cancer
In order to identify the tumor progression associated gene modules, we collected three datasets of breast cancer, GSE2034 , GSE7390  and GSE11121 . Based on the enrichment score of the gene modules in each dataset, we used t-test to select the gene modules that has significant statistical different enrichment score across the patient samples of outcome. Then we selected the overlap gene modules that shown significantly associated to prognosis in all the data. The 30 gene modules were identified and listed in Table 2. As shown in Table 2, the prognosis modules are most likely related to the biologic process of cell cycle, and microtubule related biologic processes and cellular components, which was generally reported that has close association with cancer progression .
Investigation on Disease-Drug connection
In order to investigate whether a drug has strong effect on the gene function module, we used hyper geometric test on drug sensitivity module with the identified breast cancer prognostic associated gene module. The corresponding P-value of the hyper geometric test of each gene module was cut off by 0.05 as significance threshold. By ranking the number of the associated gene module, the FDA approved drugs such as Paclitaxel and Etoposide, Haloperidol, Mitoxantrone shows connections with more than half of the gene modules. Paclitaxel is a mitotic inhibitor used in cancer chemotherapy . And it is a FDA approved drug that used to treat patients with breast and several other kinds of cancers. The Paclitaxel associated gene modules also shows related significantly with mitotic cell cycle in the biological process . As shown in Figure 2 the result also shows that the microtubule of Cellular Component is sensitive to the drug action of Paclitaxel which has been validated from literature . Mitoxantrone is also a FDA approved breast cancer treatment. It is a type II topoisomerase inhibitor which is an important protein for cell mitosis . As listed in Table 3, it has 15 associated gene modules. Furthermore the cell mitosis related biological processes show significant sensitivity to Mitoxantrone as shown in Figure 3.
Etoposide is also a topoisomerase inhibitor but haven't been reported to treat breast cancer . Our result suggests that etoposide may have potential to experiment more on breast cancer treatment. Haloperidol, on the other hand, is not reported as a breast cancer treatment drug, but has pharmacological effect on increase patients' risk of breast cancer among women. As a anti-psychotics drug, it has been reported that generally links to especially women breast tumor growth . This result shows that our method could recognize the compound's hidden side effect that could not be observed during the development stage.
Several not FDA approved drugs was also found has strong association with breast cancer. For example, Ellipticine was ranked also has 18 modules, and it was also reported that can act as an inhibitor to breast cancer. However, it fails for its side effect during the development . Also, camptothecin, genistein also found to have strong pharmacological efficiency on breast cancer but failed by its side effect.
Comparison of gene signatures performance
For comparison, we used gene signature instead of gene module in our method. The prognostic gene signature was extracted from GSE2034. We chose 76 genes that are reported in Wang's work  as the most significantly related to the prognosis risk. The gene signature was identified by the same procedure as gene module identification. Then a ranked list of drugs was produced by the number of genes overlapped with drug sensitivity modules (Table 4). Since the corresponding P values are above the threshold of 0.05. The result indicates that the gene signature has the ability to predict prognosis, however it is not very stable in the identification of useful drugs for breast cancer.
Discussion and conclusion
In this study, we introduced a new pipeline to interrogate the transient cellular state under drug conditions by prognosis gene modules. Different from other methods which directly map the expression profiles of drug treatment to disease, our method was used to identify the cellular responses of drugs and gene modules that correlated with tumor progression through patients' survival information. We have identified 30 significant biological processes related to breast cancer prognosis gene modules. By integrating two results of gene modules, we provided valuable prediction information of breast cancer potential treatment. Among the drug library, we identified 12 drugs for breast cancer treatment. 4 of them are FDA approved treatment drugs, which shows the efficient of our result.
We provide a novel work to investigate on the drug sensitivity module by using the prognostic related gene module annotation and use the connection to discover new association between the drug and disease. Our work proved that the sensitivity module of drugs could be used to not only identify the drug to treat the disease, but also reveal the drug's potential side effect.
This method provides a new pipeline to identify the repositioned compound for diseases. However, there are several parts that could be improved in our future research. (1) By adopt with different cancer information, we could investigate more prognostic related gene modules. (2) The gene module could be extended according to the definitions of Protein-Protein-Interaction-Network level, or pathway information, or miRNA regulation. (3) From the disease annotated drug sensitivity module, such as the Mitoxantrone in Figure 3, it is possible to further investigation on the drug similarity. We will further improve our method and make performance comparison with other works.
Mullard A: 2011 FDA drug approvals. Nat Rev Drug Discov. 2012, 11: 914-Feb
Aronson JK: Old drugs--new uses. Br J Clin Pharmacol. 2007, 64: 5635-Nov
Loscalzo J, Kohane I, Barabasi AL: Human disease classification in the postgenomic era: a complex systems approach to human pathobiology. Mol Syst Biol. 2007, 3: 124-
Butte AJ: Medicine. The ultimate model organism. Science. 2008, 320: 3257-Apr 18
Gottlieb A, Stein GY, Ruppin E, Sharan R: PREDICT: a method for inferring novel drug indications with application to personalized medicine. Mol Syst Biol. 2011, 7: 496-
Cheng F, Liu C, Jiang J, Lu W, Li W, Liu G, Zhou W, Huang J, Tang Y: Prediction of drug-target interactions and drug repositioning via network-based inference. PLoS Comput Biol. 2012, 8: e1002503-10.1371/journal.pcbi.1002503.
Lamb J, Crawford ED, Peck D, Modell JW, Blat IC, Wrobel MJ, Lerner J, Brunet JP, Subramanian A, Ross KN, Reich M, Hieronymus H, Wei G, Armstrong SA, Haggarty SJ, Clemons PA, Wei R, Carr SA, Lander ES, Golub TR: The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science. 2006, 313: 192935-Sep 29
Shigemizu D, Hu Z, Hung JH, Huang CL, Wang Y, DeLisi C: Using functional signatures to identify repositioned drugs for breast, myelogenous leukemia and prostate cancer. PLoS Comput Biol. 2012, 8: e1002347-10.1371/journal.pcbi.1002347. Feb
Navlakha S, Kingsford C: The power of protein interaction networks for associating genes with diseases. Bioinformatics. 2010, 26: 105763-Apr 15
Lage K, Karlberg EO, Storling ZM, Olason PI, Pedersen AG, Rigina O, Hinsby AM, Tumer Z, Pociot F, Tommerup N, Moreau Y, Brunak S: A human phenome-interactome network of protein complexes implicated in genetic disorders. Nat Biotechnol. 2007, 25: 30916-Mar
Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, Richter J, Rubin GM, Blake JA, Bult C, Dolan M, Drabkin H, Eppig JT, Hill DP, Ni L, Ringwald M, Balakrishnan R, Cherry JM, Christie KR, Costanzo MC, Dwight SS, Engel S, Fisk DG, Hirschman JE, Hong EL, Nash RS, Sethuraman A, Theesfeld CL, Botstein D, Dolinski K, Feierbach B, Berardini T, Mundodi S, Rhee SY, Apweiler R, Barrell D, Camon E, Dimmer E, Lee V, Chisholm R, Gaudet P, Kibbe W, Kishore R, Schwarz EM, Sternberg P, Gwinn M, Hannick L, Wortman J, Berriman M, Wood V, de la Cruz N, Tonellato P, Jaiswal P, Seigfried T, White R: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004, 32: D258-61. 10.1093/nar/gkh036. Jan 1
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25: 259-. May
Li Y, Hao P, Zheng S, Tu K, Fan H, Zhu R, Ding G, Dong C, Wang C, Li X, Thiesen HJ, Chen YE, Jiang H, Liu L: Gene expression module-based chemical function similarity search. Nucleic Acids Res. 2008, 36: e137-10.1093/nar/gkn610. Nov
Jiang W, Chen X, Liao M, Li W, Lian B, Wang L, Meng F, Liu X, Jin Y, Li X: Identification of links between small molecules and miRNAs in human cancers based on transcriptional responses. Sci Rep. 2012, 2: 2821-
Wang Y, Klijn JG, Zhang Y, Sieuwerts AM, Look MP, Yang F, Talantov D, Timmermans M, Meijer-van Gelder ME, Yu J, Jatkoe T, Berns EM, Atkins D, Foekens JA: Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet. 2005, 365 (6719): 19-25. Feb
Desmedt C, Piette F, Loi S, Wang Y, Lallemand F, Haibe-Kains B, Viale G, Delorenzi M, Zhang Y, d'Assignies MS, Bergh J, Lidereau R, Ellis P, Harris AL, Klijn JG, Foekens JA, Cardoso F, Piccart MJ, Buyse M, Sotiriou C: Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the TRANSBIG multicenter independent validation series. Clin Cancer Res. 2007, 13: 320714-Jun 1
Schmidt M, Bohm D, von Torne C, Steiner E, Puhl A, Pilch H, Lehr HA, Hengstler JG, Kolbl H, Gehrmann M: The humoral immune system has a key prognostic impact in node-negative breast cancer. Cancer Res. 2008, 68: 540513-Jul 1
Shoemaker RH: The NCI60 human tumour cell line anticancer drug screen. Nat Rev Cancer. 2006, 6: 81323-Oct
Scherf U, Ross DT, Waltham M, Smith LH, Lee JK, Tanabe L, Kohn KW, Reinhold WC, Myers TG, Andrews DT, Scudiero DA, Eisen MB, Sausville EA, Pommier Y, Botstein D, Brown PO, Weinstein JN: A gene expression database for the molecular pharmacology of cancer. Nat Genet. 2000, 24: 23644-Mar
Roschke AV, Tonon G, Gehlhaus KS, McTyre N, Bussey KJ, Lababidi S, Scudiero DA, Weinstein JN, Kirsch IR: Karyotypic complexity of the NCI-60 drug-screening panel. Cancer Res. 2003, 63: 863447-Dec 15
Yvon AM, Wadsworth P, Jordan MA: Taxol suppresses dynamics of individual microtubules in living human tumor cells. Mol Biol Cell. 1999, 10: 94759-Apr
Jordan MA, Wilson L: Microtubules as a target for anticancer drugs. Nat Rev Cancer. 2004, 4: 25365-Apr
Brito DA, Yang Z, Rieder CL: Microtubules do not promote mitotic slippage when the spindle assembly checkpoint cannot be satisfied. J Cell Biol. 2008, 182: 6239-Aug 25
Bharadwaj R, Yu H: The spindle checkpoint, aneuploidy, and cancer. Oncogene. 2004, 23: 201627-Mar 15
Fox EJ: Management of worsening multiple sclerosis with mitoxantrone: a review. Clin Ther. 2006, 28: 46174-Apr
Gordaliza M, Garcia PA, del Corral JM, Castro MA, Gomez-Zurita MA: Podophyllotoxin: distribution, sources, applications and new cytotoxic derivatives. Toxicon. 2004, 44: 44159-Sep 15
Halbreich U, Shen J, Panaro V: Are chronic psychiatric patients at increased risk for developing breast cancer?. Am J Psychiatry. 1996, 153: 55960-Apr
Stiborova M, Poljakova J, Martinkova E, Borek-Dohalska L, Eckschlager T, Kizek R, Frei E: Ellipticine cytotoxicity to cancer cell lines - a comparative study. Interdiscip Toxicol. 2011, 4: 98105-Jun
Publication of this article has been funded by the grant from the National Natural Science Foundation of China (Grant 61272277, 21173092 and 31402231), the Natural Science Foundation of Hubei Province (Grant 2013CFA016) and the Fundamental Research Funds for the Central Universities (Grant 2662014BQ081). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
This article has been published as part of BMC Medical Genomics Volume 8 Supplement 2, 2015: Selected articles from the 4th Translational Bioinformatics Conference and the 8th International Conference on Systems Biology (TBC/ISB 2014). The full contents of the supplement are available online at http://www.biomedcentral.com/bmcmedgenomics/supplements/8/S2.
The authors declare that they have no competing interests.
The work presented here was carried out in collaboration between all authors. Lida Zhu and Fuxi Zhu defined the research theme. Lida Zhu designed methods and experiments, carried out the laboratory experiments, analyzed the data, interpreted the results. Lida Zhu and Fuxi Zhu co-wrote the paper.