- Research
- Open Access
Predicting drug-disease interactions by semi-supervised graph cut algorithm and three-layer data integration
- Guangsheng Wu^{1},
- Juan Liu^{1}Email author and
- Caihua Wang^{1}
https://doi.org/10.1186/s12920-017-0311-0
© The Author(s) 2017
- Published: 28 December 2017
Abstract
Background
Prediction of drug-disease interactions is promising for either drug repositioning or disease treatment fields. The discovery of novel drug-disease interactions, on one hand can help to find novel indictions for the approved drugs; on the other hand can provide new therapeutic approaches for the diseases. Recently, computational methods for finding drug-disease interactions have attracted lots of attention because of their far more higher efficiency and lower cost than the traditional wet experiment methods. However, they still face several challenges, such as the organization of the heterogeneous data, the performance of the model, and so on.
Methods
In this work, we present to hierarchically integrate the heterogeneous data into three layers. The drug-drug and disease-disease similarities are first calculated separately in each layer, and then the similarities from three layers are linearly fused into comprehensive drug similarities and disease similarities, which can then be used to measure the similarities between two drug-disease pairs. We construct a novel weighted drug-disease pair network, where a node is a drug-disease pair with known or unknown treatment relation, an edge represents the node-node relation which is weighted with the similarity score between two pairs. Now that similar drug-disease pairs are supposed to show similar treatment patterns, we can find the optimal graph cut of the network. The drug-disease pair with unknown relation can then be considered to have similar treatment relation with that within the same cut. Therefore, we develop a semi-supervised graph cut algorithm, SSGC, to find the optimal graph cut, based on which we can identify the potential drug-disease treatment interactions.
Results
By comparing with three representative network-based methods, SSGC achieves the highest performances, in terms of both AUC score and the identification rates of true drug-disease pairs. The experiments with different integration strategies also demonstrate that considering several sources of data can improve the performances of the predictors. Further case studies on four diseases, the top-ranked drug-disease associations have been confirmed by KEGG, CTD database and the literature, illustrating the usefulness of SSGC.
Conclusions
The proposed comprehensive similarity scores from multi-views and multiple layers and the graph-cut based algorithm can greatly improve the prediction performances of drug-disease associations.
Keywords
- Drug-disease interaction
- Integration strategy
- Similarity
- Graph cut
- Guilt-by-association
Background
On one hand, traditional drug development is a time-consuming and costly process with low success rate [1–3]. To speed up the process and reduce the risks and costs, drug repositioning has becoming a promising alternative for de novo drug discovery [1, 4, 5]. However, to reposition a drug might also be a haphazard process with a bit of luck, for examples, repositioning sildenafil (brand name: Viagra) from the treatment of angina to erectile dysfunction [6], repositioning minoxidil from the treatment of hypertension to hair loss [7], and so on. Thus, there are urgent needs to develop effective computational methods for drug reposition. On the other hand, the commonly used drugs for some diseases may suffer from the problems of severe side-effects or resistance, for example, the drug for Parkinson’s disease, L-dopa, has severe side effects such as dyskinesia [8]. It is necessary to find better pharmacological treatments of some diseases. Predicting drug-disease interactions is devoted to above two issues.
There are lots of methods proposed to predict the potential drug-disease relations. Some methods are based on gene expression profile data under the hypothesis that if the drug and disease have opposite expression signatures, then the drug is possible to treat that disease [9]. For instance, Sirota et al. integrated gene expression measurements from 100 diseases and 164 drug compounds, and predicted potential indications for these drugs, such as lung adenocarcinoma as the potential indications of cimetidine [10]; Jahchan et al. proposed a systematic approach to query gene expression profiles so as to identify antidepressant drugs to treat small cell cancer [11]. The vast amount of information of drugs and diseases in literature and databases make it possible to mine or infer the potential associations between drugs and diseases based on literature mining and semantic inference. Suppose that B is reported to be one of the characteristics of disease C in some literature, and drug A is reported to affect B in other literature, then it has a potential interaction between drug A and disease C [12, 13]. For example, Ahlers et al. found the potential link between the antipsychotic agents and cancer based on MEDLINE citations [14]. Since high-throughput experiments have accumulated massive data on diseases and drugs, more and more methods focus on building prediction models via machine learning strategies. For example, Gottlieb et al. proposed a logistic regression based method by integrating different information on drugs and diseases [15]; Chen et al. regarded the prediction of drug-disease associations as a recommendation problem, and adopted two recommendation algorithms to infer drug-disease interactions [16]; Liang et al. developed a Laplacian regularized sparse subspace learning (LRSSL) based method to predict drug-disease interactions by integrating drug chemical structure, drug target domain and target annotation information [17].
In recent years, the network-based prediction, which first builds a network based on the existed data and then builds the prediction model, is very promising and a few methods have been proposed, such as network-based guilt-by-association (GBA) method [4], network-based inference (NBI) method [18], random walk and network propagation based algorithm [19], and so on. Recently, Wang et al. proposed to build heterogeneous graph model HGBI for the prediction of drug-target interactions [20], and to build three-layer heterogeneous graph model (TL-HGBI) for the prediction of drug-disease interactions [21]. Even so, they did not take full advantages of the diverse information from genes, drugs, diseases, and their associations yet.
Methods
Data collection
We have collected drugs, genes, diseases, and the interactions information from several data sources. With these data, we attempt to investigate whether there is a treatment relation within any unknown drug-disease pair.
From DrugBank (https://www.drugbank.ca) [23], we obtained the chemical structures of 1186 drugs, 1141 genes, and 4594 drug-gene associations (the polypeptides and drugs whose targets are not in human cells are not included).
From DGIdb (http://dgidb.genome.wustl.edu) [24], MINT (http://mint.bio.uniroma2.it) [25] and UniProt (http://www.uniprot.org) [26], we have collected 6988 genes, and 42162 gene-gene associations. Among the genes, 1141 genes are associated with drugs (in DrugBank), and 700 genes are associated with diseases.
From OMIM (https://omim.org) [27] and Gottlieb’s data set [15], we downloaded 449 diseases and 700 related genes that form 1365 disease-gene associations. Furthermore, 1827 treatment relations between 302 of the 449 diseases and 551 drugs [15] were also collected.
To facilitate the data integration, we organized the heterogeneous data into three layers. The base layer provides information on drug substructures and disease phenotypes; the gene layer provides genes and gene-gene associations information; and the treatment layer provides drug-disease interactions information (left part of Fig. 1).
For convenience, we suppose there are m drugs (m=1186), n diseases (n=449), l druggable genes (l=6988), and q drug-disease pairs (q=m×n) hereinafter. Moreover, we denote the k-order identity matrix as I _{ k }, matrix element multiplication and division as ⊗ and ⊘ respectively, and the shorthand for the Euclidean norm as ∥∙∥.
Similarity calculation in the base layer
Our approach is mainly inspired by the assumption that similar drugs might treat similar diseases. Hence, similarity calculation is the key issue of our approach. Different with other methods, we first computed drug-drug and disease-disease similarities from three different aspects, corresponding to the drug structures/disease phenotypes, functional information of genes, and drug-disease treatment relationships respectively; And then we integrated three similarities into the comprehensive drug (disease) similarity.
In the base layer, we calculate the drug-drug and disease-disease similarities respectively according to drug chemical substructures and disease phenotype information.
Structural similarity between drugs
The SMILES (simplified molecular-input line-entry system) strings [28] for all drug structures are obtained from the DrugBank database, based on which the 2D fingerprints of the drugs are calculated via Openbabel tool [29]. Using the fingerprints information, we can calculate the Tanimoto score (the size of the intersection divided by the size of the union) [30] and use it as the structural similarity for each drug pair. Obviously, the drug-drug structural similarity matrix, denoted as S _{ bc }, is an m×m symmetrical matrix with diagonal elements being ones.
Phenotype similarity between diseases
The normalized phenotype similarity scores (ranging from 0 to 1) between diseases are obtained directly from MimMiner (http://www.cmbi.ru.nl/MimMiner/suppl.html) [31] which are constructed based on MeSH terms [32]. The n×n disease-disease phenotype similarity matrix, S _{ bd }, is also an symmetrical matrix with diagonal elements ones.
Similarity calculation in the gene layer
Since diseases (drugs) associated with the same genes or genes in the same pathways are likely to have similar functional mechanism, we can measure the functional similarities of the disease (drug) pairs according to the associated genes’ information.
Gene-gene association measurement
Profile similarity between drugs or diseases
Similarity calculation in the treatment layer
It is noticeable that we have collected 1186 drugs and 449 diseases in all, yet we can only calculated the similarities for 551 drugs and 302 diseases in the treatment layer according to the information from Gottlieb’s data set. Therefore, we adopt the same method as in [34] to project those drugs (diseases) that do not occur in Gottlieb’s data set into a unified network similarity space. By this way, we can get all drug-drug (disease-disease) similarities from \(S_{tc}^{'} \left (S_{td}^{'}\right)\). We denote the final similarity matrice in treatment layer as S _{ tc } (m×m dimension) and S _{ td } (n×n dimension) respectively.
Integrating similarities from three layers
where α _{ c },β _{ c },γ _{ c }, α _{ d },β _{ d } and γ _{ d } are combination weights satisfying that α _{ c }+β _{ c }+γ _{ c }=1 and α _{ d }+β _{ d }+γ _{ d }=1.
To determine the values of α _{ c },β _{ c },γ _{ c }, α _{ d },β _{ d } and γ _{ d }, a simple way to integrate the similarities is to assign equal weights to each layer. However this integration strategy has a weak point: the information from the layer with much smaller scores might be neglected due to the integration, and vice verse. A more rational way is to make each layer has equal contribution to the final results. In this work, we adopted the latter strategy to integrate the similarities from three layers.
Novel weighted drug-disease pair graph
where W is the q×q weight matrix that is symmetrical and with the diagonal elements zeros.
Obviously, In all q drug-disease pair nodes in the graph, some drug-disease pairs have known treatment relationships whereas others are unknown which need to be predicted.
Let f=(f _{1},f _{2},⋯,f _{ s },⋯,f _{ q })^{ T }, f _{ s }∈{0,1} indicates whether the drug-disease pair (i,j) has a treatment relationship or not. Then the problem of predicting the drug-disease treatment relationships could be addressed by determining the value of f. In this work, we consider this problem as a graph cut problem [35], and cluster all drug-disease pair nodes into two groups (treatment and non-treatment) by cutting the graph into several sub-graphs so that pairs within the same sub-graph belong to the same group.
Semi-supervised graph cut approach
Equation (4) illustrates that we only consider the priori values of unknown drug-disease pairs.
Let ∧_{ L }(Labeled) and ∧_{ U }(Unlabeled) are two q×q diagonal matrices indicating the treatment states of drug-disease pairs observed from the data set; p=(p _{1},p _{2},⋯,p _{ s },⋯,p _{ q })^{ T }(p _{ s }=P _{ ij }); y=(y _{1},y _{2},⋯,y _{ s },⋯,y _{ q })^{ T }(y _{ s }=Y _{ ij }). Obviously, y is the diagonal vector of matrix ∧_{ L }; ∧_{ U }=I _{ q }−∧_{ L }; and \(\wedge _{L}^{k} = \wedge _{L}\), \(\wedge _{U}^{k} = \wedge _{U}\); ∧_{ L } y=y, ∧_{ U } p=p.
Where μ and ξ are two parameters. Obviously, in order to minimize L o s s(f), f should meet the requirements that similar drug-disease pairs should have similar treatment relationships; the derived treatment relationships should be in accord with the known observed facts and also should be inclined to consistent with the priori knowledge. In this work, we set μ>ξ>0 with the consideration that violating the observed facts would receive greater penalty than out of the priori knowledge. Obviously, the f with the minimal L o s s(f) corresponds to the optimal graph cut.
Suppose L=A−W, obviously L is the Laplace matrix of G, and the normalized matrix [36] is \(\overline {L} = A^{-1/2} LA^{-1/2} = I_{q} - A^{-1/2} WA^{-1/2}\). Let S=A ^{−1/2} W A ^{−1/2}, then we have \(\overline {L} = I_{q} - S\).
Fortunately, Eq. (9) is convergent when setting α=1/(1+μ), \(\widehat {y} = y+\frac {\xi }{\mu }p\) and \(f^{(0)}=\widehat {y}\) according to [37]. It is expected to minimize L o s s(f) by repeating the iterative process until Eq. (9) converges. However, we find that the memory consumption is too large when running the iteration because of the extreme large matrix S (for example, if n=10^{3},m=10^{3}, then the dimension of S is 10^{12}).
Equation (10) implies that we can compute Sf with a space complexity Θ(max(n ^{2},m ^{2})), rather than Θ((n m)^{2}), which enables the iteration process to go through on the desktops.
To sum up, the framework to find the optimal graph cut is listed in Algorithm 1.
Results
Redundancy check of the data set
Rationality validation by guilt-by-association assumption
GBA analysis
Base layer | Gene layer | Treatment layer | |||||||
---|---|---|---|---|---|---|---|---|---|
avg-same | avg-diff | ratio | avg-same | avg-diff | ratio | avg-same | avg-diff | ratio | |
Drug | 0.25 | 0.17 | 1.47 | 0.29 | 0.12 | 2.41 | 0.33 | 0.06 | 5.50 |
Disease | 0.23 | 0.10 | 2.30 | 0.40 | 0.13 | 3.08 | 0.32 | 0.05 | 6.40 |
Setting of thresholds and combinations weights
Previous studies imply that small similarity scores are usually noise data which provide little information and sometimes even have adverse effect to the prediction performance [20, 21]. Therefore, we chose thresholds to cut off the small similarity scores. However, taking the thresholds together, there are 12 parameters in Eqs. (1) and (2) in all, which makes it impractical to search all the parameter space to get the optimal parameter settings. For feasibility, we set the parameters based on two principles: (1) each layer has close GBA ratio; and (2) each layer has nearly equal contribution to the ultimate similarity matrices.
Thresholds setting based on GBA assumption
We want to let each layer have similar GBA ratio. Since the treatment layer achieves the highest GBA ratios (Table 1), we set the similarities thresholds for S _{ tc },S _{ td } to zeros and then accordingly choose the thresholds for other two layers so that three layers have similar GBA ratios. As a result, the thresholds of S _{ bc },S _{ gc },S _{ bd } and S _{ gd } are set to 0.1, 0.01, 0.14 and 0.01 respectively.
Integrating weights setting based on equal contribution strategy
We want to let each layer have nearly equal contribution to the ultimate similarity matrices. After choosing of thresholds, the average of each matrix (S _{ bc },S _{ gc },S _{ tc },S _{ bd },S _{ gd } and S _{ td }) are calculated to be 0.017, 0.028, 0.057, 0.006, 0.028 and 0.038 respectively. Accordingly we can obtain the combination weights by setting equal contributions to each layer. If the average of one layer is small, we assign a large weight to enhance its final effect, on the same time, if the average of one layer is large, we assign a small weight to weaken its final effect. By this strategy, we set α _{ c },β _{ c } and γ _{ c } to 0.53, 0.32 and 0.15; and α _{ d },β _{ d } and γ _{ d } to 0.72, 0.16 and 0.12 respectively.
Evaluating the performance of SSGC
Since SSGC is a network-based approach, we compared it with three network-based methods (NBI, HGBI and TL-HGBI) on Gottlieb’s data set using 10-folds cross validation [15]. For fairness, we optimize the parameters for each method by grid search: μ=4 and ξ=0.67 for SSGC, α=0.7 for HGBI and α=0.2 for TL-HGBI.
At the same time, we investigate the number of correctly retrieved known drug-disease pairs among the top ranked prediction results. Figure 3 (right) shows that SSGC performs the best. For example, among the 1827 known drug-disease associations, 310 of them are retrieved among the top 1% ranked predictions by SSGC, whereas only 170 (78) of them are retrieved by HGBI (TL-HGBI).
Investigating the integration strategy
In order to investigate whether our comprehensive similarities combination strategy contributes to the good performance of SSGC, we try to modify the compared methods so that they can adopt the same strategies. As HGBI and TL-HGBI also utilize drug-drug and disease-disease similarities to infer drug-disease interactions, it is easy to modify them to employ the combined comprehensive similarities as our method does. At the same time, SSGC can be turned to partly or fully adopt the comprehensive similarities. After the modification, we can investigate three methods in the way that multiple layers of data are added gradually. Because NBI method only makes use of the topology structure of drug-disease association network, we do not include it in this comparing experiment.
AUC scores of different algorithms modified to integrate different layers
SSGC | HGBI | TL-HGBI | |
---|---|---|---|
base | 0.80 | 0.78 | 0.74 |
base + gene | 0.87 | 0.85 | 0.74 |
base + gene + network | 0.93 | 0.91 | 0.75 |
base + gene + network + priori | 0.95 | 0.93 | 0.84 |
Validating the predicted drug-disease associations
Distribution of predicted values
Validation in tissue-specific expression data
The drug-disease pairs related to the same tissue
Tissue | Drug | Disease | Value |
---|---|---|---|
Pancreas | Acetylsalicylic acid (DB00945) | Diabetes Mellitus, Noninsulin-Dependent (125853) | 0.20 |
Pancreas | Acetylsalicylic acid (DB00945) | Cystic fibrosis by Pseudomonas aeruginosa (219700) | 0.32 |
Pancreas | Acetaminophen (DB00316) | Diabetes Mellitus, Noninsulin-Dependent (125853) | 0.13 |
Pancreas | Acetaminophen (DB00316) | Cystic fibrosis by Pseudomonas aeruginosa (219700) | 0.26 |
Skeletal Muscle | Acetaminophen (DB00316) | Myasthenic syndrome (601462) | 0.22 |
Skin | Lorazepam (DB00186) | Immunodysregulation, Polyendo-crinopathy, And X-Linked Enteropathy (304790) | 0.17 |
Testis | Lorazepam (DB00186) | Persistent Mullerian duct syndrome, type II (261550) | 0.09 |
Testis | Alprazolam (DB00404) | Persistent Mullerian duct syndrome, type II (261550) | 0.10 |
Testis | Acetaminophen (DB00316) | Persistent Mullerian duct syndrome, type II (261550) | 0.24 |
Heart | Acetylsalicylic acid (DB00945) | Thrombosis, Susceptibility to thrombin defect; thph1 (188050) | 0.20 |
Heart | Acetaminophen (DB00316) | Thrombosis, Susceptibility to thrombin defect; thph1 (188050) | 0.33 |
Heart | Acetaminophen (DB00316) | Afibrinogenemia, congenital (202400) | 0.25 |
Heart | Acetylsalicylic acid (DB00945) | Afibrinogenemia, congenital (202400) | 0.24 |
Case studies for potential drug-disease relations
We select four diseases as case studies: Huntington disease (HD, OMIM 143100), Non-small-cell lung cancer (NSCLC, OMIM 211980), Alcohol dependence (AD, OMIM 103780) and Small-cell lung cancer (SCLC, OMIM 182280). After excluding the known approved drugs which are also predicted in the results (value > 0.8), we observe other predicted top-20 ranked drugs. The investigation of the predicted drug-disease associations included three parts as follows.
Investigation of the pathways overlapping between the disease and drugs
For a specific disease, if the related pathways of the drugs are overlapped with those of the disease, the prediction results should be convincible. Therefore, we first extracted the disease related genes from OMIM, and the target genes of the top-20 drugs from DrugBank; and then we got the enriched pathways of the two gene sets respectively with DAVID [40, 41], and investigated the overlap between them.
Verification in CTD database
The top-ranked predictions for selected diseases(Verification in CTD database)
Disease | Known drugs | Part of top-ranked predictions | Direct evidence |
---|---|---|---|
HD (143100) | Baclofen (DB00181) | Clozapine (DB00363, rank:01) | |
Tetrabenazine (DB04844) | Olanzapine (DB00334, rank:03) | T | |
Aripiprazole (DB01238, rank:06) | T | ||
Amitriptyline (DB00321, rank:10) | |||
Risperidone (DB00734, rank:12) | |||
NSCLC (211980) | Doxorubicin (DB00997) | Carboplatin (DB00958, rank:01) | T |
Adenosine triphosphate (DB00171, rank:02) | |||
Glutathione (DB00143, rank:05) | |||
Ponatinib (DB08901, rank:09) | |||
Sorafenib (DB00398, rank:10) | |||
Dasatinib (DB01254, rank:14) | |||
Daunorubicin (DB00694, rank:15) | |||
Epirubicin (DB00445, rank:16) | T | ||
Bosutinib (DB06616, rank:18) | |||
Caffeine (DB00201, rank:19) | |||
Cisplatin (DB00515, rank:20) | T | ||
AD (103780) | Citalopram (DB00215) | Lorazepam (DB00186, rank:04) | T |
Chlordiazepoxide (DB00475) | Diazepam (DB00829, rank:10) | ||
Acamprosate (DB00659) | Clomipramine (DB01242, rank:13) | ||
Naltrexone (DB00704) | Flunitrazepam (DB01544, rank:14) | ||
Disulfiram (DB00822) | Adenosine triphosphate (DB00171, rank:17) | ||
Ondansetron (DB00904) | Trazodone (DB00656, rank:18) | ||
Imipramine (DB00458, rank:20) | |||
SCLC (182280) | Cisplatin (DB00515) | Carboplatin (DB00958, rank:01) | T |
Methotrexate (DB00563) | Adenosine triphosphate (DB00171, rank:02) | ||
Teniposide (DB00444) | Irinotecan (DB00762, rank:04) | T | |
Etoposide (DB00773) | Glutathione (DB00143, rank:07) | ||
Topotecan (DB01030) | Doxorubicin (DB00997, rank:09) | T | |
Daunorubicin (DB00694, rank:11) | |||
Sorafenib (DB00398, rank:13) | |||
Ponatinib (DB08901, rank:16) | |||
Epirubicin (DB00445, rank:18) | T |
As shown in Table 4, Five drugs are associated with HD, Olanzapine (DB00334) and Aripiprazole (DB01238) have curated association to HD, which are signed with “T” in the “Direct Evidence” item. Eleven drugs are associated with NSCLC, Carboplatin (DB00958), Epirubicin (DB00445) and Cisplatin (DB00515) have curated association to NSCLC. Seven drugs have association to AD, Lorazepam (DB00186) has curated association to AD. Nine drugs are associated with SCLC, Carboplatin (DB00958), Irinotecan (DB00762), Doxorubicin (DB00997) and Epirubicin (DB00445) have curated association to SCLC.
Verification in literature
The top-ranked predictions for selected diseases(Verification in literature)
Disease | Known drugs (DrugBank IDs) | Part of top-ranked predictions |
---|---|---|
HD (143100) | Baclofen (DB00181) | Clozapine (DB00363, rank:01) |
Tetrabenazine (DB04844) | Olanzapine (DB00334, rank:03) | |
Ziprasidone (DB00246, rank:05) | ||
Aripiprazole (DB01238, rank:06) | ||
Quetiapine (DB01224, rank:07) | ||
Risperidone (DB00734, rank:12) | ||
NSCLC (211980) | Doxorubicin (DB00997) | Carboplatin (DB00958, rank:01) |
Epirubicin (DB00445, rank:16) | ||
Cisplatin (DB00515, rank:20) | ||
AD (103780) | Citalopram (DB00215) | Butriptyline (DB09016, rank:03) |
Chlordiazepoxide (DB00475) | Lorazepam (DB00186, rank:04) | |
Acamprosate (DB00659) | ||
Naltrexone (DB00704) | ||
Disulfiram (DB00822) | ||
Ondansetron (DB00904) | ||
SCLC (182280) | Cisplatin (DB00515) | Carboplatin (DB00958, rank:01) |
Methotrexate (DB00563) | Irinotecan (DB00762, rank:04) | |
Teniposide (DB00444) | Doxorubicin (DB00997, rank:09) | |
Etoposide (DB00773) | Epirubicin (DB00445, rank:18) | |
Topotecan (DB01030) |
All above results have demonstrated the effectiveness of our approach to discover the potential drug-disease interactions.
Discussion and conclusion
In this paper, we propose a novel method, SSGC, to uncover the potential associations between drugs and diseases. The main contributions are as follows: Firstly, we have presented a hierarchial framework to integrate multiple source of data, including information of drug substructures, disease phenotypes, gene-gene interactions, and known drug-disease treatment relationships. The integration framework can be easily extended to integrate more data. Secondly, we measured the comprehensive similarities of drugs and diseases from multi-view and multiple layers, which is different with many other methods that just obtain the similarity from the chemical structure and the disease phenotype. The base layer reflects the drug structural similarity and disease phenotype similarity, which are the original features. The gene layer reflects the functional similarities of drugs and diseases, which are calculated based on the assumption that diseases (drugs) associated with some common genes or gene pathways might have analogous functional mechanism. The treatment layer takes the known drug-disease relationships into account, which can improve the similarities of drugs and diseases. Therefore, the comprehensive similarities can improve the prediction accuracy and are easily interpretable. Thirdly, we model the prediction as a graph cut problem, and develop a semi-supervised algorithm, SSGC, to resolve it. The experimental results imply that SSGC significantly outperforms three representative approaches. Besides, KEGG pathway enrichment analysis and the validations via CTD database and literature also demonstrated that SSGC is useful to predict the potential associations between drugs and diseases. In fact, the proposed SSGC algorithm can also be used in other recommendation systems, such as recommending products to customers.
Of course, there is a long way to go in the process of drug discovery. And there are many other types of data (side effect data of chemicals, clinical symptoms and signs, and so on) could be utilized to predict drug-disease interactions. For example, Rastegar-Mojarad et al. utilized phenome-wide association studies (PheWAS) data and further expanded the horizon for the prediction of drug-disease interactions [59]. However, how to fuse multiple sources of data more properly and rationally and how to develop prediction models with better performance and interpretability are still full of challenges.
Declarations
Funding
This work was supported by the National Science Foundation of China [61272274, 60970063]; the program for New Century Excellent Talents in Universities [NCET-10-0644]; the National Science Foundation of Jiangsu Provice [BK20161249].
Availability of data and materials
The data supporting the results of this research paper are included within this article.
About this supplement
This article has been published as part of BMC Medical Genomics Volume 10 Supplement 5, 2017: Selected articles from the IEEE BIBM International Conference on Bioinformatics & Biomedicine (BIBM) 2016: medical genomics. The full contents of the supplement are available online at https://bmcmedgenomics.biomedcentral.com/articles/supplements/volume-10-supplement-5.
Authors’ contributions
GSW, JL and CHW developed the methodology. GSW and CHW executed the experiments, JL provided guidance and supervision. JL, GSW and CHW wrote this paper. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
- Iorio F, Rittman T, Ge H, et al.Transcriptional data: a new gateway to drug repositioning?Drug Discov today. 2013; 18(7):350–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Booth B, Zemmel R. Prospects for productivity. Nat Rev Drug Discov. 2004; 3(5):451–6.View ArticlePubMedGoogle Scholar
- Li J, Zheng S, Chen B, et al.A survey of current trends in computational drug repositioning. Brief Bioinforma. 2016; 17(1):2–12.View ArticleGoogle Scholar
- Chiang AP, Butte AJ. Systematic evaluation of drug-disease relationships to identify leads for novel drug uses. Clin Pharmacol Ther. 2009; 86(5):507.View ArticlePubMedPubMed CentralGoogle Scholar
- Ashburn TT, Thor KB. Drug repositioning: identifying and developing new uses for existing drugs. Nat Rev Drug Discov. 2004; 3(8):673–83.View ArticlePubMedGoogle Scholar
- Novac N. Challenges and opportunities of drug repositioning. Trends Pharmacol Sci. 2013; 34(5):267–72.View ArticlePubMedGoogle Scholar
- Varothai S, Bergfeld WF. Androgenetic alopecia: an evidence-based treatment update. Am J Clin Dermatol. 2014; 15(3):217–30.View ArticlePubMedGoogle Scholar
- Segura-Aguilar J, Muñoz P, Paris I. Aminochrome as new preclinical model to find new pharmacological treatment that stop the development of parkinson’s disease. Curr Med Chem. 2016; 23(4):346–59.View ArticlePubMedGoogle Scholar
- Harrison C. Drug repositioning: Genetic signatures uncover new uses. Nat Rev Drug Discov. 2011; 10(10):732–3.View ArticlePubMedGoogle Scholar
- Sirota M, Dudley JT, Kim J, et al.Discovery and preclinical validation of drug indications using compendia of public gene expression data. Sci Transl Med. 2011; 3(96):96–779677.View ArticleGoogle Scholar
- Jahchan NS, Dudley JT, Mazur PK, et al.A drug repositioning approach identifies tricyclic antidepressants as inhibitors of small cell lung cancer and other neuroendocrine tumors. Cancer Discov. 2013; 3(12):1364–77.View ArticlePubMedGoogle Scholar
- Andronis C, Sharma A, Virvilis V, et al.Literature mining, ontologies and information visualization for drug repurposing. Brief Bioinforma. 2011; 12(4):357–68.View ArticleGoogle Scholar
- Weeber M, Kors JA, Mons B. Online tools to support literature-based discovery in the life sciences. Brief Bioinforma. 2005; 6(3):277–86.View ArticleGoogle Scholar
- Ahlers CB, Hristovski D, Kilicoglu H, et al.Using the literature-based discovery paradigm to investigate drug mechanisms. In: AMIA: 2007.Google Scholar
- Gottlieb A, Stein GY, Ruppin E, et al.Predict: a method for inferring novel drug indications with application to personalized medicine. Mol Syst Biol. 2011; 7(1):496.View ArticlePubMedPubMed CentralGoogle Scholar
- Chen H, Zhang H, Zhang Z, et al.Network-based inference methods for drug repositioning. Comput Math Methods Med. 2015; 2015(2015):130620.PubMedPubMed CentralGoogle Scholar
- Liang X, Zhang P, Yan L, et al.Lrssl: predict and interpret drug-disease associations based on data integration using sparse subspace learning. Bioinformatics (Oxford, England). 2017; 33(8):1187–1196.Google Scholar
- Cheng F, Liu C, Jiang J, et al.Prediction of drug-target interactions and drug repositioning via network-based inference. PLoS Compututational Biol. 2012; 8(5):1002503.View ArticleGoogle Scholar
- Emig D, Ivliev A, Pustovalova O, et al.Drug target prediction and repositioning using an integrated network-based approach. PLoS ONE. 2013; 8(4):60618.View ArticleGoogle Scholar
- Wang W, Yang S, Li J. Drug target predictions based on heterogeneous graph inference. Biocomputing. 2013; 2013:53–64.Google Scholar
- Wang W, Yang S, Zhang X, et al.Drug repositioning by integrating target information through a heterogeneous network model. Bioinformatics. 2014; 30(20):2923–30.View ArticlePubMedPubMed CentralGoogle Scholar
- Wen Z, Chen Y, Feng L, Fei L, Gang T, Li X. Predicting potential drug-drug interactions by integrating chemical, biological, phenotypic and network data. Bmc Bioinformatics. 2017; 18(1):18.View ArticleGoogle Scholar
- Knox C, Law V, Jewison T, et al.Drugbank 3.0: a comprehensive resource for ‘omics’ research on drugs. Nucleic Acids Res. 2011; 39(suppl 1):1035–41.View ArticleGoogle Scholar
- Griffith M, Griffith OL, Coffman AC, et al.Dgidb - mining the druggable genome for personalized medicine. Nat Methods. 2013; 10:1209–10.View ArticlePubMedPubMed CentralGoogle Scholar
- Licata L, Briganti L, Peluso D, et al.Mint, the molecular interaction database: 2012 update. Nucleic Acids Res. 2012; 40(D1):857–61.View ArticleGoogle Scholar
- Consortium U, et al.Uniprot: the universal protein knowledgebase. Nucleic Acids Res. 2017; 45(D1):158–69.View ArticleGoogle Scholar
- Hamosh A, Scott AF, Amberger JS, et al.Online mendelian inheritance in man (omim), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005; 33(suppl 1):514–7.Google Scholar
- Weininger D. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. J Chem Inf Comput Sci. 1988; 28(1):31–6.View ArticleGoogle Scholar
- O’Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR. Open babel: An open chemical toolbox. J Cheminformatics. 2011; 3(1):33.View ArticleGoogle Scholar
- Tanimoto TT. Elementary Mathematical Theory of Classification and Prediction. Armonk: Int Bus Machines Corp; 1958.Google Scholar
- Van Driel MA, Bruggeman J, Vriend G, et al.A text-mining analysis of the human phenome. European J Human Genet. 2006; 14(5):535–42.View ArticleGoogle Scholar
- Lipscomb CE. Medical subject headings (mesh). Bull Med Libr Assoc. 2000; 88(3):265.PubMedPubMed CentralGoogle Scholar
- Perlman L, Gottlieb A, Atias N, et al.Combining drug and gene similarity measures for drug-target elucidation. J Comput Biol. 2011; 18(2):133–45.View ArticlePubMedGoogle Scholar
- Yamanishi Y, Araki M, Gutteridge A, et al.Prediction of drug–target interaction networks from the integration of chemical and genomic spaces. Bioinformatics. 2008; 24(13):232–40.View ArticleGoogle Scholar
- Wu G, Liu J, Wang C. Semi-supervised graph cut algorithm for drug repositioning by integrating drug, disease and genomic associations. In: 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2016). Institute of Electrical and Electronics Engineers Inc.2016;2016:223–8.Google Scholar
- Von Luxburg U. A tutorial on spectral clustering. Stat Comput. 2007; 17(4):395–416.View ArticleGoogle Scholar
- Zhou D, Bousquet O, Lal TN, et al.Learning with local and global consistency. Adv Neural Informa Proc Syst. 2004; 16(16):321–8.Google Scholar
- Lage K, Hansen NT, Karlberg EO, et al.A large-scale analysis of tissue-specific pathology and gene expression of human disease genes and complexes. Proc Natl Acad Sci. 2008; 105(52):20870–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Su AI, Wiltshire T, Batalov S, et al.A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci USA. 2004; 101(16):6062–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using david bioinformatics resources. Nat Protoc. 2009; 4(1):44–57.View ArticleGoogle Scholar
- Huang DW, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009; 37(1):1–13.View ArticleGoogle Scholar
- Davis AP, Grondin CJ, Johnson RJ, et al.The comparative toxicogenomics database: update 2017. Nucleic Acids Res. 2017; 45(D1):972–8.View ArticleGoogle Scholar
- Alpay M, Koroshetz WJ. Quetiapine in the treatment of behavioral disturbances in patients with huntington’s disease. Psychosom. 2006; 47(1):70–2.View ArticleGoogle Scholar
- Bonelli RM, Mayr BM, Niederwieser G, et al.Ziprasidone in huntington’s disease: the first case reports. J Psychopharmacol. 2003; 17(4):459–60.View ArticlePubMedGoogle Scholar
- Duff K, Beglinger LJ, O’Rourke ME, et al.Risperidone and the treatment of psychiatric, motor, and cognitive symptoms in huntington’s disease. Ann Clin Psychiatry. 2008; 20(1):1–3.View ArticlePubMedPubMed CentralGoogle Scholar
- Brusa L, Orlacchio A, Moschella V, et al.Treatment of the symptoms of huntington’s disease: preliminary results comparing aripiprazole and tetrabenazine. Mov Disord. 2009; 24(1):126–9.View ArticlePubMedGoogle Scholar
- Paleacu D, Anca M, Giladi N. Olanzapine in huntington’s disease. Acta Neurol Scand. 2002; 105(6):441–4.View ArticlePubMedGoogle Scholar
- Van Vugt J, Siesling S, Vergeer M, et al.Clozapine versus placebo in huntington’s disease: a double blind randomised comparative study. J Neurol Neurosurg Psychiatr. 1997; 63(1):35–9.View ArticleGoogle Scholar
- Ardizzoni A, Boni L, Tiseo M, et al.Cisplatin-versus carboplatin-based chemotherapy in first-line treatment of advanced non–small-cell lung cancer: an individual patient data meta-analysis. J Natl Cancer Inst. 2007; 99(11):847–57.View ArticlePubMedGoogle Scholar
- Martoni A, Melotti B, Guaraldi M, Pannuti F. Activity of high-dose epirubicin in advanced non-small cell lung cancer. European J Cancer Clin Oncol. 1991; 27(10):1231–4.View ArticleGoogle Scholar
- Dziadziuszko R, Ardizzoni A, Postmus P, et al.Temozolomide in patients with advanced non-small cell lung cancer with and without brain metastases: a phase ii study of the eortc lung cancer group (08965). European J Cancer. 2003; 39(9):1271–6.View ArticleGoogle Scholar
- Pani PP, Trogu E, Amato L, Davoli M. Antidepressants for the treatment of depression in alcohol dependent individuals. The Cochrane Library. 2010.Google Scholar
- ClinicalTrials.gov. Disulfiram combined with lorazepam for treatment of patients with alcohol dependence and primary or secondary anxiety disorder. Technical report, ClinicalTrials.gov (NCT number:NCT00721526). 2012.Google Scholar
- Clinicaltrials.gov. Temozolomide for relapsed sensitive or refractory small cell lung cancer. Technical report, ClinicalTrials.gov (NCT number: NCT00740636). 2012.Google Scholar
- Rustin G, Shreeves G, Nathan P, et al.A phase ib trial of ca4p (combretastatin a-4 phosphate), carboplatin, and paclitaxel in patients with advanced cancer. British J Cancer. 2010; 102(9):1355–60.View ArticleGoogle Scholar
- Tadokoro J-i, Kakihata K, Shimazaki M, et al.Post-marketing surveillance (pms) of all patients treated with irinotecan in japan: clinical experience and adr profile of 13 935 patients. Jpn J Clin Oncol. 2011; 41(9):1101–11.View ArticlePubMedGoogle Scholar
- Yamashita JI, Ogawa M, Shirakusa T. Plasma endothelin-1 as a marker for doxorubicin cardiotoxicity. Int J cancer. 1995; 62(5):542–7.View ArticlePubMedGoogle Scholar
- Gridelli C, D’Aprile M, Curcio C, et al.Carboplatin plus epirubicin plus vp-16, concurrent ‘split course’ radiotherapy and adjuvant surgery for limited small cell lung cancer. Lung Cancer. 1994; 11(1–2):83–91.View ArticlePubMedGoogle Scholar
- Rastegar-Mojarad M, Ye Z, Kolesar JM, et al.Opportunities for drug repositioning from phenome-wide association studies. Nat Biotechnol. 2015; 33(4):342–5.View ArticlePubMedGoogle Scholar