 Research
 Open access
 Published:
Inferring miRNAdisease associations using collaborative filtering and resource allocation on a tripartite graph
BMC Medical Genomics volumeÂ 14, ArticleÂ number:Â 225 (2021)
Abstract
Background
Developing efficient and successful computational methods to infer potential miRNAdisease associations is urgently needed and is attracting many computer scientists in recent years. The reason is that miRNAs are involved in many important biological processes and it is tremendously expensive and timeconsuming to do biological experiments to verify miRNAdisease associations.
Methods
In this paper, we proposed a new method to infer miRNAdisease associations using collaborative filtering and resource allocation algorithms on a miRNAdiseaselncRNA tripartite graph. It combined the collaborative filtering algorithm in CFNBC model to solve the problem of imbalanced data and the method for association prediction established multiple types of known associations among multiple objects presented in TPGLDA model.
Results
The experimental results showed that our proposed method achieved a reliable performance with Area Under Roc Curve (AUC) and Area Under PrecisionRecall Curve (AUPR) values of 0.9788 and 0.9373, respectively, under fivefoldcrossvalidation experiments. It outperformed than some other previous methods such as DCSMDA and TPGLDA. Furthermore, it demonstrated the ability to derive new associations between miRNAs and diseases among 8, 19 and 14 new associations out of top 40 predicted associations in case studies of Prostatic Neoplasms, Heart Failure, and Glioma diseases, respectively. All of these new predicted associations have been confirmed by recent literatures. Besides, it could discover new associations for new diseases (or miRNAs) without any known associations as demonstrated in the case study of Openangle glaucoma disease.
Conclusion
With the reliable performance to infer new associations between miRNAs and diseases as well as to discover new associations for new diseases (or miRNAs) without any known associations, our proposed method can be considered as a powerful tool to infer miRNAdisease associations.
Background
MicroRNA (miRNA) is a small RNA, about 22â€“26 nucleotides, which belongs to the noncoding RNA class [1]. Recent researches have shown that miRNAs are involved in many crucial biological processes like cell differentiation, proliferation, signal transduction, viral infection, and so on [2]. Identifying miRNAdisease associations could not only help us understand disease mechanism at miRNA level but also facilitate us in detecting disease biomarkers and discovering drugs for disease diagnosis, treatment, prognosis, and prevention. It has been confirmed that the dysregulations of the miRNAs are associated with the development and progression of various complex human diseases [3,4,5,6]. Until now, there are only a few known miRNAdisease associations in comparison with the number of newly discovered miRNAs. It is also tremendously expensive and timeconsuming to do biological experiments to verify miRNAdisease associations. Therefore, expanding effective and outstanding computational methods to predict potential miRNAdisease associations is urgently needed and is attracting many computer scientists in recent years [7].
Recently, various computational methods to forecast possible miRNAdisease associations have been developed. For example, Liu et al. [8] proposed PBMDA prediction model which integrated known human miRNAdisease associations, miRNA functional similarity, disease semantic similarity and Gaussian interaction profile kernel similarity for miRNAs and diseases. They constructed a heterogeneous graph and further adopted depthfirst search algorithm to figure out probable miRNAdisease associations. Chen et al. [9] presented a model called Graphlet Interaction for miRNADisease Association prediction (GIMDA) to predict miRNAdisease associations by measuring the graphlet interaction among miRNAs and among diseases. Graphlet is a type of subgraph with a few connections in a large network. GIMDA achieved a decisive performance but it was significantly timeconsuming. Liang et al. [10] proposed a miRNAdisease association prediction method based on adaptive multiview multilabel learning (AMVML). It learned a new affinity graph for miRNAs and diseases from multiple data sources. However, the integration of unreliable similarity matrices might weaken its overall prediction accuracy. The above mentioned methods for predicting miRNAdisease associations strongly relied on known human miRNAdisease associations. Most of existing methods need to use the similarity matrices such as the disease semantic similarity matrix and miRNA functional similarity matrix but they are not directly related to the miRNAdisease associations [11]. Besides, they have to deal with the problem of sparse similarity matrices which affected the prediction accuracies [12]. One other problem is that the miRNAtarget interactions usually have a high rate of falsepositive and falsenegative [9, 13].
In fact, diseases are caused by the disturbance of a complex of interacting multiple biomolecules rather than the abnormity of a single biomolecule. The functionally dependent molecular components in human cells form a complex biological network, in which lncRNAs and proteins are important parts of human tissues and cells. It is the reason that some computational methods have recently based on multiple types of known associations or interactions among multiple objects to predict potential miRNAdisease associations. For example, Zhao et al. [7] developed a computational method based on a distance correlation set to predict miRNAdisease associations (DCSMDA) by integrating known lncRNAdisease associations, known miRNAlncRNA associations, disease semantic similarity, and various lncRNA and disease similarity measures. DCSMDA did not require known miRNAdisease associations but it required the calculation of various similarity matrices and its performance depended on the pregiven threshold parameter. MÃ¸rk et al. [14] relied on known miRNAâ€“protein associations and known proteinâ€“disease associations to infer miRNAâ€“disease associations. Marissa Sumathipala and Weiss [15] integrated miRNAgene, proteinâ€“protein, and genedisease network information into a multilevel complex network to predict and prioritize biologically relevant miRNAs for diseases. Ji et al. [16] constructed a heterogeneous information network by integrating the known associations among lncRNAs, drugs, proteins, diseases, and miRNAs. They further employed the network embedding method which learned graph representations with global structural information to predict miRNAdisease associations. In general, the computational methods for predicting miRNAdisease associations based on multiple types of known associations among multiple objects are usually helpful for improving prediction accuracy. However, the number of known associations among biological objects is very limited in comparison with the number of objects in each type. Therefore, once again, these models have to be considered with the sparsity data problem.
In recent years, a variety of recommender systems have been developed to increase the association prediction reliability based on collaborative filtering methods. These methods rely on prior actions to predict useritem relationships to solve the problem of scarce known associations among different objects [17, 18]. Up to date, recommender algorithms have been appended into some computational models of prediction to identify different potential disease related biological objects. For example, Yu et al. [19] proposed a collaborative filtering model for lncRNAdisease association prediction based on the NaÃ¯ve Bayesian classifier. Shen et al. [2] predicted miRNAdisease association with Collaborative Matrix Factorization model which caused bias to miRNAs with more known associated diseases. Li et al. [11] presented a collaborative filteringbased miRNAdisease association prediction model (CFMDA) to predict miRNAdisease association. CFMDA was straight and robust by considering a minimal amount of related information and no tunable parameters were defined. However, CFMDAâ€™s association prediction performance was subjective because it only relies on miRNAdisease associations to execute predictions.
To solve the sparsity data problem and to take advantages of the integration of multiple types of known associations among multiple objects in improving prediction accuracy, in this paper, we proposed a new method to infer miRNAdisease associations using collaborative filtering and resource allocation algorithms on a tripartite graph. Our method is inspired by combining the collaborative filtering algorithm in CFNBC model introduced by Yu et al. [19] to solve the problem of imbalanced data and the method for association prediction established multiple types of known associations among multiple objects presented in TPGLDA model which introduced by Ding et al.[20] and the model in our former study [21]. Firstly, we constructed a tripartite graph which based on the known miRNAdisease associations, the known lncRNAdisease associations, and the known miRNAlncRNA interactions. Secondly, we used a collaborative filtering algorithm to recommend miRNAs for lncRNAs and diseases, respectively. Next, we employed a resource allocation algorithm to infer miRNAdisease associations. Finally, we ranked all candidate miRNAs for each disease in descending order to suggest associations between miRNAs and diseases for further giving the evidence in the future. Our method achieved a trustworthy prediction performance under fivefoldcrossvalidation experiments with an Area Under Roc Curve (AUC) averaged value of 0.9788 and an Area Under PrecisionRecall Curve (AUPR) averaged value of 0.9373. It is outperformance in comparison to several previous methods such as the DCSMDA [7] and the TPGLDA [20].
Methods
Materials
In this paper, we used datasets which came from the study of Zhao et al. [7]. We downloaded and used the Additional files 1, 2, 3, 4, and 5 from this study. These datasets contain 190 diseases, 111 lncRNAs and 264 miRNAs as described as follows:
Known lncRNAmiRNA associations
The known lncRNAmiRNA associations were collected from the starBasev2.0 [22] in February, 2017 and provided the most comprehensive experimentally confirmed lncRNAmiRNA interactions based on largescale CLIPSeq data. After eliminating duplicate values and erroneous data and also removing lncRNAs not included in DS2 dataset, we obtained the DS1 dataset which contains 1880 known lncRNAmiRNA associations.
Known lncRNAdisease associations
The known lncRNAdisease associations were collected from 8842 known diseaselncRNA associations in the MNDR database [23] and 2934 known diseaselncRNA associations in the LncRNADisease database [24]. After eliminating diseases without any MeSH descriptors because the disease names came from two different databases, merging the diseases with the same MeSH descriptors and removing the lncRNAs which were not included in the lncRNAmiRNA dataset (DS1), 936 known associations between diseases and lncRNAs (DS2) remained.
Known diseasemiRNA associations
The known human miRNAdisease associations were downloaded from the HMDD V2.0 database [25]. This dataset (DS3) contains 3252 quality miRNAdisease associations after we eliminated the duplicate associations and miRNAdisease associations involving with other diseases or lncRNAs which were not contained in the DS1 or DS2 datasets.
Method overview
In this paper, we proposed a new method to infer miRNAdisease associations. The flowchart of the proposed method is illustrated in Fig. 1. Generally, our proposed method contains four main stages. At the first stage, we constructed a tripartite graph G^{0} based on known miRNAdisease associations, known lncRNAdisease associations, and known miRNAlncRNA interactions. The tripartite graph G^{0} is represented by three adjacency matrices: A^{0}_{MD,} A^{0}_{ML} and A^{0}_{DL} where A^{0}_{MD} is the adjacency matrix between miRNAs and diseases, A^{0}_{ML} is the adjacency matrix between miRNAs and lncRNAs, A^{0}_{DL} is the adjacency matrix between diseases and lncRNAs. During the second stage, to solve the imbalance data problem, we employed a collaborative filtering algorithm on the tripartite graph G^{0} to obtain a tripartite graph G^{u}. The tripartite graph G^{u} is represented by three adjacency matrices: A^{u}_{MD,} A^{u}_{ML} and A^{0}_{DL} where A^{u}_{MD,} A^{u}_{ML} are the adjacency matrices obtained by updating A^{0}_{MD} and A^{0}_{ML} after using collaborative filtering algorithm. The tripartite graph G^{u} is used in a resource allocation algorithm at the third stage to calculate final resource score (Rscore_final) of miRNA candidates for each disease. At the final stage, we ranked all miRNA candidatesâ€™ Rscore_final for each disease in descending order so that the candidate with greater Rscore_final will have higher possibility to be verified in the future.
Construction of a tripartite graph G^{0}
Inspired by previous studies [19, 20] to infer lncRNAdisease associations by using a tripartite graph, in this paper, we firstly construct a miRNAdiseaselncRNA tripartite graph G^{0} as follows:
Construction of known miRNAdisease association graph
Let Mâ€‰=â€‰{m_{k}; kâ€‰=â€‰1,â€¦,n_{m}} denotes the set of miRNAs, Dâ€‰=â€‰{d_{j}; jâ€‰=â€‰1,â€¦, n_{d}} denotes the set of diseases where n_{m}, n_{d} represent the number of miRNAs and diseases, respectively. We build a MD^{0} graph based on the known miRNAdisease associations. The MD^{0} graph is represented by a matrix A^{0}_{MD} which is the adjacency matrix of known miRNAdisease associations. The entity A^{0}_{MD}(m_{k}, d_{j}) is the element in kth row and jth column of A^{0}_{MD}, and A^{0}_{MD}(m_{k}, d_{j})â€‰=â€‰1 if miRNA m_{k} is associated with disease d_{j}, otherwise, A^{0}_{MD}(m_{k}, d_{j})â€‰=â€‰0.
Construction of known miRNAlncRNA interaction graph
In the same way, let Mâ€‰=â€‰{m_{k}; kâ€‰=â€‰1,â€¦,n_{m}} denotes the set of miRNAs, Lâ€‰=â€‰{l_{i}; iâ€‰=â€‰1,â€¦, n_{l}} denotes the set of lncRNAs where n_{m}, n_{l} represent number of miRNAs and lncRNAs, respectively. We can obtain ML^{0} graph and A^{0}_{ML} matrix. ML^{0} graph is built on known miRNAlncRNA interactions. A^{0}_{ML} is the adjacency matrix of known miRNAlncRNA interactions. The entity A^{0}_{ML}(m_{k}, l_{i}) is the element in kth row and ith column of A^{0}_{ML}, and A^{0}_{ML}(m_{k}, l_{i})â€‰=â€‰1 if miRNA m_{k} interacts with lncRNA l_{i}, otherwise, A^{0}_{ML}(m_{k}, l_{i})â€‰=â€‰0.
Construction of known diseaselncRNA association graph
Similarly, let Dâ€‰=â€‰{d_{j}; jâ€‰=â€‰1,â€¦, n_{d}} denotes the set of diseases, Lâ€‰=â€‰{l_{i}; iâ€‰=â€‰1,â€¦,n_{l}} denotes the set of lncRNAs, where n_{d}, n_{l} represent number of diseases and lncRNAs, respectively. We can obtain DL^{0} graph and A^{0}_{DL} matrix where DL^{0} graph is built on known diseaselncRNA associations and A^{0}_{DL} is the adjacency matrix of known diseaselncRNA associations. The entity A^{0}_{DL}(d_{j}, l_{i}) is the element in jth row and ith column of A^{0}_{DL}, and A^{0}_{DL}(d_{j}, l_{i})â€‰=â€‰1 if disease d_{j} is associated with lncRNA l_{i}, otherwise, A^{0}_{DL}(d_{j}, l_{i})â€‰=â€‰0.
Construction of a tripartite graph G^{0}
From the integration of the three MD^{0}, ML^{0}, DL^{0} graphs, we obtain a tripartite graph G^{0}. The tripartite graph G^{0} is represented by three adjacency matrices: A^{0}_{MD,} A^{0}_{ML} and A^{0}_{DL} as mentioned before.
Construction of a tripartite graph G ^{u}
In the tripartite graph G^{0}, the number of known associations between miRNAs and diseases as well as between miRNAs and lncRNAs are small. So that, for any given lncRNA node l_{i} and disease node d_{j}, it is clear that the number of miRNA nodes which associated with both l_{i} and d_{j} will be very small. To improve it, in our method, we use a collaborative filtering algorithm for recommending suitable miRNA nodes to corresponding lncRNA nodes and disease nodes, respectively. By considering that a recommender system may involve various input data including users and items [18], in our proposed method, we take lncRNAs and diseases as users, while miRNAs as items. For the two adjacency matrices A^{0}_{ML} and A^{0}_{MD} obtained above, it is easy for us to construct another adjacency matrix A^{0}_{MLD}â€‰=â€‰[A^{0}_{ML}, A^{0}_{MD}] by splicing A^{0}_{ML} and A^{0}_{MD} together because the number of rows in both A^{0}_{ML} and A^{0}_{MD} are same. It is clear that the row vector of A^{0}_{MLD} consists of the row vectors in A^{0}_{ML} and A^{0}_{MD} while the column vectors in A^{0}_{MLD} is the same as the column vectors in A^{0}_{ML} or A^{0}_{MD}.
On the basis of A^{0}_{MLD} and tripartite graph G^{0}, we can obtain a cooccurrence matrix R^{m x m}, in which, the entity R(m_{k}, m_{r}) indicates the element in k^{th} row and r^{th} column of R^{m x m} where R(m_{k}, m_{r})â€‰=â€‰1 if and only if the miRNA m_{k} and miRNA m_{r} have at least one common neighboring node in G^{0}, otherwise R(m_{k}, m_{r})â€‰=â€‰0. The common neighboring node can be an lncRNA or a disease in G^{0}. So, a similarity matrix R^{nor} can be calculated by normalizing R^{m x m} as the following equation:
where k, r are the number of miRNAs. \(\leftN\left({m}_{k}\right)\right\) indicates the number of known lncRNAs and diseases associated to m_{k} in G^{0}, which means the number of elements with value equaling to 1 in kth row of A^{0}_{MLD}. \(\leftN\left({m}_{r}\right)\right\) indicates the number of known lncRNAs and diseases associated to m_{r} in G^{0}, which means the number of elements with value equaling to 1 in rth row of A^{0}_{MLD}. âˆ£N(m_{k})â€‰âˆ©â€‰N(m_{r})âˆ£ indicates the number of known lncRNAs and diseases associated with both miRNA m_{k} and miRNA m_{r} simultaneously in G^{0}.
Based on the similarity matrix R^{nor} and the adjacency matrix A^{0}_{MLD}, we calculate a new recommender matrix A^{u}_{MLD} as follows:
Specifically, for a particular lncRNA l_{i} or disease d_{j} in G^{0}, if there is a miRNA m_{k} satifying A^{0}_{MLD}(m_{k}, l_{i})â€‰=â€‰1 or A^{0}_{MLD}(m_{k}, d_{j})â€‰=â€‰1 in A^{0}_{MLD}, then we firstly calculate the sum of the values of all elements in the ith or jth column in A^{u}_{MLD}, respectively. Therefore, we will have its averaged value P. Next, if the ith or jth column of A^{u}_{MLD} contains a miRNA \({m}_{\theta }\) which satisfies A^{u}_{MLD}(\({m}_{\theta }\), l_{i})â€‰>â€‰P or A^{u}_{MLD}(\({m}_{\theta }\), d_{j})â€‰>â€‰P then we recommend miRNA \({m}_{\theta }\) for lncRNA l_{i} or disease d_{j}, respectively. Also, we will add new edge between \({m}_{\theta }\) and l_{i} or \({m}_{\theta }\) and d_{j} into the tripartite graph G^{0}.
Finally, we obtain a tripartite graph G^{u}. The tripartite graph G^{u} contains three graphs: MD^{update}, ML^{update} and DL^{0} and can be represented by three adjacency matrices: A^{u}_{MD}, A^{u}_{ML} and A^{0}_{DL}. MD^{update} is the updated graph of MD^{0} after adding new edge between recommended miRNAs and diseases. ML^{update} is the updated graph of ML^{0} after adding new edge between recommended miRNAs and lncRNAs. A^{u}_{MD} is the adjacency matrix which represents MD^{update} graph. It contains 10,310 known and recommended associations and 39,850 unknown remained associations. A^{u}_{ML} is the adjacency matrix which represents ML^{update} graph.
Employing resource allocation process on the tripartite graph G ^{u} to infer miRNAdisease associations
To infer miRNAdisease association, we employ the resource allocation algorithm on the tripartite graph G^{u} as described in the following steps:
Step 1: Calculating resource allocation between miRNAs and diseases
For a specific miRNA m_{k}, we define the initial resources located on disease d_{j} as:
where n_{d} is the number of diseases.
Then we calculate the resource moved back from D to M by using a weight matrix Wâ€‰=â€‰{w_{kt}}n_{m x} n_{m} to indicate the resource allocation process between miRNAs and diseases as follows:
where \({w}_{kt}\) is the contribution resource moved from tth node to kth node in M, and it can be understood as the similarity between miRNA m_{k} and miRNA m_{t} in MD^{update} graph. \(\mathit{deg}{A}_{MD}^{u}\left({m}_{k}\right)\) is the degree of miRNA m_{k} in MD^{update} graph and it represents the number of associated diseases for miRNA m_{k}. Similarly, \(\mathit{deg}{A}_{MD}^{u}\left({d}_{j}\right)\) is the degree of disease d_{j} in MD^{update} graph and it represents the number of associated miRNAs for disease d_{j}.
With respect to previous study [20], we also modify the resource allocation algorithm by considering the level of consistency between the contribution of resource transferred in both directions. It shows the impact of coselection (m_{k}, m_{t}) between the contribution of resource from m_{k} to m_{t} and the contribution of resource from m_{t} to m_{k}. A consistencebased resource allocation to represent a final miRNAdisease weight matrix Wâ€™â€‰=â€‰{wâ€™_{kt}} can be defined as in the following equation:
From the combination of the final miRNAdisease weight matrix Wâ€™ and the adjacency matrix A^{u}_{MD}, we define a final resource Rscore_ondisease_1 located on D as follows:
Step 2: Calculating resource allocation between diseases and lncRNAs
In regard to resource allocation between genes and diseases in TPGLDA [20], the same initial resources located on M nodes are allocated from nodes in M to nodes in D and then moved back, and the final resource matrix Rscore_ondisease_2 located on D nodes are issued by:
where \(\mathrm{deg}{A}_{DL}^{0}\left({l}_{i}\right)={\sum }_{j=1}^{{n}_{d}}{A}_{DL}^{0}({d}_{j}, {l}_{i})\) is the number of related diseases for lncRNA l_{i} or the degree of lncRNA l_{i} in DL^{0} graph. \(\mathrm{deg}{A}_{DL}^{0}\left({d}_{j}\right)\)=\({\sum }_{i=1}^{{n}_{l}}{A}_{DL}^{0}({d}_{j}, {l}_{i})\) is the number of related lncRNAs for disease d_{j} or the degree of disease d_{j} in DL^{0} graph.
Step 3: Calculating the final resource score Rscore_final to infer the potential diseaserelated miRNAs
We calculate the final resource score Rscore_final which is used to measure latent diseaserelated miRNAs as follows:
where Î³ is a tunable parameter with value in [0, 1]. Our model achieves the best prediction performance when Î³â€‰=â€‰0.9.
Ranking all candidate miRNAsâ€™ Rscores for each disease in descending order
Finally, we sort all candidate miRNAsâ€™ Rscore_final for each disease in descending order so that a higher score candidate will have more chances to be verified in the future.
Results
Performance measures
To evaluate our method performance in inferring miRNAdisease associations, we performed the fivefoldcrossvalidation experiments and evaluated the Area under roc curve (AUC) and the Area under precisionrecall curve (AUPR) as described in following sections:
Evaluating the AUC under 5foldcross validation
After applying a collaborative filtering algorithm on tripatite graph G^{0}, we obtained a tripartite graph G^{u} which contained three subgraphs: MD^{updated} graph, ML^{updated} graph and DL^{0} graph. By employing the resource allocation algorithm on the tripartite graph G^{u}, we predicted potential miRNAdisease associations. To evaluate our model performance in AUC term [26], we compared the inferred miRNAdisease associations resulted in Rscore_final matrix with the adjacency matrix A^{u}_{MD} of MD^{updated} graph. In MD^{updated} graph, we considered 10,310 associations of known and recommended associations as positive samples and the 39,850 remained unknown associations as negative samples. Then we randomly divided all positive and negative samples into 5 equal parts to perform fivefoldcrossvalidation. Next, in each running time, we used 4 parts of positive and negative samples for training and the remain part for testing. Our model is trained to recalculate Rscore_final in each running time. Basically, we computed the false positive rate (FPR) and true positive rate (TPR) with different Î³ values where FPR indicates the proportion of the real negative samples in predicted positive samples to all negative samples and TPR indicates the proportion of the real positive samples in all predicted positive samples. The FPR and TPR are calculated by the following equations:
where TP (true positive) means that a positive sample is correctly predicted as positive sample; FN (false negative) means that a positive sample is incorrectly predicted as negative sample; FP (false positive) indicates that a negative sample incorrectly predicted as positive sample; TN (true negative) indicates that a negative sample is correctly predicted as negative sample. We use TPR as vertical axis and FPR as horizontal axis to draw the receiver operating characteristic (ROC) curve [32], and the AUC value of our model achieves 0.9788 with Î³â€‰=â€‰0.9 after we perform the experiment for 10 times under fivefoldcrossvalidation. Figure 2 illustrates AUC curve with Î³â€‰=â€‰0.9 in one experimental running time.
Evaluate AUPR under 5foldcross validation
As previously mentioned, the data to evaluate our model performance is not balanced. Therefore, we also draw precisionrecall curve and calculate the AUPR curve to evaluate prediction performance [27]. The Precision reflects the percentage of the accurately predicted positive samples in all predicted positive samples, and the Recall reflects the percentage of the accurately predicted positive samples in all real positive samples. We calculate Precision and Recall as follows:
After we perform the experiment under fivefoldcrossvalidation for 10 times, our model achieves the best AUPR value 0.9373 with Î³â€‰=â€‰0.9. Figure 3 illustrates AUPR curve with Î³â€‰=â€‰0.9 one experimental running time.
Performance comparison with other related models
To demonstrate the outperformance of our model, we compare our model performance with the performance of DCSMDA method proposed by Zhao et al. [7]. We also implements predicting miRNAdisease associations by applying the resource allocation process introduced in [20] without applying collaborative filtering algorithm. The performances of these methods are shown in the Table 1.
As can be seen, our proposed method achieves better performance in comparison with DCSMDA and the method of applying TPGLDA in prediction of miRNAdisease associations for both AUC and AUPR values. Because of the sparsity data problem, AUC value usually achieves high score. However, in our proposed method, by using collaborative filtering algorithm to improve the density of miRNAdisease associations so that the updated adjacency matrix A^{u}_{MD} becomes more balanced which implies that the AUPR value (0.9373) could significantly be improved in comparison to AUPR value (0.7421) in case of applying TPGLDA model to predict miRNAdisease associations without using collaborative filtering algorithm. It demonstrates that our model achieves a more reliable performance than other previous methods.
Case studies
In addition to fivefoldcrossvalidation experiments, we also employed some case studies on our proposed model by sorting all candidate miRNAs for each disease. These predictions are utilized for further validation. In consistence with the previous study [20], all known and recommended miRNAdisease associations are considered as training samples, then the Rscore_final for each potential miRNAdisease association is calculated in sequence. Higher Rscore_final value indicates greater potential miRNAdisease association. In more detail, case studies on Prostatic Neoplasms, Heart Failure, Glioma and Openangle Glaucoma are constructed to show the ability of our model in order to identify new diseaseassociated miRNAs.
Prostatic neoplasms, also known as Prostate Cancer, is the secondmost prevalent type of cancers and the fifthleading cause of cancerrelated death in men [28]. miRNAs have been shown to play an important role in predicting prognosis of Prostate Cancer. Up to now, a variety of miRNAs have been reported to be associated with Prostatic Neoplasms /Prostate Cancer. For example, a target gene of miR6535p represses the proliferation and invasion of prostate cancer cells [29]. The dual action of miR125b as a Tumor Suppressor and OncomiR22 promotes Prostate Cancer tumorigenesis [30]. As shown in Table 2, there are 8 new miRNAdisease associations out of top predicted 40 miRNAs by applying our proposed method. All of new 8 miRNAdisease associations were confirmed by recent literatures.
Heart failure (HF), also known as congestive heart failure (CHF) and congestive cardiac failure (CCF), is when the heart is unable to pump sufficiently to maintain blood flow to meet the body's needs. It is a widely prevalent syndrome imposing a significant burden of morbidity and mortality worldwide [31]. Unravelling the functional relevance of miRNAs within pathogenic pathways is a major challenge in cardiovascular research. Recently, a numerous miRNAs have been reported to be associated with heart failure. For instance, plasma miR126 levels are upregulated in HF patients [32]. MicroRNA34 family members (miR34a, 34b, and 34c) are upregulated in the heart in response to stress [33]. Local microRNA133a downregulation is associated with hypertrophy in the dyssynchronous heart [34]. Table 3 shows top 40 predicted heart failure related miRNAs by applying our proposed method. As can be seen, it contains 19 new miRNAs associated with Heart failure. All of these predicted associations were confirmed by literatures.
Glioma is the most common central nervous system tumor and associated with poor prognosis. Identifying effective diagnostic biomarkers for glioma is particularly important in order to guide optimizing treatment [35]. Many studies have shown that some miRNAs are correlated with the diagnosis and prognosis of gliomas. For example, MiR34a acts as tumorsuppressor by targeting many oncogenes related to proliferation, apoptosis, and invasion of gliomas [36]. MicroRNA (miR) 125b regulates cell growth and invasion in pediatric low grade glioma [37]. MicroRNA21 promotes migration and invasion of glioma cells via activation of Sox2 and Î²catenin signaling [38]. Therefore, in this study, we chose glioma as a case study to demonstrate our modelâ€™s ability in prediction associations between miRNAs and diseases. Table 4 lists top 40 glioma associated miRNAs inferred by our model. As illustrated, there are 14 new miRNAs associated with glioma, which are uncovered by applying our proposed method and all of them have been validated by literatures.
Glaucoma is the second leading cause of blindness in the United States of America [39]. The most common types of openangle glaucoma (OAG) are primary openangle glaucoma (POAG) and exfoliation glaucoma (XFG) [40]. Recent studies have shown that miRNAs may play a role in pathways implicated in glaucoma and act as biomarkers for disease pathogenesis [41]. In this paper, openangle glaucoma is considered as an isolated disease because it is not associated with any miRNAs in the used datasets. However, our proposed method can be used to discover new associations for new diseases (or miRNAs) without any known associations before. As illustrated in Table 5, by applying our proposed method, 11 out of top 20 predicted openangle glaucomarelated miRNAs have been confirmed by recent literatures.
Discussions
Although our proposed method achieved a reliable performance, it still exists some limitations which require further research. Firstly, our method still focuses on unweighted tripartite graph, so it may be improved by weighting the known lncRNAdisease associations, known miRNAdisease associations, and verified lncRNAmiRNA interactions. Secondly, enhancing the algorithm of appropriating resources can integrate the updated lncRNAmiRNA interactions into resource allocation process. Finally, the latest useful datasets should be collected to update our dataset library (Additional files 1, 2, 3, 4, 5).
Conclusion
In this paper, we proposed a new method to infer miRNAdisease associations using collaborative filtering and resource allocation on a miRNAdiseaselncRNA tripartite graph. By applying our proposed method, we can improve prediction accuracy, solve the sparsity data problem, and have not to use subjective and not directly related to association prediction information. The experimental results show that our method achieves a reliable performance with AUC and AUPR values 0.9788 and 0.9373, respectively, which is more impressive than several mentioned previously methods. It demonstrates the ability to infer new associations between miRNAs and diseases as indicated in case studies of Prostatic Neoplasms, Heart Failure, and Glioma diseases. Besides, it can discover new associations for new diseases (or miRNAs) without any known associations as indicated in the case study of Openangle glaucoma disease. It suggests that our method can be considered as a powerful tool to predict miRNAdisease associations.
Availability of data and materials
The datasets used in our research were collected from Zhao et al.â€™s study https://doi.org/10.1186/s128590182146x [7].
Abbreviations
 AUC:

Area under Roc curve
 AUPR:

Area under precisionrecall curve
 FN:

False negative
 FP:

False positive
 FPR:

False positive rate
 TP:

True positive
 TPR:

True positive rate
 lncRNA:

Long noncoding RNA
 miRNA:

Micro RNA
 OAG:

Openangle glaucoma
 POAG:

Primary openangle glaucoma
References
Ambros V. The functions of animal microRNAs. Nature. 2004;431(7006):350â€“5.
Shen Z, Zhang YH, Han K, Nandi AK, Honig B, Huang DS. MiRNAdisease association prediction with collaborative matrix factorization. Complexity. 2017;2017.
Chen X, Xie D, Zhao Q, You ZH. MicroRNAs and complex diseases: from experimental results to computational models. Brief Bioinform. 2019;20(2):515â€“39.
Giannopoulou E, Alves P, Tewari AK, Gerstein MB. Epigenetic repression of miR31 disrupts androgen receptor homeostasis and contributes to prostate cancer progression. Cancer Res. 2014;73(3):1232â€“44.
Masson S, Batkai S, Beermann J, BÃ¤r C, Pfanne A, Thum S, et al. Circulating microRNA132 levels improve risk prediction for heart failure hospitalization in patients with chronic heart failure. Eur J Heart Fail. 2018;20(1):78â€“85.
Shi L, Cheng Z, Zhang J, Li R, Zhao P, Fu Z, et al. HsaMir181a and HsaMir181B function as tumor suppressors in human glioma cells. Brain Res. 2008;1236:185â€“93.
Zhao H, Kuang L, Wang L, Ping P, Xuan Z, Pei T, et al. Prediction of microRNAdisease associations based on distance correlation set. BMC Bioinform. 2018;19(1):1â€“14.
Liu Y, Zeng X, He Z, Zou Q. Inferring MicroRNAdisease associations by random walk on a heterogeneous network with multiple data sources. IEEE/ACM Trans Comput Biol Bioinform. 2014;14(4):905â€“15.
Chen X, Guan NN, Li JQ, Yan GY. GIMDA: graphlet interactionbased MiRNAdisease association prediction. J Cell Mol Med. 2018;22(3):1548â€“61.
Liang C, Yu S, Luo J. Adaptive multiview multilabel learning for identifying diseaseassociated candidate miRNAs. PLoS Comput Biol. 2019;15(4):1â€“18.
Li ZS, Liu B, Yan C. CFMDA: collaborative filteringbased MiRNAdisease association prediction. Multimed Tools Appl. 2019;78(1):605â€“18.
Yu SP, Liang C, Xiao Q, Li GH, Ding PJ, Luo JW. MCLPMDA: a novel method for miRNAdisease association prediction based on matrix completion and label propagation. J Cell Mol Med. 2019;23(2):1427â€“38.
Chen X, Niu YW, Wang GH, Yan GY. MKRMDA: multiple kernel learningbased Kronecker regularized least squares for MiRNAdisease association prediction. J Transl Med. 2017;15(1):1â€“14. https://doi.org/10.1186/s1296701713403.
MÃ¸rk S, Pletscherfrankild S, Caro AP, Gorodkin J, Jensen LJ. Proteindriven inference of miRNAdisease associations. Bioinformatics. 2014;30(3):392â€“7.
Sumathipala M, Weiss ST. Predicting miRNAbased diseasedisease relationships through network diffusion on multiomics biological data. Sci Rep. 2020;10(1):1â€“12. https://doi.org/10.1038/s41598020656336.
Ji BY, You ZH, Cheng L, Zhou JR, Alghazzawi D, Li LP. Predicting miRNAdisease association from heterogeneous information network with GraRep embedding model. Sci Rep. 2020;10(1):1â€“12.
Sarwar B, Karypis G, Konstan J. Itembased collaborative filtering recommendation algorithms. In: WWW â€™01 Proceedings of the 10th international conference on world wide web. 2001;285â€“295.
Liu NN, He L, Zhao M. Social temporal collaborative ranking for context aware movie recommendation. ACM Trans Intell Syst Technol. 2013;4(1).
Yu J, Xuan Z, Feng X, Zou Q, Wang L. A novel collaborative filtering model for LncRNAdisease association prediction based on the NaÃ¯ve Bayesian classifier. BMC Bioinform. 2019;20(1):1â€“13.
Ding L, Wang M, Sun D, Li A. TPGLDA: novel prediction of associations between lncRNAs and diseases via lncRNAdiseasegene tripartite graph. Sci Rep. 2018;8(1):1â€“11.
Nguyen VT, Le TTK, Tran DH. A new method on lncRNAdiseasemiRNA tripartite graph to predict lncRNAdisease associations. In: KSE2020 [Internet]. IEEE; 2020. p. 287â€“93. https://ieeexplore.ieee.org/document/9287563
Li J, Liu S, Zhou H, Qu L, Yang J. starBase v2.0: decoding miRNAceRNA, miRNAncRNA and proteinâ€”RNA interaction networks from largescale CLIPSeq data. 2014;42(December 2013):92â€“7. https://doi.org/10.1093/nar/gkt1248
Cui T, Zhang L, Huang Y, Yi Y, Tan P, Zhao Y, et al. MNDR v2.0: an updated resource of ncRNAdisease associations in mammals. Nucl Acids Res. 2018;46(D1):D371â€“4. https://doi.org/10.1093/nar/gkx1025
Bao Z, Yang Z, Huang Z, Zhou Y, Cui Q, Dong D. LncRNADisease 2.0: an updated database of long noncoding RNAassociated diseases. Nucl Acids Res. 2019;47(D1):D1034â€“7. https://doi.org/10.1093/nar/gky905
Li Y, Qiu C, Tu J, Geng B, Yang J, Jiang T. HMDD v2.0: a database for experimentally supported human microRNA and disease associations. 2014;42(November 2013):1070â€“4. https://doi.org/10.1093/nar/gkt1023
HajianTilaki K et al. Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Casp J Intern Med 2013; 2013;4(2):627â€“35.
Saito T, Rehmsmeier M. The precisionrecall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE. 2015;1â€“21.
McGuire S. World Cancer Report 2014. Geneva, Switzerland: World Health Organization, International Agency for Research on Cancer, WHO Press, 2015. Adv Nutr. 2016;7(2):418â€“9.
Fu Q, Sun Z, Yang F, Mao T, Gao Y, Wang H. SOX30, a target gene of miR6535p, represses the proliferation and invasion of prostate cancer cells through inhibition of Wnt/Î²catenin signaling. Cell Mol Biol Lett. 2019;24(1):1â€“13.
Budd WT, SeasholsWilliams SJ, Clark GC, Weaver D, Calvert V, Petricoin E, et al. Dual action of miR125b as a tumor suppressor and OncomiR22 promotes prostate cancer tumorigenesis. PLoS ONE. 2015;10(11):1â€“21.
Chen Y, Wang J, Sing K, Lee L, Wah O, Mark A. The association of heart failurerelated microRNAs with neurohormonal signaling. BBA Mol Basis Dis. 2017;1863(8):2031â€“40. https://doi.org/10.1016/j.bbadis.2016.12.019.
Wei XJ, Han M, Yang FY, Wei GC, Liang ZG, Yao H, et al. Biological significance of miR126 expression in atrial fibrillation and heart failure. Braz J Med Biol Res. 2015;48:983â€“9.
Bernardo BC, Gao XM, Winbanks CE, Boey EJH, Tham YK, Kiriazis H, et al. Therapeutic inhibition of the miR34 family attenuates pathological cardiac remodeling and improves heart function. Proc Natl Acad Sci U S A. 2012;109(43):17615â€“20.
van Middendorp LB, Kuiper M, Munts C, Wouters P, Maessen JG, van Nieuwenhoven FA, et al. Local microRNA133a downregulation is associated with hypertrophy in the dyssynchronous heart. ESC Hear Fail. 2017;4(3):241â€“51.
Zhou Q, Liu J, Quan J, Liu W, Tan H, Li W. MicroRNAs as potential biomarkers for the diagnosis of glioma: a systematic review and metaanalysis. Cancer Sci. 2018;109(9):2651â€“9.
Vaitkiene P, Pranckeviciene A, Stakaitis R, Steponaitis G, Tamasauskas A, Bunevicius A. Association of miR34a expression with quality of life of glioblastoma patients: a prospective study. Cancers (Basel). 2019;11(3):1â€“11.
Yuan M, Da Silva ACAL, Arnold A, Okeke L, Ames H, CorreaCerro LS, et al. MicroRNA (miR) 125b regulates cell growth and invasion in pediatric low grade glioma. Sci Rep. 2018;8(1):1â€“14.
Luo G, Luo W, Sun X, Lin J, Wang M, Zhang Y, et al. MicroRNA21 promotes migration and invasion of glioma cells via activation of Sox2 and Î²catenin signaling. Mol Med Rep. 2017;15(1):187â€“93.
Shaikh Y, Yu F, Coleman AL. Burden of undetected and untreated glaucoma in the United States. Am J Ophthalmol. 2014;158(6):1121â€“9. https://doi.org/10.1016/j.ajo.2014.08.023.
Drewry MD, Challa P, Kuchtey JG, Navarro I, Helwa I, Hu Y, et al. Differentially expressed microRNAs in the aqueous humor of patients with exfoliation glaucoma or primary openangle glaucoma. Hum Mol Genet. 2018;27(7):1263â€“75.
Hindle AG, Thoonen R, Jasien JV, Grange RMH, Amin K, Wise J, et al. Identification of candidate miRNA biomarkers for glaucoma. Investig Ophthalmol Vis Sci. 2019;60(1):134â€“46.
Qin W, Xie W, Yang X, Yang K, Zhou Q, Meng C. Downregulation of miR34a promotes the cell proliferation and inhibits apoptosis in glaucoma. Int J Clin Exp Pathol. 2016;9(2):1368â€“75.
Acknowledgements
Not applicable.
About this supplement
This article has been published as part of BMC Medical Genomics Volume 14 Supplement 3 2021: Selected articles from the 19th Asia Pacific Bioinformatics Conference (APBC 2021): medical genomics The full contents of the supplement are available at https://bmcmedgenomics.biomedcentral.com/articles/supplements/volume14supplement3.
Funding
Publication costs are funded by the Vietnam Ministry of Education and Training, project No. B2021SPH01. The funders did not play any role in the design of the study, the collection, analysis, and interpretation of data, or in writing of the manuscript.
Author information
Authors and Affiliations
Contributions
VTN, TTKL, DHT, TQVN conceived and designed the study; VTN, DHT performed computational analyses; VTN, TTKL collected data and performed experiments. VTN wrote the first draft of the manuscript. All authors contributed to writing the paper, read, and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable. The study does not involve human subjects, only used public data.
Consent for publication
Not applicable.
Competing interests
The authors declare that there is no competing interest in relation to the publication of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1
: For known lncRNAdisease associations.
Additional file 2
: For known lncRNAdisease associations.
Additional file 3
: For known lncRNAdisease associations.
Additional file 4
: For known lncRNAmiRNA associations.
Additional file 5
: For known diseasemiRNA associations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Nguyen, V.T., Le, T.T.K., Nguyen, T.Q.V. et al. Inferring miRNAdisease associations using collaborative filtering and resource allocation on a tripartite graph. BMC Med Genomics 14 (Suppl 3), 225 (2021). https://doi.org/10.1186/s12920021010788
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12920021010788