Data sources
The benchmark dataset (see Additional file 3) used in this manuscript is downloaded from [21, 26, 27]. Here below we provide a brief description.
The miRNA-miRNA functional similarity data
The miRNA-miRNA functional similarity scores were downloaded from http://cmbi.bjmu.edu.cn/misim/[26]. In this dataset, a functional similarity score for each miRNA pair is calculated based on the observation that genes with similar functions are often associated with similar diseases. The miRNA functional similarity scores have been successfully used to infer novel human miRNA-disease associations in [22].
The disease phenotype similarity data
We downloaded the disease phenotype similarity scores from the MimMiner [27], developed by van Driel et al. who computed a phenotype similarity score for each phenotype pair by the text mining analysis of their phenotype descriptions in the Online Mendelian Inheritance in Man (OMIM) database [28]. The phenotypic similarity scores have been successfully used to predict or prioritize disease related protein-coding genes [29, 30].
The human miRNA-disease association data
We downloaded the 270 known experimentally verified miRNA-disease associations provided in [21]. We discovered that 19 miRNAs could not be searched in [26]. After removing the 19 miRNAs from the 270 known associations, we finally received 242 verified miRNA-disease associations consisting of 99 miRNAs and 51 disease phenotypes.
Method description
We denote the miRNA set as M = {m
1, m
2, …, m
n
} and the phenotype set as P = {p
1, p
2, …, p
m
}, the miRNA-disease associations can be described as a bipartite MP graph G(M, P, E), where E = {e
ij
: m
i
∈ M, p
j
∈ P}. A link is drawn between m
i
and p
j
when the miRNA m
i
is associated with the phenotype p
j
. The MP bipartite network can be presented by an n×m adjacent matrix {a
ij
}, where a
ij
=1 if m
i
and p
j
is linked, while all other unknown miRNA-disease pairs are labeled as 0 to indicate they are going to be predicted. We define M(n*n), P(m*m), and a(n*m) as the adjacency matrix of the miRNA functional similarity network, the disease phenotype similarity network, and the miRNA-disease association network, respectively.
MicroRNA-based similarity inference (MBSI)
The basic idea of this method is: if a miRNA is associated with a disease, then other miRNAs similar to the miRNA will be recommended to be associated with the disease. For an MP pair m
i
-p
j
, a linkage between m
i
and p
j
is determined by the following predicted score:
(1)
where S(m
i
, m
l
) is miRNA functional similarity value between miRNAs m
i
and m
l
.
Phenotype-based similarity inference (PBSI)
The basic idea of this method is: if a miRNA is associated with a disease, then the miRNA will be recommended to be associated with other similar diseases. For an MP pair m
i
-p
j
, a linkage between m
i
and p
j
is determined by the following predicted score:
(2)
Where S(p
j
, p
l
) is disease phenotype similarity value between diseases p
j
and p
l
.
Network-consistency-based inference (NetCBI)
The basic idea of network consistency is that, if miRNAs are ranked by their relevance to a query miRNA, and phenotypes are ranked by their relevance to the hidden target phenotype of the query miRNA, the top-ranked miRNAs and the top-ranked disease phenotypes should be highly connected by known associations. Unlike the above two inference methods, NetCBI integrates the miRNA-miRNA functional similarity network data and the disease phenotype similarity network data. The idea of network consistency has been successfully used to predict gene-phenotype associations in [24]. The solid foundation for the algorithm can be traced back to [25]. We formulate a graph query problem for miRNA and disease association discovery. The query miRNA is represented by a binary vector m = [m
1, m
2, …, m
n
]T denoting the miRNA membership against the miRNA set, i.e. each m
i
=1 if miRNA i is the query miRNA, otherwise m
i
=0. Similarly, the list of target phenotypes is given by another binary vector p = [p
1, p
2, …, p
m
]T and phenotype j is a target phenotype if p
j
=1.
To make full use of global network similarity information, we compute the global relevance score between the query miRNA m and all the miRNAs based on the graph Laplacian of the miRNA functional similarity network M(n*n). We first normalize M as , where i is the column number of M. A vector of graph Laplacian scores is derived from:
(3)
In Equation (3), the first term is a smoothness penalty, which forces connected miRNAs to receive similar scores, and the second term ensures the consistency with the query miRNA. Parameter α ∈ (0, 1) balances the contributions from the two penalties. The close solution to Equation (3) is
(4)
Similarly, graph Laplacian scores can be derived to measure the relevance between the phenotypes and the target phenotype p with the close solution
(5)
where is the normalized P and parameter β ∈ (0, 1).
Our method uses consistency in networks to measure whether the query miRNA m and a target phenotype p show coherent association with the known miRNA-phenotype associations. Specifically, given the graph Laplacian scores m, which ranks the miRNAs by their relevance to the query miRNA , and the graph Laplacian scores , which ranks the phenotypes by their relevance to the hidden target phenotype p, NetCBI measures whether the associations given by a are connecting miRNAs and phenotypes with similar scores in and . We simply go through each phenotype and compute a Pearson correlation coefficient score against the query miRNA m for each case.
(6)
Finally, the phenotype(s) with the highest score(s) is chosen as the target phenotype(s).