MiRNA-disease interaction prediction based on kernel neighborhood similarity and multi-network bidirectional propagation

Background Studies have shown that miRNAs are functionally associated with the development of many human diseases, but the roles of miRNAs in diseases and their underlying molecular mechanisms have not been fully understood. The research on miRNA-disease interaction has received more and more attention. Compared with the complexity and high cost of biological experiments, computational methods can rapidly and efficiently predict the potential miRNA-disease interaction and can be used as a beneficial supplement to experimental methods. Results In this paper, we proposed a novel computational model of kernel neighborhood similarity and multi-network bidirectional propagation (KNMBP) for miRNA-disease interaction prediction, especially for new miRNAs and new diseases. First, we integrated multiple data sources of diseases and miRNAs, respectively, to construct a novel disease semantic similarity network and miRNA functional similarity network. Secondly, based on the modified miRNA-disease interactions, we use the kernel neighborhood similarity algorithm to calculate the disease kernel neighborhood similarity and the miRNA kernel neighborhood similarity. Finally, we utilize bidirectional propagation algorithm to predict the miRNA-disease interaction scores based on the integrated disease similarity network and miRNA similarity network. As a result, the AUC value of 5-fold cross validation for all interactions by KNMBP is 0.93126 based on the commonly used dataset, and the AUC values for all interactions, for all miRNAs, for all disease is 0.93795、0.86363、0.86937 based on another dataset extracted by ourselves, which are higher than other state-of-the-art methods. In addition, our model has good parameter robustness. The case study further demonstrated the predictive performance of the model for novel miRNA-disease interactions. Conclusions Our KNMBP algorithm efficiently integrates multiple omics data from miRNAs and diseases to stably and efficiently predict potential miRNA-disease interactions. It is anticipated that KNMBP would be a useful tool in biomedical research.


Background
MicroRNAs (miRNAs) are a category of single-stranded small-non-coding RNAs(~22 nt) which play important roles in gene regression via interference in posttranscriptional regulation [1,2]. In the past decades, micro-RNAs were found in eukaryotes and viruses besides prokaryotes [3]. Previous research has shown that miRNAs was related to several human diseases like cancer, Alzheimer's disease and Diabetes Mellitus etc. [4][5][6]. miR-375 was found to be significant in the growth and response to metabolic stress of pancreatic islets [7].miR-21 negatively regulated Pdcd4 which can suppress TPA-induced neoplastic transformation [8]. miRNA-200 was detected in the metastasis of gastric adenocarcinoma cells [9]. miR-146a is a tumor suppressor inhibit NF-κB activity related to promotion and suppression of tumor growth [10].
Wang et al. [11] constructed a Directed Acyclic Graph (DAG) to describe a disease based on the MeSH descriptors. Then they calculated the disease semantic similarity by the DAG, and combined with the known miRNAdiseases interaction to construct the miRNA functional similarity, which was also used to preliminarily infer new potential functions or related diseases of miRNAs. Xu et al. [12] proposed a support vector machine (SVM) to predict the interaction between miRNA and tumor, but since the current database rarely provides a list of non-cancer miR-NAs, therefore, the lack of negative samples leads to a supervised learning model that is not well suited for largescale disease-miRNA interaction prediction.
The miRNA-disease interaction prediction problem can be regarded as a classification problem that lacks negative samples. According to this feature, a large number of network-based semi-supervised methods have been proposed, most of which are based on similar miRNAs (diseases) are more likely to interact with the same disease (miRNA). Chen et al. [13] adopted restart random walk (RWRMDA) to predict the potential miRNA-disease interaction, which restarted the known miRNA-disease interaction network, using random walks on miRNA functional similarity network to predict potential miRNA-disease interaction. Since the restart operator of RWRMDA is based on the known miRNA-disease interaction network, this method does not apply to predictions of new diseases that are not associated with any miRNA. The regularized least squares algorithm (RLSMDA) was also proposed by Chen et al. [14] in 2015 to predict miRNA-disease interactions, which uses both the disease semantic similarity and the miRNA functional similarity to calculate miRNAdisease interaction scores, and the weighted linear combination of the two scores was used as the final result. The method combined disease similarity network and miRNA similarity network to predict simultaneously, which improves the prediction accuracy and enhanced the predictive power of the model to some extent. However, the model is highly dependent on parameters, and how to set appropriate parameters is the defect of the model. Subsequently, in 2018, Chen et al. [15] released a Graph Regression model to predict miRNA-disease interactions by using singular value decomposition (SVD) to decompose the interaction matrix, the disease similarity matrix and the miRNA similarity matrix, then using partial least squares (PLS) to perform graph regression in interaction space, miRNA similarity space, and disease similarity space. SVD decomposition and PLS regression can eliminate noise to a certain extent, but it also causes information loss, which leads to the reduction of model accuracy. Recently, Chen et al. proposed two novel models: the hierarchical clustering recommendation algorithm [16] (BNPMDA) and the low rank matrix decomposition [17] (IMCMDA) algorithm to predict potential miRNA-disease interactions. Both models have the advantage of fewer parameters, but the former uses only known miRNA-disease interaction networks for inference, so it cannot predict new miRNAs and new diseases, and the latter leads to a reduction in prediction accuracy due to matrix decomposition. The miRNA functional similarity used in the above algorithms is based on the method of Wang et al. [11], which depends on the known miRNA-disease interactions, so these models cannot predict new miRNAs.
Luo et al. [18] proposed a Kronecker regularized least squares, which calculated miRNA functional similarity based on miRNA-gene interaction network and gene weight network, combined with disease semantic similarity to predict potential miRNA-disease interactions. The model enhances the predictive power of new miRNAs by integrating heterogeneous omics data of miRNAs, but the model is highly dependent on the weight coefficients of different similarity measurements, which greatly affects its promotion and practical application ability. Xiao et al. [19] constructed a graph regularized non-negative matrix factorization method, which decomposes the modified known miRNA-disease interaction network, and uses miRNA functional similarity and disease semantic similarity to construct regularization operators for prediction. The model can predict new miRNAs and new diseases, but more model parameters and stronger parameter dependencies also reduce the performance of the model. Both of these models use information outside the miRNA-disease interaction dataset to construct miRNA functional similarity, which enhances their ability to predict new miR-NAs. However, they only use MeSH descriptors to describe disease similarity, resulting in a sparsely diseased network, which limits the predictive performance of the model.
Here, we propose a new framework, kernel neighborhood similarity and multi-network bidirectional propagation (KNMBP), which uses multiple omics data to infer unknown miRNA-disease interactions. KNMBP uses diseasegene interactions, disease-biological process interactions, and disease semantic information to construct a novel disease semantic similarity network, using miRNA-target interactions and gene weight networks to construct a novel miRNA functional similarity network. Different from previous methods, the miRNA functional similarity and disease semantic similarity calculated in this paper does not utilize the known miRNA-disease interaction, but excavates more feature information of miRNA and disease from other latest datasets, which greatly expands our ability to predict new miRNA and disease. The accumulated research [15,20] shows that the known miRNA-disease interaction network also contains important feature information of miRNA and disease, and the reasonable use of this information can well enhance the prediction ability of the model. In these considerations, based on the modified miRNA-disease interaction, we use the kernel-based neighborhood similarity algorithm to calculate the disease kernel neighborhood similarity and miRNA kernel neighborhood similarity. Finally, based on the integrated miRNA (disease) similarity network, we constructed a bidirectional propagation model to predict potential miRNA-disease interaction scores. The experimental results show that KNMBP not only has a good ability to predict new interactions, new miRNAs and new diseases, but also has the advantage of parameter robustness.

Methods overview
To predict unknown miRNA-disease interactions, we propose a new KNMBP model with five parts, as shown in Fig. 1. First, we calculate miRNA functional similarity and disease semantic similarity by using multiple histological data other than miRNA-disease interaction information (as shown in step 1 of Fig. 1). Second, based on the modified known miRNA-disease interaction network, we use the kernel-based neighborhood similarity model (KSNS) to calculate the disease kernel neighborhood similarity and miRNA kernel neighborhood similarity (as shown in step 2 and step 3 of Fig. 1). Finally, based on the integrated miRNA (disease) similar network calculated by Diffusion Component Analysis (clusDCA), we released a bidirectional propagation algorithm to predict unknown miRNA-disease interaction scores (as shown in step 4 and step 5 in Fig. 1).

Dataset collection
In order to fairly compare the performance of the model, we used two benchmark datasets to conduct experiments.
For benchmark dataset I, we utilized the dataset of miRNA-disease interaction prediction established by Fig. 1 The flow diagram of KNMBP model. In Step 1 and Step 2, the red box indicates disease, the grass green triangle indicates the gene, the circle indicates the miRNA, the pentagon indicates the biological process corresponding to the disease, SF m and SS d represent improved miRNA functional similarity and disease semantic similarity, respectively, WKNNP represents a weighted k-neighborhood profile algorithm used to preprocess the interaction matrix. In Step 3, SI m and SI d represent disease kernel neighborhood similarity and miRNA kernel neighborhood similarity, respectively. In Step 4, clusDCA represents the network fusion algorithm based on diffusion component analysis Chen et al. [16,17]. The dataset I consists of three parts: First, 5430 interactions between 383 diseases and 495 miRNAs were extracted from HMDD v2.0 [21]. Second, based on the Medical Subject Headings (MeSH) descriptors in the U.S. National Library of Medicine, two semantic similarity matrices of diseases were established by wang et al. [11] and Xuan et al. [22], respectively. Third, the functional similarity matrix of miRNA was established by Lu et al. [23]. All these data can be downloaded from https://github. com/IMCMDAsourcecode/IMCMDA. However, Dataset I is based on the old version (HMDD v2.0), and it also has the disadvantage that the disease semantic similarity is very sparse and the miRNA functional similarity depends on the known miRNA-disease interaction. Therefore, we extracted information about miRNAs and diseases from several latest databases and built benchmark dataset II. We describe the establishment of dataset II from three aspects.
First, extract information about the disease. The Comparative Toxicogenomics Database (CTD) is an important database of disease research that provides a wealth of interactive information between disease and chemistry, genetic products, phenotypes and the environment [24]. Disease items in CTD are described by MeSH ID, which is a hierarchical vocabulary that provides a strict classification system for studying the relationships among various diseases, and the relationships between any diseases can be illustrated by a directed acyclic graph (DAG). For example, the MeSH ID of the disease "Deletion Syndrome (Partial)" was "MesH: C538288" in CTD, whose parent diseases are "Chromosome Deletion" and "Chromosome Disorders", and the corresponding MesH ID were "MesH:D002872" and "MesH: D025063", respectively. In order to get a detailed description of the disease, we download 12,988 diseases, including the names of diseases, multiple ID representations of the diseases, and information about their parent nodes. Furthermore, we downloaded gene-disease interactions, including 25,114,553 interactions between 46, 045 genes and 7163 diseases. At the same time, disease-GO biological process interactions, including 1,727,119 interactions between 13,126 GOs and 7116 diseases were also downloaded.
Second, extract information about the miRNA. In order to accurately describe the relationship between miRNAs, we extracted as complete as possible miRNA interaction information from multiple latest databases. We obtained the miRNA-gene interaction information from experimentally verified databases, including TarBase (version 8.0) [25], miRTarBase (version 7.0) [26], miRNAMAP (version 2.0) [27], miRecord (version 4) [28]. DIANA-TarBase v8 is a reference database for indexing experimentally supported microRNA targets, has more than a decade of support in the field of non-coding RNA [25]. We downloaded 927,119 miRNA-gene interactions from the database, after the removal of non-human gene and converted the gene ID into Entrez Gene identifiers, a total of 423,392 interactions between 18,345 genes and 1084 miRNAs are retained. Meanwhile, we performed ID transformation of the genes in the miR-TarBase database, deleted the null miRNAs and target genes, and finally obtained 381,088 interactions between 2599 miRNAs and 15,064 genes. Similarly, we extracted 83,071 interactions between 1135 target genes and 471 miRNAs from miRNAMAP, and obtained 1269 interactions between 767 target genes and 203 miRNAs from the miRecord. Based on miR-Base [29], all of the above miRNAs were transformed into the v22 version using the R package 'miRBase-Converter', and the null and duplicate miRNAs were deleted. After integration, a total of 588,134 interactions between 2814 miRNAs and 18,468 genes were obtained. In addition, Lee et al. [30] integrated 21 omics data from multiple organisms by modifying bayes and used logarithmic likelihood scores to measure the probability of interaction between two genes with true functional links. To build similarity networks of genes, we downloaded the human weighted gene network data from the HumanNet database, which contained the log likelihood score of 476,399 interactions among 16,243 genes.
Third, extract interactive information of miRNA and disease. The human microRNA Disease Database (HMDD) collects large amounts of human miRNAdisease interactions from genetics, epigenetics, circulating miRNA and miRNA target interactions, and provides detailed annotation of miRNA-disease interactions [21]. In June 28, 2018, HMDD (version 3.0) [31] was also released, which provides 200.2% of human miRNA-disease interactions and has more evidence to classify. We extracted the disease information with MeSH ID or OMIM ID from HMDD v3.0, removed duplicate miRNA-disease interactions, and obtained 14,457 interactions between 1045 miRNAs and 627 diseases. To ensure all the miRNA similarity and all the disease similarity can be calculated, we delete the diseases and miRNAs not in the above two datasets, and finally got 10,561 interactions between 574 miRNAs and 579 diseases. The details of the two benchmark datasets are shown in Additional file 1.

Construction of disease semantic similarity network
In fact, most methods use MeSH descriptors to construct a directed acyclic graph of the disease, which contains common information between different diseases is used to describe the disease similarity, which leads to a sparsely similar network [16,17]. In order to construct a more reasonable disease semantic similarity, we make full use of the various omics data to calculate the similarity of the disease. Proteinencoding genes can affect the pathogenesis of the disease to some extent [32], so disease-gene interactions also imply some features of the disease. Similarly, the gene ontology biological process of the disease is also the reflection of some characteristics of the disease. In this paper, we combine the diseasegene interactions (D-G) and disease-GO biological process interactions datasets (D-GO), and the MeSH descriptors of the disease, using the MultiSourcDSim model proposed by Lei et al. [33] to calculate the disease semantic similarity.
Based on the MeSH descriptor, a directed acyclic graph (DAG) can be used to describe the semantic relationship between diseases. Any disease d in the DAG can be expressed as representing the set of direct child nodes and direct parent nodes of disease d, respectively, and A(d) represents the set constituted by all ancestor nodes of disease d.
First, combining the disease interaction dataset (D-G or D-GO) and DAG, the frequency FT c (d) of any disease d in the DAG can be calculated: where f c (d) represents the frequency of d in the interaction dataset c, it can be seen that the occurrence frequency of d in DAG is equal to the sum of the occurrence frequency of all its direct child nodes and the frequency of itself in the interaction dataset. Then, normalize the frequency of disease occurrence as follow: Where, PT c (root) represents the occurrence frequency of the root node in DAG. According to Eqs. 1 and 2, it can be known that 0 ≤ PT c (t) ≤ 1. Based on the more information shared, the higher the similarity. The disease similarity can be obtained: Where, COM(d 1 , d 2 ) is the set of the minimum common ancestor of the disease d 1 and d 2 , and it is easy to see that 0 ≤ S c (d 1 , d 2 ) ≤ 1. According to D-G and D-GO, we can obtain two disease similarity networks {S c , c = 1, 2}. After that, the clusDCA [34] was used to integrate the disease similar networks, and the integrated semantic similar network SS d was finally obtained.

Construction of miRNA functional similarity network
In order to overcome the dependence of miRNA functional similarity on known miRNA-disease interaction network, the algorithm can predict miRNAs not associated with any disease. We calculate the miRNA functional similarity by means of Luo [18] and Xiao's [19] methods. Specifically, we used miRNA target gene interaction network and gene similarity network to calculate miRNA similarity.
First, we normalized and symmetrized the loglikelihood score data between genes downloaded from HumanNet: Where S g (g i , g j ) represents the similarity between gene g i and gene g j , LLS(i, j) represents the log-likelihood score between gene g i and gene g j , MAX LLS represents the maximum log-likelihood score. At this point, we can define the similarity between any gene g i and any gene set G: Where, S g (g i , G) represents the similarity between g i and G. Then, we can get the functional similarity between miRNA m i and miRNA m j : Where, SF m (m i , m j ) represents the functional similarity between m i and m j , G i represent the gene set associated with m i , and |G i | represent the number of genes in the set G i .

Kernel-based neighborhood similarity
Reasonable use of known miRNA-disease interaction information can greatly improve the performance of the model [17,18]. In this paper, based on the known miRNA-disease interactions, we used the kernel-based neighborhood similarity (KSNS) [35] to calculate miRNA (disease) kernel neighborhood similarity. KSNS not only comprehensively utilizes the distance similarity and structural similarity of samples, but also fully excavates the nonlinear structural similarity information between samples, achieving a good prediction effect in lncRNA-protein interaction prediction. In addition, to overcome the sparse problem of the interaction matrix, a weighted k-neighborhood profile (WKNNP) algorithm was proposed by Xiao et al. [19] to preprocess the interaction matrix, achieved good results. Based on the above two points, we first use WKNNP to preprocess the known interaction matrix, and then uses KSNS to calculate the kernel neighborhood similarity of miRNA (disease).
Let the matrix X of the NM rows and ND columns represent the miRNA-disease interaction matrix, then X can be expressed as: where M i is the ith row vector of X, could be regarded as the interaction profile feature of miRNA m i ; D j is the jth column vector of X, could be regarded as the interaction profile feature of disease d j .
According to the WKNNP algorithm, we make use of K-nearest neighbor feature of m i to enrich the interaction profile M i , then the modified interaction profilê M i of m i is as follows: Where Similarly, we can get the disease modified interaction profile matrix X d ¼ ½D 1 ;D 2 ; ⋯;D ND . Finally, the modified interaction profile matrix X is shown as follows: Now, based on theX, we make use of KSNS to calculate miRNA (disease) kernel neighborhood similarity. First, we construct the K-neighboring discriminant matrix of miRNA based on the miRNA functional similarity: Where N(m i ) represents the set of NK nearest miRNAs of m i , NK = ⌊PN × N⌋, PN denotes neighbors proportion parameter, N is the total number of samples, ⌊•⌋ means round down. Then weight matrix W of miRNA is as follow: Where, Φ(•) denotes kernel function, ‖•‖ F represents-Frobenius norm, ⨀ is an element-by-element multiplication, μ 1 is non-neighborhood control parameters, μ 2 is similarity regularization parameters, e = (1, 1, ……, 1) T . The first item of constraint requires the sum of reconstruction weights of each sample to be 1, the second requires that all elements in W are non-negative, and the third term indicates that the self-similarity of miRNA is 0. Using the Lagrange multiplier method and the Karush-Kuhn-Tucker (KKT) condition, the iterative formula of W is as follows: Where k(X, X) represents the kernel matrix of X. In this paper, we select Gaussian kernel function, which is represented as: Where k(x i , x j ) is the kernel of any two samples of x i , represents the regularized bandwidth parameter. After that, we conducted multiple normalization operations on the weight matrix W to obtain the miRNA kernel neighborhood similarity matrix SI m , and the normalization formula is as follows: Where, the diagonal matrix D = diag (d 1 , d 2 , …, d NM ), Similarly, we can get the disease kernel neighborhood similarity SI d . Then the clusDCA [34] was used to integrate the miRNA functional similarity SF m (disease semantic similarity matrix SS d ) and kernel neighborhood similarity SI m (kernel neighborhood similarity SI d ) to obtain the final miRNA similarity matrix S m = (disease similarity matrix S d ).

Bidirectional propagation algorithm
Based on miRNA similarity, disease similarity and known miRNA-disease interaction information, we proposed a bidirectional propagation algorithm to predict the miRNA-disease interaction score.
Let (F) NM × ND be the miRNA-disease interaction score matrix, then F can be decomposed as F ¼ ½FM T 1 ; FM T 2 ; ⋯; FM T NM ¼ ½FD 1 ; FD 2 ; ⋯; FD ND , Where, FM T i represents the predicted interaction score of miRNA m i with all diseases, and FD j denotes the predicted interaction score of disease d j . Based on the hypothesis that higher similarity miRNAs are more likely to be interacted with the same disease, we can get: Where k F−Y k 2 F represents the overall prediction error, which is required to be as small as possible, λ m and λ d are the Laplacian regularization parameters of miRNA and disease, respectively. The derivative of Eq. 16 for F is as follows: In order to speed up the optimization of the gradient algorithm, we use AdaGrad algorithm [34] to adaptively choose the gradient step size. The details of the optimization algorithm to the proposed bidirectional propagation model are described in Algorithm 1.

Experimental settings
To evaluate the performance of the KNMBP algorithm fairly, we performed the 5-fold cross-validation (CV) on Dataset I and Dataset II, and compared with the following methods: IMCMDA [17], BNPMDA [16] and RLSMDA [14], KRLSM [18], RWRMDA [13]. Specifically, for each method, we performed CV four times, each time using a different seed, and the mean value of the AUC values under different seeds was taken as the final AUC value of the method. The miRNA-disease interaction matrix Y ∈ R NM × ND had NM rows for miRNAs and ND columns for diseases. We carried out three types of CV as follows [36]: (1) CV a : CV on all miRNA-disease pairs. In order to ensure that the known interactions could be evenly distributed, we randomly divided the known and unknown interactions into five equal parts, one of which was selected as the test set in turn, and the association contained in it was deleted as the training set. (2) CV m : CV on miRNAs (row vectors in Y), all miRNAs were randomly divided into five equal parts, one of which was selected as the test set in turn, and its association was deleted as the training set. (3) CV d : CV on diseases (column vectors in Y), all diseases were randomly divided into five equal parts, one of which was selected as the test set in turn, and its association was deleted as the training set.
In each crossover experiment, Under CV a , 80% of Y elements are used as the training set, and the remaining 20% are test set; Under CV m , 80% of rows in Y are used as the training set, and the remaining 20% are test set; Under CV d , 80% of columns in Y are used as the training set, and the remaining 20% are test set. In Dataset I, since the disease semantic similarity matrix is sparse, and the miRNA functional similarity relies on known miRNA-disease interactions, most of the methods only perform CV a experiment. Therefore, we only perform CV a on Dataset I, and perform the above three CV on Dataset II.

Cross validation
For each CV, we calculated the prediction interaction scores of the test set by the above six methods, and normalized all the prediction interaction scores as follows: Where PS(i, j) represents the predicted interaction score of miRNA m i and disease d j , minPS represents the minimum value of PS, and maxPS represents the maximum value of PS. Then, the [0,1] interval is equally divided into 1000, and each of the points is sequentially selected as a threshold, and calculate the True Positive Rate (TPR, sensitivity) and False Positive Rate (FPR, 1specificity) under each specific threshold. After that, we calculate the mean value of the TPR and the FPR for each threshold under CV, draw the corresponding TPR and FPR curve. Figure 2 shows the optimal AUC and corresponding ROC curves for each model under CV. The optimal parameters of KNMBP and the corresponding AUC values are shown in Additional file 2.
In the above experiment, CV a tested the predictive performance of the model for new interactions, and CV m and CV d tested the predictive performance for new miRNAs and new diseases, respectively. It can be seen that our method (KNMBP) achieves the best prediction results in Fig. 2. Specifically, based on Dataset I, the AUC value of KNMBP for CV a can reach 0.93126, which is 9.67, 5.69, 11.57, 3.41, and 10.31% higher than RWRMDA, RLSMDA, BNPMDA, KRLSM, and IMCMDA, respectively. Based on Dataset II, the AUC value of KNMBP for CVa can reach 0.93795, which is 7.97, 3.58, 13.68, 5.31 and 16.49% higher than the other five methods respectively. Since BNPMDA based on binary recommendation algorithm needs to utilize known miRNA-disease interactions to achieve resource allocation, it cannot predict new miRNA and new diseases [16]. RWRMDA, which restarts the random walk on MiRNA similarity network, is also not suitable for prediction of new diseases [13]. Therefore, RLSMDA, KRLSM and IMCMDA were selected as comparison algorithms under CV d , and the AUC value of KNMBP could reach 0.86363, which was 7.66, 25.577 and 12.93% higher than the other three methods (RLSMDA, KRLSM, IMCMDA). For CV m , the AUC of KNMBP can reach 0.86937, which is 0.62, 0.67, 11.09, 5.31 and 12.68% higher than the other four methods (RWRMDA, RLSMDA, KRLSM, IMCMDA), respectively.

Parametric sensitivity analysis
In machine learning, with the change of experimental scenarios, the optimal parameter combination may be very different, and the parameter selection may have a huge impact on the performance of the model, so the sensitivity analysis of parameters is often very important. In this section, we focus on the influence of four parameters, namely, neighbor proportion parameter PN, Laplace regularization parameter λ = λ m = λ d , nonneighborhood control parameter μ 1 and similarity regularization parameter μ 2 , on the prediction performance of the model. Let F cv = c (PN = i, λ = j, μ 1 = s, μ 2 = t) represent the AUC value of the KNMBP algorithm when cv = c, c ∈ {1, 2, 3, 4} is performed and the parameters are set to PN = i, λ = j, μ 1 = s, μ 2 = t. In order to facilitate the visualization of the results, for each type of CV we combined the above four parameters in pairs to analyze the influence of the paired parameters on the predicted results of the model. First, we consider the influence of neighbor proportion parameter PN and Laplace regularization parameter λ on the predictive performance of the model. When PN = i, λ = j, and the other two parameters change arbitrarily, we calculate the maximum AUC value of KNMBP ( maxAUC c i; j ), the average AUC value (meanAUC c i; j ) and the minimum AUC value (minAUC c i; j ), as shown below: Where μ 1 ∈ ∀ and μ 2 ∈ ∀ represent arbitrary values of the parameters μ 1 and μ 2 within their these two parameters have little influence on the prediction performance of the model. According to Fig. 3 and Additional file 3, when the parameters of the model change within a certain range, KNMBP can always achieve better prediction performance, indicating that our algorithm has strong parameters robustness.

Case study
To further demonstrate the predictive performance of KNMBP algorithm for novel miRNA-disease interac- The KNMBP was used for prediction, and the candidate miRNAs of 166 diseases ranked according to their predicted scores were provided in Additional file 4. Figure 4 shows the confirmed ratio of candidate miRNAs for 11 diseases under different thresholds. For example, the top 10 predicted scores of candidate miRNAs for Bladder Neoplasms are all confirmed in HMDD v3.0. Twentyseven of the top 30 predicted scores were confirmed in HMDD v3.0. As can be seen from Fig. 4, most of the top candidate miRNAs for these diseases can be confirmed in the latest version. In addition, in order to further test the validity of the predicted results, we divided the candidate miRNAs for each disease into two groups according to the predicted scores, called Top group and Bottom group respectively [19], with 20 candidate miRNAs in each group, and then used fisher's exact test to evaluate the statistical differences between the two groups. Figure 5 shows the proportion of confirmed candidate miRNAs in the Top group and Bottom group of four diseases and the significance level p by fisher's exact test. For example, 18 of the candidate miRNAs in Colon Neoplasms's Top group were confirmed (proportion of 0.9), and 2 of the Bottom group were confirmed (proportion of 0.1), with a p value of 5.2959 × 10 −7 . This suggests that the candidate miRNAs of Colon Neoplasms in the Top group are more likely to be confirmed than that in the Bottom group. Meanwhile, the p values were 1.4509 × 10 −11 , 3.5997 × 10 −4 , 2.4436 × 10 −4 for Bladder Neoplasms, Glioma, Ovarian Neoplasms, respectively. The test results verified that the number of confirmed miRNAs in the Top group were significantly higher than that in the Bottom group, which further demonstrated the high efficiency of KNMBP algorithm in predicting new miRNA-disease interactions.
As shown in Additional file 5, the top 10 candidate miRNAs for these four diseases and their confirmation in HMDD v3.0 [31], miRCancer [37] and dbDEMC 2.0 [38]. Specifically, for Gladden Neoplasms and Colon Neoplasms, their top 10 candidate miRNAs were all con- Top 90 Top 100 Fig. 4 For different thresholds, the proportion of candidate mirnas that have been confirmed to be associated with the disease HMDD v3.0 and one was confirmed in miRCancer; For Ovarian Neoplasm, 9 were confirmed in HMDD v3.0 and one was confirmed in dbDEMC 2.0. Finally, all the interactions in Dataset II extracted from the current latest database were used as the training set, and the candidate miRNAs of 579 diseases predicted by KNMBP algorithm were sorted according to scores, as shown in Additional file 6.

Discussion
The KNMBP proposed in this paper not only has high performance in predicting unknown miRNA-disease interactions, but also can efficiently predict the new miRNA (disease), which not associated with any disease (miRNA). In order to fairly evaluate the performance of the model, we compare the performance of it and several state-of-the-art models to the common Dataset (Dataset I) and the Dataset (Dataset II) extracted by ourselves for 5-fold cross validation (CV). In Dataset I, the AUC value of KNPMBP could reach 0.93126 when we perform CV on interactions. In Dataset II, the AUC value of KNMBP could reach 0.93795, 0.86937 and 0.86363 when we perform CV on interactions, on miRNAs and on diseases, respectively. The predicted results of our method were all better than other methods. In order to evaluate the predictive performance of KNMBP for new miRNAdisease interactions, we extracted the data from the old version database and tested the predicted results with the new version. Statistical results of 11 diseases confirmed that most of the top candidate miRNAs could be confirmed in the new version dataset. We divided the candidate miRNAs of the four common tumors into the Top group and the Bottom group according to the predicted scores. The fisher's exact test results further confirmed that the number of confirmed miRNAs in the Top group were significantly higher than that in the Bottom group. In addition, the results of parameter sensitivity analysis show that KNMBP algorithm has the advantage of parameter robustness when the parameters are taken in a wide range. The reason why the KNMBP algorithm has higher performance is mainly due to the following aspects. First, we constructed more reasonable disease semantic similarity network and miRNA functional similarity network. Specifically, instead of using Directed Acyclic Graph (DAG) alone to describe the disease similarity, we comprehensively used the gene-disease interactions, disease-GO biological process interactions and the MeSH descriptor to calculate the disease similarity, and more fully mined the similarity information between diseases to obtain more dense and accurate disease similarity network. In addition, previous methods for constructing miRNA functional similarity network mostly rely on the known miRNA-disease interaction, therefore they cannot predict new miRNAs. In this paper, the miRNA functional similarity is calculated by integrating miRNAtarget gene interaction network and gene weight network, avoiding dependence on known miRNA-disease interactions and ensuring the prediction of new miR-NAs. Secondly, in order to overcome the sparseness of the miRNA-disease interaction network and fully exploit the miRNA (disease) feature information, we utilized the weighted K neighborhood profiles to make a weighted correction on the sparse interaction network, taking advantage of neighborhood information to reduce the interaction network sparsity. Meanwhile, we used KSNS to calculate the miRNA (disease) kernel neighborhood similarity. Different from Gaussian function similarity and linear neighborhood similarity [20], KSNS not only makes full use of non-neighborhood information, but also fully excavates the nonlinear structural similarity between samples, consider both the distance similarity and the structural similarity of samples. Thirdly, we used diffusion component analysis to integrate the heterogeneous omics data of disease similarity and miRNA similarity respectively. The fused miRNA (disease) similarity network can not only effectively utilize the feature information among the known interactions, but also reflect the new similarity information obtained from other omics data. Fourthly, the bidirectional propagation algorithm simultaneously spreads the known miRNA-disease interactions from the similarity network of both disease and miRNA respectively, making full use of the global network information of miRNA and disease. Although KNMBP efficiently predicted the unknown miRNA-disease interactions, there are some limitations. First, we tried to build the disease semantic similarity networks and miRNA functional similarity networks by making use of other latest data resources, however, there may be noises and errors in these similarity networks. Secondly, our evaluation is based on the known miRNAdisease interaction which may be not complete. Although the known miRNA-disease interactions have been greatly improved over the previous years, the proportion of these interaction in the total miRNA disease pair is still very low, which leads to some errors in the evaluation of our prediction results.

Conclusion
Studies on the potential miRNA-disease interactions can help people understand the pathogenesis of diseases and design reasonable treatment schemes. In this paper, we proposed a new computational model (KNMBP) to predict the potential miRNA-disease interactions. Compared with other state-of-the-art methods, KNMBP not only has higher prediction accuracy on unknown miRNA-disease interaction, but also can effectively find potential interaction of new disease (or miRNA) without any known related miRNA (or disease). Furthermore, the proposed model is not sensitive to parameter. These indicate that our algorithm can integrate multiple omics data of miRNAs and diseases, and have a wide application prospect in miRNA and disease research.