Skip to main content

Table 2 Performance of a selection of drug-disease similarity scores.

From: Compensating for literature annotation bias when predicting novel drug-disease relationships through Medical Subject Heading Over-representation Profile (MeSHOP) similarity

Scoring Method

Direct Connection Validation AUC

CTD Validation AUC

PREDICT Validation AUC

Corrected drug-disease p-value

0.65

0.76

0.66

Cosine distance tf-idf

0.88

0.91

0.87

Cosine distance of p-values

0.64

0.70

0.52

Cosine distance of term fractions

0.78

0.83

0.80

Sum of the log of combined p-values

0.92

0.93

0.80

Sum of the differences of log p values

0.89

0.86

0.58

L2 of log-p of intersecting terms

0.95

0.92

0.66

L2 of term fractions of intersecting terms only

0.64

0.55

0.57

L2 of log of p-values

0.88

0.84

0.57

L2 of p-values

0.87

0.82

0.56

L2 of term fractions P(s < S)

0.85

0.90

0.78

L2 of term frequency

0.87

0.83

0.62

Total number of terms

0.90

0.87

0.62

Number of Intersecting Terms

0.91

0.91

0.63

Number of Drug Terms

0.80

0.83

0.58

Number of Disease Terms

0.84

0.83

0.60

  1. Performance validated using novel direct drug-disease direct co-occurrences from MEDLINE, and novel drug-disease relationships from the CTD. Top scores for each validation set are presented in boldface type.