Skip to main content

Table 1 Explanation of the scoring functions evaluated.

From: Compensating for literature annotation bias when predicting novel drug-disease relationships through Medical Subject Heading Over-representation Profile (MeSHOP) similarity

Scoring Method

Description

Cosine Distance of Term Frequency-Inverse Document Frequency

j M c i ( j ) d i ( j ) j M c i ( j ) 2 j M d i ( j ) 2

Cosine Distance of p-values

i M c p ( i ) d p ( i ) i M c p ( i ) 2 i M d p ( i ) 2

Cosine Distance of term fractions

i M c f ( i ) d f ( i ) i M c f ( i ) 2 i M d f ( i ) 2

Sum of the log of combined p-values

i M log c p ( i ) + d p ( i ) - c p ( i ) d p ( i )

Sum of the differences of log p values

i M log c p ( i ) d p ( i ) = i M log c p ( i ) - log d p ( i )

L2 of log-p of overlapping terms only

i ( C D ) log c p ( i ) - log d p ( i ) 2

L2 of term fractions of overlapping terms only

i ( C D ) c f ( i ) - d f ( i ) 2

L2 of log of p-values

i M log c p ( i ) d p ( i ) 2 = i M log c p ( i ) - log d p ( i ) 2

L2 of p-values

i M c p ( i ) - d p ( i ) 2

L2 of term fractions

i M c f ( i ) - d f ( i ) 2

L2 of term frequency

i M c ( i ) - d ( i ) 2

Term Coverage

C D

Term Overlap

C D

Number of Drug MeSH Terms

C

Number of Disease MeSH Terms

D

  1. M refers to the set of all MeSH terms, C and D refer to the MeSH terms for the drug and disease profile respectively. c(i), c f (i), c p (i) and c i (i) refer to the frequency, term fraction, hypergeometric p-value and term frequency-inverse document frequency for the MeSH term i of the drug profile. d(i), d f (i), d p (i) and d i (i) refer to the frequency, term fraction, hypergeometric p-value and term frequency-inverse document frequency for the MeSH term i of the disease profile.