In this report, we introduce a new literature-based procedure for the analysis of drug-disease similarity with a focus on the identification of candidates for drug-repositioning. Using MeSH Over-representation Profiles (MeSHOPs) as quantitative representatives for biological entities, we seek to identify drugs and diseases with similar annotation under the expectation that such similarity may be suggestive of potential for repositioning. Drug-disease MeSHOP similarity scores, using a panel of metrics, are found to be strongly influenced by the level of annotation of drugs and diseases. The most heavily studied diseases and drugs are disproportionately emphasized by the comparison scores. A new corrected scoring procedure is introduced to account for the background expectation of similarity scores for comparably annotated drugs and diseases. The new procedure is demonstrated to account for the bias. Application of the MeSHOP similarity scoring procedure reveals a set of candidate drugs for future repositioning research.
The assessment of drug repositioning candidate predictions is necessarily problematic. Given the expense of validating drug efficacy, there is no reference collection against which to measure performance. In this report we elected to take as references two approaches. First, we predicted future co-occurrence in the research literature. This measure is indirect, as co-occurrence does not necessarily reflect a functional tie between the drug and disease. Furthermore, this measure is particularly susceptible to annotation influence - well studied drugs and diseases have a higher rate of future publications and are thus more likely to be linked. The second reference collection tested was extracted from the CTD, which records bonafide drug-disease links. The performance measurements reflect a similar literature bias on the CTD results, which may reflect a tendency for well-studied drugs to be tested for utility in well-studied disease therapy.
Within this report, we observe that the MeSHOP comparisons perform better than simple annotation measures, which indicates that the similarity assessment has value. Furthermore, we were able to identify and correct for the annotation bias influence on the analysis. It is our hope that future annotation-based similarity measures will be evaluated for the biases we observe here.
The source of the annotation biases identified in the validation sets may lie in methodological bias or be intrinsic to the nature of drug-disease relationships. The case for methodological bias notes the relationship between the existence of experimental protocols and the publication of related research. The study of disease involves the availability of appropriate animal models, a family with a history of the condition, a large-scale association study, and an accurate protocol to diagnose the condition. As well, the rarity and severity of the disease will also change the degree of research interest. Likewise, the study of drugs also benefits from animal models, bioassays to detect the compound, the ability and ease to generate the compound, and the ability to deliver an appropriate dosage of the compound to the targets of interest. Other factors motivating research directions are availability of funding and the focus of existing lab personnel and their research towards more popular directions of research.
However, the bias may also intrinsic to the nature of the disease or of the drug. Gillis and Pavlidis  have previously observed that multifunctional genes are a strong driver in gene function prediction. They identify gene multifunctionality through protein interaction and co-expression datasets, which encompass previous definitions of the "hub-ness" of a particular gene. A drug may have a more global effectiveness, due to targeting these multifunction genes or their pathways, and thereby be involved in more drug-disease associations. Similarly, there may be diseases that are involved in key processes, and therefore be the target of many potential drugs. Whether the biases are intrinsic to the biology of drugs and diseases, primarily introduced by the human nature in the research, or some combination of these factors will ultimately be revealed by the results of future research. As our knowledge of the nature of drugs and diseases increases and matures, the human elements and methodological biases will increasingly become less significant, leaving us to identify the degree this bias is due to the biological mechanism and nature of the drugs and diseases.
The underlying principle motivating the comparison approach to reveal novel drug repositioning candidates is that there will be shared characteristics of the drug actions and disease properties. While the current approach utilizes universal comparisons across all MeSH terms, it may be beneficial to restrict the analysis to a subset of more relevant MeSH terms. Development of a procedure to restrict the terms (the features) of MeSHOPs may allow for more specific drug repositioning candidates to emerge in the future.
MeSH provides a wide spectrum of medically relevant topics, however, some applications may be better served by a vocabulary with more specific terms in the field of interest. For example, there are only eight terms in MeSH (Akathisia, Drug-Induced; Drug Eruptions; Drug Toxicity; Dyskinesia, Drug-Induced; Epidermal Necrolysis, Toxic; Erythema Nodosum; Serotonin Syndrome; Serum Sickness) relating directly to adverse drug events. Instead, there are several subheadings including "adverse effects", "poisoning", "toxicity" and "contraindications" which can occur with drug terms, or "chemically induced" and "complications" subheadings occurring with adverse outcomes. Expanding the analysis to look specifically for these subheading modifiers could allow us to extract a subset of articles directly relevant to adverse drug reactions for MeSHOP analysis. Alternatively, an alternative source linking side effects to articles could be employed to supplement our existing analysis with side-effect data.
CitationRank  was used to highlight genes involved in adverse drug reaction by analyzing the co-occurrence of genes in articles relating to an adverse drug reaction. Looking at the comprehensive network of MeSHOP similarity between genes, drugs and diseases would allow a similar network-style analysis, adding the information of the gene entities.
Rather than predicting drug-disease associations directly, another application of the method could be to highlight potential links between drugs and mechanisms of action. Drug therapies can be effective even when the understanding of the underlying mechanism of action is incomplete. These predicted drug-mechanism links could be also related back to relevant diseases, indirectly helping hypothesize on the biology of a disease and effective mechanisms for treatment.