Skip to main content

Identifying enriched drug fragments as possible candidates for metabolic engineering



Fragment-based approaches have now become an important component of the drug discovery process. At the same time, pharmaceutical chemists are more often turning to the natural world and its extremely large and diverse collection of natural compounds to discover new leads that can potentially be turned into drugs. In this study we introduce and discuss a computational pipeline to automatically extract statistically overrepresented chemical fragments in therapeutic classes, and search for similar fragments in a large database of natural products. By systematically identifying enriched fragments in therapeutic groups, we are able to extract and focus on few fragments that are likely to be active or structurally important.


We show that several therapeutic classes (including antibacterial, antineoplastic, and drugs active on the cardiovascular system, among others) have enriched fragments that are also found in many natural compounds. Further, our method is able to detect fragments shared by a drug and a natural product even when the global similarity between the two molecules is generally low.


A further development of this computational pipeline is to help predict putative therapeutic activities of natural compounds, and to help identify novel leads for drug discovery.


A crucial factor for realizing the promises of precision medicine is the availability of novel and safe drugs to modulate the increasing number of targets that are being identified. Of all the medical branches, oncology is posed to be among those that could benefit the most from a new array of therapeutics [1].

Despite substantial progress in understanding the molecular basis of human cancers, there is still a pressing need for more effective, rational and personalized treatments. A few drugs for specific cancer types have achieved a good degree of selectivity with relatively low toxicity, but for the vast majority of human cancers, standard chemotherapy regimens (with their related toxicity) remain the only viable option. However, the situation is rapidly changing.

Breakthroughs in cancer genomics are now leading to the identification of new actionable targets [2], opening up unprecedented opportunities for personalized treatment. As a result of our improved understanding of cancer biology, with some notable exceptions the search for “silver bullet” therapies has now largely been replaced by a quest for novel targets that can be simultaneously modulated by combinatorial therapies [1, 3], akin to what has been accomplished for the treatment of HIV infections [4]. As a result, a vast number of suitable new drugs will soon be required to modulate a large array of cancer targets. Another area where the availability of new effective drugs is becoming a pressing need is the treatment of infectious diseases, as antibiotic-resistant bacteria are becoming more widespread and are a cause for serious concern [5].

The natural product derived structure plays a significant role in the discovery of novel pharmaceutical agents and/or bioactive molecules. The anti-diabetic activity in lupins has been attributed to quinozolidine alkaloids [6] and a review of the literature shows many such examples of natural products as sources of new drugs [7] including Paclitaxel, which is one of the most widely prescribed anticancer drugs on the market. Most of the natural products are biologically active and have favorable absorption, distribution, metabolism, excretion and toxicology properties. Plants are often the predominant source for the discovery of natural products due to the relative ease of access. However, more recently microbial as well as marine sources have been identified as alternative resources, particularly for antibiotics [8]. Several databases of natural products have been published and reviewed [912].

Although many pharmaceutical companies emphasize high throughput (HTP) screening of combinatorial libraries, natural products continue to provide enormous structural and chemical diversity to guide the careful design of drug-like leads. More importantly, the products of HTP screening often do not interact well with biomolecules and induce unexpected and possibly severe side effects. Therefore, over the years (since 1980) only 2 drugs obtained through the HTP screening have been approved by the FDA, while over 85 drugs are either natural products-based or compounds derived from them [13, 14]. In the past decade, several databases focusing on the collection of medically important natural products and medicinal herbs have been established [10, 11] and the use of computer aided drug design including virtual screening of large databases has become an important part of the drug discovery process [15].

Pharmaceutically relevant natural products are of low molecular weight and often restricted to special plant families. While these compounds are not important for the primary metabolism of the plants, they are found to be important for their survival in a given environment. Therefore, medicinally important plants are often collected from the wild or their natural habitat and are more likely to be endangered due to severe over collection [16]. Unfortunately, we still have limited knowledge about plant secondary metabolism, its regulation, molecular mechanisms concerning gene expression and rate-limiting enzymes found within a diverse network of biosynthetic pathways in living organisms.

Obtaining a drug completely from a plant source is often a difficult process, as the yield from the natural source may be small and the extraction process can be complex [17]. The solution to these problems is partial chemical synthesis or semi synthesis, where the aim is to extract only a biosynthetic intermediate or a bioactive fragment of the lead compound, which can then be developed into a drug using conventional synthesis [18]. This approach has several advantages: first, a biochemical intermediate can be more easily extracted with higher yield than the final product; second, it may be possible to synthesize analogues of the final product [19, 20].

The literature provides numerous examples of chemical fragments originating from natural sources which have been used for pharmaceutical purposes. For example, according to Lahlou et al. [21] the widely prescribed anticancer drug paclitaxel was manufactured by extracting 10-deacetylbaccatin from the needles of the yew tree, followed by a four-stage synthesis [21, 22]. Another example is provided by therapeutic drug fragments in plants with anti-fertility properties, which can be used as intermediates in the synthesis of contraceptive drugs from the natural source [23].

With the development of the fragment-based method described in this study, it is now possible to determine potentially important structures in natural products in silico, which may be investigated further to determine their pharmaceutical value as lead or intermediate compound, and potentially produced by cells cultivated in vitro utilizing plant biotechnology methods. To the best of our knowledge, this is the first time that a fragment-based approach using enrichment analysis is applied to identify potentially important chemical fragments in natural products.


Obtaining and representing drugs and natural products

The DrugBank database [24] (version 4.1) was used to obtain information on drugs that were approved for therapeutic use in at least one country. The initial set of drugs contained 1554 molecules. Natural products were obtained from the SuperNatural II database [12], containing 325,508 molecules.

Drugs and natural products were represented using the SMILES system [25], a widely used notation that makes it possible to encode chemicals as ASCII strings. SMILES strings for drugs and natural products were directly obtained from the DrugBank and SuperNatural II databases, respectively.

Fragmenting the molecules

Both drugs and natural products were fragmented with the fragment program, part of the molBLOCKS suite [26], which breaks molecules along chemically important bonds and returns the corresponding fragments (or putative building blocks). The list of chemical bonds that were used by the program to fragment the molecules is shown in Fig. 2, and is based on Lewell et al. [27]. The minimum size for a fragment was set to four atoms, and the fragmentation was carried out with the “extensive” flag turned on, which yields all possible fragments that can be generated given the list of chemical bonds of interest [26].

It is noteworthy to mention that the fragmentation rules are encoded as SMARTS (SMiles ARbitrary Target Specification), an extension to the SMILES notation created by Daylight Chemical Information System, Inc. and widely used in computational chemistry. Using SMARTS patterns the particular bonds that are to be cleaved are encoded as regular expressions, making it straightforward to add other cleavable bonds to the fragmentation rules.

Clustering fragments

Drug fragments obtained as described above were clustered with the analyze [26] program using standard parameters. In order to compute the fragment similarity for clustering, the program converts the fragment to a fingerprint representation, based on linear segments of up to 7 atoms in length (FP2 fingerprints [28]). The fingerprints are stored as bit arrays, where the presence or absence of a particular linear segment is represented by a 1 or 0, respectively. The FP2 fingerprint representation is obtained via the Open Babel library ( Then, the Tanimoto coefficient T s between two fragments x and y is computed as:

$$ T_{s} = \frac{{\sum\nolimits}_{i} X_{i} \land Y_{i}}{{\sum\nolimits}_{i} X_{i} \lor Y_{i}} $$

where X and Y are the bit array representations of the linear segments found in fragment x and y, respectively, and and are the bitwise and and or operators.

The analyze program computes pairwise similarities between fragments and converts them to a graph representation, where an edge between fragments indicates a pairwise Tanimoto greater than the chosen threshold, which was set to 0.7 in this study. Subsequently, the program extracts the connected components of the graph, and selects the representative element for each cluster as the fragment with the highest average similarity against all the other fragments in the cluster.

Extracting enriched fragments for each ATC code

In order to assign functional categories to drugs, we used the Anatomical Therapeutic Chemical (ATC) classification system (, a widely used nomenclature that organizes drugs according to the organ or system which they modulate and their therapeutic properties. The ATC code system is hierarchically organized into five levels of increasing specificity. We considered the second level, which describes the therapeutic main groups. We note that a single drug can be annotated with multiple ATC codes, if it has multiple therapeutic indications. For this study, to get meaningful statistics we selected all the ATC codes that annotated at least 10 distinct drugs.

Enrichment analysis was carried out in order to identify the specific fragments (or clusters of fragments) that appear in a set of molecules more frequently than expected by chance, given a background distribution. In this study the background was represented by the union of all approved drugs.

The analyze program uses the hypergeometric distribution to model the probability of obtaining a number of fragments (or clusters of fragments) equal to or greater than the observed by chance alone:

$$ P(X \ge k) = \sum\limits_{x=k}^{K}\frac{\binom{K}{x} \binom{N-K}{n-x}}{\binom{N}{n}} $$

where N is the total number of fragments; K is number of fragments of the given type; n is the total number of fragments in the main set; and x is the total number of fragments of the given type in the main set.

The program returns both uncorrected p-values and False Discovery Rate (FDR) corrected p-values, obtained with the procedure of Benjamini-Hochberg [29]. In this study we selected fragments that were enriched with an FDR <0.05.

Comparing enriched fragments in the drug dataset against fragments from natural compounds.

The final step of the pipeline involves the comparison between enriched fragments from the drug dataset against fragments obtained from the natural compounds set. In order to calculate the pairwise similarity between each of the enriched drug fragments and each of the fragments from natural compounds we used the Tanimoto coefficient (see Eq. 1). To carry out the calculations we wrote an in-house program that uses the Python API [30] of the OpenBabel library [28], and retained the drug fragment–natural product fragment pairs that had a Tanimoto similarity >0.9.

Computational requirements for enrichment analysis and fragment comparison.

The most time-consuming step of the pipeline is represented by the pairwise fragment comparison, which took approximately 12 h on a 24-core machine. Fragmentation of the 325,509 molecules found in the SuperNatural II database took approximately eight hours on a 24-core machine, bringing the entire analysis to roughly 20 h.

Biosynthesis pathway annotation.

We used the online SMILE converter program ( to convert all chemical structures from SMILE format to MDL mol structural files, to be used later as an input for the pathway prediction algorithms. Chemspider ( [31], a free chemical structure database, was used to retrieve the IUPAC names and the chemical information for the enriched fragments.

The first step in pathway annotation was to determine the natural source and a possible biosynthesis pathway for the enriched fragments obtained from our pipeline. We used the Retropath webserver ( [22] and submitted each enriched fragment as an input query in MDL mol structural format. The output from the Retropath webserver consisted of a feasible biosynthetic pathway from the natural source, including the names of the enzymes catalyzing the reactions.

The next step in pathway annotation was to determine the synthesis pathway from fragments to drug compound. To accomplish this task, we used the Pathpred webserver (, which predicts the synthesis pathway given the substrates and the final product. The Pathpred webserver is linked to the KEGG database and the user can input a query compound in the MDL mol file format, in the SMILES representation, or using the KEGG compound/drug identifier. The enriched fragments with known biosynthesis pathway (obtained from Retropath) were given to Pathpred as initial substrate. The drug compounds associated with the enriched fragment obtained from the enrichment pipeline were given as the final product in order to get possible synthetic routes between the fragment and drug.


A computational pipeline to systematically compare functionally relevant drug fragments and natural products

We set out to systematically compare approved drugs obtained from the DrugBank database [24] against a large collection of natural products, assembled in the SuperNatural II database [12]. The novelty of our approach consists first in extracting the fragments that are statistically overrepresented in each pharmacological category, and then in comparing those fragments against the ones derived from the natural compounds.

The rationale behind this approach is twofold. On the one hand, chemical fragments capture important properties of the full molecules, and on the other hand they may be shared by otherwise globally dissimilar molecules, which might go undetected when using a global similarity measure. The pipeline is briefly outlined in Fig. 1, which shows the main steps of the procedure. More details are found in the Materials and Methods section of the paper.

Fig. 1
figure 1

Simplified overview of the pipeline. Each approved drug (obtained from Drugbank [24]) is assigned a therapeutic class using the ATC nomenclature. The drugs are then broken down into fragments using the molBLOCKS software [26], and enrichment analysis is performed on each therapeutic class to identify statistically overrepresented fragments (FDR <0.05). Each overrepresented fragment is then compared against similarly obtained fragments from a database of natural compounds [12] (see Materials and Methods for further details)

In order to fragment the molecules we used the molBLOCKS suite [26] with the RECAP rules [27] (Fig. 2), which allow us to break small molecules apart along chemically important bonds. It is noteworthy to mention that in several cases no fragmentation rule applies to a small molecule, which is then left as it is and treated as a whole fragment. In our initial dataset of 1,543 approved drugs we were able to fragment 949 (62 %) of the drugs. The remaining ones, for which no fragmentation RECAP rule applies, were treated as one fragment. In the case of natural products, the fragmentation rules applied to 174,156 (54 %), and the remaining molecules were treated as one fragment.

Subsequently, we grouped drugs by Anatomic Therapeutic Code, which gives the therapeutic group of a drug (e.g., “L01” stands for “Antineoplastic Agents”, “C03” for “Diuretics”, etc.). Multiple membership of a drug in several ATC groups was allowed if the drug was annotated in DrugBank with multiple ATC codes. We ended up with 40 ATC groups, each containing at least 10 distinct drugs. For each ATC group, we performed clustering of the fragment followed by enrichment analysis with the molBLOCKS suite, extracting statistically overrepresented fragments for each group, with an FDR <0.05. The total number of enriched fragments across all therapeutic groups was 141.

In the last step of the pipeline, we systematically compared the enriched fragments from each ATC group against the fragments obtained from the natural compounds, and retained for further analysis all the pairs that had a Tanimoto similarity >0.9.

Fig. 2
figure 2

RECAP rules used to fragment drugs and natural products. The 11 eleven types of chemical bonds depicted above (green dashed lines) indicate the potential sites that can be broken in the small molecules, resulting in smaller fragments. These 11 fragmentation rules were derived from Lewell et al. [27], and capture chemically relevant synthetic reactions that combine building blocks into more complex molecules

Drugs and natural compounds are related at the fragment level in specific therapeutic groups

We considered the number of fragments for each therapeutic group with at least one matching fragment in the natural products dataset, obtaining the distribution shown in Fig. 3. The top-ranking group was represented by the antibacterial drugs, followed by drugs active on the cardiovascular system, antiviral drugs, antineoplastic drugs, and anti-inflammatory drugs. The prominence of antibacterial drugs in this list is consistent with the importance that natural products have had in the development of antibiotics [32].

Fig. 3
figure 3

Distribution of enriched fragments and matching natural compounds per ATC code. Panel (a) shows the number of drug fragments that are enriched in given therapeutic categories (ATC codes) that have at least one matching fragment in the set of natural compounds. Panel (b) shows the total number of natural compounds whose fragments match one or more of the enriched drug fragments in each therapeutic category

An alternative way of analyzing these data is to consider the number of natural products whose fragments match at least one of the fragments in each therapeutic group. The results are shown in Fig. 3. The therapeutic group with the largest number of natural products is now the anti-inflammatory class, closely followed by diuretic drugs, muscle relaxants, and corticosteroids.

Case studies

In Fig. 4 we show two examples of fragments shared by a drug and a natural product in the context of low global similarity. One of the advantages of our fragment-based approach is the automatic identification of common and chemically important building blocks among molecules that may be globally dissimilar.

Fig. 4
figure 4

Examples of fragments shared by natural compounds and drugs in the absence of high global similarity. The two examples shown here illustrate how a fragment-based approach can automatically detect commonalities between molecules that are globally different. Panel (a) shows a tetracyclic fragment present both in a natural compound and in an anti-cancer agent (Paclitaxel). In spite of the common core shared by the two molecules, the Tanimoto similarity between the drug and the natural compound is relatively low (0.56). In panel (b), the beta-lactam ring is detected (which a small variation) in both an approved antibiotic (tazobactam) and a natural compound (SN0240101). However, the Tanimoto similarity between the natural compound and tazobactam is low (0.49)

A proof of concept is given by the anticancer drug Paclitaxel and the natural product SN00162945 (Fig. 4), which share a tetracyclic core but have different substituents. In fact, Paclitaxel itself was first isolated from the bark of a yew, and belongs to the taxane family, whose members all share the core fragment shown in the figure (or a closely related variation). However, because of the different substituents in the two molecules, the Tanimoto coefficient between Paclitaxel and SN00162945 turns out to be only 0.56.

Another example that showcases the power of using fragments is shown in Fig. 4. The antimicrobial Tazobactam contains a β-lactam ring, which is the building block of a highly important group of widely prescribed antibiotics, including penicillin, cephalosporins and carbapenems, and it occurs in several natural compounds. As in the example of Fig. 4, Tazobactam has a low Tanimoto similarity (0.49) for the natural product SN00240101, in spite of the fact that they both share the β-lactam ring.

Pairwise global similarity of drugs and natural products that share an enriched fragment

The case studies discussed above suggest that the proposed fragment-based approach can capture local similarities between molecules that are otherwise globally different. In order to test this hypothesis systematically, we set out to compare the global similarity between drugs and natural products sharing an enriched fragment. We systematically computed the pairwise Tanimoto similarity between drugs and natural products that had at least one enriched fragment in common (defined as a Tanimoto similarity >0.90 between the enriched fragment and the natural product fragment), obtaining the distribution shown in Fig. 5 and containing 320,134 comparisons. The median of the distribution (indicated by a red line in Fig. 5) is 0.204, confirming that the shared enriched fragments often occur in globally different molecules.

Fig. 5
figure 5

Distribution of pairwise Tanimoto similarity between drugs and natural products that share an enriched fragment. The distribution was obtained from 320,134 pairs of drugs/natural products. The red line indicates where the median of the distribution falls

Biosynthetic pathway analysis

The computational fragmentation process yields fragments that are chemically plausible, but for which there is no guarantee to their existence in biological pathways as standalone molecules. To address whether some of the fragments could in fact be identified in biosynthetic pathways, we processed all 112 enriched fragments coming from approved drugs using the Retropath webserver. Retropath returned a total of nine fragments with a known biosynthetic pathway in a plant or microbial organism. The enriched fragments with a known biosynthetic pathway are shown in Table 1, which also provides the E.C. number, the IUPAC name of the enzyme that catalyzes the biosynthetic reaction leading to the fragment, and the organism source. Seven out of the nine fragments are found in plant sources, and the remaining two come from the fungi kingdom (Rhodotorula glutinis and Acremonium chrysogenum).

Table 1 Enriched fragments with SMILES code, IUPAC name, chemical structure, and source (organism)

Case studies

We next addressed whether it is possible to identify biosynthetic pathways in plants and microorganisms that can potentially turn an enriched fragment into a known drug. As discussed in the Methods section, we used the Pathpred webserver to extract known biosynthetic pathways from an enriched fragment to a drug product. For five fragments (caffeine, 5-aminopentanal, glycine, styrene, and thymine) we could identify a biosynthetic pathway leading to the fragment and also biosynthetic pathways leading to drugs. Two case studies are discussed below.

Caffeine biosynthesis. One of the enriched fragments for which Retropath could return a biosynthetic pathway was 1,3,7-trimethylxanthine, commonly known as caffeine, which is found in coffee plants (Coffea arabica) and young leaves of tea plant (Camellia sinensis, Table 1). Caffeine synthesis begins in these plants with xanthosine as the initial substrate, which is then converted into 7-methylxanthosine followed by a second methylation step which leads to the formation of theobromine. The final product is synthesized by the enzymes theobromine synthase (EC and caffeine synthase (EC, which convert 7-methylxanthine to theobromine and theobromine to caffeine, respectively (Fig. 6). Our fragment enrichment pipeline identifies caffeine as significantly enriched in drugs active on the respiratory system, including theophylline (DrugBank ID: DB00277), a drug which is used in the acute treatment of asthma. Pathpred provided the synthesis route between caffeine fragment and theophylline. Additionally, it also provided the synthesis pathway for other derivatives (1-methyluric acid and xanthine).

Fig. 6
figure 6

Biosynthetic pathway involving caffeine. Information about the biosynthetic pathway of the enriched fragment caffeine in C. irrawadiensis was obtained from Retropath (top). The biosynthetic pathways from caffeine to xanthine, theophylline, and 1-methyluric acid were obtained from Pathpred (bottom)

Styrene biosynthesis. For some enriched fragments we found a biosynthetic pathway in microorganisms. For example, the styrene fragment was found to have a biosynthesis pathway in S. cerevisiae (budding yeast). The enzyme ferulic acid decarboxylase (EC 4.1.1.M2) catalyzes the production of styrene from the substrate trans-cinnamate (Fig. 7). Using styrene as substrate in Pathpred we identified biosynthetic pathways for two drugs: Eugenol (DB09086) and Coumarin (DB00682). The enzyme trans-cinnamate 4-monooxygenase (EC converts styrene to p-anol. Two different enzymes acts on p-anol: p-Coumarate 3-monooxygenase (EC:, which catalyzes the production of the Coumarin drug, and ferulate 5-hyroxylase, which converts converts p-anol to 4-[(1E)-Propen-1-yl]-1, 2- benzenediol. Finally, caffeic acid 3-O-methyltransferase (EC: acts on 4-[(1E)-Propen-1-yl]-1,2- benzenediol to give Eugenol (Fig. 7). Eugenol has analgesic, local anesthetic, anti-inflammatory and antibacterial effects, and is widely used in dental care practice [33]. It also prevents oxidative changes in membrane and acts as an antioxidant [34]. The other product (coumarin) is used as an anti-coagulant. It also has anti-fungicidal and anti-tumor activities.

Fig. 7
figure 7

Biosynthetic pathways involving styrene. Information about the biosynthetic pathway of the enriched fragment styrene in S. cerevisiae irrawadiensis was obtained from Retropath (top). The biosynthetic pathways from styrene to coumarin and eugenol were obtained from Pathpred (bottom)

Discussion and conclusion

The natural world as a source of highly diverse and complex chemicals has always been of value to synthetic chemists, and is becoming even more relevant today, given the output slump of the pharmaceutical industry. The pipeline introduced here allows to automatically detect relationships between small molecules using a fragment-based approach. Using a fragment-based approach is motivated by the fact that natural products are often assembled from independent building blocks via a chain of enzymatic reactions. These processes are somewhat similar to what is common practice in synthetic chemistry.

By first extracting statistically overly represented fragments for each therapeutic class we reduced the complexity of the approved drugs to a handful of chemical fragments that are likely to be responsible (at least in part) for the pharmaceutical activity of the given drugs, or are important as chemical scaffolds. Comparing these fragments against the fragments obtained from a large library of natural products allowed us to establish potential relationships between drugs and natural products even in the absence of high global similarity between the molecules. As an analogy, we could compare this fragment-based approach to a local sequence alignment procedure, which can identify highly similar protein domains among globally different protein sequences.

As a note of caution, we should mention that the choice of the Tanimoto similarity thresholds or the stringency of the fragment clustering step would affect the final results, in that more or less matching fragments would be found depending on how stringent the parameters that control the similarity are set to be. Unfortunately, there are no hard and fast rules to guide the user in the choice of parameters. However, as it is often the case in bioinformatics applications, the results should be interpreted as a guide to help design further experiments or perform more thorough literature searches. In this context, our pipeline could be used to ask the question of whether a natural product that happens to share a fragment with an antihypertensive drug does in fact have pressure lowering activity. Alternatively, the pipeline could be used to investigate whether a natural product shows potential as a lead compound for a given therapeutic indication.

In the future we plan to extend our pipeline by mining databases to automatically collect biosynthetic pathway information, and do more extensive analyses on the sources of natural compounds. Although this may not be possible for all compounds, databases like the “Universal Natural Product Database” [11] (contained in SuperNatural II) do include source information for several compounds. Combined with metabolic information on plant and microbial pathways, this will yield a better understanding of natural product synthesis. As shown by a pioneering study by Runguphan et al. [35], this could eventually lead to co-opting natural systems for engineering better drugs.


  1. Hanahan D. Rethinking the war on cancer. The Lancet. 2014; 383:558–63. doi:10.1016/S0140-6736(13)62226-6.

    Article  Google Scholar 

  2. Vogelstein B, Papadopoulos N, Velculescu VE, Zhou S, Diaz L A J, Kinzler KW. Cancer genome landscapes. Science. 2013; 339(6127):1546–1558.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Al-Lazikani B, Banerji U, Workman P. Combinatorial drug therapy for cancer in the post-genomic era. Nature Biotechnol. 2012; 30:679–92. doi:10.1038/nbt.2284.

    Article  CAS  Google Scholar 

  4. Arts EJ, Hazuda DJ. HIV-1 antiretroviral drug therapy. Cold Spring Harb Perspect Med 2. 2012. doi:10.1101/cshperspect.a007161.

  5. Blair JMA, Webber MA, Baylay AJ, Ogbolu DO, Piddock LJV. Molecular mechanisms of antibiotic resistance. Nature Rev Microbiol. 2015; 13(1):42–51. doi:10.1038/nrmicro3380.

    Article  CAS  Google Scholar 

  6. Brunmair B, Lehner Z, Stadlbauer K, Adorjan I, Frobel K, Scherer T, Luger A, Bauer L, Fürnsinn C. 55p0110, a novel synthetic compound developed from a plant derived backbone structure, shows promising anti-hyperglycaemic activity in mice. PLoS ONE. 2015; 10(5):e0126847.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Newman DJ, Cragg GM. Natural products as sources of new drugs over the 30 years from 1981 to 2010. J Nat Prod. 2012; 75(3):311–35. doi:10.1021/np200906s.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Cragg GM, Newman DJ. Natural products: A continuing source of novel drug leads. Biochimica et Biophysica Acta. 2013; 1830(6):3670–695. doi:10.1016/j.bbagen.2013.02.008.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Chen CYC. TCM Database@Taiwan: The world’s largest traditional Chinese medicine database for drug screening In Silico. PLoS ONE. 2011; 6(1). doi:10.1371/journal.pone.0015939.

  10. Yongye AB, Waddell J, Medina-Franco JL. Molecular scaffold analysis of natural products databases in the public domain. Chemical biol drug design. 2012; 80(5):717–24.

    Article  CAS  Google Scholar 

  11. Gu J, Gui Y, Chen L, Yuan G, Lu HZ, Xu X. Use of Natural Products as Chemical Library for Drug Discovery and Network Pharmacology. PLoS ONE. 2013; 8(4). doi:10.1371/journal.pone.0062839.

  12. Banerjee P, Erehman J, Gohlke BO, Wilhelm T, Preissner R, Dunkel M. Super Natural II–a database of natural products. Nucleic Acids Res. 2015; 43(Database issue):935–9. doi:10.1093/nar/gku886.

    Article  Google Scholar 

  13. Chin YW, Balunas MJ, Chai HB, Kinghorn AD. Drug discovery from natural sources. AAPS J. 2006; 8(2):239–53.

    Article  Google Scholar 

  14. Ntie-Kang F, Mbah JA, Mbaze LM, Lifongo LL, Scharfe M, Hanna JN, Cho-Ngwa F, Onguéné PA, Owono LCO, Megnassan E, et al. Cammednp: Building the cameroonian 3d structural natural products database for virtual screening. BMC Complement Alternat Med. 2013; 13(1):88.

    Article  Google Scholar 

  15. Koehn FE, Carter GT. The evolving role of natural products in drug discovery. Nat Rev Drug Discov. 2005; 4(3):206–20.

    Article  CAS  PubMed  Google Scholar 

  16. Sarasan V, Kite GC, Sileshi GW, Stevenson PC. Applications of phytochemical and in vitro techniques for reducing over-harvesting of medicinal and pesticidal plants and generating income for the rural poor. Plant Cell Rep. 2011; 30(7):1163–1172.

    Article  CAS  PubMed  Google Scholar 

  17. Beutler JA. Natural products as a foundation for drug discovery. Current Protocol Pharmacol. 2009; SUPPL. 46. doi:10.1002/0471141755.ph0911s46.

  18. Abel U, Koch C, Speitling M, Hansske FG. Modern methods to produce natural-product libraries. Curr Opin Chem Biol. 2002; 6(4):453–8. doi:10.1016/S1367-5931(02)00338-1.

    Article  CAS  PubMed  Google Scholar 

  19. Hajduk PJ, Greer J. A decade of fragment-based drug design: strategic advances and lessons learned. Nature Rev Drug Disc. 2007; 6(3):211–9. doi:10.1038/nrd2220.

    Article  CAS  Google Scholar 

  20. Kupchan SM. Drugs from Natural Products–Plant Sources: American Chemical Society; 2009, pp. 1–13. Chap. 2. doi:10.1021/ba-1971-0108.ch001.

  21. Lahlou M. The success of natural products in drug discovery. Pharmacol Pharm. 2013; 4(June):17–31. doi:10.4236/pp.2013.43A003.

    Article  Google Scholar 

  22. Stierle A, Strobel G, Stierle D. Taxol and taxane production by Taxomyces andreanae, an endophytic fungus of Pacific yew. Science. 1993; 260(5105):214–6. doi:10.1126/science.8097061.

    Article  CAS  PubMed  Google Scholar 

  23. Cambie RC, Brewis A. Anti-fertility Plants of the Pacific. Collingwood, VIC 3066, Australia: CSIRO Publishing; 1997, p. 184.

    Google Scholar 

  24. Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, Chang Z, Woolsey J. Drugbank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 2006; 34(Database issue):668–72.

    Article  Google Scholar 

  25. Weininger D. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. J Chem Inf Comput Sci. 1988; 28(1):31–6.

    Article  CAS  Google Scholar 

  26. Ghersi D, Singh M. molBLOCKS: Decomposing small molecule sets and uncovering enriched fragments. Bioinformatics. 2014; 30(14):2081–083. doi:10.1093/bioinformatics/btu173.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Lewell XQ, Judd DB, Watson SP, Hann MM. Recap retrosynthetic combinatorial analysis procedure: a powerful new technique for identifying privileged molecular fragments with useful applications in combinatorial chemistry. J Chem Inf Comput Sci. 1998; 38(3):511–522.

    Article  CAS  PubMed  Google Scholar 

  28. O’Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR. Open babel: An open chemical toolbox. J Cheminform. 2011; 3:33.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J Roy Stat Soc Ser B (Methodological). 1995; 57:289–300.

    Google Scholar 

  30. O’Boyle NM, Morley C, Hutchison GR. Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit. Chem Central J. 2008; 2:5. doi:10.1186/1752-153X-2-5.

    Article  Google Scholar 

  31. Pence HE, Williams A. Chemspider: An online chemical information resource. J Chem Educ. 2010; 87(11):1123–1124. doi:10.1021/ed100697w.

    Article  CAS  Google Scholar 

  32. Brown DG, Lister T, May-Dracka TL. New natural products as new leads for antibacterial drug discovery. Bioorg Med Chem Lett. 2014; 24(2):413–8. doi:10.1016/j.bmcl.2013.12.059.

    Article  CAS  PubMed  Google Scholar 

  33. Pramod K, Aji Alex MR, Singh M, Dang S, Ansari SH, Ali J. Eugenol nanocapsule for enhanced therapeutic activity against periodontal infections. J Drug Target. 2016; 24(1):24–33. doi:10.3109/1061186X.2015.1052071.

    Article  CAS  PubMed  Google Scholar 

  34. Salleh WMNH, Ahmad F, Yen KH. Antioxidant and Anticholinesterase Activities of Essential Oils of Cinnamomum griffithii and C. macrocarpum. Nat Prod Commun. 2015; 10(8):1465–1468.

    PubMed  Google Scholar 

  35. Runguphan W, Qu X, O’Connor SE. Integrating carbon-halogen bond formation into medicinal plant metabolism. Nature. 2010; 468(7322):461–4. doi:10.1038/nature09524.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references


The authors would like to thank the Bioinformatics group at UNO for useful discussions. This work was partly funded by a Nebraska Research Initiative grant to DG.


Publication costs were funded through a Nebraska Research Initiative grant to DG.

This article has been published as part of BMC Medical Genomics Vol 9 Suppl 2 2016: Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine 2015: medical genomics. The full contents of the supplement are available online at

Availability of data

The molBLOCKS software is freely available at All data and scripts are available upon request.

Authors’ contributions

DG and DB conceived the study, analyzed the data, and wrote the manuscript. IT provided support with the analysis of the results, SS and KK performed the experiments. All authors read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests.

Consent for publication

Not applicable.

Ethics approval and consent to participate

Not applicable.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Dario Ghersi.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sharma, S., Karri, K., Thapa, I. et al. Identifying enriched drug fragments as possible candidates for metabolic engineering. BMC Med Genomics 9 (Suppl 2), 46 (2016).

Download citation

  • Published:

  • DOI: