Multiple target drug cocktail design for attacking the core network markers of four cancers using ligand-based and structure-based virtual screening methods

Wong, Yung-Hao; Lin, Chih-Lung; Chen, Ting-Shou; Chen, Chien-An; Jiang, Pei-Shin; Lai, Yi-Hua; Chu, Lichieh Julie; Li, Cheng-Wei; Chen, Jeremy JW; Chen, Bor-Sen

doi:10.1186/1755-8794-8-S4-S4

Volume 8 Supplement 4

Joint 26th Genome Informatics Workshop and 14th International Conference on Bioinformatics: Medical Genomics

Research
Open access
Published: 09 December 2015

Multiple target drug cocktail design for attacking the core network markers of four cancers using ligand-based and structure-based virtual screening methods

Yung-Hao Wong^1,3,
Chih-Lung Lin²,
Ting-Shou Chen²,
Chien-An Chen²,
Pei-Shin Jiang²,
Yi-Hua Lai³,
Lichieh Julie Chu⁴,
Cheng-Wei Li¹,
Jeremy JW Chen³ &
…
Bor-Sen Chen¹

BMC Medical Genomics volume 8, Article number: S4 (2015) Cite this article

2941 Accesses
13 Citations
1 Altmetric
Metrics details

Abstract

Background

Computer-aided drug design has a long history of being applied to discover new molecules to treat various cancers, but it has always been focused on single targets. The development of systems biology has let scientists reveal more hidden mechanisms of cancers, but attempts to apply systems biology to cancer therapies remain at preliminary stages. Our lab has successfully developed various systems biology models for several cancers. Based on these achievements, we present the first attempt to combine multiple-target therapy with systems biology.

Methods

In our previous study, we identified 28 significant proteins--i.e., common core network markers--of four types of cancers as house-keeping proteins of these cancers. In this study, we ranked these proteins by summing their carcinogenesis relevance values (CRVs) across the four cancers, and then performed docking and pharmacophore modeling to do virtual screening on the NCI database for anti-cancer drugs. We also performed pathway analysis on these proteins using Panther and MetaCore to reveal more mechanisms of these cancer house-keeping proteins.

Results

We designed several approaches to discover targets for multiple-target cocktail therapies. In the first one, we identified the top 20 drugs for each of the 28 cancer house-keeping proteins, and analyzed the docking pose to further understand the interaction mechanisms of these drugs. After screening for duplicates, we found that 13 of these drugs could target 11 proteins simultaneously. In the second approach, we chose the top 5 proteins with the highest summed CRVs and used them as the drug targets. We built a pharmacophore and applied it to do virtual screening against the Life-Chemical library for anti-cancer drugs. Based on these results, wet-lab bio-scientists could freely investigate combinations of these drugs for multiple-target therapy for cancers, in contrast to the traditional single target therapy.

Conclusions

Combination of systems biology with computer-aided drug design could help us develop novel drug cocktails with multiple targets. We believe this will enhance the efficiency of therapeutic practice and lead to new directions for cancer therapy.

Background

Cancer is the leading cause of human death worldwide. It is a complex set of diseases, and people have tried to reveal its underlying mechanisms to guide the development of novel therapy strategies. In the last two decades, cancer researchers have generated an abundance of knowledge about cancer, and revealed the etiology of various cancers at DNA, RNA, and protein levels [1]. Weinberg et al. summarized the first and next generation of cancer hallmarks to expand the current understanding of the basic mechanisms of cancer [2, 3]. Recently, due to the scale up in high throughput data, availability of integrated OMICS data, and various advanced statistical analysis methods, many novel systems biology approaches have been employed to reveal the deeper underlying systematic mechanisms of various cancers [4–6].

Traditional computer-aided drug design (CADD) focuses on a single target for therapy, such as Src, FAK, and EGFR in the case of cancer [7, 8]. Researchers have used virtual screening with de novo methods to develop small molecules that in most cases inhibit these targets (although sometimes they are agonists), and accordingly reduce the expression of these proteins to kill cancer cells. CADD has a long history and many successful examples. CADD methods can be divided into structure-based and ligand-based methods [9]. Methods in the former category analyze both the structures of the target protein and the small molecule inhibitors to design drugs: examples include the docking method and molecular dynamics simulations. On the other hand, methods in the latter category use only the structures of the small molecule inhibitors (drugs) to do statistical calculations to determine the relationship between a drug's IC₅₀ and its corresponding molecular properties: examples include HypoGen pharmacophore modeling, COMFA (and COMSIA) [10], and many other machine learning and regression methods [11, 12].

One of the main differences between Western and Chinese medical philosophy is that the former focuses on single targets, while the latter focuses on multiple targets simultaneously [13]. Systems biology reconstructs the regulatory relationships within genetic, metabolic, and protein-protein interaction networks. These biological networks are highly complex, so robustness and sensitivity are their key system-level features [14, 15]. The intertwined nature of these networks shows us that inhibiting a single protein directly is not the only way to depress its expression. Systems biology will helps us to identify several protein targets to be inhibited simultaneously: due to their network behaviour, this multi-target approach will produce the same or better effect, than focusing on a single target protein. Also, recent research has demonstrated that to inhibit protein-protein interactions (PPI) is another novel anti-cancer strategy [16]. Inspired by the above ideas, based on the result of our previous systems biology studies [17], we have developed a novel multi-target cocktail therapy to focus on common core network markers of four different cancers. Our strategy is different from Dr. Ho's famous cocktail therapy for AIDS [18], which is not targeted therapy. Our method is to apply traditional CADD methods simultaneously to multiple drug targets. We regard this as a great advance in novel anti-cancer strategies.

Theoretical biological background: Recently, PPI-based analysis seems to have become a novel strategy or cancer target drug therapy and the development of precision medicine [16, 19]. Unlike traditional target drug design, which focuses on the inhibition or activation of a single target protein, usually a receptor or enzyme, PPI-based drug design involves inhibition of PPIs interface that mediate many important biological processes by small molecule; it is a novel and creative approach to drug discovery, especially for anti-cancer. Many clinical and elementary biological researches have concluded that the identification of PPI nodes and hubs that are significant for cell transformation functions in cancer. These PPIs related to cancers have become important targets for cancer therapy. Progress on technologies in the identification of PPI modulators and the clinical validation of the PPI pairs has made anti-cancer therapeutics by interfering with PPIs a reality [16, 19].

So, to identify PPI interface and PPI targets are regarded as the future topics for next generation anticancer strategies. Nevertheless, the cancer PPI networks are always highly complex and differ between cancer subtypes. Hence, we put concentration on common PPI network markers and their PPIs with a critical carcinogenesis relevance value (CRV), which is an estimate of the PPI evolution during the carcinogenesis process, to focus on the conservation of house-keeping proteins and their PPI interface characteristics as important targets shared by different cancers. This allows us to not only find out the crucial common pathways of different cancers in carcinogenesis, but also discover novel PPI targets for cancer therapy. Specific network markers can be regarded to represent specific PPI targets for each cancer. To target PPIs in both common core network markers and specific network markers simultaneously may provide new directions for anticancer therapy strategies.

Theoretical, mathematical, and statistical method: The traditional methods to find hubs or driven proteins in the gene regulatory or PPI networks of cancers is different from our method. In our previous study [17], we compared between the networks of each cancer and its corresponding normal PPI, which were obtained by the AIC (akaike information criterion) order detection and Student t-test methods from microarray expression data of patients and normal people, respectively, to get the PPI differential network in order to reveal PPI alternations during the tumorigenesis process. And then, we developed a carcinogenesis relevance value (CRV) for each protein in the PPI differential network based on the total alternations of PPI interaction abilities with other proteins to approve the critical PPI changes during tumorigenesis process. In the end, we obtained the core and specific network markers by using the intersection and difference of these 4 cancers with top CRVs. These novel core and specific network markers could provide possible PPI targets for small-molecule drugs to interfere and then destroy tumors. Calculations and estimations were using real microarray expression data. The maximum likelihood parameter estimation method and AIC model order selection method are well-known and widely used system identification methods from experimental data. During these estimation and learning processes, the PPI interaction mathematical model can derive the most probable PPI network for cancer and normal patients from large amount of microarray data and big databases to interpret the hidden biological mechanisms. The above paragraphs are adapted from our previous study [17] to make this paper be a complete paper.

In our previous study, we analyzed various cancers--specifically, bladder, colorectal, liver, and lung cancers--through regression modeling, microarray data, maximum likelihood parameter estimations, and big database mining. Based on known PPI information and gene expression data from normal and patient samples, we built a cancer PPI network [17]. Here, we focus on not only the PPI networks of single cancer types but also on common core network markers of four different cancers through the intersection of their respective PPI networks. Cancer is a complex disease, and so we tried to reveal house-keeping proteins significant in different cancers, i.e. the common core network markers shared by the different cancers. As the first trial to develop multiple-target therapy, cancer house-keeping proteins (common core network markers) may be a good choice. There are two main parts of this research. One is to find the core network markers of four cancers by systems biology approach, and the other is to attach these network markers by ligand-based and structure-based CADD (computer-aided drug design) methods. The first part has been published on Journal of Theoretical Biology [17]. The whole work of this research is the Part-II, i.e., using the CADD methods to attach the network biomarkers. To make this paper complete, so we described how to get these network biomarkers in our previous research [17].

Materials and methods

Identification of the core network markers of four cancers (28 proteins) - A brief review of our previous methods

In our previous study [17], we used the systems biology approach to study network markers of various cancers. Firstly, microarray data, PPI databases and PPI interaction models were employed to construct the PPI networks of normal and cancer cells by the maximum likelihood parameter estimation method (see Additional file 1). The AIC system order detection method (Additional file 2) was employed to prune false-positive PPIs to obtain real PPI networks of normal and cancer cells: in other words, we used the reverse engineering method to construct the PPI networks of normal and cancer cells. Then, the differential PPI network--obtained by contrasting the cancer PPI network and normal PPI network--was used to investigate PPI variations of each protein in the differential PPI network due to carcinogenesis. Finally, the carcinogenesis value (CRV) based on PPI variations was proposed to evaluate the significance of each protein for carcinogenesis in the differential PPI network. Proteins with a significant CRV (p-value<0.01) were considered to be significant for the progress of the cancer. The complete mathematical model is described as follows.

After organizing the cancer microarray data and PPI data, we used a PPI model, the maximum likelihood parameter estimation method and a model order detection method together to prune each candidate PPIN by the corresponding microarray data to approach the actual PPIN of each cancer. Here, the PPIs of a target protein i in the candidate PPIN can be depicted by the following protein association model:

x_{i} [n] = \sum_{j = 1}^{M_{i}} α_{i j} x_{j} [n] + ω_{i} [n]

(1)

where $x_{i} [n]$ represents the expression level of the target protein i for the sample n; $x_{j} [n]$ represents the expression level of the j-th protein interacting with the target protein i for the sample n; $α_{i j}$ denotes the association ability between the target protein i and its j-th interactive protein; $M_{i}$ represents the number of proteins interacting with the target protein i; and $ω_{i} [n]$ represents the stochastic noise due to other factors or model uncertainty. The biological meaning of equation (1) is that the expression level of the target protein i is associated with the expression levels of the proteins interacting with it. Consequently, a protein association (interaction) model for each protein in the protein pool can be built.

After constructing equation (1) for the PPI model of each protein in the candidate PPIN, we used the maximum likelihood estimation method [20] to identify the association parameters in (1) by microarray data as follows (see Additional file 1):

x_{i} (n) = \sum_{j = 1}^{M_{i}} {\hat{α}}_{i j} x_{j} (n) + w_{i} (n)

(2)

where ${\hat{α}}_{i j}$ is identified by the maximum likelihood estimation method.

Once the association parameters for all proteins in the candidate PPI network were identified for each protein, the true protein associations were determined by pruning the false positive PPIs. Akaike Information Criterion (AIC) [20] and a Student's t-test [21] were employed to achieve model order selection for the pruning of false positive protein associations in ${\hat{α}}_{i j}$ (see Additional file 2).

After the AIC order detection and use of the Student's t-test to determine p-values of ${\hat{α}}_{i j}$ , the false positive PPIs ${\hat{α}}_{i j}$ in (2) were pruned away and only significant PPIs were refined as follows:

x_{i} (n) = \sum_{j = 1}^{{M_{i}}^{'}} {\hat{α}}_{i j} x_{j} (n) + {w_{i}}^{'} (n), i = 1, 2 . . . . . . M

(3)

where ${M_{i}}^{'} \leq M_{i}$ denotes the number of true PPIs, with the target protein i, i.e., a number of $M_{i} - {M_{i}}^{'}$ (or false positives) are pruned in the PPIs of target protein i. One protein by one protein (i.e., $i = 1, 2, . . ., M$ for all proteins in refined PPIN in (3)) results in refined PPIN

X (n) = A X (n) + w (n)

(4)

where $X (n) = [\begin{gathered} x_{1} (n) \\ x_{2} (n) \\ ⋮ \\ x_{M} (n) \end{gathered}]$ , $A = [\begin{matrix} {\hat{α}}_{11} & \dots & {\hat{α}}_{1 M} \\ ⋮ & ⋱ & ⋮ \\ {\hat{α}}_{M 1} & \dots & {\hat{α}}_{M M} \end{matrix}]$ , $w (n) = [\begin{gathered} {w_{1}}^{'} (n) \\ {w_{2}}^{'} (n) \\ ⋮ \\ {w_{M}}^{'} (n) \end{gathered}]$

where the interaction matrix A denotes the PPIs.

If there is no PPI between protein i and protein j or it is pruned by AIC order detection due to the false positive PPIs in the refined PPIN, then ${\hat{α}}_{i j} = 0$ . In general, ${\hat{α}}_{i j} = {\hat{α}}_{j i}$ , but if this is not the case, the larger one was chosen as ${\hat{α}}_{i j} = {\hat{α}}_{j i}$ to avoid the situation ${\hat{α}}_{i j} \neq {\hat{α}}_{j i}$ . The above PPIN construction method was employed to construct the refined PPINs for cancer and non-cancer cells of bladder, colorectal, liver, and lung cancer, respectively.

The interaction matrices A of refined PPINs in (4) for cancer and non-cancer cells of the four cancers were constructed, respectively, as follows:

$A_{C}^{k} = [\begin{matrix} {\hat{α}}_{11, C}^{k} & \dots & {\hat{α}}_{1 M, C}^{k} \\ ⋮ & ⋱ & ⋮ \\ {\hat{α}}_{M 1, C}^{k} & \dots & {\hat{α}}_{M M, C}^{k} \end{matrix}]$ , $A_{N}^{k} = [\begin{matrix} {\hat{α}}_{11, N}^{k} & \dots & {\hat{α}}_{1 M, N}^{k} \\ ⋮ & ⋱ & ⋮ \\ {\hat{α}}_{M 1, N}^{k} & \dots & {\hat{α}}_{M M, N}^{k} \end{matrix}]$ (5)

where k = bladder cancer, colorectal cancer, liver cancer, and lung cancer; $A_{C}^{k}$ and $A_{N}^{k}$ denote the interaction matrices of refined PPIN of the k-th cancer and non-cancer, respectively; M is the number of proteins in the refined PPIN. Therefore, the protein association model for CPPIN and NPPIN in the k-th cancer and non-cancer can be represented by the following equations according to (4) and (5):

\begin{gathered} x_{C}^{k} (n) = A_{C}^{k} x_{C} (n) + w_{C}^{k} (n) \\ x_{N}^{k} (n) = A_{N}^{k} x_{N} (n) + w_{N}^{k} (n) \end{gathered}

(6)

where k = bladder cancer, colorectal cancer, liver cancer, and lung cancer;

$x_{C}^{k} (n) = {[x_{1 C}^{k} x_{2 C}^{k} \dots x_{M C}^{k}]}^{T}$ and $x_{N}^{k} (n) = {[x_{1 N}^{k} x_{2 N}^{k} \dots x_{M N}^{k}]}^{T}$ denote the vectors of expression levels; $w_{C}^{K} (n)$ and $w_{N}^{K} (n)$ indicate the noise vectors of PPINs in the k-th cancer and non-cancer cells, respectively.

The different matrix $A_{C}^{k} - A_{N}^{k}$ of differential PPI network between CPPIN and NPPIN in the k-th cancer is defined as:

D^{k} = [\begin{matrix} d_{11}^{k} & \dots & d_{1 M}^{k} \\ ⋮ & ⋱ & ⋮ \\ d_{M 1}^{k} & \dots & d_{M M}^{k} \end{matrix}] = [\begin{matrix} {\hat{α}}^{k}_{11, C} - {\hat{α}}^{k}_{11, N} & \dots & {\hat{α}}^{k}_{1 M, C} - {\hat{α}}^{k}_{1 M, N} \\ ⋮ & ⋱ & ⋮ \\ {\hat{α}}^{k}_{M 1, C} - {\hat{α}}^{k}_{M 1, N} & \dots & {\hat{α}}_{M M, C}^{k} - {\hat{α}}_{M M, N}^{k} \end{matrix}]

(7)

where k = bladder cancer, colorectal cancer, liver cancer, and lung cancer; $d_{i j}^{k}$ denotes the PPI variation between the i-th protein and the j-th protein of differential PPI network by comparing CPPIN with NPPIN in the k-th type of cancer; the matrix $D^{k}$ indicates the difference in network structure between CPPIN and NPPIN in the k-th type of cancer. In order to investigate carcinogenesis from the difference matrix $D^{k}$ between CPPIN and NPPIN of the k-th cancer in (7), a score named the carcinogenesis relevance value (CRV) was presented to quantify the correlation of PPI variations of each protein in $D^{k}$ with the significance of carcinogenesis as follows [22]:

C R V^{k} = [\begin{matrix} C R V_{1}^{k} \\ ⋮ \\ C R V_{i}^{k} \\ ⋮ \\ C R V_{M}^{k} \end{matrix}]

(8)

where $C R V_{i}^{k} = \sum_{j = 1}^{M} |d_{i j}^{k}|$ , and k = bladder cancer, colorectal cancer, liver cancer, and lung cancer.

The $C R V_{i}^{k}$ in (8) quantifies the differential extent of PPI variations of the $i$ -th protein (the absolute sum of the i-th row of $D^{k}$ in (7)) and $C R V_{}^{k}$ can differentiate CPPIN from NPPIN in the k-th cancer. In other words, the $C R V_{i}^{k}$ in (8) could calculate the total PPI variations of the i-th protein by comparing the network structure differences between the cancer and non-cancer networks, which can be used to check which proteins are involved with the k-th cancer.

In order to investigate which proteins are more likely involved in the k-th cancer, we needed to calculate the corresponding empirical p-value to determine the statistical significance of $C R V_{i}^{k}$ . To determine the observed p-value of each $C R V_{i}^{k}$ , we repeatedly permuted the network structure of the candidate PPIN of the k-th type of cancer as a random network of the k-th cancer. Each protein in the random network of the k-th type of cancer will have its own CRV to generate a distribution of $C R V_{i}^{k}$ for k = bladder, colorectal, liver, and lung cancer. Although there was random disarrangement of the network structure, the linkages of each protein were maintained, i.e., the proteins with which a particular protein interacted were permuted without changing the total number of protein interactions. This procedure was repeated 100,000 times and the corresponding p-value was calculated as the fraction of random network structure in which $C R V_{i}^{k}$ is at least as large as the CRV of the real network structure. According to the distributions of $C R V_{i}^{k}$ of random networks, the $C R V_{i}^{k}$ in (8) with a p-value of less than or equal to 0.01 was regarded as a significant CRV and the corresponding protein was determined to be a significant protein in the carcinogenesis of the k-th cancer: a protein with a p-value > 0.01 was removed from the list of significant proteins in carcinogenesis (in other words, if the p-value of $C R V_{i}^{k}$ > 0.01, then the i-th protein was removed from $C R V_{}^{k}$ in (8) and the remainders in $C R V_{}^{k}$ with p-values of CRVs less than 0.01 were significant proteins of the k-th cancer). Based on the p-value of the CRVs for all proteins ( $i = 1, 2, . . ., M$ ) and the four types of cancer (k = bladder, colorectal, liver, and lung cancer), we generated four lists of significant proteins (Additional file 3: Table S1) for the cancers according to the CRV and the statistical assessment of each significant protein in $C R V_{}^{k}$ in (8). As shown in Table S1, we found 107 significant proteins in bladder cancer, 110 significant proteins in liver cancer, 60 significant proteins in colorectal cancer, and 86 proteins in lung cancer. These proteins have significant PPI changes between the CPPIN and NPPIN in the carcinogenic process for their corresponding cancer and we suspect that they may play important roles in carcinogenesis, warranting further investigation.

The intersection of these significant proteins in the four cancers and their PPIs is known as the core network markers, while the differences of these significant proteins are the unique significant proteins of each cancer and their PPIs in each of the cancers are known as the specific network markers for each cancer. We found that there were 28 significant proteins that could be classified as a core network marker and 26, 4, 24, and 13 significant proteins that were specific network markers of bladder, colorectal, liver, and lung cancer, respectively. The core network and specific network markers for the cancers are described in our previous paper [17]. This insight into the carcinogenic mechanisms of common core and specific network markers in different cancers will be discussed in detail in the following section.

The 28 significant proteins in Figure 1(a) (see also Table 1) are significant proteins shared by the four cancers, and these proteins and their PPIs form the common core network markers of these four cancers. The significant proteins outside of these 28 are specific network markers, distinct for each cancer. Finally, based on these common core network markers and specific network markers, we investigated the mechanisms behind the carcinogenesis process with the help of databases (for example, GO database [23], DAVID database [24, 25], and KEGG pathway database [26, 27]) to find multiple network targets for cancer therapy. Unlike conventional theoretical methods that generate a single mathematical model for a cancer network for detailed theoretical analysis, ours is a systems biology approach to cancer network markers based on real microarray data through reverse engineering, theoretical statistical methods, and data mining in combination with big databases. These features made our method novel and helped produce the significant findings of our paper. The above paragraphs are adapted from our previous study [17] to make this paper to be a complete paper.

Table 1 Structure information of 28 core proteins.

Full size table

Several scenarios to break down the cancer core network

In this study, we used several different approaches to break down the cancer core network. This is the key feature of this research, which confers it novelty. In contrast to traditional single target therapy, we used multiple-target cocktail therapy to attack common core network markers, i.e., to inhibit several key proteins simultaneously instead of inhibiting only one single target. We assume that precisely targeting the core network would reduce the activity of the cancer. Based on this assumption, we believe that attacking core network markers simultaneously will be more efficient than attacking only one target. The first important problem is how to choose the group of key proteins to target. The first two novel approaches below to break down the cancer core network are based on the systems biology approach employed in our previous study [17]. We developed the third approach by performing pathway analysis using Panther [28] and MetaCore (GeneGo Inc.) [29]. In addition, we used a 'confusion pharmacophore model' to develop the fourth novel trial approach.

In the first approach, we chose the compounds with highest docking scores for each of the 28 significant proteins. Then we removed the redundant compounds, and analyzed which compounds can target more than two proteins simultaneously. The basic strategy is to minimize the number of compounds necessary to target in order to break down the core network. Since there are 28 proteins in this core network, even choosing only one ligand to inhibit each of these 28 proteins would necessitate 28 ligands, which may be unsuitable for wet-lab validation.

In the second approach, we used the summation of the CRVs (Table 2 and Table 3) as the criterion to choose the key proteins to inhibit. In our previous study, we showed that CRV quantifies the extent of a protein's association with other proteins, and thus it is optimal to target proteins with the highest summation of CRVs. This strategy depends on the cancer being targeted; in the present study, we focused on the common core network markers of the four cancers, so we chose the highest summation of CRVs across the four cancers.

Table 2 (a): CRVs of the 28 core proteins of each of the 4 cancers [17].

Full size table

Table 3 (b) Top 5 ligands for the top 5 CRV proteins.

Full size table

In the third approach, we employed many new and valuable pathway analyses. Wet-lab cancer researchers can choose the pathway they want to focus on as the therapy target. There are many different combinations dependent on different conditions of cancers, so we do not list all the possibilities. Here, we demonstrate a single example, and others can develop their own scenarios following this example with the help of our docking results. (Additional file 4: Table S2)

The fourth novel approach is to develop confusion and individual pharmacophore models of core network markers (28 proteins) to do virtual screening using the compounds stored in the Life Chemical database.

We highlight the different approaches in this section, because they form the crux of this study. Please see Figure (1b). In the following sections, we describe the related methods necessary for each of the approaches.

More pathway and functional analyses

A. Panther

In our previous studies, we performed pathway analysis using the DAVID database. Here, we expand on the previous analysis by performing more pathway and functional analyses in order to develop a more efficient strategy for multiple target therapy. This approach will allow us to accumulate more information on possible therapy strategies. The 28 proteins comprising the cancer core network markers were analyzed for their molecular functions, molecular processes, and subcellular localizations by the PANTHER (Protein ANalysis THrough Evolutionary Relationships) classification system [28]. PANTHER was designed to classify proteins (and their genes) in order to facilitate high-throughput analysis. It has a friendly user interface, you only need to input the gene list and set up the parameters, you will get the results you want. Please see Additional file 5: Figure S5.1 and the PANTHER website [28]

B. MetaCore

MetaCore includes a manually annotated database of gene interactions and metabolic reactions obtained from the scientific literature, including the most newly updated ones. The enrichment analysis of the biological process was based on the hypergeometric distribution algorithm and relevant pathway maps. The mathematical foundation of MetaCore is shown in supplementary materials.

More pathway and functional analyses are fundamental to learning more about the hidden mechanisms of cancer networks. So, to interpret the results for the third approach in a meaningful way, this topic must be explained and described beforehand. The novelty of this paper is in its combination of systems biology with computer-aided drug design. Our research opens many new directions for multiple-target drug design; we describe only a portion of our results in this paper, for illustrative purposes. Our results clearly show that our approaches show great promise for future research to target special pathways through the design of drugs having multiple targets.

Metacore is also user-friendly software. The various manuals and training materials can be download from public website [30, 31], and due to the copyright concern, we do not copy too many material here. Please follow the instructions in these manuals, and you can very well analysis. The mathematical foundation of Metacore is shown in Additional file 6. Of course, the more deeply you understand the underlying statistics meaning, you can do better analysis.

Protein and ligand structures

The following section illustrates the computer-aided drug design (CADD) strategy applied to target the 28 common core network proteins. The first through third approaches need both the 3D structures of proteins and ligands, while the fourth pharmacophore approach only needs the structures of the ligands. The first thing we need is thus the 3D structures of the proteins. At this stage, 24 of them are available in the well-known PDB database, and we can download them directly, while there are four proteins (CEBPB, IRAK1, KIAA0101 and PSMA3) whose 3D structures have not been solved at this stage (Table 1). We used the NCI (46872 ligands) and Life Chemical (31742 ligands) drug libraries with the filter "anti-cancer". The Developmental Therapeutics Program NCI/NIH offers the NCI-60 cell line screening. The users can download the 3D ligands (drugs) structures from the website, and after the virtual screening, you can request them send you these drugs for free. Life Chemical is a commercial company, you have to pay to get these drugs [32]. All drug structures were prepared and minimized by Discovery Studio 3.5 (DS3.5). People seldom do the virtual screening on multiple targets since it is computationally intense. We performed this work on an IBM server with 160 cores and 1 TB of memory.

Homology model and binding site prediction

Proteins do not always have 3D structures available in the PDB database. Since only amino acid sequence data without corresponding structures were available for four of the proteins targeted in our study (CEBPB, IRAK1, KIAA0101 and PSMA3) we used homology modeling to predict the structures for these proteins. Among many famous software and webservers, the I-Tasser webserver developed by Zhang et al. [33] is the most well-known homology modeling webserver. We used this server to perform homology modeling of our proteins of interest that lacked available 3D structures. For some well-studied protein targets, structures with embedded ligands are available, so the exact binding site for the docking experiments is known. However, binding site information was unavailable for most proteins, so we used the COACH webserver [34], also developed by Zhang et al., to predict the binding sites. The detailed binding site information is shown in Additional File 7. I-TASSER and COACH are both user friendly webserver. You only need to input the sequence of protein residue into the I-TASSER, and set up the parameters, it will give you the predicted 3D protein structures. Users input the 3D protein structures into the COACH and set up the parameters, and then it will give you the predicted binding sites (Additional file 5: Figure S5.2, S5.3).

Docking

After prepared the 3D structures of proteins and ligands, and find out the binding site, we can perform docking for virtual screening. We used DS 3.5 to perform docking simulations, and then ranked the docking results based on LibDock score. We chose the top 20 compounds for further analysis, such as the construction of pharmacophore models. The DS 3.5 is also user friendly software, users can download the manual from website, please see the example for parameter setting of docking. Please see Additional file 5: Figure S5.4

Pharmacophore Model

A. HypoGen method and virtual screening

There are three stages in the generation of the HypoGen model: constructive phase, subtractive phase and optimization phase. (1) Constructive phase: Active ligands within a given range of the maximum activity are chosen initially. The two most active ligands are used to enumerate the pharmacophores. (2) Subtractive phase: After the constructive phase, a database with a large number of pharmacophore structures is generated. The purpose of the subtractive phase is to identify the inactive pharmacophores and eliminate them from the database. (3) Optimization phase: The simulated annealing method is used to modify the scoring function of each hypothesis to be tested. After optimization, the HypoGen method generates the top 10 hypotheses. This study is our first attempt to combine systems biology with CADD. Pharmacophore modeling is one of the most powerful approaches in CADD. Here, we developed a novel prototype pharmacophore model, which is different from the traditional one. As mentioned above, this is a novel idea, and the pharmacophore model is just a prototype: it will require further modification in future studies. In traditional pharmacophore modeling, the IC₅₀ value for each drug under investigation is necessary, but our model does not require them. Instead, we linearly transformed LibDock scores to generate putative IC₅₀ values, and used these values to build the pharmacophore. We combined the top three ligands for each of the 28 proteins to make a ligand pool, and then used this ligand pool to construct a common pharmacophore; this is qualitatively different from building 28 individual pharmacophores. We build the Hypogen model by the DS 3.5, and it is also user friendly. However, to build a correct Hypogen model is a time consuming work, you need to try the different combination of compounds. For a skill expert with normal computational resource, it often needs at least one month to build the correct model. We show the user interface for detailed parameter setting for your reference. Please refer to Additional file 5 : Figure S5.5.

B. PharmaGist

PharmaGist is another method to construct ligand-based pharmacophores. Because it does not require IC₅₀ values of ligands [35], it is easier to construct pharmacophores by PharmaGist than by the HypoGen model. For each protein, we constructed one PharmaGist model: these models can be used to do virtual screening in the future. PharmaGist is also a user friendly webserver. Please see Additional file 5: Figure S5.6.

Cocktail multiple-target strategy: A novel model combined with systems biology and CADD

In summary, we used systems biology to construct the common core network marker of four different cancers, which contains 28 proteins as the house-keeping proteins shared by the different cancers. Then we used the docking and pharmacophore methods to perform virtual screening on these 28 core proteins and get the top compounds for each protein. In contrast to traditional single target methods, we suggest using a combination of the top compounds to perform cell proliferation experiments. We have provided an example for each of our first three approaches. Our research provides a novel direction for target therapy for cancers. Wet-lab biomedical scientists can combine the top ligands for each of the 28 core proteins based on their experimental conditions and the pathways that they want to focus on. As an early stage project, we have taken a conservative approach, only using the NCI anti-cancer drug library. As these drugs have proven anti-cancer activity, our cocktail of multiple drugs can be expected to slightly or moderately enhance the therapeutic effect under the right wet-lab conditions. Collaboration with and input from wet-lab experts would permit the modification and optimization of our model, and the scope could expand to include other drug libraries in the virtual screening.

Biological Experimental Validations

The detailed protocol for the biological experiment in this Part, please refer to our previous work [36]. And the results please refer to Additional file 8. Recently, our team members who are part of wet-lab experiments have found that: if we used more than one drug to attack more than one target at the same time, it will decrease the IC₅₀ of the drug. Our team has shown that if we used Gefitinib (Iressa) and L4 (one drug from the LOPAC drug library) to inhibit Src and EGFR at the same time, we achieve lower cell viability.

Results and Discussion

Review of the results of our previous methods

Our previous study identified 28 core proteins in the common core network marker for four cancers [17]. These are the proteins intersecting between the four cancers' PPI networks with the top CRVs. Figure 1(a) shows the PPI information of each common core network marker. As stated previously, these 28 core proteins could be responsible for the house-keeping mechanisms of these four cancers, so using the minimum possible number of ligands to target the maximum possible number of the 28 proteins could be the most efficient strategy to attack the cancers.

Several approaches to break the cancer core network marker

a. We chose the top 20 compounds with the highest docking scores for each of the 28 significant proteins. Of a total of 560 ligands (Additional file 4: Table S2), we found there were 13 ligands that target 2 or 3 proteins simultaneously (Table 4(a)) after redundancy analysis. These 13 ligands combined 11 target proteins (Table 5new 3(b)). We used the 13 ligands as the minimum package for breaking the core network by inhibiting these 11 proteins. While this does not target all 28 proteins, the main purpose here was to minimize the number of ligands at this first stage trial. (As we have said, this is the first trial of our primary model.) Of course, according to our docking results of these 28 proteins, there are thousands of possible drug combinations that can be used based on different analysis methods as per a given researcher's scope. It is quite possible to develop combinations to target all the 28 proteins based on our docking results.

Table 4 NCI drugs target more than one protein.

Full size table

Table 5 The proposed 13 drugs totally target the following 11 proteins

Full size table

b. In our previous research, we observed the CRVs of the 28 core proteins for each of the four cancers (Table 2). In this study, we summed up the 4 CRVs for each of the 28 proteins. After ranking these summations, we listed the top 5 CRV summations (Table 2), and suggested them to be the therapeutic targets for breaking cancer core network marker. According to our previous systems biology analysis, we believe that inhibiting proteins with the highest CRVs is a highly efficient way to attack cancers. As the first trial, our combination of proteins with the top 5 CRVs should simply be taken as a proof of concept: one can develop combinations according to one's particular needs based on our CRVs.

c. We have also listed pathway analyses using Panther and MetaCore on the core network proteins. Other scientists can use this information to choose the pathways that they want to target, and then choose the best combination of drugs from our virtual screening results. For instance, from the pathway analysis results using Panther (Table 6), researchers interested in targeting the p53 pathway can choose between EP300, CREBBP, HDAC1, HDA2, TRAF2, CDK2, MDM2, and TP53 as the therapy targets. For the above scenario, researchers can choose the top drugs from Additional file 4: Table S2. MetaCore gives us more information for the purpose of target therapy. The top three modules mapped to our 28 core proteins are listed in Table 7. Taking the first map/module (DNA damage_ATM/ATR regulation of G1/S checkpoint) as an example, we see CDK2, p53, Ubiquitin, BRCA1, c-Myc, PCNA, and MDM2 are related to this module. As above, we can use the drugs in Additional file 4 : Table S2 to choose the optimal combination of drugs to attack this module.

Table 6 the pathway analysis results of Panther.

Full size table

Table 7 Top three modules/maps given by Metacore.

Full size table

More pathway and functional analysis

A. Panther

The results of Panther are shown in Figure 2 and listed in Table 6. We see the 28 core network markers hit many important cancer-related pathways, such as p53, Wnt signaling, p53 feedback loops 2, apoptosis, etc. As we have described above, this study is a prototype model. Inhibition of the right proteins hitting the key pathways is an important strategic consideration in real clinical situations. Our results offer another reference for doctors to design the best combination of multiple inhibitors. In the future, using clinical data from doctors will help us perform deeper analysis. These preliminary results also could help us exclude pathways unrelated to cancer at the first stage, such as those related to Huntington's disease, Alzheimer's disease, etc.

B. MetaCore

The results of MetaCore analysis are described below. We have described the top three maps/modules (Table 7).

(i) DNA damage_ATM/ATR regulation of G1/S checkpoint (Figure 3): It is the highest scoring map (i.e., the map with the lowest p value). ATM/ATR regulates both the checkpoints of the G1/S and S/G2. DNA damage checkpoints pathways arrest or delay the progression of cell cycle in response to the DNA damage. Eukaryotic cell cycle have four phases, G1 (G indicating gap), S (Synthesis), G2 (Gap 2), and M (Mitosis), and one outside, G0 (Gap 0) [37]. When DNA damage occurs, the G1/S checkpoints will inhibit the initiation of replication to prevent cells from entering the S phase. They are related to two pathways of signal transduction, to initiate and maintain the G1/S arrest, respectively [37]. Jiri Bartek et al. discussed "The DNA damage response in tumorigenesis and the treatment of cancer" [38]. [The above description is directly cited from the Metacore document.]

(ii) Transcription_P53 signaling pathway (Figure 4): It is the second highest scoring map (i.e., the map with the second lowest p-value). The Tumor protein p53, also known as p53 ortransformation-related protein 53 (TRP53), acts a significant role in shielding the genome integrity. While being activated, p53 will bind to the enhancer/promoter regions of downstream target genes. And then it regulates the transcription of these genes, through which it initiates cellular processes that responsible for lots of its tumor-suppressor related functions [39]. It is not surprising that core network of 4 cancers are related to the p53 signaling pathway. [The above description is directly cited from the Metacore document.]

(iii) DNA damage_Brca1 as a transcription regulator (Figure 5): It is the third highest scoring map (i.e., the map with the third lowest p-value). Activation of breast cancer susceptibility gene 1 (BRCA1) by DNA damage occurs via activating the ataxia telangiectasia mutated serine/threonine protein kinase (ATM) [40] or serine/threonine-protein kinase ATR (ataxia telangiectasia and Rad3 related protein kinase) [41]. These protein kinases can either directly or indirectly phosphorylate BRCA1 (by cell cycle checkpoint kinase 2 (Chk2) [42]). BRCA1 acts a significant role in the DNA repair process by expediting cellular response upon DNA repair. There are numerous DNA repair pathways (see map 427 Role of BRCA1 and BRCA2 in DNA repair) [43]. We know that DNA repair is highly related to various cancers. [The above description is directly cited from the Metacore document.]

* The highest three scoring map (i.e., the map with the three lowest p-value) is based on the enrichment distribution sorted by the 'Statistically significant maps' set. Experimental data from all files is linked to and visualized on the maps as thermometer-like figures. Up-ward thermometers are red and indicate up-regulated signals, and down-ward (blue) ones indicate down-regulated expression levels of the genes. [The above description is directly cited from the Metacore document.]

Statistical analysis results of the three maps are shown in Table 7. These results also help us choose more suitable targets and exclude less suitable ones.