Integrative network analysis identifies potential targets and drugs for ovarian cancer

Background Though accounts for 2.5% of all cancers in female, the death rate of ovarian cancer is high, which is the fifth leading cause of cancer death (5% of all cancer death) in female. The 5-year survival rate of ovarian cancer is less than 50%. The oncogenic molecular signaling of ovarian cancer are complicated and remain unclear, and there is a lack of effective targeted therapies for ovarian cancer treatment. Methods In this study, we propose to investigate activated signaling pathways of individual ovarian cancer patients and sub-groups; and identify potential targets and drugs that are able to disrupt the activated signaling pathways. Specifically, we first identify the up-regulated genes of individual cancer patients using Markov chain Monte Carlo (MCMC), and then identify the potential activated transcription factors. After dividing ovarian cancer patients into several sub-groups sharing common transcription factors using K-modes method, we uncover the up-stream signaling pathways of activated transcription factors in each sub-group. Finally, we mapped all FDA approved drugs targeting on the upstream signaling. Results The 427 ovarian cancer samples were divided into 3 sub-groups (with 100, 172, 155 samples respectively) based on the activated TFs (with 14, 25, 26 activated TFs respectively). Multiple up-stream signaling pathways, e.g., MYC, WNT, PDGFRA (RTK), PI3K, AKT TP53, and MTOR, are uncovered to activate the discovered TFs. In addition, 66 FDA approved drugs were identified targeting on the uncovered core signaling pathways. Forty-four drugs had been reported in ovarian cancer related reports. The signaling diversity and heterogeneity can be potential therapeutic targets for drug combination discovery. Conclusions The proposed integrative network analysis could uncover potential core signaling pathways, targets and drugs for ovarian cancer treatment.


Background
In United States, ovarian cancer is the fifth leading cause of cancer-related death in female [1], which accounts for 2.5% of all cancers in female, whereas, 5% of all cancer death in female [2]. In 2018, there are about 22,000 new cases of ovarian cancer, and 14,000 deaths [2]. The high death rate (< 50% of 5 year survival rate) is mainly because of the late diagnosis and aggressive high grade serous carcinoma [2,3]. Platinum-based chemotherapy after surgical debulking is the standard treatment for ovarian cancer [4]. However, the cancer recurrence rate is high, and recurred tumors are often platinum resistant [4][5][6], with complicated mechanism of platinum resistance [7]. Though a few targeted therapies are being evaluated in clinical trials, e.g., VEGF, PARP, EGFR inhibitors [4], some of them are not very successful [4]. Therefore, novel targeted therapies and synergistic drug combinations are needed for ovarian cancer.
On the other hand, comprehensive multi-omics data of ovarian cancer patients have been profiled and analyzed [1,8]. A set of genetic biomarkers, e.g., TP53, NOTCH, FOXM1, have been identified via association analyses [1]. Also, a few dysfunctional signaling pathways, e.g., MYC, TP53, PI3K/RAS, were be identified in ovarian cancer by mapping multi-omics data, e.g., differentially expressed genes, mutations, copy number variation, and methylation data, to the curated signaling pathways [8]. However, the functional consequence of these biomarkers and cross-talk of complicated signaling pathways in ovarian cancer remain unclear. It is still a challenge to discover effective drugs and synergistic drug combinations [9][10][11][12] for ovarian cancer based these valuable knowledge and multi-omics data.
In this study, we aim to systematically investigate potential activated core signaling pathways in ovarian cancer sub-groups by uncovering the up-stream signaling pathways of activated transcription factors (TFs), and identify all available FDA approved drugs targeting on these up-stream signaling and TFs. The combinations of these drugs have the potential to be synergy with standard platinum chemotherapy by disrupting multiple upstream signaling and their cross-talk. This study will provide a useful reference resource for repositioning effective drugs and drug combinations for ovarian cancer. The rest of the paper is organized as follows. The details of datasets and methods are provided in Section 2. The analysis results are presented in Section 3, followed by a summary in Section 4.

KEGG signaling pathways and regulatory network
To obtain KEGG signaling pathways, the "Pathview" R package [15] was employed to download KGMLs of signaling pathways. Then the "KEGGgraph" R package was used to extract nodes and edges of KEGG signaling pathways from KGMLs [16]. In total, 282 signaling pathways were collected from seven categories: metabolism, genetic information processing, environmental information processing, cellular processes, organismal systems, human diseases, and drug development. The TF-Target regulatory network was downloaded from the supplemental material of reference [17], which was derived from the TF binding site predictions for all target genes from TRANSFAC (v7.4) [18]. In summary, the TF-target regulatory network consists of 230 TFs, 12,733 target genes, and 79,100 TF-Target interactions.

Drug combination screening data in NCI ALMANAC
This dataset includes screening results of pairwise combinations of 104 FDA-approved anticancer drugs on NCI-60 cancer cell lines (59 cancer cell lines with detailed genomics profiles) [19]. Specifically,~5232 pairwise drug combinations were evaluated in each cancer cell line. Each drug combination was tested at either 9 or 15 dose points for a total of 2,809,671 dose-specific combinations. The detailed definition of synergistic drug combination score was introduced in reference [19].

Selection of up-regulated genes for each sample
In this study, the GTEx normal ovarian tissue samples were used as normal control versus ovarian cancer tumor samples from TCGA. The simple fold change and p-value <= 0.05 (using t test) will result in too many upregulated genes. The Maximum Likelihood Estimate (MLE) method (see Fig. 1, red probability distribution function (PDF) curve) also generated too many upregulated genes. Thus, we employ the Markov chain Monte Carlo (MCMC) model to simulate the distribution of gene expression distribution of given genes based on the normal tissues. Let x, D present the gene expression of a given gene and normal tissues respectively.
We use the conjugate priors for μ andσ 2 , which are the Normal distribution and Inverse Gamma distribution: μ : N(w 0 , v 0 ), σ 2 : IG(a 0 , b 0 ).. To get uninformative priors, we set w 0 = 0, v 0 = +∞, a 0 = 0, b 0 = 0. Since it is hard to calculate eq. (1), we use MCMC method to simulate the distribution. The python package "Pymc3" [20] was employed to conduct the analysis. We set w 0 = 0, v 0 = 10 4 , a 0 = 10 − 3 , b 0 = 10 − 3 . The MCMC model is better than MLE (see the green PDF curve in Fig. 1), but still too many up-regulated genes will be selected. To further reduce the number of up-regulated genes, we empirically simulate the PDF of random variable y = 2x, and use the PDF of y to calculate the p-value of given gene expression in ovarian cancer samples. Specifically, we selected up-regulated genes for each tumor sample with fold change> = 2 and p-value<=0.05 (calculated based on the PDF of random variable y). We take the gene "CENPH" as an example to illustrate this analysis. The PDF generated by the MCMC model is more robust than generated by Maximum Likelihood Estimate (MLE) (see Fig. 1). The yellow point is the threshold and area under blue curve on the right of yellow point is about 0.05 (the calculation of p-value).

Identification of activated TFs for individual ovarian cancer patients
The Fisher's exact test (using hyper-geometric distribution) was used to identify the activated TFs by comparing the number of up-regulated targets vs. the number of all target genes, with the number of all the up-regulated genes vs. the number of all the genes tested. The p-value threshold, 0.05, was used to select the activated TFs.

Sub-grouping analysis using activated TFs
We cluster 427 ovarian cancer samples using the identified activated TFs. We transform p-value to 0-1 using 0.05 as a threshold. For categorical data, we use the k-modes method [21] for the sub-grouping analysis.

Uncovering up-stream signaling of activated TFs
All 282 signaling pathways from KEGG are investigated, and all the signaling cascades from the starting nodes to the activated TFs are extracted using the python package, NetworkX, to extract the up-stream signaling cascades starting from the beginning genes of individual signaling pathways to the given TFs. Then we score each signaling cascades using the average probability of genes (obtained from the MCMC analysis). To control the size of up-stream signaling network, the top 3 signaling cascades are kept.

Target importance scoring
The impact analysis (IA) evaluates both the topology and dynamics of a signaling pathway by considering the gene expression changes, the direction and type of signaling interaction, and the position and role of every gene in a pathway. A perturbation factor for each gene, PF(g i ), is calculated using the impact analysis method [22], as follows: The term ΔE(g i ) represents the signed normalized measured gene expression change of gene g i . The second term is the sum of perturbation factors of direct upstream genes of target gene g i , normalized by the number of downstream genes of each such gene N ds (g j ). The value of β ij quantifies the strength of the interaction between genes g j and g i . We use the probability density of gene expression instead of gene expression, which s will be more accurate considering that the standard deviation of different genes is different.

Ovarian cancer samples were clustered into 3 groups based on activated TFs
Using the K-modes method, the 427 ovarian cancer samples were classified into 3 sub-groups (with 100, 172, 155 samples respectively) based on the activated TFs. For each sub-group, there is a center sample, and we use the center sample to characterize each sub-group. In another word, the activated TFs in the center sample were used as the activated TFs for this sub-group.
For visualization purpose, the principal component analysis (PCA) was employed to reduce the 230 TFs to 2 dimensions (see Fig. 2). In one sub-group (Group 1), 14   TFs were activated: ELK1, FOXF2, NRF1, ETS2,  NF.muE1, ADD1, TBP, SP1, GABP, E4F1, TELO2 In addition, MTOR actives TP53 by cellular senescence pathway while T53 inhibits MTOR through IGF1/ MTOR. Since TP53 is the most frequently altered genes in ovarian cancer, the signaling loop between TP53 and MTOR might be a potential target of novel synergistic To investigate potential drugs that can potentially perturb these up-stream signaling networks, we mapped the FDA approved drugs on the signaling networks (see Figs. 3, 4 and 5). The target information was obtained from DrugBank (version 5.0.11) [23]. In total, 66 drugs (red nodes in Figs. 3, 4 and 5) were selected targeting on different targets. Through the literature search, 44 drugs had been reported to treat ovarian cancer (see Table 1).
In addition to these single drugs, we investigated effective combinations that appeared in our drug list, and validated in the drug combination screening on NCI 60 ovarian cancer cell lines (the synergy is defined with a threshold score higher than 8) (see Table 2). Moreover, we found that the top 10 drug targets of synergistic drug combinations are EGFR, TUBB1, TUBA4A, TUBB, TOP2B, MTOR, TUBB3, CYP19A1, ERS1 and BCL2. TUBB1, TUBA4A, TUBB, TUBB3 and TOP2B are related to cell proliferation. CYP19A1 and ERS1 are related to estrogen. BCL2 is the member of the Bcl-2 family of regulator proteins that regulate cell death.  The color of green, blue, yellow, and red represents signaling starting genes, signaling transduction genes, TFs, and drugs respectively EGFR and MTOR are in PI3K-AKT pathway, and EGFR is one of the upstream of MTOR signaling. The combination of MTOR inhibitors, and EGFR, RTK, PI3K signaling inhibitors might be synergy to inhibit ovarian cancer development.

Discussion
Ovarian cancer is the fifth leading cause of cancerrelated death among women, and the 5-year survival rate is fewer than one half. Though a set of biomarkers and signaling pathways have been identified to be associated with ovarian cancer, the functional consequence of these biomarkers and signaling pathways remain unclear. Moreover, there is a lack of effective targeted therapies for ovarian cancer, especially for the platinum resistant ovarian cancer. In this study, we analyzed the gene expression data of ovarian cancer samples and ovarian normal tissues via network analysis. We aim to systematically explore the activated signaling pathways of individual ovarian cancer patients and sub-groups, and identify potential targets and drugs that are able to disrupt the core signaling pathways. There are still several limitations of the study. First, in addition to gene expression, mutation, methylation, and copy number  Epinephrine TNF Erlotinib EGFR Erlotinib or gefitinib for the treatment of relapsed platinum pretreated non-small cell lung cancer and ovarian cancer: a systematic review [35].
Etanercept TNF Study of etanercept, a tumor necrosis factor-alpha inhibitor, in recurrent ovarian cancer [36].
Everolimus MTOR Effective use of everolimus as salvage chemotherapy for ovarian clear cell carcinoma: a case report [37].
Infliximab TNF Infliximab, a humanised anti-TNF-a monoclonal antibody, exhibits biological activity in the ovarian tumor microenvironment in patients [40].
Isopropyl alcohol TNF Lapatinib ERBB2, EGFR A phase II evaluation of lapatinib in the treatment of persistent or recurrent epithelial ovarian or primary peritoneal carcinoma: a gynecologic oncology group study [32]. variation data should be integrated in the network analysis to uncover the TFs, and up-stream signaling. Second, the signaling cross-talk among these up-streams are not investigated, which might be responsible for drug resistance. In the future, we will also investigate the signaling network and TFs of platinum resistant ovarian cancer samples; and conduct the network-based drug repositioning approaches [66,67] to reposition drugs [68,69] and drug combinations [70] for ovarian cancer treatment.

Conclusions
The purpose of this study is to systematically uncover potential activated core signaling pathways in ovarian cancer using integrative network analysis. We identified about 37 activated TFs from three sub-groups of ovarian cancer, as well as a set of up-stream signaling pathways linking to these TFs, e.g., WNT, TP53, MYC, AKT, RAS, mTOR, PDGFRA signaling pathways. In addition, 66 FDA approved drugs were identified targeting on the uncovered core signaling pathways. Forty-four drugs had been reported in ovarian cancer related reports. Combinations of these drugs could be potentially synergy to disrupt the cross-talk of multiple activated signaling pathways and TFs for better ovarian therapy. These uncovered signaling networks, TFs and drugs can be used as reference resources to support biomedical studies in ovarian cancer.