Case-oriented pathways analysis in pancreatic adenocarcinoma using data from a sleeping beauty transposon mutagenesis screen

Background Mutation studies of pancreatic ductal adenocarcinoma (PDA) have revealed complicated heterogeneous genomic landscapes of the disease. These studies cataloged a number of genes mutated at high frequencies, but also report a very large number of genes mutated in lower percentages of tumors. Taking advantage of a well-established forward genetic screening technique, with the Sleeping Beauty (SB) transposon, several studies produced PDA and discovered a number of common insertion sites (CIS) and associated genes that are recurrently mutated at high frequencies. As with human mutation studies, a very large number of genes were found to be altered by transposon insertion at low frequencies. These low frequency CIS associated genes may be very valuable to consider for their roles in cancer, since collectively they might emerge from a core group of genetic pathways. Result In this paper, we determined whether the genetic mutations in SB-accelerated PDA occur within a collated group of biological processes defined as gene sets. The approach considered both genes mutated in high and lower frequencies. We implemented a case-oriented, gene set enrichment analysis (CO-GSEA) on SB altered genes in PDA. Compared to traditional GSEA, CO-GSEA enables us to consider individual characteristics of mutation profiles of each PDA tumor. We identified genetic pathways with higher numbers of genetic mutations than expected by chance. We also present the correlations between these significant enriched genetic pathways, and their associations with CIS genes. Conclusion These data suggest that certain pathway alterations cooperate in PDA development. Electronic supplementary material The online version of this article (doi:10.1186/s12920-016-0176-7) contains supplementary material, which is available to authorized users.

such as brain tumors, sarcomas, hematopoietic malignancies, and carcinomas [2,3] via insertional mutagenesis. A large number of of loci recurrently mutated by insertion of SB transposons called common insertion sites (CIS) have been identified [4]. The general impression from these studies is one of tremendous genetic complexity.
Recent large-scale analyses of human cancer genomes mirrors these results in general. Most types of human cancer harbor a small number of genes that are altered in a high percentage of cases, so called "mountains", and a large number of genes altered in a low percentage of cases, so called "hills". In addition, two patients diagnosed with the same type of cancer often show distinct genetic alternations, however, the disrupted pathways tend to be similar among patients [5].
Conventional pathway analysis approaches usually obtain gene-based scores by summarizing data across tumor cases, then calculating pathway statistics using the scores of the genes in the pathway. However, these approaches could potentially lose information regarding whether multiple mutations in a pathway are from a single patient or multiple cases with a single mutation at various genes in the pathway. In contrast, case-oriented gene set analysis (CO-GSEA) can consider the two situations differently and hence can incorporate heterogeneity of each tumor case into the analysis. This approach provides a case-based score for each pathway and further enhances the study of correlation of mutation events between pathways, as well as between genes and pathways. It has recently been applied to the analysis of human tumors [6].
Pancreatic ductal adenocarcinoma (PDA) is the fourth leading cause of death due to cancer, with over a 98 % casefatality rate. The crucial molecular events, required for progression from a pre-invasive and non-life threatening state to an invasive and metastatic lethal condition, are not well-understood. We previously reported the results of a SB transposon-based forward genetic screen for drivers of PDA in mice expressing the Kras G12D oncogene in epithelial cells of the pancreas [7]. Our screen revealed new candidate genes for PDA and confirmed the importance of many genes and pathways previously implicated in human PDA. The most commonly mutated gene was the X chromosome-linked deubiquitinase Usp9x, which was inactivated in over 50 % of the tumors. In addition, several hundred candidate PDA genes were identified as CIS in this screen.
In this paper, we report analyses intended to determine whether a core group biological processes or pathways are populated by genes from CIS. We applied a less stringent criterion to consider CIS associated genes that mutated both at high frequencies (mountains) and at lower frequencies (hills). Secondly, we determined whether nonrandom associations between alteration of genes in certain pathways or biological pathways exist by analysis of CIS from individual tumors.

Certain pathways are enriched in CIS-associated genes
We collected insertional mutatgenesis data of tumor samples from 146 Kras LSL-G12D ; Pdx1-cre; T2/Onc; Rosa26-LSL-SB13 mice. To determine whether a core group of pathways were enriched with CIS-associated genes than reported previously [7], we analyszed 968 CIS with uncorrected p value < 10 −4 from TapDance. Among these, 239 genes were mapped an grouped into 281 KEGG curated pathways categories. After excluding pathways with less than 6 genes, 272 KEGG pathways remain in the following analysis.
Using the CO-GSEA described in the Method Section, we found 95 KEGG pathways that are enriched with CISassociated genes with permutated p value < 10 −7 listed in Table 1 (more details about the disrupted genes in each pathway can be found in Additional file 1). In Table 1, "# of genes" records the number of genes defined in the pathway from KEGG; "# of CIS" (third column) reports the number of CIS genes in the pathway; and "# of mutated cases" (fourth column) records the number of cases that the pathway was disrupted. A histogram of the sizes of each of the KEGG pathways is shown in Additional file 2. In Figs. 1 and 2, we plotted the KEGG diagrams of two pathways that are enriched CIS-associated genes.
The genetic screen was designed to discover genes that when altered would cause acceleration of PDA in pancreatic ductal epithelial cells expressing an activated form of the Kras oncogene, Kras G12D . As such it was not surprising that KEGG pathways with the strongest statistical support for CIS associated gene enrichment were many cancer associated pathways. As expected, we found some of the same pathways previously reported and which were expected [7,8]. An informal prior analysis [7] suggested that TGFβ signaling was enriched in CISassociated genes and indeed we found that this KEGG pathway is enriched. Similarly, Rb1/p16Inka4a pathway was suggested to be recurrently altered by CIS-associated genes [7]. Indeed, we found that the KEGG pathway CELL CYCLE was enriched in CIS-associated genes. Many other cancer-associated pathways were enriched in CIS-associated genes including the RAS, PI3K-AKT, HIPPO, VEGF, HEDGEHOG, MAPK, FOXO1, and MTOR pathways. Moreover, the human disease KEGG pathway PANCREATIC CANCER and several other human cancer pathways were enriched in CIS-associated genes.
In addition to these expected KEGG pathways, many involving metabolism have not been strongly linked to pancreatic cancer development or cancer development in general. However, recent studies revealed evidence of metabolic reprogramming to sustain tumor survival in KRAS-mutated PDA tumors [9]. For example, KRAS-dependent tumor cells compensated the energy loss through increasing glycolysis, amino acid and lipid biosynthesis [10]. In particular, TERPENOID BIOSYN-THESIS, LYSINE DEGRADATION and the SULFUR RELAY SYSTEM are significantly altered in the SBaccelerated tumor models. To date KRAS remains a poorly druggable target, hence, targeting the downstream metabolic regulation could be effective alternatives in inhibiting tumor growth.
Several organismal systems KEGG pathways were also enriched in CIS-associated genes despite not   being strongly linked to pancreatic cancer development. These include OXYTOCIN SIGNALING, CHOLINER-GIC SYNAPSE, and MELANOGENESIS. Our recent work helped show that the AXON GUIDANCE pathway is enriched for CIS-associated genes, a result which led to the discovery that these genes and the pathways they participate in are altered in human PDA [8]. This result was reproduced in this current analysis. Thus, it is clear that the broadened definition of CIS allows for the identification of many known and novel candidate cancer pathways. These data suggest many new hypotheses to be tested in PDA development.

Analysis of individual tumor reveals significant co-altered pathways
We and others have published results of SB screens in which we found that individual CIS tended to be comutated by transposon insertion more than expected by chance (e.g. [11]). We wondered whether an analysis of individual tumors would reveal that specific pathways would be co-altered in this same manner. Figure 3 shows a heat map of adjusted correlation between pair of pathways, which are co-altered by transposon insertions within/near genes in those pathways. We observed that there are two major clusters of strongly co-altered pathways. Within these clusters certain specific pathways show strong associations, being altered by transposon insertion in the same tumors more often than would be expected by chance. These data provide the basis for developing specific hypotheses about pathways that interact to cause cancer. Thus, alterations of one pathway may allow the other pathway to exert its full oncogenic effects. A careful analysis of some of the associations reveals pairs of pathways that might be predicted to interact based on what is known about their functions and regulation already. For example, block 1, labeled in Fig. 3, contains strong associations between the ubiquitin processing pathway and several pathways including ErbB, Insulin and mTOR signaling. It is known that cell signaling pathways that transmit signals from the extracellular space into the cell cytoplasm and nucleus are regulated by the abundance and stability of certain proteins. In many cases, the stability of these proteins is regulated by ubiquitination and degradation by the proteosome. Well known examples, include NFκB and Wnt/β-catenin signaling pathways. Work shows that members of the ErbB family of receptors are downregulated by ubiquitination involving the E3 ubiquitin ligase Cbl [12]. Ubiquitination also regulates Akt-mTOR signaling in multiple myeloma [13] and Akt-mTOR is activated by insulin signaling [14].
Block 2, labeled in Fig. 3 contains several other intriguing pathway-pathway associations. For example, we see a strong association between cell cycle control and miR-NAs known to be involved in cancer. Indeed, there are several well studied examples of miRNAs that regulate the mRNA transcripts of cell cycle regulators such as MYC [15], RB1 [16] and CCND1 [17]. Also in block 2, we see evidence for TGFβ pathway and MAPK pathway codysregulation. Abundant evidence for crosstalk between these pathways has been published [18,19]. Thus, it is entirely plausible that co-alteration between these pathways is specifically selected for during PDA progression. Specific hypotheses can, or have been, tested in the laboratory. For example, MAPK activation, via expression of the Kras G12D oncogene, cooperates strongly with Smad4 inactivation, which alters/inactivates TGFβ signaling, in a mouse model of PDA [20]. This functionally confirms the observation from the analyses done here. We can thus predict, that many other pathway-pathway associations observed in Fig. 3 can be functionally validated. More speculative, but of tremendous therapeutic significance, is the idea that targeting one pathway of a pathway-pathway pair observed in Fig. 3 would alter the ability of the second pathway to exert its oncogenic effects. Indeed, cotargeting both of such pairs of altered pathways may be the most effective way to treat individual cases of PDA. These

Association of CIS-associated genes and enriched pathways
Several of the most commonly altered genes in the PDA screen (i.e. the top ranked CIS-associated genes) have little published functional data. We speculate that by finding which pathways they most often interact with, something could be learned about their function in general and in PDA development. The associations between the top ranked CIS and enriched pathways are shown in Fig. 4. In Fig. 4, several CIS-associated genes such as Stag2, Arhgap5, Usp9x, Magi1, Arid1a have few connections to enriched pathways then other CIS-associated genes. In Additional file 3, we listed these connections and corresponding estimates from regression model, p values and FDR. For example, in Additional file 1, Usp9x is associated with PI3K-AKT signaling pathway, DOPAMINERGIC Fig. 2 Frequently mutated genes in ErbB signaling pathway. Darker color indicates higher mutation frequencies in mice SYNAPSE, HIPPO signaling, and TIGHT JUNCTION pathways. Thus, it seems likely that Usp9x mutation or down regulation has to cooperate with alterations in these other pathways in order for PDA to develop. The CISassociated genes that also demonstrated association with the these Usp9x-associated pathways are Gsk3b, Ctnna1, Mll5, Pten, Arfip1, Magil.

Conclusion
In this work, we demonstrate the non-random enrichment of CIS-associated genes from a transposon-based screen for PDA into certain KEGG signaling pathways, disease states and biological processes.

Methods
To assess whether a pathway harbors more CIS-associated genes than expected by chance, we use CO-GSEA approach. For each tumor sample, we considered a pathway is altered (coded: 1) if at least 1 gene in the pathway was mutated; coded zero if it's not. A score for each pathway was calculated to be the number of tumors in which the pathway is altered. We assessed whether the score of a pathway was statistically significant through random permutation. For example, if a mouse tumor contains 100 mutations, we randomly assigned the 100 mutations to 100 different genes. A score for each pathway can be obtained by counting the number of altered tumor samples after the permutation. We repeated the permutation 10 7 times to obtain the distribution of score under the null for each pathway and calculated a p value based on the permuted null distribution. A similar approach was also applied in mutation analysis of human tumor samples in [6].
The ability to detect a significant pathway using the CO-GSEA approach depends on the background mutation rate and the size of the pathway under consideration. The relationship between the number of total cases, and the expected score of a given pathway under random permutation can be described as: is the total number of cases; G is the number of genes   considered in the pathway analysis; n i is the number of events in sample i and P s is the number of genes in the pathway [6].

Analysis of co-altered pathways
To investigate whether a pair of pathways was co-altered in a significant manner, we remove the CIS-associated genes that are present in both pathways, and for each sample, we calculated the mutation frequency in each pathway using the remaining non-overlapping CIS as: # of mutations in sample i in the pathway # of non-overlapping CIS in the pathway . For each pair of pathways, Pearson correlations were calculated to present the correlation between pathways characterized by nonoverlapping CIS using the mutation counts.

Association between top CIS-associated genes and enriched pathways
Among the top 20 CIS-associated genes previously reported [7], 12 of them listed do not map to any KEGG pathways. We conducted association analysis between the top 20 CIS-associated genes and the enriched KEGG pathway using quasi-Poisson regression models with overdispersion. For each CIS-associated gene, we examined whether mutation status of the CIS-associated gene (code 1 if mutated; 0 otherwise) is associated with higher mutation counts (the number of altered CISassociated genes) for a pathway under consideration. We reported the CIS and pathways associations with FDR < 0.001. Competing interests DL is share holder and consultant for NeoClone Biotechnology, Inc and Discovery Genomics, Inc. and received grant support from Genentech, Inc. Y-Y H, TKS, and RL declare that they have no competing interests.