Genome-wide analysis reveals the association between alternative splicing and DNA methylation across human solid tumors

Background Dysregulation of alternative splicing (AS) is a critical signature of cancer. However, the regulatory mechanisms of cancer-specific AS events, especially the impact of DNA methylation, are poorly understood. Methods By using The Cancer Genome Atlas (TCGA) SpliceSeq and TCGA data for ten solid tumor types, association analysis was performed to characterize the potential link between cancer-specific AS and DNA methylation. Functional and pathway enrichment analyses were performed, and the protein-protein interaction (PPI) network was constructed with the String website. The prognostic analysis was carried out with multivariate Cox regressions models. Results 15,818 AS events in 3955 annotated genes were identified across ten solid tumor types. The different DNA methylation patterns between tumor and normal tissues at the corresponding alternative spliced exon boundaries were shown, and 51.3% of CpG sites (CpGs) revealed hypomethylated in tumors. Notably, 607 CpGs were found to be highly correlated with 369 cancer-specific AS events after permutation tests. Among them, the hypomethylated CpGs account for 52.7%, and the number of down-regulated exons was 173. Furthermore, we found 38 AS events in 35 genes could serve as new molecular biomarkers to predict patient survival. Conclusions Our study described the relationship between DNA methylation and AS events across ten human solid tumor types and provided new insights into intragenic DNA methylation and exon usage during the AS process.


Background
Alternative splicing (AS) is one of the conserved biological processes diversifying the transcriptome and proteome [1]. Pre-mRNA splicing is a key step of gene expression in which introns within nascent RNA are removed, and exons are ligated to form mature mRNA [2]. It is estimated that more than 90% of human genes with multiple exons undergo AS during pre-mRNA maturation, highlighting the importance of AS in determining gene function [3,4]. Recently, AS has been implicated as an important signature to understand tumorigenesis, cancer progression, and resistance to therapy [5]. Several lines of evidence have shown that the disruption of AS is frequently associated with the inactivation of tumor suppressors and activation of oncogenes [6,7]. For example, Bcl-x, an apoptosis regulator, is found to be switched from its pro-apoptotic into antiapoptotic splicing isoforms in a number of cancer types [8,9]. Another well-characterized example is CD44, whose different splicing isoforms have been associated with tumor evasion and metastasis [10,11]. Moreover, Kahles and colleagues have recently provided a comprehensive landscape of AS events across different tumor types [12]. Other studies have also reported the prognostic value of AS events in multiple cancer types, such as ovarian cancer [13], breast cancer [14], glioblastma [15] and gastrointestinal adenocarcinomas [16], suggesting a predominant role of splicing dysregulation in cancers.
The splicing reaction is generally regulated by cis-elements within the pre-mRNA and trans-acting splicing factors that bind to these cis-elements [17]. Alterations of splicing factors, mutations in spliceosomal proteins, and regulation exerted by pre-mRNA cis-elements can all contribute to the splicing alterations in cancers [18][19][20]. Additionally, epigenetic modification has been recently proposed as another regulator of alternative pre-mRNA splicing patterns [11,21,22]. The genome-wide mapping analysis has unveiled that DNA methylation was a strong marker for exon boundaries and suggested the possible role of DNA methylation in exon definition and splicing regulation [23,24]. Anastasiadou et al. have reported that the increased CpG methylation was frequent in alternatively spliced sites [25]. In support, by studying wild-type and methylation deficient embryonic stem cells, Yearim et al. have found that DNA methylation could affect the splicing of more than 20% of alternative exons [26].
The global change of splicing in tumor tissues in comparison to their normal tissue counterparts is increasingly appreciated. However, the regulatory mechanisms underlying cancer-specific AS events, especially the influence of DNA methylation on AS, remain poorly understood. To figure out the association between AS events and DNA methylation of alternatively spliced exons, we first compared the transcriptome-wide splicing between tumor and matched normal tissues across ten solid tumor types. Further, we compared the DNA methylation of CpG sites (CpGs) at the boundaries of alternatively spliced exons. Finally, we utilized correlation analysis and permutation test to assess whether there is an association between DNA methylation and cancer-specific AS events. By these means, we expect to explore the role of methylation in exon usage during transcriptional process and carcinogenesis.

Overall workflow
The overall workflow of the present study is shown in Fig. 1. Using TCGA SpliceSeq dataset for ten solid tumor types, AS events were analyzed in 580 paired tumor-normal tissues. Then, the methylation levels of CpGs at the boundaries of alternatively spliced exons were compared between 641 normal and tumor samples from TCGA. Subsequently, we correlated cancer-specific AS events with methylation features and observed a significant association between them. To further understand the function and mechanism of the genes containing methylation-associated cancer-specific AS events, functional enrichment, and network analyses were applied. Finally, we found that a set of cancer-specific AS events could serve as reliable prognostic biomarkers for cancers.

Data acquisition
Data on transcriptome alterations were downloaded from TCGA SpliceSeq portal (http://bioinformatics.mdanderson. org/TCGASpliceSeq/). TCGA SpliceSeq is a web-based bioinformatics resource containing the mRNA splicing patterns of 33 types of TCGA cancers and adjacent normal samples. SpliceSeq can be used to calculate the "percent spliced in" (PSI) value for each splicing event in each cancer sample [3], which provides a clear view of the splice junction and the proportion of exons included in different samples. PSI values were utilized to quantify seven types of AS events: exon skip (ES), alternate donor site (AD), alternate acceptor site (AA), retained intron (RI), mutually exclusive exons (ME), alternate terminator (AT) and alternate promoter (AP) [27]. In the present study, AS events (379,749 in total) that have PSI values in more than 75% of samples were included.
Data of somatic mutations, copy number variations, mRNA expression, and DNA methylations were downloaded from publicly released TCGA level 3 data (processed data from USCS Cancer Genome Browser). In this study, ten types of cancers were included: breast invasive carcinoma (BRCA), colon adenocarcinoma (COAD), head and neck squamous cell carcinoma (HNSC), kidney renal clear cell carcinoma (KIRC), kidney renal papillary carcinoma (KIRP), liver hepatocellular carcinoma (LIHC), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), prostate adenocarcinoma (PRAD) and thyroid carcinoma (THCA). Illumina Human Methylation 450 BeadChip (450 K array) was used for DNA methylation data. Patient clinical information was also downloaded from TCGA dataset. The sample size of each cancer is summarized in Additional file 1: Table S1.
Gene coordinates and RefSeq annotations were obtained from UCSC (Jul 2013 release, hg 19) [28]. All the features of exons and introns referred to RefSeq genes. The 'exon boundary' was determined as 200 nucleotides in size for intronic regions and 39 nucleotides for exonic regions [29].

Identification of cancer-specific AS events and methylated CpGs
Wilcoxon paired test was applied to compare the PSI values between tumor tissues and matched adjacent normal tissues. In each cancer type, we identified cancerspecific AS events that satisfied the following criteria: (1) the distributions of PSI values were significantly different between tumor and normal tissues (FDR < 0.05); (2) the change of PSI values (|△PSI|) between tumor and normal tissues was greater than 0.1. Differentially methylated CpGs were detected by applying a paired t-test, setting the absolute average methylation change (|△meth|) ≥ 0.1 and FDR < 0.05 as selection criteria. All statistical analyses were performed using R software (version 3.6.0).

Correlation analysis and permutation test
Patients with both splicing information and methylation data were used for correlation analysis. Covariance analysis model, after adjusting for copy number variants and somatic mutation in tumor, as well as Spearman correlation analysis in normal tissues, were performed to investigate the correlations between PSI and DNA methylation respectively. In order to infer their statistical significance, we further carried out the permutation test. The number of permutations was 1000, and we considered the permutation P < 0.05 was statistically significant. In the present study, the aovperm() function in R package "permuco" and spear-man_test() function in package "coin" were used for permutation tests.

Functional enrichment analysis
Gene Ontology (GO) enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses were conducted on genes with methylation-associated cancerspecific AS events using "ClusterProfiler" package in R software [30]. GO and KEGG enrichment analyses were based on the threshold of FDR < 0.05.

Protein-protein interaction analysis
Genes with significantly methylation-associated AS events were analyzed through the protein-protein interaction (PPI) Fig. 1 Flowchart of the present study network. The PPI analysis was constructed with the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING, https://string-db.org/cgi/input.pl) [31]. The confidence score ≥ 0.4 and the maximum number of interactors = 0 were set as the selection criteria.

Survival analysis
To assess the prognostic value of cancer-specific AS events and methylated CpGs, we performed survival analysis by using the "survival" package in R. The Cox proportional hazard model was utilized to test the interactions between different variables and overall survival in multivariate analysis by adjusting for age, gender, TNM stage and adjuvant therapy (including chemotherapy and radiotherapy). P < 0.05 was considered as the threshold for significance.

Identification of cancer-specific AS events in common solid tumors
To systematically characterize abnormal AS events in common solid tumors, we compared PSI values between tumors and matched adjacent normal tissues. We found that 15,818 AS events in 3955 annotated genes were significantly altered in cancer vs. normal tissues. Splicing alterations were most abundant in KIRC and LUAD, while least in COAD (Additional file 1: Table S2). For KIRC, a total of 2678 differential AS events in 1550 genes were identified, including 81 AAs in 74 genes, 91 ADs in 83 genes, 1190 APs in 643 genes, 535 ATs in 286 genes, 611 ESs in 474 genes, 12 MEs 12 genes, and 142 RIs in 137 genes. For LUAD, a total of 2212 differential AS events in 1369 genes were identified (Fig. 2a).
For the seven AS events types, we found that AP and ES took up 44.0 and 21.2% of the AS events, respectively, followed by AT (19.9%). In addition, we observed that the numbers of AS events with increased vs. decreased PSI values were similar, with the exception of RI and AD events. (Fig. 2b).
Notably, one gene could possess more than one cancer-specific AS events. The numbers of genes that possess differential AS events in varied cancer types are depicted in Fig. 2c.

Identification of differentially methylated CpGs at exon boundaries
To examine the differences of DNA methylation, we retrieved methylation data of CpGs at exon boundaries from TCGA data portal. After comparing between tumor and matched normal tissues, we identified a large amount of CpGs with differential methylation levels ( Fig. 3a). Totally, 1180 CpGs were detected, among which KIRC displayed the highest number of CpGs, including 5 CpGs in AA, 3 CpGs in AD, 186 CpGs in AP, 23 CpGs in AT, 15 CpGs in ES and 18 CpGs in RI (Additional file 1: Table S3). We also observed that the proportions of hypomethylated CpGs in different solid tumors were similar, with the exception of COAD (32.4%) and PRAD (10.9%) (Fig. 3b). When comparing the differentially methylated CpGs in varied types of AS events, 833 CpGs were identified in AP type, of which 48.6% were hypomethylated. The proportions of hypomethylated CpGs in other types of AS were similar, ranging from 43.9% in ES to 66.7% in AD (Fig. 3c). Taken together, hypomethylation was a more frequently observed form of differential methylation in tumor tissues, compared to hypermethylation.

Correlation between cancer-specific AS events and DNA methylation
To explore possible regulatory mechanisms of cancerspecific AS, we further applied correlation analysis between the PSI values and methylation levels of CpGs at alternatively spliced exon boundaries. A significant association between PSI and DNA methylation was observed from the covariance analysis model after adjusting for covariates, including copy number variants and somatic mutation. After performing the permutation test, 891 CpGs were found to be highly correlated with 483 cancer-specific AS events (permuted P < 0.05) (Additional file 1: Table S4). To better determine whether the changes in AS resulted from changed methylation levels or vice versa, we also performed association analysis in the normal tissues. Results showed that the methylation levels of 607 CpGs were not significantly correlated to PSI in the normal tissues after permutation tests (permuted P < 0.05) (Additional file 1: Table  S4). The number of PSI-associated CpGs is presented in Fig. 4a. Among them, 320 CpGs were hypomethylated, and 287 were hypermethylated in tumor tissues. Expect for COAD, LIHC, LUAD, LUSC, and PRAD, the hypomethylated CpGs account for more than half of AS-associated CpGs in cancers (ranging from 54.5 to 95.7%)(Additional file 2: Figure S1-A). Among the exons, 173 exons were down-regulated (ranging from 35.7% in THCA to57.6% in LUSC) in the tumor tissues (Additional file 2: Figure S1-B).
Though the regions of splicing-related CpGs showed differently across varied chromosomes and regions (Fig.  4b), some common features in different tumor types were sill discovered. The common AS events having similar relationships with CpGs, which changed in at least three cancer types are shown in Table 1. This result suggested that there were indeed a significant number of AS-associated CpGs shared by multiple cancer types. Taking ZNF577 and cg11269599 as examples, we found that the exon 1.2 and exon 1.3 of ZNF577 showed increased PSI values in five cancer types when comparing tumors to the cognate normal tissues. The CpG site, cg11269599, also showed significantly hypermethylated in HNSC, KIRC, KIRP, LUSC, and PRAD. The correlations between both aforementioned were also observed consistently in the five cancer types (Fig. 4c). The preference of a specific splice variant by several cancer types could help to confirm the link between AS and DNA methylation [32].

Gene functional enrichment analysis and network construction
We further conducted GO analysis on genes containing methylation-related cancer-specific AS events for different cancer types. Results showed that these genes were enriched in different GO terms across different cancers. In COAD, high enrichment of genes was associated with cellular components, including sarcolemma and basal part. In THCA, genes were mainly associated with molecular functions, including hormone binding and acyl-CoA ligase activity (Additional file 1: Table S5). KEGG pathway analysis was also performed. Only the pathway "Glycerophospholipid metabolism" was found significantly enriched by genes in LIHC (Additional file 1: Table S6). These results suggested that the activity of these genes might be in an independent manner in different cancer types.
Moreover, we constructed a PPI network using the STRING database. In the PPI network, NCK1 and DOK1 were the top hub genes in the HNSC network. PIK3R1 PRKACA and LCK were the top three hub genes in the LUSC network. However, in other cancers, no PPI networks were constructed with statistical significance (Additional file 3: Figure S2).

Survival analysis of AS events and CpGs
We then explored whether the associated AS events and CpGs could be effectively used as a prognostic signature for each cancer. Cox regression analysis was performed adjusting for age, gender, TNM stage and adjuvant therapy, and results showed that 10.3% (38/369) AS events could significantly distinguish patients with longer versus shorter survival prognoses ( Table 2). Because splicing of introns could be regulated under a co-transcriptional mechanism [33], an alteration in gene expression may affect AS of corresponding genes. Thus, we examined mRNA expression of genes   containing cancer-specific AS events and found that the majority of them (85.71%, 30/35) were not significantly associated with survival, suggesting that these AS events are partially influenced by their gene expression (Table 2). Additionally, we evaluated DNA methylation of CpGs at the survival-related AS exons boundaries. 14 (20.6%) CpGs in 8 (21.1%) AS exons were found significantly associated with the overall survival of cancer patients. (Table 2).

Discussion
Collectively, we performed a comprehensive analysis of the relationship between cancer-specific AS events and DNA methylation. We found that approximately half of the AS events were AP and ES types in human solid tumors. The boundaries of alternatively spliced exons were more likely to be hypomethylated in tumor tissues compared with adjacent normal tissues. Association analyses and permutation tests revealed that cancer-specific AS events were significantly correlated with DNA methylation. Our research provided a novel perspective of the regulatory mechanism of cancer-specific AS.
It is well accepted that AS plays an important role in multiple cellular processes and development programs, as well as contributing to tumorigenesis [34,35]. Nonetheless, to our knowledge, few studies have investigated the role of DNA methylation as a regulator of the cancer-specific AS. Recently, Laurent et al. have demonstrated that methylation differences were more prominent at the exon-intron boundaries [36]. Other studies also found that DNA methylation links with the inclusion rate of alternative exons [26,37]. These findings supported the hypothesis that AS could be functional by cis-regulation of DNA methylation. With the advantage of high-throughput data, TCGA data portal provides opportunities for the integration analyses of multi-omics data. TCGA Splice Seq, web-based bioinformatics, providing a clear view of the mRNA splicing patterns of 33 tumor types, across a dataset of more than 10,000 TCGA samples. Ryan et al. identified and calculated each potential splicing event across 33 types of cancer to establish TCGA SpliceSeq database, while did not evaluate the potential mechanism and clinical usage of AS events [27]. In the present study, we integrated AS events from SpliceSeq and TCGA data together to comprehensively explore potential regulatory mechanisms of DNA methylation for cancer-specific AS. We performed an association analysis between cancer-specific AS events and CpGs at their exon boundaries. The parameter selected in this study was based on previous reports [38][39][40], especially Castle's study, in which they examined the exon neighborhoods in the size of 200 nucleotides for intronic regions and 39 nucleotides for exonic regions to identify splicing cis-regulatory elements in sequences [29].
In the present study, to ensure the AS difference resulted from DNA methylation, we excluded CpGs showing evidence of associations with AS in normal tissues. By using such analysis, we identified a number of PSI-related CpGs, whose locations appeared differently across varied chromosomes and regions in different cancers. Moreover, the higher proportion of decreased methylation levels of AS-associated CpGs were observed in the majority of solid cancer types, except for COAD, LIHC, LUAD, LUSC, and PRAD. We speculated that this was due to the heterogeneity among tumor samples. It is well known that the splicing process is predominantly tissue-and cancer-specific [12,41]. Meanwhile, DNA methylation is also demonstrated to be varied across tumor types [42]. It is worth to note that several cases showing the same pattern of changes were identified across different cancer types. These signatures were potential candidates for actively selected variants, which might drive tumorigenesis. For example, we found the consistent changes of AS event and cg11269599 in gene ZNF577 (zinc finger protein 577) across several tumor types. Zinc finger proteins are commonly involved in transcriptional regulation of genes, but the mechanism of how DNA methylation affects AS of ZNF577 has not been clarified and needs to be determined in further studies [43].
Recently, several studies have given insight into the regulation of DNA methylation in AS. It has been suggested that DNA methylation at exonic regions could affect the binding of methyl-sensitive DNA-binding proteins, such as MeCp2, CTCF, and HP1 [37,[44][45][46]. MeCp2 was found to be enriched in spliced exons, which could facilitate the recruitment of histone deacetylase, thus causing Pol II pause and exon inclusion [46]. In 2011, Shukla et al. found the DNA methylation-mediated splicing of CD45 pre-mRNA, which inhibited CTCF binding [37]. HP1 could bring several splicing factors to methylated exons [26]. In addition, DNA methylation had also been reported to determine nucleosome positioning and affected the transcriptional elongation rate [47].
We also investigated the prognostic values of the cancerspecific AS events by conducting Cox regression. We found 38 methylation-related cancer-specific AS events could significantly distinguish patients with longer versus shorter survival prognoses. It should be noted that these identified signatures were overrepresented in renal carcinoma. Renal carcinoma has various histological and molecular subtypes and is characterized by poor prognosis and high recurrence. Until now, several molecular biomarkers have been investigated for renal carcinoma, while none of them have been used in clinical practice [48,49]. Therefore, the identification of novel and effective prognostic biomarkers is important for patients suffering from renal carcinoma. The signatures identified in the present study in renal cancers were not well reported previously and could provide the basis for further study into the pathomechanisms and serve as potential therapeutic targets. Additionally, 14 CpGs at the exon boundaries of 8 survival-related AS also showed significant association with overall survival, even after adjusting for age, gender, TNM stage, and adjuvant therapy.
These results led us to speculate that the identified CpGs might be the specific cancer drivers, and serve as prognostic biomarkers for cancers. By the integrated analyses of multi-dimensional data, our study revealed a deeper understanding of cancer-specific AS events and their relationships with DNA methylation across different solid tumor types. Another strength of our study was the pan-cancer analysis, which could reveal similarities and specificities across different cancer types, as well as enabled the elimination of false-positive and falsenegative calls made in several single-tumor-type projects [50]. Finally, we provided novel mechanisms and therapeutic perspectives for either one or several cancer types. The present study has several limitations. First, our study investigated the methylation-splicing features across multiple cancer types instead of the difference within cancer. In addition, we only studied the differentially methylated CpGs between tumor and normal tissues, which could miss the CpGs with hyper-or hypo-methylation in tumors that were associated with either down-or up-regulation of PSI. Finally, our research was based on the public data which lacked several important clinical features, such as a history of drug use, and has not been studied in survival analysis. Therefore, further research is needed to clarify and demonstrate the nature of the regulation of methylation in AS.

Conclusion
Overall, our analyses identified a significant association between DNA methylation and cancer-specific AS events. The present study contributes to the understanding of the role of methylation in exon usage during transcriptional process and carcinogenesis. Finally, this relatively small set of AS events and CpGs will facilitate the discovery of critical regulators, which are responsible for splicing dysregulation in cancers and thus can be used as new therapeutic biomarkers.