CAS-viewer: web-based tool for splicing-guided integrative analysis of multi-omics cancer data
- Seonggyun Han†1,
- Dongwook Kim†1,
- Youngjun Kim†2,
- Kanghoon Choi1,
- Jason E. Miller3,
- Dokyoon Kim3, 4 and
- Younghee Lee1Email author
© The Author(s). 2018
Published: 20 April 2018
The Cancer Genome Atlas (TCGA) project is a public resource that provides transcriptomic, DNA sequence, methylation, and clinical data for 33 cancer types. Transforming the large size and high complexity of TCGA cancer genome data into integrated knowledge can be useful to promote cancer research. Alternative splicing (AS) is a key regulatory mechanism of genes in human cancer development and in the interaction with epigenetic factors. Therefore, AS-guided integration of existing TCGA data sets will make it easier to gain insight into the genetic architecture of cancer risk and related outcomes. There are already existing tools analyzing and visualizing alternative mRNA splicing patterns for large-scale RNA-seq experiments. However, these existing web-based tools are limited to the analysis of individual TCGA data sets at a time, such as only transcriptomic information.
We implemented CAS-viewer (integrative analysis of Cancer genome data based on Alternative Splicing), a web-based tool leveraging multi-cancer omics data from TCGA. It illustrates alternative mRNA splicing patterns along with methylation, miRNAs, and SNPs, and then provides an analysis tool to link differential transcript expression ratio to methylation, miRNA, and splicing regulatory elements for 33 cancer types. Moreover, one can analyze AS patterns with clinical data to identify potential transcripts associated with different survival outcome for each cancer.
CAS-viewer is a web-based application for transcript isoform-driven integration of multi-omics data in multiple cancer types and will aid in the visualization and possible discovery of biomarkers for cancer by integrating multi-omics data from TCGA.
Alternative splicing (AS) is important to our understanding of cancer biology. There are multiple mechanisms by which AS plays a role in cancer, for instance cancer-specific transcript isoforms can be generated  or the ratio between mRNA isoforms can be disrupted . Epigenetic factors such as DNA methylation and miRNAs are not only distinct molecular markers of various cancers , but methylation and miRNAs are mechanistically linked to the splicing mechanism [4, 5]. Therefore, in order to comprehensively understand the basic principles of mRNA expression patterns in cancer, it will be important to investigate AS with respect to epigenetic factors.
The Cancer Genome Atlas (TCGA) is a comprehensive resource for cancer genomic studies and precision medicine. TCGA has produced a number of multi-omics level data including: transcriptome-wide expression, genetic variants, DNA methylation, miRNA, and clinical information for 33 cancer types. There are existing web resources that allow one to explore, analyze, and visualize TCGA data, including cBioPortal , FireBrowse , Vials  and SpliceSeq . Most of these existing resources are available for exploring alternative mRNA splicing patterns but it is limited to transcriptome based visualization alone. MEXPRESS is a well-designed web-based tool for easy visualization and analysis of multi-layer of omics data - TCGA expression, DNA methylation, and clinical data  but lacks the ability to explore alternative mRNA splicing patterns.
In this study, we implemented CAS-viewer, offering a set of AS-guided analysis tools for transcripts, miRNAs, DNA methylation, and clinical data from TCGA and SNPs. CAS-viewer has several important features. CAS-viewer allows users to analyze how different isoforms are associated with DNA methylation in exonic and intronic regions and miRNAs in the 3’ UTR. CAS-viewer also correlates the expression ratio with clinical data of interest. It is easy to navigate AS isoforms by using an intron scaling bar, allowing easy conversion between genomic and transcript views. CAS-viewer also provides functional annotations of SNPs by summarizing their co-occurrence with splicing regulatory elements (SREs). Taken together, a tool that is able to integrate AS, expression and epigenetic features, along with clinical data from TCGA provides new ways of conceptualizing how the molecular mechanisms behind cancer can be studied.
TCGA data: We compiled level 3 data on transcripts, miRNA expression, DNA methylation, and clinical data for 33 cancer types from the TCGA Genomic Data Commons (GDC) data portal (https://portal.gdc.cancer.gov/). The details for each cancer type and the number of cases in the 33 cancers are summarized in Additional file 1: Table S1. For AS gene model, we used the genomic positions of exons and introns of all transcripts for each gene of a total of 20,465 genes obtained from the Generic Annotation Format (GAF) file, based on the hg19 reference downloaded from the GDC. The GAF file is the same used in the TCGA RNA-seq analysis. We also used the same genome builder for genomic positions of DNA methylation, miRNA targets, and SNPs.
miRNA target sites: The miRNA target sites in the 3’ UTR were compiled by integration of three miRNA target databases: 1) miRTarBase , which is based on experimentally validated miRNA targets; 2) TargetScan (Release 7.0) , which is based on conserved complementarity between targets of miRNAs and mRNAs; and 3) MicroRNA.org, which is based on the miRanda algorithm . First, we obtained all pairs between mRNAs and miRNAs with the miRTarBase database. Second, we obtained the targeted genomic coordinates in paired mRNAs using TargetScan and MicroRNA.org. Then, we identified the 3’ UTR regions that each mRNA binds to according to the Ensembl reference information (release version 75, downloaded from http://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/, September 2016), comprising 322,389 unique pairs between 2649 mRNAs and 14,894 genes.
SRE SNP: To perform a genome-wide scan of SNPs affecting splicing regulatory elements (SREs), we obtained all SNPs and their genomic locations from the VCF files of each chromosome, which were downloaded from the 1000 Genomes Project . The SNPs were divided into two groups, intronic SNPs and exonic SNPs, according to their functional classification. The genome-wide identification of SNPs affecting SREs have been previously described . Using the set of predicted hexameric SRE motifs, including exonic splicing enhancers (ESEs), exonic splicing silencers (ESSs), and intronic splicing enhancers (ISEs) published by Burge and colleagues , all potential SRE sites that matched perfectly with any of these hexamers in the entire intragenic region using the twoBitToFa program for hg19 were identified . A total of 4,955,866 SRE SNPs, comprising 527,138 SNPs in ESEs, 196,969 in ESSs, and 4,328,005 in ISEs were annotated.
Here, X it and X st are the expression of transcripts, including and skipping an alternatively spliced exon, respectively. With this PSI value, CAS-viewer performs a correlation test with clinical data, methylation level, and miRNA expression as described in the following section.
Correlation of PSI with clinical data: CAS-viewer performs a Kaplan-Meier survival analysis using survival, which is an R package . First, we divide the cases into two groups by high and low PSI values, using K-means clustering for each group of cases, defined by a user. For the high-PSI and low-PSI cases, we calculate differential survival outcomes using “vital status” and “date to death” information.
Correlation of PSI with methylation and miRNA in the two groups of cases: For each of the two groups of cases defined by the user, we perform linear regression to correlate PSI with methylation levels and miRNA expression. Comparison of differential methylation and miRNA between the two groups of cases is performed using Welch two-sample t-test  (the t.test function in R package).
CAS-viewer is currently available at http://genomics.chpc.utah.edu/cas/. On the landing page of CAS-viewer, a user can search for an AS gene with a keyword using the HUGO approved gene symbol, DNA methylation cgid, miRNA id, and SNP rs number. The next page returns the searched gene(s) whose intragenic region (i.e., defined as “the transcribed gene region” from the start to end of the transcript) matches with the genomic location of an entered keyword. Clicking a gene links the user to the main page composed of three components: 1) AS Transcript Navigator: for browsing AS transcript isoforms along with methylation sites, miRNA binding sites, and SNPs; 2) Option: select transcripts and cancer cases of interest, based on AS events (i.e., exon skipping, intron retention, and 5′ and 3′ splice site) and clinical information, respectively; and 3) Output: shows the results for differential expression between selected transcripts (i.e., PSI; see Methods), its association with selected clinical features, DNA methylation level, targeting miRNA expression, and SNPS located in SREs.
“Exon Usage”, the last transcript track, is composed of the representative exons that are defined by the clustering of overlapping exons (Fig. 1a). Exons are clustered according to genomic location to find overlapping exons. Then, we create “representative exons,” which is essentially a concatenation of the longest exons in each exon cluster. Exons whose length differs from the representative exon can be easily recognized to be alternatively spliced (i.e., 5′ and 3′ splice sites and intron retention). The color on the exon represents the skipping frequency of each exon; the lighter the color, the more frequently it is skipped. The user can also see the same information through the mouse over pop-up for each exon that shows how many transcripts miss the given exon.
An intron scaling bar was implemented to allow seamless transition from the genome browser (unspliced/pre-mRNA) to the transcript viewer (spliced/mature transcript), combining advantages of the two viewers. An intron scale of 0% shows transcripts in the mRNA coordinate that more easily indicate splicing features (Fig. 1a). An intron scale of 100% makes the viewer equivalent to the genome browser, which is most convenient to specify the genomic features in introns (Fig. 1b). Since introns are much (> 5) longer than exons in eukaryotic genomes, most space in the genome browser is assigned for introns, and AS events, such as alternative splice sites, are not easily recognized. Therefore, scaling introns to a small value is helpful to show minute variations in exon length, maintaining the exon-intron boundary.
Option: There are two options that enable users to divide transcripts and cancer cases into two groups: Group transcripts and Group cases.
Once the user selects an exon of interest in AS Transcript Navigator, by default, the transcripts are automatically pre-divided into two groups: transcript groups with and without skipping of the selected exon (Fig. 2a). Then, the user can further re-group the transcripts by 1) clicking the transcript id to unselect the pre-selected transcript and 2) clicking “>” or “<” to move the transcript into another group. Fig. 2b and Fig. 2c show the re-grouping options and regrouped transcripts, respectively. As the expression ratio of certain mRNA transcript isoforms are often imbalanced or altered in cancers cell, the ratio of differential expression (denoted “percent spliced in” (PSI)) between the two transcript groups will be calculated and plotted in “Transcript ratio” of the output panel. The PSI value will be used for correlation test with clinical data, methylation data, and miRNA data.
Output: Output comprises five sections in plotting the results: Transcript ratio, Clinical correlation, Methylation, miRNA, and SRE SNPs.
Case study: MAF
The key feature of CAS-viewer is that it can explore multi-omics data such as methylation, miRNA, and clinical information in the context of transcript isoforms. We presented a case study with MAF (MAF BZIP Transcription Factor) gene for bladder cancer as an example, to demonstrate how one can use CAS-viewer to gain insight into comprehensive understanding of molecular complexity existing among splicing, methylation, and miRNA, and its clinical correlation. The MAF gene is a transcription factor and well-known oncogene as a member of the AP1 superfamily. The MAF gene is known to produce two isoforms, uc002ffn.2 (NM_001031804) and uc002ffm.2 (NM_005360) (Fig. 9a and b). uc002ffn.2 isoform includes a retained intron between exon 1 and 2 in the 3’ UTR region, but uc002ffn.2 does not. We first investigated whether expression ratio between the transcripts with and without retained intron (“PSI” value) differs in the stages of bladder cancer. The higher PSI value refers to the higher expression of isoforms with retained intron compared to the isoform without trained intron (Fig. 9a). We selected two groups of cases by bladder cancer stages, Group A consists of cases in stage 1 (N = 2) and 2 (N = 130) and Group B consists cases in stage 3 (N = 140) and 4 (N = 134). As shown in Fig. 9c, PSI values in Group B (red dots) were significantly higher than in Group A (blue dots) (p = 0.014), suggesting that expression level of transcript isoforms with retained intron were higher in Group B than Group A. Subgroups from the highest quantiles of PSI values in Group B (N = 212) showed a poorer clinical outcome (p = 0.000393, Fig. 9d) compared to subgroups from the highest qualities of PSI values in the Group A (N = 108). It suggests that changed or imbalanced expression ratio between spliced mRNA isoforms of MAF may be associated with a worse survival outcome in bladder cancer and be more important for the cases in the advanced stages (3 and 4) rather than cases in the stage 1 and 2 in bladder cancer.
Epigenetic factors affecting gene expression, such as methylation and miRNA, are mechanistically linked to splicing as well. We tested whether the methylation site (cg07870982) and miRNA (hsa-mir-10b-3p) located in the retained intron of MAF are associated with transcript ratio. DNA methylation was not significantly correlated with the transcript ratio in both groups (upper plot of Fig. 9e), but DNA methylation level was higher in Group B compared to Group A (p = 0.002, lower plot of Fig. 9e). Notably, as expected, expression of hsa-mir-10b-3p was reversely correlated with the transcript ratio in Group B only (advanced stage of bladder cancer) (p = 0.0004, R2 = 0.04, upper plot of Fig. 9f) but not Group A, even though hsa-mir-10b-3p were similarly expressed in the two groups (lower plot of Fig. 9f). These results suggest that early and late stages of bladder cancer can be differentiated by the ratio between miRNA and target transcript isoform using MAF alone.
Here, we have presented CAS-viewer, a web-based tool, a splicing-guided integrative analysis tool of multi-layer omics data sets such as RNA-seq, methylation, miRNA, and clinical information on a large number of cases across diverse cancer types. We expect that CAS-viewer will be a useful resource for bioinformaticians and non-bioinformaticians who study cancer. Furthermore, CAS-viewer will aid in the visualization and possible discovery of biomarkers for cancer by integrating multi-omics data from TCGA.
The support and resources from the Center for High Performance Computing and Vice President’s Clinical and Translational Research Scholar Program at the University of Utah are gratefully acknowledged. We gratefully acknowledge the TCGA Consortium, and all its members for the TCGA Project initiative, for providing a sample, tissues, data processing and making data and results available. The results published here are in whole or part based upon data generated by The Cancer Genome Atlas pilot project established by the NCI and NHGRI. Information about TCGA and the investigators and institutions that constitute the TCGA research network can be found at https://cancergenome.nih.gov/.
The publication cost of this articles was funded by a Younghee Lee’s development funding at the Department of Biomedical Informatics, University of Utah.
Availability of data and materials
Results are shared in the additional files. CAS-viewer is available at http://genomics.chpc.utah.edu/cas/ including detail of manual, used data and materials, and methods.
About this supplement
This article has been published as part of BMC Medical Genomics Volume 11 Supplement 2, 2018: Proceedings of the 28th International Conference on Genome Informatics: medical genomics. The full contents of the supplement are available online at https://bmcmedgenomics.biomedcentral.com/articles/supplements/volume-11-supplement-2.
SH, DK, and YK contributed equally in designing and developing analysis workflow and manuscript writing. KC assisted with showing examples of this study. JM, and DK contributed to the interpretation of analysis, and manuscript writing. YL supervised overall conception, design, analysis, and interpretation of the study, and lead to manuscript writing. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Moore MJ, Wang Q, Kennedy CJ, Silver PA. An alternative splicing network links cell-cycle control to apoptosis. Cell. 2010;142(4):625–36.View ArticlePubMedPubMed CentralGoogle Scholar
- Sveen A, Kilpinen S, Ruusulehto A, Lothe RA, Skotheim RI. Aberrant RNA splicing in cancer; expression changes and driver mutations of splicing factor genes. Oncogene. 2016;35(19):2413–27.View ArticlePubMedGoogle Scholar
- Suzuki H, Maruyama R, Yamamoto E, Kai M. DNA methylation and microRNA dysregulation in cancer. Mol Oncol. 2012;6(6):567–78.View ArticlePubMedPubMed CentralGoogle Scholar
- Passetti F, Ferreira CG, Costa FF. The impact of microRNAs and alternative splicing in pharmacogenomics. Pharmacogenomics J. 2009;9(1):1–13.View ArticlePubMedGoogle Scholar
- Lev Maor G, Yearim A, Ast G. The alternative role of DNA methylation in splicing regulation. Trends Genet. 2015;31(5):274–80.View ArticlePubMedGoogle Scholar
- Gao J, Aksoy BA, Dogrusoz U, Dresdner G, Gross B, Sumer SO, Sun Y, Jacobsen A, Sinha R, Larsson E, et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal. 2013;6(269):pl1.View ArticlePubMedPubMed CentralGoogle Scholar
- Deng M, Bragelmann J, Kryukov I, Saraiva-Agostinho N, Perner S. FirebrowseR: an R client to the broad Institute's firehose pipeline. Database (Oxford). 2017;2017.Google Scholar
- Strobelt H, Alsallakh B, Botros J, Peterson B, Borowsky M, Pfister H, Lex A. Vials: visualizing alternative splicing of genes. IEEE Trans Vis Comput Graph. 2016;22(1):399–408.View ArticlePubMedPubMed CentralGoogle Scholar
- Ryan MC, Cleland J, Kim R, Wong WC, Weinstein JN. SpliceSeq: a resource for analysis and visualization of RNA-Seq data on alternative splicing and its functional impacts. Bioinformatics. 2012;28(18):2385–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Koch A, De Meyer T, Jeschke J, Van Criekinge W. MEXPRESS: visualizing expression, DNA methylation and clinical TCGA data. BMC Genomics. 2015;16:636.View ArticlePubMedPubMed CentralGoogle Scholar
- Chou CH, Chang NW, Shrestha S, Hsu SD, Lin YL, Lee WH, Yang CD, Hong HC, Wei TY, Tu SJ, et al. miRTarBase 2016: updates to the experimentally validated miRNA-target interactions database. Nucleic Acids Res. 2016;44(D1):D239–47.View ArticlePubMedGoogle Scholar
- Lewis BP, Burge CB, Bartel DP. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell. 2005;120(1):15–20.View ArticlePubMedGoogle Scholar
- Betel D, Wilson M, Gabow A, Marks DS, Sander C. The microRNA.org resource: targets and expression. Nucleic Acids Res. 2008;36(Database issue):D149–53.PubMedGoogle Scholar
- Genomes Project C, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, et al. A global reference for human genetic variation. Nature. 2015;526(7571):68–74.View ArticleGoogle Scholar
- Lee Y, Gamazon ER, Rebman E, Lee Y, Lee S, Dolan ME, Cox NJ, Lussier YA. Variants affecting exon skipping contribute to complex traits. PLoS Genet. 2012;8(10):e1002998.View ArticlePubMedPubMed CentralGoogle Scholar
- Yeo G, Hoon S, Venkatesh B, Burge CB. Variation in sequence and organization of splicing regulatory elements in vertebrate genes. Proc Natl Acad Sci U S A. 2004;101(44):15700–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002;12(4):656–64.View ArticlePubMedPubMed CentralGoogle Scholar
- Han S, Jung H, Lee K, Kim H, Kim S. Genome wide discovery of genetic variants affecting alternative splicing patterns in human using bioinformatics method. Genes Genom. 2017;39(4):453–9.View ArticleGoogle Scholar
- Therneau T. A Package for Survival Analysis in S. version 2.38, 2015. https://CRAN.Rproject.org/package=survival.
- Welch BL. The generalisation of student's problems when several different population variances are involved. Biometrika. 1947;34(1–2):28–35.PubMedGoogle Scholar
- Shakir R, Ngo N, Naresh KN. Correlation of cyclin D1 transcript levels, transcript type and protein expression with proliferation and histology among mantle cell lymphoma. J Clin Pathol. 2008;61(8):920–7.View ArticlePubMedGoogle Scholar
- Ryan M, Wong WC, Brown R, Akbani R, Su X, Broom B, Melott J, Weinstein J. TCGASpliceSeq a compendium of alternative mRNA splicing in cancer. Nucleic Acids Res. 2016;44(D1):D1018–22.View ArticlePubMedGoogle Scholar
- Yang IS, Son H, Kim S, Kim S. ISOexpresso: a web-based platform for isoform-level expression analysis in human cancer. BMC Genomics. 2016;17(1):631.View ArticlePubMedPubMed CentralGoogle Scholar