The Cancer Omics Atlas: an integrative resource for cancer omics annotations

Sun, Qingrong; Li, Mengyuan; Wang, Xiaosheng

doi:10.1186/s12920-018-0381-7

Database
Open access
Published: 08 August 2018

The Cancer Omics Atlas: an integrative resource for cancer omics annotations

BMC Medical Genomics volume 11, Article number: 63 (2018) Cite this article

8464 Accesses
19 Citations
1 Altmetric
Metrics details

A Correction to this article was published on 30 August 2018

This article has been updated

Abstract

Background

The Cancer Genome Atlas (TCGA) is an important data resource for cancer biologists and oncologists. However, a lack of bioinformatics expertise often hinders experimental cancer biologists and oncologists from exploring the TCGA resource. Although a number of tools have been developed for facilitating cancer researchers to utilize the TCGA data, these existing tools cannot fully satisfy the large community of experimental cancer biologists and oncologists without bioinformatics expertise.

Methods

We developed a new web-based tool The Cancer Omics Atlas (TCOA, http://tcoa.cpu.edu.cn) for fast and straightforward querying of TCGA “omics” data.

Results

TCOA provides the querying of gene expression, somatic mutations, microRNA (miRNA) expression, protein expression data based on a single molecule or cancer type. TCOA also provides the querying of expression correlation between gene pairs, miRNA pairs, gene and miRNA, and gene and protein. Moreover, TCOA provides the querying of the associations between gene, miRNA, or protein expression and survival prognosis in cancers. In addition, TCOA displays transcriptional profiles across various human cancer types based on the pan-cancer analysis. Finally, TCOA provides the querying of molecular profiles for 2877 immune-related genes in human cancers. These immune-related genes include those that are established or promising targets for cancer immunotherapy such as CTLA4, PD1, PD-L1, PD-L2, IDO1, LAG3, and TIGIT.

Conclusions

TCOA is a useful tool that supplies a number of unique and new functions complementary to the existing tools to facilitate exploration of the TCGA resource.

Peer Review reports

Background

With the development of high-throughput sequencing technology, a large volume of cancer genomics data are emerging and advancing cancer research. Notably, The Cancer Genome Atlas (TCGA) datasets cover 33 different cancer types and more than 10,000 cancer cases in total (https://gdc-portal.nci.nih.gov/). Each TCGA cancer type contains different types of “omics” data, including: whole exome (genome) sequencing; genomic DNA copy number arrays; DNA methylation; mRNA expression array and RNA-Seq data; microRNA (miRNA) sequencing; reverse-phase protein arrays; and clinical metadata. TCGA is becoming a necessary data resource not only for the cancer informatics researchers, but also for experimental cancer researchers and oncologists. Particularly, many cancer researchers are interested in having a preliminary search of TCGA to find or filter their experimental targets; many researchers seek for a validation of their experimental results from TCGA. However, because most of experimental biologists and oncologists lack sufficient skills in bioinformatics analysis, it is usually difficult for them to explore the TCGA resource. Thus, the development of web-based tools with the user-friendly graphical user interface (GUI) must be useful for experimental biologists and oncologists to search what they need from TCGA.

Some web-based tools have been developed to explore the TCGA data. The cBioPortal (http://cbioportal.org) is a web resource for analyzing and visualizing multidimensional cancer genomics data including those from TCGA [1, 2]. The Broad Institute TCGA GDAC Firehose (http://gdac.broadinstitute.org/) provides standardized datasets, algorithms, and analysis results for TCGA. MEXPRESS (http://mexpress.be/) provides query and visualization of the clinical, gene expression and methylation data in TCGA [3]. GEPIA (http://gepia.cancer-pku.cn/) is a web tool for visualizing gene expression comparisons and correlations, and associations with patient survival prognosis based on TCGA and GTEx data [4]. The Cancer Proteome Atlas (http://tcpaportal.org/tcpa/) is a web-based data portal for downloading, visualizing, and analyzing TCGA proteomics data [5]. All these tools provide significantly valuable resources that facilitate cancer biologists and oncologists to explore the TCGA data. However, the existing tools still have many places worth improving to satisfy the large community of experimental biologists and oncologists without bioinformatics expertise. For example, cBioPortal lacks differential expression and survival analyses based on gene expression profiles while these data are often of interests for biologists and oncologists. GDAC Firehose is a good resource for bioinformatics scientist while is not straightforward for cancer biologists and oncologists without bioinformatics training. Although MEXPRESS can provide the fast querying of the visualized clinical, gene expression and methylation data in TCGA, it lacks some important data types that are relevant to cancer biology and oncology such as gene somatic mutations, miRNAs, proteins and their associations with survival prognosis in cancers. Similarly, GEPIA is a recently-published web tool that specializes in querying gene expression and its association with survival prognosis in cancers while it lacks other cancer omics data such as gene somatic mutations, miRNAs and proteomics, and the associations of these molecular profiles with survival prognosis in cancers.

To provide useful functions complementary to these existing tools, we developed a new web-based tool The Cancer Omics Atlas (TCOA, http://tcoa.cpu.edu.cn) for fast and straightforward querying of TCGA gene expression, somatic mutations, miRNA expression, protein expression based on a single molecule or cancer type. TCOA also provides the querying of expression correlations of gene-gene, miRNA-miRNA, protein-protein, gene-miRNA and gene-protein, and the correlation of gene, miRNA, or protein expression with survival prognosis in cancers. Moreover, TCOA provides a portrait of transcriptional landscape of human cancers based on the pan-cancer analysis. In addition, because cancer immunotherapy is showing increasingly noteworthy for its effectiveness in treating a variety of cancers, we specifically provide a tab for querying of 2877 immune-related genes in TCOA.

Construction and content

Database architecture and web interface

TCOA was developed using Hypertext Preprocessor (PHP, version 5.5.10) with a R-based web framework. The back-end database was built by MySQL (version 5.5.36) that contained the TCGA data needed for querying. PHP scripts were used to handle database queries or computational results by R script, generate results and send them to users. The TCOA website was developed by HTML (Hyper TextMarkup Language) and JavaScript for the user interface. TCOA contains six major modules: Gene, MicroRNA, Cancer, Pan-cancer, Immuno-Oncology, and Protein (Table 1). For all the querying from users, TCOA will send visualized results to them in the form of figures (a few in the form of tables).

Table 1 A summary of TCOA functions and data display

Full size table

Table 2 Database statistics

Full size table

Functions of six modules in the database

Functions of the “gene” module

In the “Gene” module, when a user submits the querying of a gene using the gene symbol or Entrez ID, TCOA will output the information on expression and somatic mutations of the gene in 33 cancer types. The gene expression data include: gene expression levels in cancers; expression correlations with other genes in cancers; differential expression comparisons between cancer and normal samples (if the gene expression data in normal samples are available in TCGA); differential expression comparisons between different cancer phenotypes such as stage and grade; associations of gene expression with survival prognosis in cancers. The gene somatic mutation data include: mutation rates in cancers; variants classification in cancers; comparisons of mutation rates between different cancer phenotypes such as stage and grade; comparisons of gene expression between gene-mutated and gene-wildtype cancers; associations of gene mutations with survival prognosis in cancers.

For example, if we are interested in the research of the tumor suppressor gene TP53 in cancers, we can enter into the “Gene” module to search for the gene. Firstly, we obtain a summary of the TP53 mean expression levels and somatic mutation rates in 33 cancer types. We find that TP53 has the highest mutation rate of 91.2% in uterine carcinosarcoma (UCS) and has the second highest mutation rate of 83% in ovarian serous cystadeno-carcinoma (OV). There are ten cancer types that have a TP53 mutation rate greater than 50% in total (Fig. 1a). Moreover, we can find a summary of the variant classification of TP53 mutations in cancers, e.g., in pancreatic adenocarcinoma (PAAD), 64 and 12% of TP53 mutations being missense and frame-shift insertion, respectively (Fig. 1b). Importantly, we can find the associations of TP53 mutations with survival prognosis in cancers. For example, TP53 mutations are associated with worse survival (overall and disease free survival) prognosis in PAAD (Fig. 1c). In addition, one could be interested in the expression associations of other genes with TP53 in cancers, e.g., the expression association between PLK1 and TP53 in PAAD (Fig. 1d). In fact, previous studies have shown that PLK1 interacted with TP53, and that p53 dysfunction caused enhanced expression of PLK1 in cancers [6,7,8,9].

Functions of the “MicroRNA” module

In the “MicroRNA” module, a user can submit the querying of miRNAs using the human miRNA symbol. TCOA will output the miRNA expression-related data in 33 cancer types. These data include: miRNA expression levels in cancers; expression correlations with genes in cancers; expression correlations with other miRNAs in cancers; differential miRNA expression comparisons between cancer and normal samples (if the miRNA expression data in normal samples are available in TCGA); differential miRNA expression comparisons between different cancer phenotypes such as stage and grade; associations of miRNA expression with survival prognosis in cancers.

For example, to explore the human miRNA hsa-mir-100 in cancers, we can submit the querying of hsa-mir-100 in the “MicroRNA” module. Firstly, we obtain a summary of hsa-mir-100 mean expression levels in 33 cancer types. Further, we desire to explore the expression levels of hsa-mir-100 in breast invasive carcinoma (BRCA). In selecting the cancer type, we find that hsa-mir-100 has significantly lower expression levels in BRCA than in normal tissue (Fig. 2a). Moreover, we find that elevated expression of hsa-mir-100 is associated with better overall survival (OS) prognosis in BRCA (Fig. 2b). Furthermore, we explore the expression correlation between gene PLK1 and hsa-mir-100, and find that PLK1 and hsa-mir-100 have significantly negative expression correlation in BRCA while have no significant expression correlation in normal tissue (Fig. 2c). This is consistent with previous studies showing that miR-100 could induce apoptosis and cell cycle arrest in cancer by targeting a number of genes including PIK1 [10, 11]. Accordingly, the TCOA search result shows that elevated expression of PLK1 is associated with worse OS prognosis in BRCA (Fig. 2d).

Functions of the “Cancer” module

In the “Cancer” module, when a user clicks a cancer type, TCOA will output top 50 most frequently mutated genes in the cancer, up-regulated and down-regulated genes, and up-regulated and down-regulated miRNAs in the cancer relative to normal controls. This module also outputs important pathways associated with the highly-expressed genes in the cancer type. TCOA outputs the up-regulated and down-regulated genes or miRNAs depending on the threshold input by users. The threshold includes: fold change of expression levels in cancer compared to normal tissue, and adjusted p-value. The adjusted p-values (FDR q-values) are calculated by the Benjamini and Hochberg (BH) method [12].

For example, if we submit the querying of liver hepatocellular carcinoma (LIHC) in the module, we will find that TTN has the highest mutation rate of 34%, and TP53 has the second highest mutation rate of 31.1% in LIHC. The other frequently-mutated genes in LIHC include CTNNB1, MUC16, ND5, OBSCN, RYR2, ALB etc. (Fig. 3a). TCOA shows that THBS4 has the highest mean expression increase (nearly 40-fold) in LIHC relative to normal tissue. This gene has been shown to be overexpressed in multiple cancer types [13, 14]. The other overexpressed genes in LIHC include ZIC2, GPC3, EPS8L3, CPLX2, IGF2BP1, NUF2, CDC25C, CDC20, and GABRD (Fig. 3b). In contrast, the most down-regulated gene in LIHC is CLEC4M which has nearly 335-fold expression decrease compared to normal tissue. This gene encodes a protein that is involved in the innate immune system and is expressed in the endothelial cells of the lymph nodes and liver. Previous studies have shown that CLEC4M and its product were down-regulated in LIHC and other cancer types [15, 16]. The other repressed genes in LIHC include CLEC4G, INS-IGF2, CLEC1B, CYP1A2, GDF2, FCN2, MARCO, STAB2, HAMP, and MT1H (Fig. 3b). The gene set enrichment analysis of the highly-expressed genes in LIHC shows that the pathways of cell cycle, DNA replication, ECM-receptor interaction, p53 signaling, MAPK signaling, axon guidance, focal adhesion, metabolism, and mismatch repair are enriched in LIHC (Fig. 3c). In addition, TCOA shows that mir-1269, mir-10b, mir-224, and mir-183 are overexpressed in LIHC with more than 4-fold expression increase compared to normal tissue, while mir-1258, mir-675, mir-490, mir-424, mir-483, mir-1247, mir-199b, mir-199a-2, mir-139, mir-199a-1, mir-3607, and mir-451 are underexpressed in LIHC with more than 4-fold expression decrease compared to normal tissue (Fig. 3d).

Functions of the “pan-cancer” module

In the “Pan-cancer” module, TCOA outputs the genes consistently up-regulated or down-regulated, pathways significantly up-regulated, and genes whose deregulation is significantly associated with survival prognosis across various cancer types. This module also outputs the genes that are differentially expressed between cancer and normal samples, and between low-advanced and highly-advanced cancers across various cancer types. We refer to early-stage (Stage I-II) or low-grade (Grade I-II) cancers as lowly-advanced cancers, and late-stage (Stage III-IV) or high-grade (Grade III-IV) cancers highly-advanced cancers. A comparison of tumor mutation burden (TMB, defined as the total number of substitutions, regardless of variant type) among different cancer types is also shown in this module (Fig. 4). Figure 4 shows that cutaneous melanoma (SKCM) has the highest median TMB, followed by lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC). It confirms that TMB was associated with clinical response to immunotherapy [17,18,19] in that several cancer types with high TMB have shown positive response to immune checkpoint blockade treatment such as melanoma [20] and non-small cell lung cancer (NSCLC) [21]. The results presented in the “Pan-cancer” are mainly based on a recent study by our group [22].

Functions of the “Immuno-oncology” module

In the “Immuno-Oncology” module, TCOA provides the querying of 2877 immune-related genes about their expression, mutations and associations with survival prognosis in various cancer types. When a user selects a gene, TCOA will enter into the gene information interface that is the same as that by the “Gene” module querying of that gene. For example, many users could be interested in the gene PD-L1 whose product plays an important role in cancer immune evasion and is an important target for cancer immunotherapy [23]. TCOA shows that PD-L1 has significantly higher expression levels in esophageal carcinoma (ESCA) and kidney chromophobe (KICH), while has significantly lower expression levels in LIHC, LUAD, LUSC and prostate adenocarcinoma (PRAD) compared to their normal tissue (Fig. 5a). Interestingly, elevated expression of PD-L1 is associated with better OS and/or disease free survival (DFS) prognosis in adrenocortical carcinoma (ACC), colon adenocarinoma (COAD), kidney renal clear cell carcinoma (KIRC), and SKCM, while worse OS and/or DFS prognosis in brain lower grade glioma (LGG) and PAAD (Fig. 5b).

Functions of the “protein” module

In the “Protein” module, when a user submits the querying of a protein, TCOA will output expression data for the protein in 33 cancer types. These data include: protein expression levels in cancers; protein-gene expression correlation in cancers; differential protein expression comparisons between different cancer phenotypes such as stage and grade; associations of protein expression with survival prognosis in cancers. Figure 6 shows two DNA mismatch repair proteins MSH2 (MutS protein homolog 2) and MSH6 (MutS protein homolog 6) whose expression is significantly associated with survival prognosis in a wide type of cancers. Elevated expression of MSH2 and MSH6 is associated with worse OS and/or DFS prognosis in BRCA, sarcoma (SARC), uterine corpus endometrial carcinoma (UCEC), thyroid carcinoma (THCA), rectum adenocarcinoma (READ), KIRC, UCS and ACC, while is associated with better OS and/or DFS prognosis in LUSC and COAD.

Computational and statistical analyses

Class comparison to identify differentially-expressed genes, miRNAs or proteins

We normalized the TCGA gene and miRNA expression values by log₂(x + 1) transformation, and used the original downloaded protein expression data since they had been normalized. We compared expression levels of a single gene, miRNA or protein between two classes of samples using Student’s t test.

Comparison of gene mutation rates among different clinical phenotypes

We compared the gene somatic mutation rates among different clinical phenotypes of cancer patients using Fisher’s Exact Test. Each phenotype was divided into two classes: tumor stage (early stage (Stage I-II) vs. late stage (Stage III-IV)); tumor size (T) (small size (T1–2) vs. large size (T3–4)); lymph nodes (N) (without regional lymph nodes (N0) vs. with lymph nodes (N1–3)); metastasis (M) (no metastasis (M0) vs. metastasis (M1)); grade (low grade (Grade I-II) vs. high grade (Grade III-IV)).

Correlation analysis, pathway analysis and survival analysis

We calculated expression correlations of gene-gene, gene-miRNA, miRNA-miRNA and gene-protein by Pearson product-moment or Spearman correlation analysis. We performed pathway analysis of gene sets using the Gene Set Enrichment Analysis (GSEA) software [24]. The KEGG pathways significantly associated with gene sets were displayed (FDR q-value< 0.05). We performed survival analysis of TCGA patients based on gene somatic mutation data, and expression data for genes, miRNAs and proteins, respectively. Kaplan-Meier survival curves were used to show the survival (OS or DFS) differences between gene-mutated cancer patients and gene-wildtype cancer patients, and between gene, miRNA or protein higher-expression-level patients and lower-expression-level patients. Gene, miRNA or protein higher-expression-level and lower-expression-level patients were determined by the median values of expression. If the expression level in a patient was higher than the median value, the patient was classified into the higher-expression-level group; otherwise into the lower-expression-level group. We used the log-rank test to calculate the significance of survival-time differences between two classes of patients.

Utility and discussion

The TCGA data are providing an invaluable resource for cancer researchers and oncologists. However, a lack of bioinformatics expertise often hinders experimental cancer biologists and oncologists from exploring the TCGA resource. Although a number of tools have been developed for helping cancer biologists and oncologists utilize the TCGA data, these existing tools cannot fully satisfy the large community of experimental cancer biologists and oncologists without bioinformatics expertise. To this end, we developed TCOA with additional functions complementary to these existing tools. TCOA provides fast and straightforward querying of TCGA gene expression, somatic mutations, miRNA expression, protein expression based on a single molecule or cancer type. TCOA provides the querying of expression correlation not only between gene pairs, but also between miRNA pairs, gene and miRNA, and gene and protein. TCOA also provides the querying of the associations of gene, miRNA, or protein expression with survival prognosis in cancers. Moreover, TCOA presents transcriptional profiles across various human cancer types based on the pan-cancer analysis [22]. In addition, TCOA provides the querying of molecular profiles for 2877 immune-related genes in human cancers. These immune-related genes include those that are established or promising targets for cancer immunotherapy such as CTLA4, PD1, PD-L1, PD-L2, IDO1, LAG3, and TIGIT. It would be of great interest for cancer researchers and oncologists to query expression, mutations and correlations with cancer survival prognosis of these immune-related genes across various human cancer types.

TCOA will be continuously updated with more functions and modules such as DNA methylation and DNA copy number alteration modules. In addition, for a specific cancer type, one could be interested in molecular alterations across different subtypes. For the immune-related genes, one could be more interested in gene-sets that represent the activities of specific immune cells, functions or pathways [25]. TCOA is expected to provide such functions in future updates.

Conclusions

TCOA is a useful tool that supplies a number of unique and new functions complementary to the existing tools to facilitate exploration of the TCGA resource.

Change history

30 August 2018
Following publication of the original article [1], the authors reported an error in Fig. 1A.

Abbreviations

ACC:: Adrenocortical carcinoma
BLCA:: Bladder urothelial carcinoma
BRCA:: Breast invasive carcinoma
CESC:: Cervical squamous-cell carcinoma and endocervical adenocarcinoma
CHOL:: Cholangiocarcinoma
COAD:: Colon adenocarcinoma
DFS:: Disease free survival
DLBC:: Lymphoid neoplasm diffuse large B-cell lymphoma
ESCA:: Esophageal carcinoma
GBM:: Glioblastoma multiforme
GSEA:: Gene set enrichment analysis
HNSC:: Head and neck squamous cell carcinoma
KICH:: Kidney chromophobe
KIRC:: Kidney renal clear cell carcinoma
KIRP:: Kidney renal papillary cell carcinoma
LAML:: Acute myeloid leukemia
LGG:: Brain lower grade glioma
LIHC:: Liver hepatocellular carcinoma
LUAD:: Lung adenocarcinoma
LUSC:: Lung squamous cell carcinoma
miRNA:: MicroRNA
OS:: Overall survival
OV:: Ovarian serous cystadenocarcinoma
PAAD:: Pancreatic adenocarcinoma
PRAD:: Prostate adenocarcinoma
READ:: Rectum adenocarcinoma
SARC:: Sarcoma
SKCM:: Cutaneous melanoma
STAD:: Stomach adenocarcinoma
TCGA:: The Cancer Genome Atlas
TCOA:: The Cancer Omics Atlas
TGCT:: Testicular germ cell tumors
THCA:: Thyroid carcinoma
THYM:: Thymoma
UCEC:: Uterine corpus endometrial carcinoma
UCS:: Uterine carcinosarcoma
UVM:: Uveal melanoma

References

Cerami E, et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2012;2(5):401–4.
Article PubMed Google Scholar
Gao J, et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal. 2013;6(269):pl1.
Article PubMed PubMed Central CAS Google Scholar
Koch A, et al. MEXPRESS: visualizing expression, DNA methylation and clinical TCGA data. BMC Genomics. 2015;16:636.
Article PubMed PubMed Central CAS Google Scholar
Tang Z, et al. GEPIA: a web server for cancer and normal gene expression profiling and interactive analyses. Nucleic Acids Res. 2017;45(W1):W98–W102.
Article PubMed PubMed Central CAS Google Scholar
Li J, et al. TCPA: a resource for cancer functional proteomics data. Nat Methods. 2013;10(11):1046–7.
Article PubMed PubMed Central CAS Google Scholar
Wang X, Simon R. Identification of potential synthetic lethal genes to p53 using a computational biology approach. BMC Med Genet. 2013;6:30.
Google Scholar
McKenzie L, et al. p53-dependent repression of polo-like kinase-1 (PLK1). Cell Cycle. 2010;9(20):4200–12.
Article PubMed PubMed Central CAS Google Scholar
Liu Z, Sun Q, Wang X. PLK1, a potential target for Cancer therapy. Transl Oncol. 2017;10(1):22–32.
Article PubMed Google Scholar
Ando K, et al. Polo-like kinase 1 (PLK1) inhibits p53 function by physical interaction and phosphorylation. J Biol Chem. 2004;279(24):25549–61.
Article PubMed CAS Google Scholar
Li C, et al. Multiple roles of MicroRNA-100 in human Cancer and its therapeutic potential. Cell Physiol Biochem. 2015;37(6):2143–59.
Article PubMed CAS Google Scholar
Petrelli A, et al. By promoting cell differentiation, miR-100 sensitizes basal-like breast cancer stem cells to hormonal therapy. Oncotarget. 2015;6(4):2315–30.
Article PubMed Google Scholar
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B. 1995;57:289–300.
McCart Reed AE, et al. Thrombospondin-4 expression is activated during the stromal response to invasive breast cancer. Virchows Arch. 2013;463(4):535–45.
Article PubMed CAS Google Scholar
Lin X, et al. Associations of THBS2 and THBS4 polymorphisms to gastric cancer in a southeast Chinese population. Cancer Genet. 2016;209(5):215–22.
Article PubMed CAS Google Scholar
Yin F, et al. Microarray-based identification of genes associated with cancer progression and prognosis in hepatocellular carcinoma. J Exp Clin Cancer Res. 2016;35(1):127.
Article PubMed PubMed Central CAS Google Scholar
Liu X, et al. Low expression of dendritic cell-specific intercellular adhesion molecule-grabbing nonintegrin-related protein in lung cancer and significant correlations with brain metastasis and natural killer cells. Mol Cell Biochem. 2015;407(1–2):151–60.
Article PubMed CAS Google Scholar
Snyder A, et al. Genetic basis for clinical response to CTLA-4 blockade in melanoma. N Engl J Med. 2014;371(23):2189–99.
Article PubMed PubMed Central CAS Google Scholar
Rizvi NA, et al. Cancer immunology. Mutational landscape determines sensitivity to PD-1 blockade in non-small cell lung cancer. Science. 2015;348(6230):124–8.
Article PubMed PubMed Central CAS Google Scholar
Hugo W, et al. Genomic and transcriptomic features of response to anti-PD-1 therapy in metastatic melanoma. Cell. 2016;165(1):35–44.
Article PubMed PubMed Central CAS Google Scholar
Larkin J, et al. Combined Nivolumab and Ipilimumab or monotherapy in untreated melanoma. N Engl J Med. 2015;373(1):23–34.
Article PubMed PubMed Central CAS Google Scholar
Herbst RS, et al. Pembrolizumab versus docetaxel for previously treated, PD-L1-positive, advanced non-small-cell lung cancer (KEYNOTE-010): a randomised controlled trial. Lancet (London, England). 2016;387(10027):1540–50.
Article CAS Google Scholar
Li M, Sun Q, Wang X. Transcriptional landscape of human cancers. Oncotarget. 2017;8(21):34534–51.
PubMed PubMed Central Google Scholar
Chen L, Han X. Anti-PD-1/PD-L1 therapy of human cancer: past, present, and future. J Clin Invest. 2015;125(9):3384–91.
Article PubMed PubMed Central Google Scholar
Subramanian A, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545–50.
Article PubMed PubMed Central CAS Google Scholar
Charoentong P, et al. Pan-cancer Immunogenomic analyses reveal genotype-Immunophenotype relationships and predictors of response to checkpoint blockade. Cell Rep. 2017;18(1):248–62.
Article PubMed CAS Google Scholar
Breuer K, et al. InnateDB: systems biology of innate immunity and beyond--recent updates and continuing curation. Nucleic Acids Res. 2013;41(Database issue):D1228–33.
Article PubMed CAS Google Scholar

Download references

Acknowledgements

We thank professor Maode Lai from China Pharmaceutical University for providing many supports for building the database, and thank the Information Technology Center of China Pharmaceutical University for supplying hardware and software resources in building the database.

Funding

This work was supported by the startup fund (no. 3150120001) awarded to XW by the China Pharmaceutical University. This fund was involved in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.

Availability of data and materials

We downloaded the TCGA gene expression profiles (rnaseqv2 level 3 RSEM), gene somatic mutations (level 3), miRNA gene expression profiles (level 3), protein expression profiles (level 3), and clinical data for 33 cancer types from the genomic data commons data portal (https://portal.gdc.cancer.gov/, Data Release 10.1 - February 15, 2018). We obtained a set of immune genes from the nCounter PanCancer Immune Profiling Panel (nCPCIPP) (http://www.nanostring.com/products/pancancer_immune/). We also downloaded six immune-related gene sets from the InnateDB database [26], and selected the genes that were included in at least two of the six gene sets. We then combined them with the nCPCIPP genes to obtain an immune-related gene set composed of 2877 genes. Table 2 shows the data types and their size in TCOA. The database TCOA can be accessed with the website http://tcoa.cpu.edu.cn.

Author information

Authors and Affiliations

Department of Basic Medicine, School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing, 211198, China
Qingrong Sun
Biomedical Informatics Research Lab, School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing, 211198, China
Mengyuan Li & Xiaosheng Wang
Cancer Genomics Research Center, School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing, 211198, China
Mengyuan Li & Xiaosheng Wang
Big Data Research Institute, China Pharmaceutical University, Nanjing, 211198, China
Mengyuan Li & Xiaosheng Wang

Authors

Qingrong Sun
View author publications
You can also search for this author in PubMed Google Scholar
Mengyuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiaosheng Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

QS wrote software codes and built the database. ML helped prepare for the back-end data, tested the software, and helped prepare for the manuscript. XW conceived and designed the database, helped prepare for the back-end data, provided partial software codes, and wrote the manuscript. All the authors read and approved the final manuscript.

Corresponding author

Correspondence to Xiaosheng Wang.

Ethics declarations

Ethics approval and consent to participate

Ethical approval was waived since we used only publicly available data and materials in this study.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional information

The original version of this article was revised: figure 1A was replaced and reference 12 was updated.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Sun, Q., Li, M. & Wang, X. The Cancer Omics Atlas: an integrative resource for cancer omics annotations. BMC Med Genomics 11, 63 (2018). https://doi.org/10.1186/s12920-018-0381-7

Download citation

Received: 18 October 2017
Accepted: 30 July 2018
Published: 08 August 2018
DOI: https://doi.org/10.1186/s12920-018-0381-7

The Cancer Omics Atlas: an integrative resource for cancer omics annotations

Abstract

Background

Methods

Results

Conclusions

Background

Construction and content

Database architecture and web interface

Functions of six modules in the database

Functions of the “gene” module

Functions of the “MicroRNA” module

Functions of the “Cancer” module

Functions of the “pan-cancer” module

Functions of the “Immuno-oncology” module

Functions of the “protein” module

Computational and statistical analyses

Class comparison to identify differentially-expressed genes, miRNAs or proteins

Comparison of gene mutation rates among different clinical phenotypes

Correlation analysis, pathway analysis and survival analysis

Utility and discussion

Conclusions

Change history

30 August 2018

Abbreviations

References

Acknowledgements

Funding

Availability of data and materials

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Publisher’s Note

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Medical Genomics

Contact us