Comparing variants related to chronic diseases from genome-wide association study (GWAS) and the cancer genome atlas (TCGA)

Jeon, Soohyun; Park, Chaewon; Kim, Jineui; Lee, Jung Hoon; Joe, Sung-yune; Ko, Young Kyung; Gim, Jeong-An

doi:10.1186/s12920-023-01758-7

Research
Open access
Published: 19 December 2023

Comparing variants related to chronic diseases from genome-wide association study (GWAS) and the cancer genome atlas (TCGA)

Soohyun Jeon¹^na1,
Chaewon Park^2,3^na1,
Jineui Kim⁴,
Jung Hoon Lee⁵,
Sung-yune Joe^2,3,
Young Kyung Ko⁶ &
…
Jeong-An Gim⁷

BMC Medical Genomics volume 16, Article number: 332 (2023) Cite this article

1134 Accesses
1 Citations
2 Altmetric
Metrics details

Abstract

Background

Several genome-wide association studies (GWAS) have been performed to identify variants related to chronic diseases. Somatic variants in cancer tissues are associated with cancer development and prognosis. Expression quantitative trait loci (eQTL) and methylation QTL (mQTL) analyses were performed on chronic disease-related variants in TCGA dataset.

Methods

MuTect2 calling variants for 33 cancers from TCGA and 296 GWAS variants provided by LocusZoom were used. At least one mutation was found in TCGA 22 cancers and LocusZoom 23 studies. Differentially expressed genes (DEGs) and differentially methylated regions (DMRs) from the three cancers (TCGA-COAD, TCGA-STAD, and TCGA-UCEC). Variants were mapped to the world map using population locations of the 1000 Genomes Project (1GP) populations. Decision tree analysis was performed on the discovered features and survival analysis was performed according to the cluster.

Results

Based on the DEGs and DMRs with clinical data, the decision tree model classified seven and three nodes in TCGA-COAD and TCGA-STAD, respectively. A total of 11 variants were commonly detected from TCGA and LocusZoom, and eight variants were selected from the 1GP variants, and the distribution patterns were visualized on the world map.

Conclusions

Variants related to tumors and chronic diseases were selected, and their geological regional 1GP-based proportions are presented. The variant distribution patterns could provide clues for regional clinical trial designs and personalized medicine.

Peer Review reports

Introduction

Chronic diseases are defined as conditions that last 1 year or more and require medical intervention, restrict activities of daily living, or both. Chronic diseases include hypertension, diabetes, hyperlipidemia, and many associations with cancer have also been known [1,2,3]. Genome-wide association studies (GWASes) have been used as a research approach to understand chronic diseases. GWAS can help to understand the risk of chronic diseases and specific characteristics, such as cancer morbidity in an individual [4, 5]. Until now, GWAS results have been open to the public, and optimal secondary applications have been presented.

Variants indicate alterations in DNA nucleotide sequences. There are single-base pair substitutions, insertions or deletions (INDEL), and structural variations. The somatic variant refers to every variant in cells, except germ cells. Unlike germline variants, somatic variants are not inherited, and reflect genomic instability [6, 7]. Next-generation sequencing (NGS) is widely used to obtain nucleotide sequence data from cancer cells. Variants of cancer cells enable targeted therapy according to genotype. An expression quantitative trait locus (eQTL) is a variant that explains differences in gene expression patterns. A methylation QTL (mQTL) is also a variant related to the different beta values of CpG sites in the genome. eQTL and mQTL are variants of the GWAS results and are independent variables for gene expression and DNA methylation level as dependent variables [8, 9]. Many eQTL and mQTL signals have been found in chronic disease samples, and biomarkers for prognosis in cancer patients are needed for variants related to chronic diseases.

The Cancer Genome Atlas (TCGA) is a project that started in 2005 to integrate and accumulate cancer genetic variants, gene expression, and DNA methylation data using bioinformatics technologies [10]. TCGA database was provided by the National Cancer Institute of the United States. TCGA Data Portal provides researchers with a platform to search, download, and analyze cancer genomic data. TCGA provides clinical data (subtype, survival, and recurrence) and three types of omics data (variant, expression, and methylation) for 7648 patients and 33 types of cancers. Therefore, by properly processing clinical and omics datasets for the purpose of analysis, it is possible to accurately identify the factors that explain the traits of cancer [11,12,13,14].

The 1000 Genomes Project (1GP) was launched to assess human genetic variation by ethnic groups. The pilot phase and the “phase 3” were completed as 1092 and 2504 genomes, respectively. In 1GP Phase 3, 26 populations were collected [15]. The 1GP helps explain the genetic variants that occur at a population frequency of 1% or more. It also contributes to the development of preventive medicine using genetic variants found in a specific ethnic group [16, 17]. The genomic composition of the population distributed by region was changed by the evolutionary process because selective pressure and SNP density differed by ethnic group. Clinical approaches, such as disease susceptibility and drug response prediction, are also available in this region [18].

In this study, eQTL and mQTL studies were combined with GWAS to identify genes associated with cancer prognosis, and variants related to cancer were found in TCGA. Relevance to the 1GP for merged variants was confirmed. The relationship between cancer and chronic diseases was confirmed, and regional differences were visualized using 1GP data.

Methods

Data acquisition from TCGA and LocusZoom

The 33 omics and clinical data of this study were downloaded from TCGA dataset. Downloads and data processing were performed using the “GDCquery” function of the R package “TCGAbiolinks” [19]. All analyses were performed using R package version 4.1.1. GWAS datasets were downloaded from LocusZoom (https://my.locuszoom.org/) [20] and each study name was identified as the URL number. This study was approved by the Institutional Review Board (IRB) of Korea University (approval number: KUIRB-2020-0191-01) and was performed in accordance with the Declaration of Helsinki. All processes of this study are presented as a flowchart (Fig. 1).

DEG and DMR selection

TCGA RNA-seq data revealed the expression levels of 56,457 genes. Analysis with the Illumina 450 k chip in TCGA identified approximately 450,000 CpG sites. Differentially expressed genes (DEGs) and differentially methylated regions (DMRs) were selected between the patients with and without variants. The fold change and p-value of the selected DEGs and DMRs are presented as volcano plots, and the expression level and DNA methylation level of genes above a certain threshold are presented as heatmaps. Expression and DNA methylation levels are presented as boxplots for each genotype.

Visualization of variant data

A variant heatmap was presented using the “Heatmap” function of the “ComplexHeatmap” package, and a waterfall plot for variants was presented using the “oncoPrint” function [21]. The heatmap for DEG and DMR utilized the “pheatmap” package. In the case of the volcano plot, an in-house source was coded using “plot,” the default function of R.

Validation at the 1000 genomes projects for variants

The 1000 Genomes Project (Phase 3) data were downloaded from Google Cloud Life Sciences (https://cloud.google.com/life-sciences/docs/resources/public-datasets/1000-genomes). The total data consisted of 84,801,856 variants of 69,006 dbSNP rs numbers for 2504 individuals [15]. The 1000 Genomes Project variants matching the dbSNP rs number of TCGA variants were selected using the “filter” function of the “dplyr” R package.

A world map was presented using the “map” function of the “maps” R library. The “floating.pie” function of the “plotrix” R library was used to present the location and variant proportion of each population. The global positioning system (GPS) information for each population was obtained from GitHub (https://github.com/sinarueeger/map-1000genomes).

Machine learning approaches of clinical data, DEG, and DMR results

Integrative analysis was performed for the selected DEGs and DMRs using the clinical data. Decision tree is the machine learning approach that used for both classification and regression tasks. The decision tree algorithm recursively divides the dataset into subsets based on the values of different attributes. The aim is to create that are as pure as possible with respect to the target variable. Model design and visualization for decision trees were performed using “rpart” and “rpart.plot” libraries. The models were fitted and tuned for each cancer. The decision tree model was presented by selecting the cost complexity pruning (cp) value with minimum error.

Results

TCGA variants processing

In 22 cancers out of a total of 33 cancers, at least one variant overlapped with the variants found in the 23 datasets obtained from LocusZoom. Over 20 variants overlapped in seven cancers (Fig. 2), and at least one variant was observed in 10 or more patients in four cancer types (TCGA-COAD, TCGA-UCEC, TCGA-SKCM, and TCGA-STAD). In TCGA-SKCM samples, only two of the 103 patients had variants. We excluded TCGA-SKCM from the DEG and DMR analyses because t-test was performed using at least three samples per group in DEG and DMR analysis (Table 1).

Table 1 Descriptions of TCGA dataset

Full size table

Common variant selection of TCGA and LocusZoom

The TCGA single nucleotide variation (SNV) dataset from 33 cancers and variants satisfying log10 (p-value) > 5 were selected from 230 GWAS datasets. The two datasets were merged as “merge” R default function by “rs number.” For the commonly detected “rs number,” the number of patients with variants for each of 33 cancers was counted. TCGA 22 cancers found in at least one of the LocusZoom variants were presented as a heatmap (Fig. 2).

Sixty variants found in three cancers of TCGA and 13 studies of LocusZoom were selected. A waterfall plot was presented for 21 mutations, with at least 4 mutations found in 62 patients from TCGA (Fig. 3). Eleven variants were commonly found in at least six mutations in TCGA and LocusZoom (Table 2).

Table 2 Most of the eleven variants presented were found six or more times in the TGCA or LocusZoom. In the presented LocusZoom studies, at least one variant satisfying -log10(p-value) > 5 was discovered

Full size table

The chromosomal locations of common variants between TCGA and LocusZoom data are presented in a Circos plot (Fig. 4). Among them, we linked the variants of UCEC, COAD, and STAD cancers of interest. UCEC was most common on chromosome 6, COAD on chromosome 11, and STAD on chromosome 2. The connection showed a relationship between the other variants and the most abundant variant of each cancer species.

Variant distributions from 1000 genomes project data

Eight of the 11 variants were identified from the results of the 1000 Genomes Project (Phase 3). A total of 26 population variant proportions were identified and are displayed on a global map (Fig. 5). In the case of rs141502002, located in the PCSK9 gene, it was discovered in patients with STAD and UCEC, and was discovered in eight studies of LocusZoom. Nevertheless, low variant proportions were observed overall (Fig. 5a). The rs41288783 variant located in the APOB gene was also included in two studies by LocusZoom, including patients with STAD, but showed a low variant proportion overall (Fig. 5b). The rs113337987 variant located in the MTTP gene was found in COAD patients and LocusZoom 7 studies and showed slightly more variant proportions in the Caribbean, South America, and Southern Europe (Fig. 5c). The rs1060901 variant located in the MYLIP gene was found in COAD and LocusZoom 6 studies and was found in Europe (Fig. 5d). The rs2075799 variant located in the HSPA1L gene was found in Africa and Southeast Asia, and was found in COAD and seven LocusZoom studies (Fig. 5e). rs41269255, found in Europe, is located in the POM121L2 gene and was found in COAD in six studies (Fig. 5f). rs3135506 of the APOA5 gene, found in 16 studies of COAD and LocusZoom, showed low proportions, despite being found in several studies. Significantly lower proportions were observed, particularly in East Asia (Fig. 5g). In the case of rs12438025 found in COAD and 7 studies, it was located in the STRC gene and showed the highest variant proportions. In particular, it was very high in Africa (Fig. 5h).

Selection of DEGs and DMRs in three cancers

From three cancers (TCGA-COAD, TCGA-UCEC, and TCGA-STAD), DEGs and DMRs were selected based on whether the patients had variants. The DEGs and DMRs of TCGA-STAD were not calculated because of the insufficient minimum number of samples in the variant group (n < 3; Table 1). DEGs and DMRs in the three cancers were selected based on fold changes and p-values. The threshold of fold change was |FC| > 0.2 for TCGA-COAD and TCGA-UCEC, and |FC| > 0.3 for TCGA-STAD. Thresholds of p-values were PV < 0.01 for DEGs in TCGA-COAD and TCGA-UCEC, and PV < 10⁻¹⁰ in TCGA-STAD. The thresholds of p-values were PV < 10⁻¹² for DMRs in all cases (Table 1).

For TCGA-COAD, 10 DEGs were selected (SELENBP1, XKR9, PCP4, TUSC8, PRAC1, RBP4, PGGHG, RUBCNL, TLE2, ACVRL1) and eight DMRs were selected (cg01785505, cg00014484, cg01440570, PRKCZ, SEMA3D, ELF5, cg06506363, MUC6). In the DEG analysis, only one gene was overexpressed in the variant group, and in the DMR analysis, there was no CpG site that was underexpressed in the variant group. The most overexpressed gene in the variant group was XKR9 and the most hypermethylated CpG site was cg01440570 (Fig. 6).

For TCGA-STAD, five DEGs (PRSS1, CYP2B6, BMP7, BEX2, and SEPRINA5) and five DMRs (WHAMM, cg13686615, cg23045594, FOXK1, and PPT2) were selected. The most underexpressed gene in the variant group was CYP2B6, and the most hypermethylated CpG site was located in PPT2 (Fig. 7).

For TCGA-UCEC, four DEGs (ENSG0000213058, PHYHD1, TWIST1, and MUC16) and three DMRs (TP73, cg02621287, and PHACTR1) were selected. The gene with the most statistically significant difference between the two groups was TWIST1 in RNA-seq, and the CpG site was located in the TP73 gene in the methylation analysis (Fig. 8).

eQTL and mQTL analysis

The eQTL and mQTL analyses were conducted on the genes identified in the DEG and DMR analyses. For a total of three cancers, boxplots are presented for genes that are presented in heatmaps by variants. For TCGA-COAD, 10 DEGs (Fig. 9a; 10 genes) and eight DMRs (Fig. 9 b; eight CpG sites) were analyzed. For TCGA-STAD, five DEGs (Fig. 9 c; five genes) and five DMRs (Fig. 9 d; five CpG sites) were analyzed. Finally, for TCGA-UCEC, four DEGs (Fig. 9 e; four genes) and three DMRs (Fig. 9f; three CpG site) were analyzed. All DEGs were identified from RNA-seq data, and DMRs were obtained from the Illumina 450 k chip. Two groups were separated by the presence or absence of variants (Fig. 9).

Decision tree for survival prediction

A decision tree was designed to determine survival for the three cancer types. The expression or methylation of each gene presented in the heatmap and QTL were targeted as input features. Two clinical features, sex and age, were used as input features. Therefore, two clinical features, along with 18 genomic features in TCGA-COAD, 10 in STAD, and seven features in UCEC, were used to distinguish survival. No decision tree has been designed for TCGA-UCEC. In TCGA-COAD and TCGA-STAD, which are digestive cancers, survival was confirmed with seven and three nodes, respectively (Fig. 10).

Discussion

High proportions of variants in the cancer genome are derived from somatic variants, whereas most variants from chronic diseases are from germline variants. Therefore, variants related to chronic diseases and cancers are considered to have a low relevance. Nevertheless, the relationship between the variants could be an important factor in the treatment of cancer and chronic diseases.

Cancer and germline variants are related [22], and a variety of evidences have emerged. For example, genes such as BRCA are affected by germline variants. In particular, germline variants in eQTL and mQTL affect cancer progression and patient survival [23]. In addition, GWAS has shown that variants are related to chronic diseases and cancer prognosis [1, 2, 4]. Therefore, in this study, we aimed to identify cancer-related factors from a chronic disease-related variant database (LocusZoom) and TCGA.

This study revealed germline variants in three cancers related to somatic variants from the clinical data of patients with chronic disease using statistical analysis. There were statistically significant variants in the three cancer types. SELENBP1, XKR9, PCP4, TUSC8, PRAC1, RBP4, PGGHG, RUBCNL, TLE2, and ACVRL1 were identified as DEG, and cg01785505, cg00014484, cg01440570, PRKCZ, SEMA3D, ELF5, cg06506363, and MUC6 CpG sites or genes were observed as DMRs of COAD. PRSS1, CYP2B6, BMP7, BEX2, and SEPRINA5 were identified as DEG and WHAMM, cg13686615, cg23045594, FOXK1, and PPT2 CpG sites, and genes were observed in the DMRs of STAD. ENSG0000213058, PHYHD1, TWIST1, and MUC16 genes were identified as DEG and TP73, cg02621287, and PHACTR1 CpG sites or genes were observed in the DMRs of UCEC. In QTL analysis, the expression or methylation levels of each gene are presented as boxplots by variant.

COAD can be classified into four subtypes (CIN, EBV, MSI, and GS), and the different subtype proportions and variant patterns were revealed by region [24]. Therefore, a world map was presented to present the location and proportion of the 11 variants for each population. As shown in the results, the variants showed different rates in each population. Therefore, we can expect ancestral differences to appear in the chronic diseases and cancer characteristics associated with the selected variants. This hypothesis should be further tested with a larger dataset and validated using experimental methods from COAD tissues in different regions. Eight variants were found in the 1000 Genomes Project, of which only two variants were found in STAD and UCEC. The variants were found at a rare rate in a total of 26 populations of the 1000 Genomes Project. This means that compared to STAD and UCEC, mutations related to COAD show relatively greater differences depending on the population.

Decision trees were used to classify the survival status of the patients with cancer. The decision tree results showed that the selected DEGs and DMRs explained the survival prediction. We concluded that chronic disease-related variants were associated with at least two cancers. Therefore, the analysis results and methods of this study can be used for cancer progression research, patient prognosis prediction, and diagnosis [25]. In addition, from the perspective of preventive medicine, this study could help regional cancer and chronic disease prevention, and develop diagnosis strategies.

References

Bullard T, Ji M, An R, Trinh L, Mackenzie M, Mullen SP. A systematic review and meta-analysis of adherence to physical activity interventions among three chronic conditions: cancer, cardiovascular disease, and diabetes. BMC Public Health. 2019;19(1):636.
Article PubMed PubMed Central Google Scholar
Li Y, Schoufour J, Wang DD, Dhana K, Pan A, Liu X, Song M, Liu G, Shin HJ, Sun Q, et al. Healthy lifestyle and life expectancy free of cancer, cardiovascular disease, and type 2 diabetes: prospective cohort study. BMJ. 2020;368:l6669.
Article PubMed PubMed Central Google Scholar
Renzi C, Kaushal A, Emery J, Hamilton W, Neal RD, Rachet B, Rubin G, Singh H, Walter FM, de Wit NJ, et al. Comorbid chronic diseases and cancer diagnosis: disease-specific effects and underlying mechanisms. Nat Rev Clin Oncol. 2019;16(12):746–61.
Article PubMed CAS Google Scholar
Hartman M, Loy EY, Ku CS, Chia KS. Molecular epidemiology and its current clinical use in cancer management. The Lancet Oncology. 2010;11(4):383–90.
Article PubMed CAS Google Scholar
Xing J, Myers RE, He X, Qu F, Zhou F, Ma X, Hyslop T, Bao G, Wan S, Yang H, et al. GWAS-identified colorectal cancer susceptibility locus associates with disease prognosis. Eur J Cancer. 2011;47(11):1699–707.
Article PubMed CAS Google Scholar
Huang S. Genetic and non-genetic instability in tumor progression: link between the fitness landscape and the epigenetic landscape of cancer cells. Cancer Metastasis Rev. 2013;32(3):423–48.
Article PubMed Google Scholar
Brock A, Chang H, Huang S. Non-genetic heterogeneity — a mutation-independent driving force for the somatic evolution of tumours. Nat Rev Genet. 2009;10(5):336–42.
Article PubMed CAS Google Scholar
Gibson G, Powell JE, Marigorta UM. Expression quantitative trait locus analysis for translational medicine. Genome Medicine. 2015;7(1):60.
Article PubMed PubMed Central Google Scholar
Nica AC, Dermitzakis ET. Expression quantitative trait loci: present and future. Philosophical Transactions of the Royal Society B: Biological Sciences. 2013;368(1620):20120362.
Article Google Scholar
Tomczak K, Czerwińska P, Wiznerowicz M. ReviewThe Cancer genome atlas (TCGA): an immeasurable source of knowledge. Contemporary Oncology/Współczesna Onkologia. 2015:68–77.
Malta TM, Sokolov A, Gentles AJ, Burzykowski T, Poisson L, Weinstein JN, Kamińska B, Huelsken J, Omberg L, Gevaert O, et al. Machine Learning Identifies Stemness Features Associated with Oncogenic Dedifferentiation. Cell. 2018;173(2):338–354.e315.
Article PubMed PubMed Central CAS Google Scholar
Chen H, Li C, Peng X, Zhou Z, Weinstein JN, Caesar-Johnson SJ, Demchok JA, Felau I, Kasapi M, Ferguson ML, et al. A Pan-Cancer Analysis of Enhancer Expression in Nearly 9000 Patient Samples. Cell. 2018;173(2):386–399.e312.
Article PubMed PubMed Central CAS Google Scholar
Liu J, Lichtenberg T, Hoadley KA, Poisson LM, Lazar AJ, Cherniack AD, Kovatich AJ, Benz CC, Levine DA, Lee AV, et al. An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics. Cell. 2018;173(2):400–416.e411.
Article PubMed PubMed Central CAS Google Scholar
Sanchez-Vega F, Mina M, Armenia J, Chatila WK, Luna A, La KC, Dimitriadoy S, Liu DL, Kantheti HS, Saghafinia S, et al. Oncogenic Signaling Pathways in The Cancer Genome Atlas. Cell. 2018;173(2):321–337.e310.
Article PubMed PubMed Central CAS Google Scholar
Clarke L, Fairley S, Zheng-Bradley X, Streeter I, Perry E, Lowy E, Tassé A-M, Flicek P. The international genome sample resource (IGSR): a worldwide collection of genome variation incorporating the 1000 genomes project data. Nucleic Acids Res. 2017;45(D1):D854–9.
Article PubMed CAS Google Scholar
Bonham VL, Green ED, Pérez-Stable EJ. Examining how race, ethnicity, and ancestry data are used in biomedical research. JAMA. 2018;320(15):1533–4.
Article PubMed PubMed Central Google Scholar
Duzkale H, Shen J, McLaughlin H, Alfares A, Kelly MA, Pugh TJ, Funke BH, Rehm HL, Lebo MS. A systematic approach to assessing the clinical significance of genetic variants. Clin Genet. 2013;84(5):453–63.
Article PubMed PubMed Central CAS Google Scholar
Bachtiar M, Lee CGL. Genetics of population differences in drug response. Current Genetic Medicine Reports. 2013;1(3):162–70.
Article Google Scholar
Colaprico A, Silva TC, Olsen C, Garofano L, Cava C, Garolini D, Sabedot TS, Malta TM, Pagnotta SM, Castiglioni I. TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Res. 2016;44(8):e71–1.
Article PubMed Google Scholar
Pruim RJ, Welch RP, Sanna S, Teslovich TM, Chines PS, Gliedt TP, Boehnke M, Abecasis GR, Willer CJ. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics. 2010;26(18):2336–7.
Article PubMed PubMed Central CAS Google Scholar
Gu Z, Eils R, Schlesner M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics. 2016;32(18):2847–9.
Article PubMed CAS Google Scholar
Lichtenstein P, Holm NV, Verkasalo PK, Iliadou A, Kaprio J, Koskenvuo M, Pukkala E, Skytthe A, Hemminki K. Environmental and heritable factors in the causation of Cancer — analyses of cohorts of twins from Sweden, Denmark, and Finland. N Engl J Med. 2000;343(2):78–85.
Article PubMed CAS Google Scholar
Chatrath A, Ratan A, Dutta A. Germline variants that affect tumor progression. Trends Genet. 2021;37(5):433–43.
Article PubMed CAS Google Scholar
Thrumurthy SG, Thrumurthy SSD, Gilbert CE, Ross P, Haji A. Colorectal adenocarcinoma: risks, prevention and diagnosis. BMJ. 2016;354:i3590.
Article PubMed Google Scholar
Parsons DW, Roy A, Yang Y, Wang T, Scollon S, Bergstrom K, Kerstein RA, Gutierrez S, Petersen AK, Bavle A, et al. Diagnostic yield of clinical tumor and germline whole-exome sequencing for children with solid tumors. JAMA Oncology. 2016;2(5):616–24.
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

Not applicable.

Funding

This research was supported by a grant from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI) funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HI21C0012), and the National Research Foundation (NRF) funded by the Ministry of Education (grant number: NRF-2020R1I1A1A01052701).

Author information

Soohyun Jeon and Chaewon Park contributed equally to this work.

Authors and Affiliations

Department of Brain and Cognitive Engineering, Korea University, Seoul, 02841, South Korea
Soohyun Jeon
School of Biomedical Engineering, Korea University, Seoul, 02841, South Korea
Chaewon Park & Sung-yune Joe
Interdisciplinary Program in Precision Public Health, Korea University, Seoul, 02841, South Korea
Chaewon Park & Sung-yune Joe
Department of Microbiology, Institute for Viral Diseases, College of Medicine, Korea University, Seoul, 02841, South Korea
Jineui Kim
Department of Pharmacology, College of Medicine, Korea University, Seoul, 02841, South Korea
Jung Hoon Lee
Division of Pulmonary, Allergy and Critical Care Medicine, Department of Internal Medicine, Korea University Guro Hospital, Seoul, 08308, South Korea
Young Kyung Ko
Department of Medical Science, Soonchunhyang University, Asan, 31538, South Korea
Jeong-An Gim

Authors

Soohyun Jeon
View author publications
You can also search for this author in PubMed Google Scholar
Chaewon Park
View author publications
You can also search for this author in PubMed Google Scholar
Jineui Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jung Hoon Lee
View author publications
You can also search for this author in PubMed Google Scholar
Sung-yune Joe
View author publications
You can also search for this author in PubMed Google Scholar
Young Kyung Ko
View author publications
You can also search for this author in PubMed Google Scholar
Jeong-An Gim
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Methodology, S.J., C.P., Y.K.K., J.-A.G.; software, J.-A. G.; validation, Y.K.K.; formal analysis, S.J., C.P., J.K., J.H.L., S.-y.J.; investigation, Y.K.K., J.-A.G.; resources, J.-A.G.; data curation, S.J., C.P., Y.K.K., J.-A.G.; writing—original draft preparation, S.J., C.P., Y.K.K., J.-A.G.; writing—review and editing, Y.K.K., J.-A.G.; visualization, C.P., J.-A.G.; supervision, Y.K.K., J.-A.G.; project administration, J.-A.G.; funding acquisition, J.-A.G. All authors read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Jeong-An Gim.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

This research presented the results of a study conducted as part of the “ Introduction to Next-Generation Sequencing technologies” class at Korea University School of Medicine, where J.-A.G. is the professor in charge and S.J., C.P., J.K., J.H.L., and S.-y.J. are students. Y.K.K. declared no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Jeon, S., Park, C., Kim, J. et al. Comparing variants related to chronic diseases from genome-wide association study (GWAS) and the cancer genome atlas (TCGA). BMC Med Genomics 16, 332 (2023). https://doi.org/10.1186/s12920-023-01758-7

Download citation

Received: 18 January 2023
Accepted: 01 December 2023
Published: 19 December 2023
DOI: https://doi.org/10.1186/s12920-023-01758-7

Comparing variants related to chronic diseases from genome-wide association study (GWAS) and the cancer genome atlas (TCGA)