Skip to main content

A predictive model for assessing prognostic risks in gastric cancer patients using gene expression and methylation data

Abstract

Background

The role(s) of epigenetic reprogramming in gastric cancer (GC) remain obscure. This study was designed to identify methylated gene markers with prognostic potential for GC.

Methods

Five datasets containing gene expression and methylation profiles from GC samples were collected from the GEO database, and subjected to meta-analysis. All five datasets were subjected to quality control and then differentially expressed genes (DEGs) and differentially expressed methylation genes (DEMGs) were selected using MetaDE. Correlations between gene expression and methylation status were analysed using Pearson coefficient correlation. Then, enrichment analyses were conducted to identify signature genes that were significantly different at both the gene expression and methylation levels. Cox regression analyses were performed to identify clinical factors and these were combined with the signature genes to create a prognosis-related predictive model. This model was then evaluated for predictive accuracy and then validated using a validation dataset.

Results

This study identified 1565 DEGs and 3754 DEMGs in total. Of these, 369 were differentially expressed at both the gene and methylation levels. We identified 12 signature genes including VEGFC, FBP1, NR3C1, NFE2L2, and DFNA5 which were combined with the clinical data to produce a novel prognostic model for GC. This model could effectively split GC patients into two groups, high- and low-risk with these observations being confirmed in the validation dataset.

Conclusion

The differential methylation of the 12 signature genes, including VEGFC, FBP1, NR3C1, NFE2L2, and DFNA5, identified in this study may help to produce a functional predictive model for evaluating GC prognosis in clinical samples.

Peer Review reports

Background

Gastric cancer (GC) is the fifth-most common cancer in the world and is associated with high mortality and dismal prognosis as a result of its delayed diagnosis [1, 2]. There are approximately 1 million new cases of GC diagnosed every year and the mortality is about 720,000 deaths per year worldwide [3]. In western countries, the mortality for GC is extremely high because diagnostic screening approaches are insufficient and most patients (≥ 50%) are only diagnosed at later stages [2]. Understanding the underlying pathogenesis of GC has facilitated the identification of novel molecular biomarkers, which researchers hope will help to advance the diagnosis of this disease at earlier stages.

Many genetic and epigenetic events have been linked to carcinogenesis. The major hallmarks of any epigenetic events include alterations at the promoter CpG sites within the gene or changes to the histone at the chromatin level, and the most widely studied and best characterised epigenetic events include differential methylation of tumour suppressors and oncogenes [4].

In GC, multiple epigenetic modifications have been linked to disease progression, and these alterations may contribute to the identification of biomarkers for early diagnosis [4]. Using epigenome wide and gene-specific DNA methylation analyses, a DNA methylation biomarker panel, which includes IRF4, ELMO1, CLIP4, and MSC, has been found to link GC and gastritis, and this panel has been shown to be useful in endoscopic biopsies allowing for the earlier detection of GC in these samples [5]. In a study of the Chinese population COX-2 methylation levels were shown to be reduced in the anti-Helicobacter pylori intervention group, compared with the placebo group, indicating that this differential methylation might be a useful indicator of chemoprevention efficacy for GC [6]. The aberrant methylation of the tumour suppressor gene, FAT4, in peripheral blood leukocytes has been linked to increased GC risk [7]. While decreased expression of HOXB13, caused by methylation of its promoter, is a reliable marker for poor prognosis in GC [8]. Increased methylation of the LINE1 and IGF2 differentially methylated regions (DMRs) correlates with more aggressive GC phenotypes and thus are considered potential biomarkers for GC progression [9].

Despite these encouraging findings, the underlying gene methylation mechanisms used in GC remain obscure. In addition, evaluation of methylation in samples collected in previous studies remains relatively rare. Therefore, we searched GC-related gene and methylation expression profiles found in public databases, and combined these data using meta-analysis, to enlarge the sample size and enhance statistical power. Using a series of bioinformatics tools and survival analysis, we could reveal the association between gene methylation and GC prognosis identifying several novel prognostic biomarkers.

Methods

Data resource and sample classification

Dataset for meta-analysis

The gene expression and methylation profile datasets were selected from the GEO database (http://www.ncbi.nlm.nih.gov/geo/), using the keywords “gastric cancer” and “Homo sapiens”. The inclusion criteria for the datasets were as follows: (1) the dataset must include both gastric cancer tumour samples and normal tissue samples; and (2) have at least 50 samples in the dataset. Five eligible gene expression and two methylation profile datasets were identified and downloaded. GSE26942 was from the GPL6947 platform and consisted of 205 gastric tumour tissue samples (GC samples) and 12 gastric normal tissue samples (normal samples) (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE26942). GSE29727, which included 134 GC samples and 134 normal samples, (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE29727) was isolated from GPL96. GSE54129, which included 111 GC samples and 21 normal samples, (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE54129) and GSE64951, which had 63 GC samples and 31 normal samples, (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE64951) were both from GPL570. GSE65801 was from the GPL14550 and consisted of 64 samples made up of 32 GC and 32 normal tissues (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE65801). Both of the methylation datasets were identified from the GPL8490 platform, GSE25869 comprised 74 samples (GC sample, n = 42; normal sample, n = 32, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE25869), and GSE30601 had 297 samples (GC sample, n = 203; normal sample, n = 94, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE30601). The attributes for each of these datasets are summarised in Table 1. Detailed clinical information for each of the samples in these datasets was collected from the data derived in the GEO database.

Table 1 Information of gene expression and methylation profiles in the datasets included in the meta-analysis

Predictive modelling dataset

Relative gastric cancer gene expression and methylation profiles were downloaded from The Cancer Genome Atlas (TCGA, https://gdc-portal.nci.nih.gov/) database. Then, the gene expression and gene methylation profiles were matched. This created a single dataset containing a total of 398 matched tumour samples. Of these, 360 samples had complete prognostic information. These data were then used as the training dataset for the predictive model for prognosis developed in our study. Another set of gastric cancer-related gene expression profiles, GSE62254, was then downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE62254). This dataset was from the GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array platform and included the data from.300 gastric cancer tumour tissue samples, and was used as an independent validation dataset. Patients’ clinical characteristics are listed in Table 2. The mean age of the patients was 64.9 years in the TCGA dataset and 61.9 years in the validation dataset. The majority of patients in the TGCA dataset had stage II and III disease while in the validation dataset had stage II to IV disease.

Table 2 Clinical information of patients in the TCGA training dataset and GSE62254 validation dataset

Data normalisation and consistency selection

Data used for meta-analysis

Three of the five datasets used for meta-analysis, GSE29727, GSE54129, and GSE64951 were from the Affymetrix platform. The raw data from these three datasets was downloaded in the CEL format and were then transformed into gene symbols, their missing values were filled in using the median method, and then subjected to background correction using the minimal sets algorithm method and normalised using the quantiles method [10]. All of these methods were included in the oligo package from R (version 3.4.1, http://www.bioconductor.org/packages/release/bioc/html/oligo.html). The other two datasets, GSE26942 and GSE65801, were from the Illumina and Agilent platforms, respectively. These datasets were downloaded in TXT format and gene annotation was performed using the probe information provided by the platform. This data was then subjected to a logarithmic transformation and normalised using the quantiles method. These steps were performed using the limma package from R (version 3.4.1, https://bioconductor.org/packages/release/bioc/html/limma.html).

In the case of the gene methylation datasets, GSE25869 and GSE30601, the corresponding chromosome locations and methylated beta values were evaluated and assigned using the Methylation Module in GenomeStudio [PMID: 22498030] [11].

Quality control and consistency selection

Given the fact that these datasets were all derived from different platforms, meta-analysis was used to combine consistent data from across these datasets into a single larger dataset generating better statistical power and improving the reliability of the results. To eliminate potential bias, produced by differences in the platforms used to generate this data, all of the datasets were subjected to quality control using the criteria established in the MetaQC package from R (version 3.4.1, https://cran.r-project.org/web/packages/MetaQC/index.html). A total of five parameters, internal quality control (IQC), external quality control (EQC), accuracy quality control (AQC), consistency quality control (CQC), and standardised mean rank score (SMR), were calculated and evaluated. Then the reliable datasets were further analysed using the MetaDE.ES package (https://cran.r-project.org/web/packages/MetaDE) which selected the differentially expressed genes (DEGs) and differentially expressed methylation genes (DEMGs) when comparing GC and normal tissue samples. In brief, we performed a heterogeneity test for the expression of each gene on different platforms using tau2, Q value and Q pval as the measures. Then, we performed a heterogeneity test on the differential expression patterns for a gene in the integrated dataset across the different sample groups. Using this analysis we were able to determine the false discovery rate (FDR) which was validated via multiple-testing correction and we identified a value of < 0.05 as the significance threshold value for DEGs and DEMGs between different sample groups [12]. To ensure each signature gene exhibited consistent expression across different datasets, the thresholds for the homogeneity test were set as tau2 = 0 and Q pval > 0.05.

Correlation analysis between gene expression and methylation

The datasets containing DEGs and DEMGs were selected and compared. Overlapping genes, those that were both differentially expressed and exhibited altered methylation levels, were identified using the cor function in R (version 3.4.1, http://127.0.0.1:19124/library/stats/html/cor.html). The Pearson coefficient (CC) for gene expression and methylation levels was calculated, and the genes with significant associations with the methylation data were selected as candidate signature genes. These signature genes were then subjected to gene oncology (GO) biological functional enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG, https://www.kegg.jp/) pathway enrichment analyses, using the Database for Annotation, Visualization and Integrated Discovery (DAVID, version 6.8, https://david.ncifcrf.gov/) by a hypergeometric distribution [13].

Screening genetic prognostic biomarkers and clinical factors

Combining the identified tumour signature genes with the corresponding clinical factor information, we evaluated their correlation using univariate and multivariate cox regression analyses in the R survival package (version 3.4.1, http://bioconductor.org/packages/survivalr/). The threshold for significance was P < 0.05 when subjected to a log-rank test.

Construction and validation of a predictive model of prognostic risk

Construction and validation of a tumour signature gene-based predictive model

Based on the prognostic information identified in the previous step, we produced a tumour signature gene-based predictive model which was then used to calculate the prognosis index (PI) value for each sample. The median PI score was set as the cut-off for classifying samples as high- or low-risk in the training dataset. Then, Kaplan–Meier (KM) survival curves were constructed using the survival package from R (version 3.4.1, http://bioconductor.org/packages/survival/) and used to evaluate the correlations between the predictive model and clinical outcome [14]. Meanwhile, these correlations were validated using the validation dataset. The area under the receiver operating characteristic (ROC) curve (AUROC) was used to determine the predictive accuracy of this model for both the training and validation datasets. The closer the AUROC value to 1.0 the higher the accuracy of the predictive model.

Construction of a prognostic clinical factor-based predictive model

We used a cox regression analysis to use the prognostic clinical factor information from each dataset to generate a clinical factor-based predictive model. In this model, the PI for each sample was calculated and the median value was set as the cut-off for the high- and low-risk groups from the training dataset. Likewise, the KM survival curve was used to assess the relationships between the predictive model and clinical prognosis. These results were also assessed using the validation dataset.

Construction of the predictive model integrating signature genes and clinical factors

We created a novel integrated prognostic model for GC by combining the prognostic results from the signature gene-based model with those from the clinical factor-based model. A new PI value was then calculated for each and the samples in the training dataset were divided into high- and low-risk groups using these new median values. KM survival curves were then used to evaluate the predictive value of these PI values and the results were evaluated in the validation dataset.

Results

Selection of DEGs and DEMGs

After data normalisation, quality control of the datasets was conducted, and values for IQC, EQC, AQC, CQC, and SMR were calculated. The quality results indicated these datasets were all eligible for meta-analysis. Finally, we identified 1565 DEGs and 3754 DEMGs with a high degree of consistency between GC and normal samples, using MetaDE. The evaluation and identification process is described in Fig. 1.

Fig. 1
figure1

Flow chart describing the analytical process in this study. DEG differentially expressed gene, DEMG differentially expressed methylation genes, PCC Pearson coefficient correlation

Correlation between gene expression and methylation

The DEGs and DEMGs were compared and matched, identifying 396 overlapping sequences that were differentially expressed at both the gene expression and methylation levels. We analysed the correlations between these values for each of the 396 genes identified from TCGA and GSE30601-GSE15460 (methylation profile with the matched gene profile) datasets. These evaluations indicated that overall gene expression was negatively associated with overall methylation in both TCGA (CC =  − 0.5145, P = 0.004) and GSE30601-GSE15460 (CC =  − 0.72704, P < 0.001) datasets. Given this, we then calculated the correlation values for gene expression and methylation for each gene, and genes with negative relationships, that is those genes with high degrees of methylation and low expression or vice versa, were retained. This evaluation narrowed our gene cohort to roughly 274 genes which were then evaluated as tumour signature genes.

Using GO functional and KEGG pathway enrichment analyses these 274 genes were found to be significantly enriched in 14 biological process categories including steroid metabolic process (NR3C1), fructose metabolic process (FBP1), regulation of cell migration (VEGFC), vitamin metabolic process (ACADM); and five pathway terms such as glycolysis/Gluconeogenesis (FBP1) (Table 3).

Table 3 Enrichment results of the candidate gene markers

Prognostic gene biomarkers and clinical factors

These 274 genes were then subjected to univariate and multivariate cox regression analyses to identify the prognostic genes and clinical factors. We finally selected 12 genes (SLC5A5, SLC7A6, NFE2L2, DFNA5, VEGFC, MUM1, TRIB2, MCOLN1, FBP1, ACADM, WDR37, and NR3C1) that demonstrated a significant correlation with clinical prognosis (Table 4), and five independent clinical factors (age, pathologic_N, pathologic_T, targeted molecular therapy, and new tumour) for our predictive models (Table 5). The KM survival curves for each are shown in Fig. 2.

Table 4 Gene markers significantly related to the prognosis
Table 5 Clinical factors identified using cox regression analysis
Fig. 2
figure2

The Kaplan–Meier (KM) survival curves for five clinical factors. a age; b pathologic_N; c pathologic_T; d new tumour; e targeted molecular therapy

We developed a clustering heatmap showing the expression and methylation status of all 12 of our tumour signature genes and then combined these with their corresponding prognostic clinical factors (Fig. 3). This analysis revealed that four clinical factors, age, pathologic_N, targeted molecular therapy, and new tumour, were significantly associated with the prognosis of patients in the two clusters (P < 0.01).

Fig. 3
figure3

Clustered heatmap describing the gene expression and methylation patterns of the 12 signature genes and their correlation with specific prognostic clinical factors

Construction and validation of an integrated prognostic risk prediction model

The signature gene-based risk predictive model

After obtaining the prognostic gene information using the cox regression algorithm, we constructed a prognostic risk prediction model using our 12 signature genes. The samples in the training dataset were classified as high- or low-risk with their cut-off set to the median PI values.

In the training dataset, the survival analysis indicated that patients in the low-risk group had a significantly longer median overall survival (OS) (22.1 m vs. 15.1 m, P < 0.001, Fig. 4a) and median disease free survival (DFS) (22.1 m vs. 14.5 m, P < 0.001, Fig. 4b), than those patients in the high-risk group. The AUROC for the OS and DFS curves were 0.997 and 0.906, respectively (Fig. 4e), suggesting that both had a high predictive accuracy.

Fig. 4
figure4

Survival curves generated using the gene-based predictive model. a Overall survival (OS) of patients from TCGA dataset; b disease free survival (PFS) of patients from TCGA; c OS of patients from the validation dataset; d DFS of patients from the validation dataset; e the area under the ROC (AUROC) for the survival curves from both TCGA and validation datasets

Similarly, in the validation dataset GSE62254, patients in the low-risk group had better survival rates compared with those in the high-risk group as evidenced by a prolonged median OS (55.8 m vs. 44.2 m, P = 0.009, Fig. 4c) and DFS (41.6 m vs. 30.6 m, P = 0.036, Fig. 4d) value. The AUROCs for these two outcomes were 0.995 and 0.912, respectively (Fig. 4e), indicating that this model created values with high predictive accuracy.

The clinical factor-based prognostic risk model

Five clinical factors were identified using a cox regression algorithm, and based on the weight of their coefficients, samples containing all five factors were selected (n = 283) to establish the clinical factor-based predictive model for prognostic risk. The PI of each sample was calculated and then used to classify the samples.

Survival analysis showed that the OS (19.5 m vs. 16.6 m, P = 0.005, Fig. 5a) and DFS (18.5 m vs. 17.8 m, P = 0.048, Fig. 5b) of patients were significantly prolonged in the low-risk group when compared with those of the high-risk group. The AUROC was determined to be 0.923 and 0.921, respectively (Fig. 5e).

Fig. 5
figure5

Survival curves generated using the prognostic clinical factor-based predictive model. a OS of patients from TCGA dataset; b DFS of patients from TCGA; c OS of patients from the validation dataset; d DFS of patients from the validation dataset; e AUROC for each of the survival curves from both the TCGA and validation datasets

In the GSE62254 validation dataset, only three clinical factors, age, pathologic_N, and pathologic_T, were available. Thus, we adapted the three clinical factor-based predictive model to build a clinical factor-based model, to validate the results derived in TCGA training dataset. We again showed that patients in the low-risk group had significantly prolonged OS (58.9 m vs. 37.2 m, P < 0.0001, Fig. 5c) and DFS (44.4 m vs. 24.3 m, P < 0.0001, Fig. 5d) compared to patients in the high-risk group. The AUROC values for OS and DFS were 0.897 and 0.882, respectively (Fig. 5e).

Building an integrated predictive model incorporating both clinical factors and signature gene expression

The integrated predictive model was constructed by combining the weight coefficient from the 12 signature genes and five clinical factors. Then, the PI of each sample was re-calculated, and then reassigned as low- or high-risk.

The patients in TGCA dataset low-risk group had a significantly prolonged OS (20.3 m vs. 15.8 m, P < 0.001, Fig. 6a) and DFS (19.7 m vs. 14.6 m, P = 0.003, Fig. 6b) compared with those in the high-risk group. The AUROCs of the two outcomes were 0.985 and 0.939, respectively (Fig. 6e).

Fig. 6
figure6

Survival curves generated using the integrated predictive. a OS of patients from the TCGA dataset; b DFS of patients from TCGA; c OS of patients from the validation dataset; d DFS of patients from the validation dataset; e AUROC values for the survival curves generated from both the TCGA and validation datasets

To evaluate the GSE62254 validation dataset, the clinical factors were reduced from five to three based on data availability and new PI values were calculated. Patients were then categorised as high or low risk and then their survival was evaluated. Patients in the low-risk group were shown to have significantly longer survival rates than patients in the high-risk group (OS: 62.3 m vs. 38.0 m, P < 0.0001, Fig. 6c; DFS: 47.9 m vs. 24.6 m, P < 0.0001, Fig. 6d). The AUROC values for OS and DFS were 0.942 and 0.921, respectively (Fig. 6e).

Discussion

Here, we used a series of bioinformatics analyses to construct a predictive model for evaluating the prognosis of GC samples using 12 signature genes and five clinical factors. All 12 signature genes were also differentially methylated and could be used to split patients into high and low risk groups. These models were all validated using a validation set.

When we evaluated our 12 signature genes, four were identified as particularly interesting, vascular endothelial growth factor c (VEGFC), nuclear receptor subfamily 3 group c member 1 (NR3C1), nuclear factor, erythroid 2 like 2 (NFE2L2), and fructose-1,6-bisphosphatase-1 (FBP1). VEGFC has been reported to be a key regulator in GC progression and its encoded protein facilitates angiogenesis and endothelial cell growth. In addition, oxidised low-density lipoprotein (oxLDL) is a risk factor in the pathogenesis of cancers linked to its roles in abnormal lipid metabolism, and has been shown to promote lymphatic metastasis of GC via the up-regulated expression and secretion of VEGFC [15]. microRNA (miR)-27b acts as a potential tumour suppressor in GC and targets VEGFC expression [16], while miR-101 promotes cisplatin (DPP)-induced apoptosis partly via its targeting of VEGFC in DDP-resistant GC cells [17]. In addition, VEGFC expression is associated with the GC prognosis, as survival is significantly poorer in VEGFC-positive GC patients, when compared to VEGFC-negative patients [18]. Moreover, decreased VEGFC was shown to correlate with an increased risk of tumour progression [19]. Here, we identified VEGFC as one of the 12 signature genes for evaluating GC prognosis and this gene was enriched in the ‘regulation of cell migration’ functional category which suggests that VEGFC methylation may be related to GC prognosis via its regulation of cell migration. However, this regulatory relationship needs to be further validated in vitro and across large populations.

FBP1 protein is a gluconeogenesis regulatory enzyme associated with metabolic acidosis. Snail is an important mediator in cancer and has been shown to be increased in GC inducing the glucose metabolism via the down-regulated expression of FBP1 [20] indirectly regulating the epithelial-mesenchymal transition (EMT). Decreased FBP1 serves as a positive factor in the metastasis of GC and is an indicator of poor prognosis in patients [21]. In GC cell lines, FBP1 is downregulated and its promoter is hypermethylated, resulting in increased carcinogenesis. Moreover, the methylation of FBP1 at its promoter has been independently associated with GC prognosis [22]. This was consistent with our findings that FBP1 was identified as one of the 12 signature genes having some predictive value for GC prognosis. Additionally, this gene was enriched in glycometabolism-related functions and pathways. When taken collectively these data indicated that DNA methylation of FBP1 may be associated with GC prognosis via the differential regulation of the glycometabolism.

The NR3C1 gene encodes a glucocorticoid receptor. NR3C1 is important in the carcinogenesis of GC and has been used as a marker to identify primary GC [23, 24]. The high degree of methylation within the NR3C1 promoter was also implicated in the initiation of GC progression, and four SNPs at this locus have been shown to be strongly associated with increased risk for GC in a Chinese population [20]. Here, we confirmed the link between NR3C1 methylation and GC prognosis, and suggest that NR3C1 methylation may be a reliable prognostic indicator for GC.

NFE2L2, also known as NRF2, encodes a transcription factor (TF) known to participate in GC development, and its overexpression is a predictive marker for the prognosis and 5-FU resistance in GC [25]. GC patients positive for NRF2 expression are known to exhibit significantly poorer OS rates when compared to NRF2-negative patients [26]. Deafness associated tumour suppressor (DFNA5) is inactivated in GC via methylation, and this methylation is found in half of all patients with primary GC [27]. Here, we propose that there is a relationship between the methylation status of NFE2L2 and NRF2 and the prognosis of GC.

Finally, our analysis suggests that the predictive models produced in this study were relatively precise probably as a result of the increased sample size resulting from our meta-analysis. In addition, to relatively high AUROC values, our predictive models provided reliable results in our validation datasets. These encouraging results shed lights on potential regulatory mechanisms on methylation genes in GC prognosis. In addition, this pilot bioinformatics analysis will lay the foundation of exploratory biomarker analysis, which could facilitate to the prediction or indication of patients with a low risk of death and a good survival outcome. Importantly, by the identification of these sensitive methylation gene markers and the methylation patterns, we might have a deeper understanding on this malignancy progression and might develop novel targeted therapies, which could improve the survival outcomes of the patients with GC. However, several limitations remain. The expression and methylation of these signature genes should be validated in vitro and in vivo with substantial cell lines and animal samples. Moreover, perspective studies are warranted using larger clinical cohorts to validate the prognostic values of these genes before being adopted in diagnostic and prognostic settings, and we will perform these studies in future.

Conclusion

In conclusion, methylation of 12 signature genes, including VEGFC, FBP1, NR3C1, NFE2L2, and DFNA5, may be associated with the prognosis of GC, and these genes-based risk models may be a useful tool in predicting prognostic outcomes for patients at earlier stages of disease. However, these results require validation in larger patient cohorts before they can be confidently applied in a clinical setting.

Abbreviations

GC:

Gastric cancer

TCGA:

The Cancer Genome Atlas

IQC:

Internal quality control

EQC:

External quality control

AQC:

Accuracy quality control

CQC:

Consistency quality control

SMR:

Standardised mean rank score

DEGs:

Differentially expressed genes

DEMGs:

Differentially expressed methylation genes

FDR:

False discovery rate

CC:

Coefficient correlation

GO:

Gene oncology

KEGG:

Kyoto Encyclopedia of Genes and Genomes

PI:

Prognosis index

KM:

Kaplan–Meier

ROC:

Receiver operating characteristic

AUROC:

Area under the ROC

VEGFC:

Vascular endothelial growth factor c

References

  1. 1.

    Huang T, Song C, Zheng L, Xia L, Li Y, Zhou Y. The roles of extracellular vesicles in gastric cancer development, microenvironment, anti-cancer drug resistance, and therapy. Mol Cancer. 2019;18(1):62.

    PubMed  PubMed Central  Article  Google Scholar 

  2. 2.

    Petrillo A, Pompella L, Tirino G, Pappalardo A, Laterza MM, Caterino M, Orditura M, Ciardiello F, Lieto E, Galizia G, et al. Perioperative treatment in resectable gastric cancer: current perspectives and future directions. Cancers (Basel). 2019;11(3):399.

    CAS  Article  Google Scholar 

  3. 3.

    Lott PC, Carvajal-Carmona LG. Resolving gastric cancer aetiology: an update in genetic predisposition. Lancet Gastroenterol Hepatol. 2018;3(12):874–83.

    PubMed  PubMed Central  Article  Google Scholar 

  4. 4.

    Kazmi HR, Kumari S, Tiwari S, Khanna A, Narayan G. Epigenetic mechanisms and events in gastric cancer-emerging novel biomarkers. Pathol Oncol Res. 2018;24(4):757–70.

    PubMed  Article  CAS  Google Scholar 

  5. 5.

    Pirini F, Noazin S, Jahuira-Arias MH, Rodriguez-Torres S, Friess L, Michailidi C, Cok J, Combe J, Vargas G, Prado W, et al. Early detection of gastric cancer using global, genome-wide and IRF4, ELMO1, CLIP4 and MSC DNA methylation in endoscopic biopsies. Oncotarget. 2017;8(24):38501–16.

    PubMed  PubMed Central  Article  Google Scholar 

  6. 6.

    Zhang Y, Zeng HM, Nie XR, Zhang L, Ma JL, Li JY, Pan KF, You WC. Alterations of cyclooxygenase-2 methylation levels before and after intervention trial to prevent gastric cancer in a Chinese population. Cancer Prev Res (Phila). 2016;9(6):484–90.

    CAS  Article  Google Scholar 

  7. 7.

    Sun H, Zhou H, Zhang Y, Chen J, Han X, Huang D, Ren X, Jia Y, Fan Q, Tian W, et al. Aberrant methylation of and in peripheral blood leukocytes and their association with gastric cancer risk. J Cancer. 2018;9(13):2275–83.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  8. 8.

    Sui BQ, Zhang CD, Liu JC, Wang L, Dai DQ. HOXB13 expression and promoter methylation as a candidate biomarker in gastric cancer. Oncol Lett. 2018;15(6):8833–40.

    PubMed  PubMed Central  Google Scholar 

  9. 9.

    Tahara T, Tahara S, Horiguchi N, Kawamura T, Okubo M, Yamada H, Yoshida D, Ohmori T, Maeda K, Komura N, et al. Methylation status of IGF2 DMR and LINE1 in leukocyte DNA provides distinct clinicopathological features of gastric cancer patients. Clin Exp Med. 2018;18(2):215–20.

    CAS  PubMed  Article  Google Scholar 

  10. 10.

    Tuo Y, An N, Zhang M. Feature genes in metastatic breast cancer identified by MetaDE and SVM classifier methods. Mol Med Rep. 2018;17(3):4281–90.

    CAS  PubMed  PubMed Central  Google Scholar 

  11. 11.

    Turan N, Ghalwash MF, Katari S, Coutifaris C, Obradovic Z, Sapienza C. DNA methylation differences at growth related genes correlate with birth weight: a molecular signature linked to developmental origins of adult disease? BMC Med Genom. 2012;5(1):10.

    CAS  Article  Google Scholar 

  12. 12.

    Terada A, Tsuda K. Multiple testing tool to detect combinatorial effects in biology. Methods Mol Biol. 2018;1807:83–94.

    PubMed  Article  Google Scholar 

  13. 13.

    da Huang W, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009;4(1):44–57.

    CAS  Article  Google Scholar 

  14. 14.

    Goel MK, Khanna P, Kishore J. Understanding survival analysis: Kaplan–Meier estimate. Int J Ayurveda Res. 2010;1(4):274–8.

    PubMed  PubMed Central  Article  Google Scholar 

  15. 15.

    Ma C, Xie J, Luo C, Yin H, Li R, Wang X, Xiong W, Zhang T, Jiang P, Qi W, et al. OxLDL promotes lymphangiogenesis and lymphatic metastasis in gastric cancer by upregulating VEGF-C expression and secretion. Int J Oncol. 2019;54(2):572–84.

    CAS  PubMed  Google Scholar 

  16. 16.

    Chen X, Cui Y, Xie X, Xing Y, Yuan Z, Wei Y. Functional role of miR-27b in the development of gastric cancer. Mol Med Rep. 2018;17(4):5081–7.

    CAS  PubMed  PubMed Central  Google Scholar 

  17. 17.

    Li G, Yang F, Gu S, Li Z, Xue M. MicroRNA-101 induces apoptosis in cisplatin-resistant gastric cancer cells by targeting VEGF-C. Mol Med Rep. 2016;13(1):572–8.

    CAS  PubMed  Article  Google Scholar 

  18. 18.

    Ikeda K, Oki E, Saeki H, Ando K, Morita M, Oda Y, Imamura M, Kakeji Y, Maehara Y. Intratumoral lymphangiogenesis and prognostic significance of VEGFC expression in gastric cancer. Anticancer Res. 2014;34(8):3911–5.

    PubMed  Google Scholar 

  19. 19.

    Dai W, Zeller C, Masrour N, Siddiqui N, Paul J, Brown R. Promoter CpG island methylation of genes in key cancer pathways associates with clinical outcome in high-grade serous ovarian cancer. Clin Cancer Res. 2013;19(20):5788–97.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  20. 20.

    Yu J, Li J, Chen Y, Cao W, Lu Y, Yang J, Xing E. Snail enhances glycolysis in the epithelial-mesenchymal transition process by targeting FBP1 in gastric cancer. Cell Physiol Biochem. 2017;43(1):31–8.

    PubMed  Article  CAS  Google Scholar 

  21. 21.

    Li J, Wang Y, Li QG, Xue JJ, Wang Z, Yuan X, Tong JD, Xu LC. Downregulation of FBP1 promotes tumor metastasis and indicates poor prognosis in gastric cancer via regulating epithelial-mesenchymal transition. PLoS ONE. 2016;11(12):e0167857.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  22. 22.

    Liu X, Wang X, Zhang J, Lam EK, Shin VY, Cheng AS, Yu J, Chan FK, Sung JJ, Jin HC. Warburg effect revisited: an epigenetic link between glycolysis and gastric carcinogenesis. Oncogene. 2010;29(3):442–50.

    CAS  PubMed  Article  Google Scholar 

  23. 23.

    Ren W, Li W, Wang D, Hu S, Suo J, Ying X. Combining multi-dimensional data to identify key genes and pathways in gastric cancer. PeerJ. 2017;5:e3385.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  24. 24.

    Chang W, Ma L, Lin L, Gu L, Liu X, Cai H, Yu Y, Tan X, Zhai Y, Xu X, et al. Identification of novel hub genes associated with liver metastasis of gastric cancer. Int J Cancer. 2009;125(12):2844–53.

    CAS  PubMed  Article  Google Scholar 

  25. 25.

    Hu XF, Yao J, Gao SG, Wang XS, Peng XQ, Yang YT, Feng XS. Nrf2 overexpression predicts prognosis and 5-FU resistance in gastric cancer. Asian Pac J Cancer Prev. 2013;14(9):5231–5.

    PubMed  Article  Google Scholar 

  26. 26.

    Kawasaki Y, Ishigami S, Arigami T, Uenosono Y, Yanagita S, Uchikado Y, Kita Y, Nishizono Y, Okumura H, Nakajo A, Kijima Y. Clinicopathological significance of nuclear factor (erythroid-2)-related factor 2 (Nrf2) expression in gastric cancer. BMC Cancer. 2015;15(1):5.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  27. 27.

    Akino K, Toyota M, Suzuki H, Imai T, Maruyama R, Kusano M, Nishikawa N, Watanabe Y, Sasaki Y, Abe T, et al. Identification of DFNA5 as a target of epigenetic inactivation in gastric cancer. Cancer Sci. 2007;98(1):88–95.

    CAS  PubMed  Article  Google Scholar 

Download references

Acknowledgements

None.

Funding

None.

Author information

Affiliations

Authors

Contributions

DL and QLY participated in the design of this study, and they both performed the statistical analyses. HBW and MT carried out the study and collected important background information. YLZ and JL drafted the manuscript. All authors read and approved the final manuscript prior to submission.

Corresponding authors

Correspondence to YanLei Zou or Jian Liu.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Availability of data and material

The raw data were collected from the GEO public database, GSE26942 was from the GPL6947 platform (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE26942); GSE29727 was isolated from GPL96 platform (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE29727); GSE54129 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE54129) and GSE64951 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE64951) were both from GPL570. GSE65801 was from the GPL14550 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE65801); the methylation datasets of GSE25869 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE25869) and GSE30601 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE30601) were identified from the GPL8490 platform. The relative gastric cancer gene expression and methylation profiles were downloaded from The Cancer Genome Atlas (TCGA, https://gdc-portal.nci.nih.gov/) database. The gastric cancer-related gene expression profile in GSE62254 was downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE62254).

Competing interests

The authors have no conflict of interest to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Luo, D., Yang, Q., Wang, H. et al. A predictive model for assessing prognostic risks in gastric cancer patients using gene expression and methylation data. BMC Med Genomics 14, 14 (2021). https://doi.org/10.1186/s12920-020-00856-0

Download citation

Keywords

  • Gastric cancer
  • Methylation
  • Prognosis
  • Differential expression
  • Biomarker