Genes CEP55, FOXD3, FOXF2, GNAO1, GRIA4, and KCNA5 as potential diagnostic biomarkers in colorectal cancer

Background Colorectal cancer (CRC) is one of the leading causes of death by cancer worldwide and in need of novel potential diagnostic biomarkers for early discovery. Methods We conducted a two-step study. We first employed bioinformatics on data from The Cancer Genome Atlas to obtain potential biomarkers and then experimentally validated some of them on our clinical samples. Our aim was to find a methylation alteration common to all clusters, with the potential of becoming a diagnostic biomarker in CRC. Results Unsupervised clustering of methylation data resulted in four clusters, none of which had a known common genetic or epigenetic event, such as mutations or methylation. The intersect among clusters and regulatory regions resulted in 590 aberrantly methylated probes, belonging to 198 differentially expressed genes. After performing pathway and functional analysis on differentially expressed genes, we selected six genes: CEP55, FOXD3, FOXF2, GNAO1, GRIA4 and KCNA5, for further experimental validation on our own clinical samples. In silico analysis demonstrated that CEP55 was hypomethylated in 98.7% and up-regulated in 95.0% of samples. Genes FOXD3, FOXF2, GNAO1, GRIA4 and KCNA5 were hypermethylated in 97.9, 81.1, 80.3, 98.4 and 94.0%, and down-regulated in 98.3, 98.9, 98.1, 98.1 and 98.6% of samples, respectively. Our experimental data show CEP55 was hypomethylated in 97.3% of samples and down-regulated in all samples, while FOXD3, FOXF2, GNAO1, GRIA4 and KCNA5 were hypermethylated in 100.0, 90.2, 100.0, 99.1 and 100.0%, and down-regulated in 68.0, 76.0, 96.0, 95.2 and 84.0% of samples, respectively. Results of in silico and our experimental analyses showed that more than 97% of samples had at least four methylation markers altered. Conclusions Using bioinformatics followed by experimental validation, we identified a set of six genes that were differentially expressed in CRC compared to normal mucosa and whose expression seems to be methylation dependent. Moreover, all of these six genes were common in all methylation clusters and mutation statuses of CRC and as such are believed to be an early event in human CRC carcinogenesis and to represent potential CRC biomarkers. Electronic supplementary material The online version of this article (10.1186/s12920-019-0501-z) contains supplementary material, which is available to authorized users.


Background
Colorectal cancer (CRC) is one of the leading causes of death by cancer in both genders [1]. CRC occurs through a process of malignant transformation, when numerous genetic and epigenetic events transform normal colon mucosa to adenocarcinoma [2]. It is a very heterogeneous disease, in which three major molecular pathways have been identified. The most common pathway is the chromosomal instability (CIN) pathway, which is characterized by an accumulation of mutations in specific genes (e.g., APC, KRAS, BRAF, TP53) [2], and accounts for 65-70% of sporadic CRC cases [3]. The microsatellite instability (MSI) pathway accounts for approximately 15% of sporadic CRC, and is characterized by deficiency in DNA mismatch repair (MMR) genes (e.g. MLH1, MSH2, MSH6, PMS2) [4]. Silencing of MMR genes in the MSI type of CRC occurs through promoter hypermethylation, a common molecular alteration at epigenetic level. In more than 80% of MSI cases, promoter hypermethylation occurred in the MLH1 gene [5]. The third molecular pathway is the CpG island methylator phenotype (CIMP); an epigenetic instability pathway. One of these three pathways is usually predominant but they are not mutually exclusive [6,7].
CIMP has been extensively studied, not only in CRC but also in bladder, gastric, lung and breast cancer [8]. Some researchers have proposed three CIMP subtypes: CIMP high (CIMP-H), CIMP low (CIMP-L), and non-CIMP subtypes [5]. The CIMP-H subtype is significantly associated with the proximal colon and mutations in gene BRAF, whereas the CIMP-L subtype has intermediate methylation levels and is associated with mutations in KRAS gene [9]. Moreover, The Cancer Genome Atlas (TCGA) consortium describes four epigenetic subtypes (CIMP-H, CIMP-L, Cluster 3, and Cluster 4), of which Clusters 3 and 4 are defined as non-CIMP subtypes [10].
Whereas two research groups, Shen et al [11] and Yagi et al [12], reported three epigenetic subtypes and some genes as hypermethylation markers, Hinoue et al. identified four subtypes based on hierarchical clustering of DNA methylation at loci exhibiting high inter-tumor variability [13]. Two loci, representing CIMP-H and CIMP-L tumors, were associated with BRAF and KRAS mutations, respectively. Tumors in the third cluster were associated with TP53 mutations and prevalence in the distal colon, while the fourth cluster was enriched for tumors from the rectum, with low rates of KRAS and TP53 mutations. Moreover, previous studies have suggested that differences in the CIMP status are associated with differences in the transcriptomic level across several tumor types [8].
Our aim was also to identify new aberrantly methylated gene promoters and observe their expression. Our approach however was different. We used the data from TCGA, in which the DNA methylation experiment was done using microarrays, containing over 450.000 sites within the genome. Unsupervised clustering of methylation data resulted in four clusters and each was compared to the methylation data of normal mucosa samples. The aberrantly methylated probes were intersected among all clusters to obtain the probes common to all clusters. The common methylation sites in all clusters were integrated with gene expression analysis, to identify novel candidate biomarkers, some of which we tested on our experimental set of samples. Finding common epigenetic alterations in all CRC types, regardless of tumor stage, could be a starting point for testing these methylation changes on cfDNA obtained from patient's blood and/or novel therapeutic targets.

Patients and data
Colon adenocarcinoma (COAD) and rectum adenocarcinoma (READ) data were obtained from The Cancer Genome Atlas (TCGA). Data were downloaded from the Broad GDAC Firehose portal (https://gdac.broadinstitute.org/) and contained clinical information, methylation, gene expression and mutation data. Platform used for DNA methylation experiment was Illumina Infinium HumanMethylation450k BeadChip array (HM450), which covers 482,421 CpG sites within the human genome. For methylation analysis level 3 data was used, which is already normalized and contains beta-value calculations, genomic coordinate, chromosome number and HUGO gene symbol for each CpG site on the array. For gene expression analysis mRNAseq experiment performed on Illumina HiSeq platform was used. Gene expression levels were obtained through RNAseqV2 pipeline, which uses a combination of MApSplice and "scaled estimate" (RSEM) to determine expression levels. RNAseqV2 data contains a normalized read count, which represents normalized RSEM count estimates from the upper quartile. Mutation data was obtained through variant calling from DNAseq experiment using MuTect2 pipeline. There were 381 tumor samples with methylation data on HM450 platform and complete mutational profile. From these 381 samples, 359 samples had also Illumina mRNAseqV2 gene expression data.
There were 45 normal samples used for comparison in methylation data and 51 normal samples used for comparison in the gene expression dataset.

Probes and genes
The coordinates of protein-coding genes were downloaded from Ensembl, release 89 (http://www.ensembl. org/). The nomenclature of genes was unified according to The HUGO Gene Nomenclature Committee (HGNC) (http://www.genenames.org/). We mapped the HM450 probes to the GRCh38/hg38 genome using recently published study [17]. Location of mapped probes were overlapped them with promoter regions of regulatory build of genome and assigned to their nearest genes. The genes where transcription start site was within 5 kb of the mapped promoter region were used for further analysis.

Unsupervised clustering
We used the recursively petitioned mixture model (RPMM) for the identification of colorectal tumor subgroups based on the HM450 DNA methylation data. RPMM is a model-based unsupervised clustering approach developed for beta-distributed DNA methylation measurements that lie between 0 and 1 and is implemented as the RPMM Bioconductor package [18]. We removed probes mapped on X and Y chromosome and the probes containing "NA" values and performed RPMM clustering on 4165 probes, that showed the most variable DNA methylation levels (standard deviation > 0.25). A fanny algorithm (a fuzzy clustering algorithm) was used for initialization and level-weighted version of Bayesian information criterion (BIC) as a split criterion for an existing cluster as implemented in the R-based RPMM package.

Differentially methylated probes and differentially expressed genes
Differentially methylated probes and differentially expressed genes were obtained using TCGAbiolinks package in R [19]. Differentially methylated probes were obtained by comparing beta-values of probes between each methylation cluster and probes in normal samples. First, the mean methylation of each group for each probe was calculated, second, p-value was calculated using Wilcoxon test using Bonferroni adjustment method. The cutoff parameters were set to: absolute difference in methylation was larger than 0.2 and adjusted p-value less than 0.01. For obtaining differential gene expression general log-linearized model was used, with cutoff parameters: absolute fold change was larger than 1.0, and false discovery rate (FDR) adjusted p-value less than 0.01. For each cluster, we selected methylation probes mapping to promoter regions and had absolute methylation difference more than 0.3 compared to normal. We selected hypermethylated promoter probes (methylation difference more than 0.3) and down-regulated genes with logarithmic fold change of at least − 1.0. Our selection also included hypomethylated promoter probes (methylation difference was less than − 0.3) and up-regulated genes with logarithmic log fold change more than 1.0. This selection was overlapped among all resulting clusters to obtain the genes with aberrant methylation and differential expression common to all four clusters.

Data visualization, text mining and survival analysis
The HM450 DNA methylation β-values of 4165 most variable probes along with methylation cluster, location, gender, tumor stage, MLH1 promoter methylation and mutations in BRAF, KRAS, APC and TP53 were represented graphically using heatmap visualization from ComplexHeatmap package in R programming software [20]. For construction of protein-protein interaction networks the STRING database (version 10.5) was used which produces a functional association network, using interaction sources, such as text mining, experiments, database, co-expression, neighborhood, gene fusion and co/occurrence. To identify gene ontology processes enriched within our 198 set of genes from the intersection of all resulting clusters the STRING database was used [21]. We used the GeneRIF (Gene Reference into Function) database as the source text for finding gene-disease associations previously published and stored on PubMed system. We performed several queries using different conditions and terms such as: "cancer", "colorectal", "colon", "methylation", "expression" and identification numbers for all 198 genes. For the Cox proportional hazard model package survival in R software was used [22]. The influence of the different clinical and genetic parameters was determined with logrank test, where p-value was less than 0.05. Some hazard ratios could not be computed, since gene was up/down regulated or hypo/hypermethylated in all samples. Hazard ration can be computed when there are two groups.

Experimental validation Clinical samples
Samples used for experimental validation in our study are presented in Table 1. Our study was comprised of 115 samples, of which 90 were fresh frozen tissue samples and 25 tissue samples were stabilized in RNAlater solution (Ambion). All the samples (n = 115) were used in the methylation experiment, however, the latter 25 samples, that were stored in RNAlater, were of sufficient quality to be used also for gene expression experiment.
The latter 25 samples were collected during surgical colectomy of patients diagnosed with primary colorectal adenocarcinoma. The patients' whose samples were collected had no other cancer than CRC, and no previous radio-or chemotherapy. From each patient tumor and normal sample was collected, where normal samples of healthy colon mucosa were collected at least 20 cm away from tumor site. Both tumor and normal mucosa samples were placed in RNAlater solution, which stabilizes tissue and enables DNA and RNA extraction. Samples were submerged in RNAlater and incubated for 24 h at 4°C to allow the solution to penetrate through the sample. After incubation period, the samples were stored at − 20°C.
For all 115 samples data about gender, tumor location, size, nodal infiltration, distant metastasis, and survival data was obtained from Cancer Registry of Slovenia. Patients enrolled in the study signed an informed consent form agreeing to participate in the study. The National Medical Ethics Committee of the Republic of Slovenia approved this research.
RNA/DNA isolation DNA and RNA from tissues stored in RNALater solution were isolated with All prep DNA/RNA Mini Kit (Qiagen), according to the manufacturer's recommendations. DNA and RNA quantity and quality were determined spectrophotometrically by NanoDrop ND-1000 (Thermo Fisher Scientific). DNA (n = 90) was isolated from fresh frozen samples with QIAamp DNA Mini Kit (Qiagen), according to the manufacturer's recommendations. DNA quantity and quality were determined spectrophotometrically by NanoDrop ND-1000 (Thermo Fisher Scientific).

Bisulfite conversion and MS-HRM experiment
After DNA extraction, 1000 ng of DNA was used in bisulfite conversion with innuCONVERT Bisulfite Basic Kit (Analytik Jena AG). Twenty ng of bisulfite converted DNA was used in methylation-sensitive high resolution melting experiment (MS-HRM). Primers for MS-HRM were designed in Methyl Primer Express Software v1.0 (Thermo Fisher Scientific) (Additional file 1: Table S1) to amplify both, methylated and unmethylated DNA. Amplicon length was designed to cover the specific CpG sites in the 5′ UTR region of selected genes differentially methylated from the bioinformatics analysis. For some genes one amplicon covers more than one CpG site. As controls, completely methylated and completely unmethylated commercially available bisulfite converted DNA (EpiTect PCR Control DNA Set, Qiagen) were used in each MS-HRM run, to help with assessment of methylation status of the samples. The amplification was performed using the following protocol: 2.00 μL bisulphite converted DNA, 1.00 μL of each primer, 0.50 μL dNTP, 1.00 μL HotStart Taq Plus Buffer (10x), 0.05 μL HotStart Taq Plus Polymerase (5 U/μL), and 0.3 μL Syto9 with Nuclease-free water to obtain a total PCR reaction volume of 10 μL. Optimized cycling protocol for HRM analysis on the Rotor-Gene Q (Qiagen) was preformed including: initial denaturation at 95°C for 5 min; 45 times at 94°C for 15 s, annealing temperature (Additional file 1: Table S1) for 30 s, extension at 72°C for 30 s (using Fluorescence data acquisition on the "HRM" channel at this step). HRM analysis was performed immediately after PCR under the following conditions: 60-99°C with 0.1°C ramp rate. This step requires fluorescence data acquisition on the "HRM" channel. All amplifications were performed in duplicate, using Rotor-Gene Q (Qiagen), following the manufacturer's recommendations.

Reverse transcription and qPCR experiment
Gene expression levels were determined using SYBR Green-based quantitative polymerase chain reaction T tumor size, N lymph node infiltration, Nx lymph node infiltration not determined, M distant metastasis, Mx distant metastasis not determined, n number of samples (qPCR), which was performed on Rotor-Gene Q (Qiagen) detection system. All the reagents were from Qiagen, except where otherwise indicated. For investigated genes and four endogenous controls (ACTB, GAPDH, RNN18S, and RPL13A) in qPCR experiment, primers were all predesigned and used according to manufacturer's instructions (Qiagen) (Additional file 2: Table S2). Total RNA (300 ng) was reverse transcribed using QuantiTect Reverse Transcription Kit according to manufacturer's instructions (Qiagen). The resulting cDNA was diluted 100-fold, and 3 μl was used for each qPCR reaction in 10 μl PCR master mix (5 μl 2x QuantiTect SYBR Green PCR Master Mix, 1 μl of forward and 1 μl of reverse primer). All the qPCR reactions were performed in duplicates or triplicates. The signal was collected at the endpoint of every cycle. Following amplification, melting curves analysis of PCR products were acquired on the SYBR channel using a ramping rate of 1°C /60 s for 60-95°C. For expression calculation, geometrical average of threshold cycle (C t ) of four endogenous controls (ACTB, GAPDH, RNN18S, and RPL13A) was subtracted from ct of investigated gene to obtain the difference of threshold cycles ΔC t . The comparative threshold cycles (ΔΔC t ) were obtained by subtracting ΔC t of tumor sample from ΔC t of paired normal sample. The comparative threshold cycle is comparable with logFC, which is used for easier comparison with bioinformatics data.

Study design
The study consisted of two major partsbioinformatics analysis and experimental validation ( Fig. 1). Bioinformatics analysis was performed on samples from projects COAD and READ obtained from TCGA. Methylation data was collected by experiment with HM450, which is the most comprehensive methylation data collection available on TCGA. Methylation data were obtained on the HM450 platform of 381 tumor tissue samples, together with 45 normal samples. HM450 covers 482,421 CpG sites within the genome, which were mapped to regulatory regions that are likely to be involved in gene regulation: the open chromatin region, predicted enhancer region, predicted promoter, predicted promoter flanking region and transcription factor binding site. Altogether we obtained 190,920 probes located in 81,467 regulatory regions. Specifically, 14,718 probes mapped into the open chromatin region, 9513 into the enhancer, 122,576 into the promoter, 42,150 into the promoter flanking region, and 13,670 probes mapped into the transcription factor binding site. Some regulatory regions can overlap, so some probes belong to more than one regulatory region.
Unsupervised clustering on methylation data resulted in four clusters. Probes in samples of each cluster were compared to probes of normal samples to obtain significant differentially methylated probes for each cluster. Differentially methylated probes with significant p-values of each cluster were intersected among all clusters. The location of the intersected probes was compared to a list of 190,920 probes in regulatory regions, which resulted in 3513 probes that were both in intersect among clusters and in regulatory regions. Gene expression data of the same samples were available from an Illumina mRNAseq V2 experiment. The second part of the study consisted of experimental validation of six selected genes. For this purpose, we tested DNA methylation status on 115 tumor tissue samples, of which 25 were paired tumor and normal tissue samples of sufficient quality for both DNA methylation and gene expression experiment.

Clustering of methylation data
Unsupervised clustering analysis was performed on the methylation data of 381 samples from COAD and READ. The clustering resulted in four separate clusters, denoted CIMP-H, CIMP-L, Cluster 3 and Cluster 4, according to the names used in the literature [10] (Fig. 2). As established previously, CIMP-H CRCs have a higher rate of hypermethylated promoter of the MLH1 gene and higher mutation rate in gene BRAF. Similarly, we found MLH1 hypermethylation is present in 49.2% of samples in CIMP-H but only 13 samples out of 222 (5.8%) from the other three clusters combined (Table 1). Almost all mutations in gene BRAF were found in the CIMP-H cluster (35.6%), a few were found in CIMP-L (4.8%) and Cluster 3 (3.2%), while none were found in Cluster 4. Cluster CIMP-L was characterized by a high frequency of KRAS mutations, with rare mutations in BRAF, and low rate of TP53 mutations. Indeed, the rate of KRAS mutations in this cluster was the highest  (Table 2). CIMP-H and CIMP-L clusters are both associated with tumor presence in ascending colon, where in our case, tumor presence in the ascending colon was 73.6% in CIMP-H and 52.5% in CIMP-L.
The both non-CIMP clusters, Cluster 3 and Cluster 4, had lower frequencies in mutations in BRAF (3.2 and 0%) and KRAS (16.1 and 9.6%), respectively. There was a higher rate of TP53 mutations, 34.7% in Cluster 3 and 39.4% in Cluster 4. Both of these clusters had a higher rate of tumor presence in sigmoid colon, 27.6% of samples in Cluster 3 and 39.6% in Cluster 4, and rectum, 37.4% in Cluster 3 and 34.1% in Cluster 4. High microsatellite instability was more pronounced in CIMP-H (47.5%), while low microsatellite instability was most frequent in CIMP-L (22.1%). Microsatellite stability was predominant in Cluster 3 (82.3%) and Cluster 4 (77.4%) but it is also quite high in CIMP-L cluster (65.4%). Fig. 2 shows there is no distinct feature (i.e., mutation, promoter methylation) common to all samples in any cluster or common to all clusters.

Aberrantly methylated probes and differentially expressed genes
Our analysis resulted in 590 aberrantly methylated probes found at the intersect between clusters and mapped to regulatory regions. These probes belong to 198 differentially expressed genes, which were differentially expressed in each cluster when compared to normal tissue samples (Additional file 3: Table S3). Using these 198 protein-coding genes, we performed protein-protein interaction network (PPIN), functional and literature mining analysis.
The 198 differentially expressed genes were uploaded to the STRING database to construct a PPIN (Additional file 4: Figure S1). Since genes were selected based on aberrant methylation and differential gene expression  CIMP the CpG island methylator phenotype, n number of samples, MUT mutation, WT wild-type, MSI microsatellite instability, MSS microsatellite stable, T tumor size, N lymph node infiltration, Nx lymph node infiltration not determined, M distant metastasis, Mx distant metastasis not determined present in all four clusters, some of the proteins coded by those genes are connected to networks, while others do not interact. To get some information about the biological functions of selected genes, we conducted gene ontology and pathway analysis (Additional file 5: Table S4). Pathway analysis reviled 12 significant KEGG pathways, of which the first three were neuroactive ligand-receptor interaction, cholinergic synapse and circadian entrainment. The circadian entrainment pathway has previously been associated with rectum adenocarcinoma [23]. We therefore selected two genes, GRIA4 and GNAO1, involved in this pathway, as well as gene KCNA5. KCNA5 is indirectly associated with this pathway through neighboring proteins KCNIP1 to NOS1. According to our PPIN analysis, there is another set of proteins with many interactions, of which BMP4 is the hub, which is involved in Hedgehog and TGF-beta signaling pathways. Moreover, our literature mining analysis revealed that the expression of gene BMP4 had already been studied (Additional file 6: Table S5), so we decided to select the genes FOXD3 and FOXF2 for experimental validation, whose proteins interact with BMP4 and both of which are transcription factors. We selected gene CEP55 on the basis of hypomethylation/up regulation, which is involved in biological process of cell cycle. We constructed child PPIN with the six selected proteins described above, presented in Fig. 3, whereby b), c) and d) were constructed using the neighboring proteins that are coded by genes from our list, and a) was constructed with the first interacted protein added, since CEP55 had no interactions in PPIN constructed from our gene set. Gene CEP55 is involved in cell division and the cell cycle process, GNAO1 and GRIA4 participate in signal transduction, FOXD3 and FOXF2 take part in stem cell differentiation and embryonic organ development, while KCNA5 is involved in negative regulation of cytosolic calcium ion concentration, protein oligomerization and action potential.
According to our in silico analysis, gene CEP55 was hypomethylated and up-regulated, while the other five genes, FOXD3, FOXF2, GNAO1, GRIA4 and KCNA5, were hypermethylated and down-regulated ( Table 3). The genes expressed various levels of difference in methylation and expressions. The most down-regulated gene, regardless of cluster, was FOXD3 (logFC = − 3.05). It was down-regulated in 98.3% of samples. The methylation difference was high (0.4), although present in fewer samples (91.9%). The highest methylation difference was present in all four promoter probes of gene GRIA4 (from 0.46 to 0.54), with two of them present in 98.4% of samples. Regardless of cluster, GRIA4 was down-regulated (logFC = − 2.41) in 98.1% of samples.
We evaluated the aberrant methylation of regulatory region per sample. In cases of GNAO1 and GRIA4, with which there was more than one methylation probe per gene, at least one of the probes had to be hypermethylated to conclude that the section was hypermethylated. No sample had less than two markers aberrantly methylated. There were two samples (0.5%) with two markers and six samples (1.6%) with three markers aberrantly methylated, 97.9% samples had at least four markers aberrantly methylated. Specifically, 33 samples (8.7%) had four, 88 samples (23.1%) had five and 252 samples (66.1%) had all six markers aberrantly methylated.
To test the clinical data available in TCGA, we performed survival analysis using the Cox proportional hazards model (Table 4). Using univariate analysis, we found that late stage compared to early stage tumor has the highest hazard ratio. The second highest hazard ratio was observed with the presence metastasis, followed by presence of polyps and age above 60. By multivariate analysis, we obtained two significant hazard ratios, presence of metastasis and age above 60. Overall model was significant, with p-value 2.475e-06.

Experimental validation
Based on the bioinformatics analysis results described above, we experimentally validated six selected genes. Experimental validation of the methylation results of the in silico analysis was performed using a larger cohort of samples (n = 115). Results revealed CEP55 to be hypo- Methylation analysis on the same cohort of samples (n = 25) revealed that, in CRC compared to normal mucosa, gene CEP55 was completely hypomethylated and up-regulated. Gene GRIA4 had one sample hypomethylated and slightly up-regulated, all other samples were hypermethylated and down-regulated. Gene GNAO1 had one sample that was hypermethylated and up-regulated, all the other samples were hypermethylated and down-regulated. We obtained mixed results for FOXD3, FOXF2, and KCNA5 genes. FOXD3 had one sample hypomethylated and down-regulated, while seven samples were hypermethylated and up-regulated, the rest being hypermethylated and down-regulated. FOXF2 had two samples hypomethylated and up-regulated, four hypermethylated and up-regulated, while the others were hypermethylated and down-regulated. KCNA5 had no hypomethylated samples; four samples were hypermethylated and up-regulated, while the rest were hypermethylated and down-regulated.
Comparing the expression and methylation status of RNAlater stored samples (n = 25) showed that gene GNAO1 had the most down-regulated gene expression and was also hypermethylated in all samples (Table 5). Gene GRIA4 showed hypermethylation and down-regulation in 97.3 and 95.2% of samples, respectively. Up-regulation in all samples and hypomethylation in 97.3% of samples was observed in gene CEP55. The performance of other genes in terms of the correlation of their expression to methylation status was less encouraging, with FOXD3, FOXF2, and KCNA5 exhibiting down-regulation in fewer than 90% of samples, even only 68% for FOXF3.
To test our clinical data, we performed survival analysis using the Cox proportional hazards model (Table 6). Using univariate analysis, we found that the presence of metastasis had the highest hazard ratio, followed by cancer progression, both with the highest significance. The next two highest hazard ratios were tumor size and lymph node infiltration, with lower significance than the previous two. By multivariate analysis, we obtained four significant hazard ratios, presence of metastasis, age above 60, lymph node infiltration and cancer progression. The overall model had a p-value significance of 5.453e-04.

Discussion
The vast database of experimental data (TCGA) was used in our bioinformatics study. Interestingly, we observed that some samples have no mutations in the most commonly mutated tumor suppressors and oncogenes In search of a common epigenetic change, we performed bioinformatics study of methylation and expression analysis based on TCGA data, which resulted in 198 sets of genes. We performed pathway and gene ontology analysis on these genes. Pathway analysis resulted in 12 significant KEGG pathways, of which the first four were neuroactive ligand-receptor interaction, cholinergic synapse, circadian  entrainment and calcium signaling pathway. These pathways included altogether 26 of our genes. Gene Ontology resulted in 137 significant biological processes, in which all the pathways together included 155 genes from our list. The first most significant biological process pathways were mostly related to the nervous system and its development in one part, and in the other part pathways were related to cell differentiation, adhesion and development. Gene ontology molecular function revealed that genes in our selected set are involved in receptor activity, whereas the gene ontology cellular component showed that most of genes in our set are part of plasma membrane. Among prominent identified genes were CEP55, involved in the cell cycle process, FOXD3 and FOXF2, which are involved in stem cell differentiation, GNAO1 and GRIA4, which participate in signal transduction and KCNA5, which is a part of the regulation of the calcium ion concentration. We experimentally validated these six genes on our own CRC tissue samples, confirming the prediction of expression and methylation status. Using both approaches, we found gene CEP55 to be hypomethylated and up-regulated, while the other five genes, FOXD3, FOXF2, GNAO1, GRIA4 and KCNA5, were hypermethylated and down-regulated. A number of studies have already described CEP55 as an overexpressed gene in cancer tissue samples. It maps to chromosomal regions 10q23 and encodes centrosome-and midbody-associated protein [24]. It is the latest member discovered in the centrosomal relative protein family and it has an important role in cell mitosis [25]. Overexpression of gene CEP55 has been observed in variety of solid tumors, including colon cancer [26], bladder cancer [27], hepatocellular carcinoma [28], gastric cancer [29], esophagus adenocarcinoma [30] and ovarian carcinoma [24]. A previous study reported overexpression of CEP55 in 60% (9/15 samples) of CRC tissue samples [26]. Overexpression of CEP55 activates p21 and enhances the cell cycle transition. In contrast, the knockdown of CEP55 inhibits cell growth in gastric [29] and breast cancer [31]. Moreover, CEP55 has an important role in final stage division, which involves the separation of two daughter cells [25,32]. Overexpression of CEP55 leads to an increase in the number of multinucleated cells and defect in cytokinesis, which may lead to tumorigenesis. In our set of genes, FOXD3 and FOXF2 have had a few studies performed on colon or gastric cancers. First, forkhead box D3 (FOXD3) was found to be a suppressor of colon cancer formation. While transcriptional repressor FOXD3 is expressed in many types of embryonic cells, its knockdown dramatically increases human colon cancer cell proliferation, affecting the EGFR-Ras-Raf-MEK-ERK signaling pathway [33]. Methylation in the promoter region of another tumor suppressor FOXF2 has previously been associated with shorter survival in gastric cancer patients. Through the FOXF2-IRF2BPL-β-catenin axis, FOXF2 inhibits Wnt signaling by binding to E3 ligate IRF2BPL promoter and up-regulates IRF2BPL, which interacts with β-catenin for its ubiquitination and degradation [34]. Methylation of both tumor suppressors, FOXD3 and FOXF2, could be responsible for their down-regulation, thus disturbing their interaction with other proteins. The second set of three genes, GNAO1, GRIA4 and KCNA5, has been less researched, with only a few studies related to cancer. Gene GNAO1 was found to be overexpressed in 62.9% of patients with gastric cancer [35], while in our CRC tissue samples gene GNAO1 was down-regulated. An association had been shown between overexpressed gene GNAO1 and tumor size, tumor differentiation, TNM stage and poor prognosis. Their findings also demonstrated that knockdown of GNAO1 leads to reduced proliferation and promotes the apoptosis of gastric cancer cells [35]. However, statistical evaluation of an effect of methylation status or expression of gene GNAO1 on tumor size and TNM status in our case is impossible, since gene methylation was observed in the majority of samples. The second gene from this set, GRIA4, was the most methylated gene in our in silico study, with two probes being present in 98.4% of samples. Moreover, its down-regulation was experimentally confirmed in 98.1% of CRC tissue samples. A recently published study reported detecting a change in methylation in all CRC tissue samples, results similar to ours, and methylated cfDNA of GRIA4 in 68.5% of metastatic CRC patients [36]. Potassium voltage-gated channel subfamily A member 5 (KCNA5) is a protein coding gene involved in tumor cell proliferation in Ewing sarcoma [37], while its role in CRC is still unknown. In our study, gene KCNA5 was methylated in all studied samples, while its expression was decreased in 84% of our CRC tissue samples. Speculatively, as well as Ewing sarcoma, methylation of KCNA5 could be responsible for stable silencing of this gene in CRC, thus contributing to proliferation of tumor cells.
We were not able to perform survival analysis, since the majority of samples had either a hypermethylated or hypomethylated promoter region of validated gene. Since expression analysis was performed on a small cohort of samples (n = 25), it did not seem reasonable to do survival analysis, e.g., for gene FOXD3, with which 68% of tumors had down-regulation of these gene and the remaining 32% had either no change or up-regulation.
A limitation of the study was that there were 115 samples available for experimental methylation analysis and only 25 for experimental expression analysis. There were also small discrepancies when comparing the methylation statuses of the entire cohort of 115 samples and 25 samples ( Table 4). The biggest difference in methylation status was observed in the FOXD3 gene, with which the discrepancy was 4%. Methylation status of CEP55, FOXF2, GRIA4 and KCNA5, exhibited 2.7, 2.2, 1.3 and 0.9% discrepancy, while GNAO1 was methylated in all samples, so showing no discrepancy.