Frequent hypermethylation of orphan CpG islands with enhancer activity in cancer
© The Author(s). 2016
Published: 12 August 2016
CpG islands (CGIs) are interspersed DNA sequences that have unusually high CpG ratios and GC contents. CGIs are typically located in the promoter of protein-coding genes. They normally lack DNA methylation but become hypermethylated and induce repression of associated genes in cancer. However, the biological functions of non-promoter CGIs (orphan CGIs) largely remain unclear.
Here, we identify orphan CGIs that do not map to the promoter of any protein-coding or non-coding transcripts but possess chromatin and transcriptional marks that reflect enhancer activity (termed eCGIs). They exhibit three-dimensional chromatin looping toward multiple target genes with high affinity. Intriguingly, transcription regulators were frequently associated with such CGI-containing enhancers. Remarkably, our analyses in cell lines and clinical tissues showed that eCGIs have more dynamic DNA methylation changes in cancer relative to promoter CGIs. The observed eCGI hypermethylation was accompanied by a loss of enhancer marks and transcriptional inactivation of the target genes.
Our results suggest that eCGIs may constitute a distinct class of enhancers and perform a more instrumental role in tumorigenesis than typical CGIs in gene promoters.
CpG dinucleotides are frequently methylated in vertebrate genomes. Although a significant portion of the genome is methylated at CpG sites, CGIs are usually unmethylated and remain transcriptionally active with active histone marks such as H3K4me3 as a result of the action of CxxC finger protein 1 (CFP1) [1–4]. Half of these CGIs are located in gene promoters and play an important role in development and cancer. For example, important developmental genes have a promoter that often coincides with a CGI and contains a bivalent domain consisting of both active (H3K4me3) and repressive (H3K27me3) histone marks . Those genes that have the bivalent promoter are marginally expressed in embryonic stem cells but increase in expression level via removal of the H3K27me3 mark during cell differentiation. Furthermore, the hypermethylation of promoter CGIs has been identified as one of the driving factors in cancer development because it represses the expression of tumor suppressor genes . This phenomenon was first reported in the promoter of tumor suppressor genes in colorectal cancer and has been confirmed in many cancer types. In addition to promoter CGI hypermethylation, whole genome bisulfite sequencing has recently revealed partially methylated domains and large hypomethylated domains in cancer .
CGIs remote from annotated promoters, located in intergenic or intragenic regions, exhibit variable tissue-specific methylation patterns [8, 9]. These non-promoter CGIs are named orphan CGIs, and account for about half of all CGIs in the human genome . Although these orphan CGIs are distal to annotated promoters, some features are shared with promoter CGIs: marking of H3K4me3, binding of Pol2, and production of transcripts, as indicated by a Cap Analysis of Gene Expression (CAGE) . Recent studies suggest that these orphan CGIs may function as miRNA promoters , and therefore the presence of an orphan CGI is an important indicator of the activity of miRNA promoters . Meanwhile, intragenic CGIs are known to act as an alternative promoter of the genes they reside in .
Although these recent studies propose that orphan CGIs may function as promoters, here we show that not all orphan CGIs produce transcripts, as judged by transcription start sites indicated by CAGE and RNA-seq. To understand the biological features and functions of the orphan CGIs that do not produce any noncoding transcripts, we perform an integrative analysis that entails a large amount of publicly available genomic, transcriptomic, and epigenomic data based on K562, Mcf7, and Hmec cell lines.
ENCODE data processing
Various histone modification, transcriptome, chromatin interactome, and DNA methylation data were downloaded from the ENCODE data portal (https://www.encodeproject.org). We downloaded bam files for various histone modifications including H3K4me1, H3K4me2, H3K4me3, H3K27ac, H3K27me3, H3K9me1, H3K9me3, H3K9ac, H3K79me2, H3K36me3, and H4K20me1. DNase I hypersensitivity site (DHS) data and transcription factor binding data for P300, Pol2, CTCF, RAD21, SMC3, YY1, and ZNF143  were obtained as well. Peak finding for histone modifications and DHSs was performed using the HOMER package with -size 1000 and—minDist 2500 options.
CAGE and RNA-seq data were used to identify functional transcripts. We used the transcription start sites defined by CAGE. RNA-seq fastq files were aligned by using Tophat and de-novo transcripts were predicted by running StringTie  with its default options. Gene expression in each cell line was then determined based on Reads Per Kilobase per Million (RPKM).
Chromatin interactome in K562 and Mcf7 cell lines were analyzed based ona Chromatin Interaction Analysis by paired-end tag (ChIA-pet) sequencing data for RNA polymerase II (Pol2). In order to use significant interactions only, tag counts greater or equal to 3 were taken.
Classification of CGIs
To classify CGIs based on gene annotation, we selected genes whose refGene ID starts with “NM”. The CGIs that are located within 1 kb of the transcription start site of the relevant genes were labeled as promoter CGIs (pCGIs), and the rest as orphan CGIs. The orphan CGIs that overlap with both H3K27ac and DHS peaks in a given cell type were then determined as active orphan CGIs. By checking whether the active orphan CGIs overlap with the transcription start sites defined in the CAGE data and the promoter of de-novo transcripts constructed from the RNA-seq data using StringTie, we defined eCGIs as not producing any protein-coding or non-coding transcripts, and npCGIs (noncoding promoter CGIs) as producing non-coding transcripts.
Typical enhancers were defined as H3K27ac-harboring DHS peaks that do not overlap with the transcription start site of de-novo transcripts detected from the CAGE and RNA-seq data. We also excluded the H3K27ac-DHS peaks intersecting with any CGIs. Using this method, we found 9282, 18,528, 20,332 typical enhancers in K562, Mcf7, and Hmec, respectively.
Target gene analysis
To check the function of the target genes of the eCGIs, we used the ‘functional annotation clustering’ of the Database for Annotation, Visualization and Integrated Discovery (DAVID) with the default options. The annotation clusters with the highest enrichment scores seems to be related to transcription.
To confirm that the eCGIs target transcription regulators, we used the list of 1469 sequence-specific transcription factors, 117 chromatin regulators, and 296 transcription-related factors as defined in the AnimalTFDB data  for Homo sapiens. We obtained the number of transcription regulators that are linked via chromatin interaction to the eCGIs or typical enhancers. To estimate the statistical significance of the overlapping, we selected the same number of random DNA segments as the eCGIs and typical enhancers in each cell type, and compared their overlapping frequencies with that of the real eCGIs and typical enhancers.
DNA methylation analysis
To investigate DNA methylation changes in association with the eCGIs, we used breast related normal and cancer pair (Hmec and Mcf7) and The Cancer Gene Atlas (TCGA; http://cancergenome.nih.gov) data. Heatmaps were generated using differentially methylated CpGs of the Hmec eCGIs in both cell lines and clinical data. A threshold of differentially methylated CpGs was determined as |differential methylation of CpGs| > 0.5 in the cell line data and |differential methylation of CpGs| > 0.1 in the clinical data. To determine the target genes of the Hmec eCGIs, we used the genes in the nearest proximity to the eCGIs due to lack of Hmec ChIA-pet data. Enrichment test of tumor suppressor genes and oncogenes were performed using Fisher-exact test. Tumor suppressor genes and oncogenes used in this study were generated by the TUSON algorithm . We extracted tumor suppressor genes (484) and oncogenes (494) with low p-value (<0.1).
Classification of CGIs
This new type of CGIs does not show genomic characteristics shared with the pCGIs. First, the eCGIs are shorter in length than the pCGIs (Additional file 1: Figure S1A). Longer pCGIs may have been favored during evolution because they are well-suited for multiple transcription factors to bind and are related with promoter directionality [16, 17]. Although well-established CGI criteria based on the CpG ratio and GC percent were used, the sequence contents of the pCGIs and eCGIs appear to be different. Compared to the pCGIs, the eCGIs have a lower CpG ratio and CpG percent, and the GC percent differs statistically significantly (Additional file 1: Figure S1B). In other words, the eCGIs have a higher frequency of C and G, but the ratio of CpG sites, which can be methylated, is lower in the eCGIs than in the pCGIs. Although specific mechanisms leading to this discrepancy are currently unknown, it is evident that the eCGIs have distinct genomic features as compared with the pCGIs.
Enhancer signatures of eCGIs
Next, we examined the binding level of Pol2. Studies indicate that Pol2 transcribes not only mRNA but also noncoding RNA, and that it binds at enhancer regions as well. While the strongest Pol2 binding was observed at the pCGIs, the Pol2 binding levels were similar in the eCGIs and typical enhancers (Fig. 2).
Additionally, we examined the distribution of various histone modifications. H3K4me1 and H3K4me3 are well-established enhancer and promoter markers, respectively . We discovered that H3K4me1 was enriched in the typical enhancers while H3K4me3 in the pCGIs. In eCGIs, H3K4me1 was highly enriched, to a degree comparable with the typical enhancers (Fig. 2c). Although the middle point of the eCGIs showed a signature of nucleosome depletion, the H3K4me1 levels were higher at the boundary areas. The H3K4me2 and H3K4me3 distributions showed intermediate values between the pCGIs and typical enhancer (Fig. 2d, e). Previous studies suggest that CFP1 binds to sequences with high CpG contents and recruits SETD1, which causes trimethylation of H3K4 . This might explain high H3K4me3 in eCGIs regardless of their promoter activity. Taken together, the results indicate that the eCGIs are similar to the typical enhancers in terms of chromatin signatures.
We also examined other histone modifications. H3K9ac, an active promoter marker, was high in the pCGIs. H3K79me2, an elongation marker that is strongly enriched in the first intron, was also high in the pCGIs. H3K9me1, H4K20me1, and H3K36me3 showed marginal enrichment in the typical enhancer. Because these three histone marks are related with transcription elongation, this may be a reflection of the typical enhancers residing in the genebody. Repressor markers such as H3K27me3 and H3K9me3 did not show any enrichment patterns (Additional file 1: Figure S2A).
For the eCGIs to have an enhancer function, they should interact with the transcription start site or the promoter of their target genes. Five proteins, CTCF, RAD21, YY1, ZAN143, and SMC3, are known to govern such chromatin interactions. All five proteins were enriched in the eCGIs; in particular, CTCF and SMC3 signals were much stronger in the eCGIs than in the pCGIs and the typical enhancers (Additional file 1: Figure S2B).
Transcriptional activity of eCGIs
ECGIs as a distinct class of enhancers
Previous studies have identified super enhancers, also known as stretch enhancers, which are large clusters of transcriptional enhancers that drive expression of genes that define cell identity [20, 21]. We checked whether the eCGIs we identified here are coincident with the super enhancers. In terms of physical overlapping in K562, only a small fraction (9.7 %) of the eCGIs was previously identified as super enhancers (Additional file 1: Figure S4A). H3K27ac is the major histone mark that is used to identify super enhancers. The H3K27ac levels were much lower in the eCGIs than in the super enhancers (Additional file 1: Figure S4B). Taken altogether, the results suggest that the eCGIs constitute a new type of enhancers that are different from typical enhancers or super enhancers.
Dynamic tumorigenic changes of DNA methylation at eCGIs
To test whether the oncogenic DNA methylation changes in the eCGIs affect regulatory activities and ultimately gene expression levels, we selected eCGIs with a > 0.5 DNA methylation increase in Mcf7 compared to Hmec, and those with a > 0.1 DNA methylation increase in clinical cancer samples compared to normal samples. The target gene expression level of hypermethylated eCGIs was lower in Mcf7 and clinical cancer data, suggesting transcriptional silencing effects of eCGI hypermethylation in cancer (Fig. 5c). To study the mechanism of transcriptional silencing effects of eCGI hypermethylation, we used DHS and H3K27ac signal in Hmec and Mcf7. The hypemethylation of these eCGIs was accompanied by a significant reduction in the DHS and H3K27ac signals (Fig. 5d), suggesting that some eCGIs function as enhancers in normal cells but lose their enhancer function due to DNA hypermethylation in cancer. To test that genes silenced by eCGI hypermethylation are tumor suppressor genes, we performed Fisher-exact test using 484 predicted tumor suppressor genes and 494 predicted oncogenes (Fig. 5e). The enrichment of targeted tumor suppressor genes by hypermethylated eCGIs is higher than the enrichment of targeting tumor suppressor genes by random regions and targeting oncogenes by eCGIs. This indicates that eCGI hypermethylation may inactivate tumor suppressor genes by removing enhancer activity.
An enhancer is a distal regulatory region that activates the expression of remote genes. A global epigenome study reveals that enhancers usually are bound by P300 proteins, and possess H3K4me1 and H3K27ac marks but not H3K4me3, which is known as a promoter marker. However, recent studies show that some active enhancers possess H3K4me3. Also, eRNAs that are made by polymerase II bound at enhancer regions stabilize enhancer-promoter looping. Based on the fact that eRNAs are transcribed bidirectionally, active enhancers were detected using the bidirectional CAGE distribution through FANTOM5 .
CpG islands are DNA sequences having high CpG ratios and GC contents. About half of CpG islands are located around transcription start site of protein coding genes. Promoters with CpG islands have higher expression than non-CpG island promoters and housekeeping genes usually have promoter CpG islands. These promoter CpG islands are usually hypermethylated during tumorigenesis and inactivate tumor suppressor genes. However, the other half of CpG islands that are not located at promoter regions have not been studied extensively. Some studies reported that some orphan CpG islands are promoters of noncoding RNAs, such as microRNA and lncRNA, through CAGE and RNA-seq analyses. Further study is still required.
Here we identified that some of these orphan CGIs possess the characteristics of enhancers, including H3K4me1, H3K27ac, P300 binding, three-dimensional interaction, and transcriptional activity. These eCGIs differ from the typical promoter CGIs not only in terms of genomic features, such as CGI size and sequence contents, but also in terms of epigenomic features such as the intensity patterns of particular histone modifications. The enhancers harboring a CGI were also different from typical enhancers. They are capable of interacting with multiple target genes with a higher chromatin interaction affinity. Intriguingly, most of these targeted genes are transcription regulators. Thus, the eCGIs appear to play a more important role in the regulatory network. Importantly, the eCGIs tend to be hypermethylated during cancer development in both cell lines and clinical breast cancer samples. Although this is a well established feature of the typical promoter CGIs, the degree of DNA methylation changes is greater for the eCGIs than the typical CGIs. Because of hypermethylation in eCGIs, enhancer signatures disappear and down regulate target genes.
We identified eCGIs using various epigenomic and transcriptomic features based on H3K27ac sequencing, RNA-seq, and CAGE data. This method may produce false positives even though eCGIs that we found have enhancer activity. To overcome this problem, STARR-seq , which can detect enhancers quantitatively, may be useful to find eCGIs. We also found that eCGIs are frequently hypermethylated in both cell lines and clinical tissues. This suggests that orphan CGIs with enhancer activity in a given cell type should be considered a novel biomarker in cancer diagnosis and treatment.
Publication of this article has been funded by the Ministry of Health and Welfare through the Korea Health Industry Development Institute [HI13C2143], by the KAIST Future Systems Healthcare Project, and by the Ministry of Science, ICT and Future Planning [2013M3A9C4078139; NRF-2015M3C9A4053251]. This article has been published as part of BMC Medical Genomics Volume 9 Supplement 1, 2016. Selected articles from the 5th Translational Bioinformatics Conference (TBC 2015): medical genomics. The full contents of the supplement are available online https://bmcmedgenomics.biomedcentral.com/articles/supplements/volume-9-supplement-1.
Availability of data and materials
No particular data to be deposited or shared.
MGB and JKC conceived the study. MGB and JYK performed data analysis under supervision of JKC. MGB, JYK, and JKC wrote the manuscript. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent to publish
Ethics approval and consent to participate
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Guenther MG, Levine SS, Boyer L a, Jaenisch R, Young R a. A chromatin landmark and transcription initiation at most promoters in human cells. Cell. 2007;130:77–88.View ArticlePubMedPubMed CentralGoogle Scholar
- Mikkelsen TS, Ku M, Jaffe DB, Issac B, Lieberman E, Giannoukos G, et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature. 2007;448:553–60.View ArticlePubMedPubMed CentralGoogle Scholar
- Illingworth RS, Gruenewald-Schneider U, Webb S, Kerr ARW, James KD, Turner DJ, et al. Orphan CpG islands identify numerous conserved promoters in the mammalian genome. PLoS Genet. 2010;6(9), e1001134.View ArticlePubMedPubMed CentralGoogle Scholar
- Thomson JP, Skene PJ, Selfridge J, Clouaire T, Guy J, Webb S, et al. CpG islands influence chromatin structure via the CpG-binding protein Cfp1. Nature. 2010;464:1082–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Bernstein BE, Mikkelsen TS, Xie X, Kamal M, Huebert DJ, Cuff J, et al. A bivalent chromatin structure marks Key developmental genes in embryonic stem cells. Cell. 2006;125:315–26.View ArticlePubMedGoogle Scholar
- Esteller M. CpG island hypermethylation and tumor suppressor genes: a booming present, a brighter future. Oncogene. 2002;21:5427–40.View ArticlePubMedGoogle Scholar
- Berman BP, Weisenberger DJ, Aman JF, Hinoue T, Ramjan Z, Liu Y, et al. Regions of focal DNA hypermethylation and long-range hypomethylation in colorectal cancer coincide with nuclear lamina—associated domains. Nat Genet. 2011;44:40–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Maunakea AK, Nagarajan RP, Bilenky M, Ballinger TJ, D’Souza C, Fouse SD, et al. Conserved role of intragenic DNA methylation in regulating alternative promoters. Nature. 2010;466:253–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Illingworth R, Kerr A, Desousa D, Jørgensen H, Ellis P, Stalker J, et al. A novel CpG island set identifies tissue-specific methylation at developmental gene loci. PLoS Biol. 2008;6:0037–51.View ArticleGoogle Scholar
- Ozsolak F, Poling LL, Wang Z, Liu H, Liu XS, Roeder RG, et al. Chromatin structure analyses identify miRNA promoters chromatin structure analyses identify miRNA promoters. Genes Dev. 2008;22(22):3172–83.View ArticlePubMedPubMed CentralGoogle Scholar
- Marsico A, Huska MR, Lasserre J, Hu H, Vucicevic D, Musahl A, et al. PROmiRNA: a new miRNA promoter recognition method uncovers the complex regulation of intronic miRNAs. Genome Biol. 2013;14:R84.View ArticlePubMedPubMed CentralGoogle Scholar
- Rao SSP, Huntley MH, Durand NC, Stamenova EK: A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping. Cell 2014;159:1–16.Google Scholar
- Pertea M, Pertea GM, Antonescu CM, Chang T-C, Mendell JT, Salzberg SL: StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 2015;33:290-5.Google Scholar
- Zhang HM, Chen H, Liu W, Liu H, Gong J, Wang H, et al. AnimalTFDB: a comprehensive animal transcription factor database. Nucleic Acids Res. 2012;40:1–6.View ArticleGoogle Scholar
- Chen K, Chen Z, Wu D, Zhang L, Lin X, Su J, et al. Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor-suppressor genes. Nat Genet. 2015;47:1149–57.View ArticlePubMedPubMed CentralGoogle Scholar
- Elango N, Yi SV. Functional relevance of CpG island length for regulation of gene expression. Genetics. 2011;187:1077–83.View ArticlePubMedPubMed CentralGoogle Scholar
- Almada AE, Wu X, Kriz AJ, Burge CB, Sharp P a. Promoter directionality is controlled by U1 snRNP and polyadenylation signals. Nature. 2013;499:360–3.View ArticlePubMedPubMed CentralGoogle Scholar
- Heintzman ND, Hon GC, Hawkins RD, Kheradpour P, Stark A, et al. Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature. 2009;459:108–12.View ArticlePubMedPubMed CentralGoogle Scholar
- Bernard F, Gelsi-Boyer V, Murati a, Giraudier S, Trouplin V, Adélaïde J, et al. Alterations of NFIA in chronic malignant myeloid diseases. Leuk Off J Leuk Soc Am Leuk Res Fund UK. 2009;23:583–5.View ArticleGoogle Scholar
- Whyte W, Orlando D a, Hnisz D, Abraham BJ, Lin CY, Kagey MH, et al. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell. 2013;153:307–19.View ArticlePubMedPubMed CentralGoogle Scholar
- Hnisz D, Abraham BJ, Lee TI, Lau A, Saint-André V, Sigova A a, et al. Super-enhancers in the control of cell identity and disease. Cell. 2013;155:934–47.View ArticlePubMedGoogle Scholar
- Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J, Boyd M, et al. An atlas of active enhancers across human cell types and tissues. Nature. 2014;507:455–61.View ArticlePubMedGoogle Scholar
- Arnold CD, Gerlach D, Stelzer C, Boryn LM, Rath M, Stark A. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science. 2013;339:1074–7.View ArticlePubMedGoogle Scholar