Genome-wide methylation and expression profiling identifies promoter characteristics affecting demethylation-induced gene up-regulation in melanoma
© Rubinstein et al; licensee BioMed Central Ltd. 2010
Received: 4 September 2009
Accepted: 9 February 2010
Published: 9 February 2010
Abberant DNA methylation at CpG dinucleotides represents a common mechanism of transcriptional silencing in cancer. Since CpG methylation is a reversible event, tumor supressor genes that have undergone silencing through this mechanism represent promising targets for epigenetically active anti-cancer therapy. The cytosine analog 5-aza-2'-deoxycytidine (decitabine) induces genomic hypomethylation by inhibiting DNA methyltransferase, and is an example of an epigenetic agent that is thought to act by up-regulating silenced genes.
It is unclear why decitabine causes some silenced loci to re-express, while others remain inactive. By applying data-mining techniques to large-scale datasets, we attempted to elucidate the qualities of promoter regions that define susceptibility to the drug's action. Our experimental data, derived from melanoma cell strains, consist of genome-wide gene expression data before and after treatment with decitabine, as well as genome-wide data on un-treated promoter methylation status, and validation of specific genes by bisulfite sequencing.
We show that the combination of promoter CpG content and methylation level informs the ability of decitabine treatment to up-regulate gene expression. Promoters with high methylation levels and intermediate CpG content appear most susceptible to up-regulation by decitabine, whereas few of those highly methylated promoters with high CpG content are up-regulated. For promoters with low methylation levels, those with high CpG content are more likely to be up-regulated, whereas those with low CpG content are underrepresented among up-regulated genes.
Clinically, elucidating the patterns of action of decitabine could aid in predicting the likelihood of up-regulating epigenetically silenced tumor suppressor genes and others from pathways involved with tumor biology. As a first step toward an eventual translational application, we build a classifier to predict gene up-regulation based on promoter methylation and CpG content, which achieves a performance of 0.77 AUC.
Epigenetic abnormalities, including global losses and local gains in methylation, have been observed in many types of cancer, including melanoma [1–3]. It is thought that, while global hypomethylation may induce genomic instability early in cellular transformation, localized hypermethylation may promote tumorigenesis through silencing of tumor suppressor genes [4, 5]. Some genomic loci are more susceptible to such hypermethylation than others, and significant progress has been made in predicting which CpG islands will be subject to methylation on the basis of sequence motifs [6, 7].
A well-established relationship exists between promoter methylation and transcriptional repression [8–10]. There are two popular models for this phenomenon, the first positing that the methyl groups directly block the binding of transcription factors  and the second citing the role of methyl-binding proteins that recruit transcriptional repressors to the methylated sites [12, 13]. Recently, Koga et al. used MeDIP combined with promoter tiling microarrays to evaluate genome-wide methylation levels and transcriptional regulation in a set of melanoma cell strains . Their analysis confirmed that, for promoters containing a minimum number of CpG dinucleotides, increased methylation caused decreased expression.
Recognition of the importance of epigenetic silencing in tumor biology has led to exploration of the therapeutic potential of demethylating agents, such as the DNA methyltransferase inhibitor 5-aza-2'-deoxycytidine (decitabine). The drug received FDA approval for the treatment of myelodysplastic syndrome in 2004 and is currently the subject of clinical trials exploring its utility in treating a variety of solid tumors. The ability of these agents to up-regulate the expression of aberrantly suppressed genes has been demonstrated in several studies, resulting in lists of individual candidate targets for demethylation-induced up-regulation [15–18]. Similar studies have employed RNAi for DNMT knockdown toward the same end . While these efforts are providing valuable insight into the specific genes and gene pathways that may be targetable through epigenetic manipulation, it is still unclear why decitabine causes some silenced loci to become up-regulated, while others remain inactive. We therefore embarked on systematic, genome-wide studies exploring molecular characteristics of decitabine-responsive genes.
Using microarray analysis of both gene expression and promoter methylation, we sought to identify promoter characteristics that predict the likelihood of response to decitabine treatment. We began by stratifying the promoter regions on the basis of CpG content and pre-decitabine methylation level. We then tested for enrichment of up-regulated genes in each of the resulting promoter categories (i.e., different CpG content and methylation level combinations). Using logistic regression and ten-fold cross-validation, we trained and tested a classifier that predicts the likelihood of decitabine-induced up-regulation on the basis of promoter category.
Decitabine-induced up-regulation varies by promoter methylation level and CpG content
Melanoma cell strains.
Melanoma cell line
IV/soft-tissue metastasis, right thigh
Primary melanoma, 2.25 mm
IV/soft-tissue metastasis, neck
IV/soft-tissue metastasis, left neck
IV, soft-tissue metastasis, right thigh
Defining the effect of decitabine on transcription in terms of the fold-change in expression before and after treatment (as is often done for differential expression) might be misleading, resulting in a biased definition of up-regulated genes. Genes with low basal expression require a lesser absolute increase to meet a 2-fold up-regulation threshold after treatment than do genes with high basal expression. Also, genes with very low basal expression might show a two-fold increase in response to decitabine but still have biologically insignificant amounts of gene expression. Furthermore, it is important to note that the set of genes with low basal expression values are significantly enriched for low CpG-content promoters, whereas genes with high CpG-content promoters generally have greater baseline expression levels . An up-regulation threshold that favors genes with low starting expression would therefore bias the set of decitabine-responsive genes to include a greater proportion of low CpG-content promoters. Given that our initial observation suggests that promoter CpG content is an important factor in decitabine-responsiveness, we were particularly interested in addressing this bias. To do so, in addition to a 2-fold increase in expression, we require a minimum delta expression of 5,000 units in order to consider a gene up-regulated. Additional file 1 contains the list of refseq IDs for each promoter bin, as well as an indication of which are up-regulated.
We next employed the hypergeometric distribution to test, for each promoter category, whether the number of up-regulated genes is significantly different than would be expected by chance. Those promoter categories significantly enriched or depleted in up-regulated genes are denoted in figure 2B with a star or a circle, respectively, and the pattern confirms our previously observed trend.
We also investigated whether gene upregulation is explained by factors other than promoter methylation and CpG density. Using the GSEA method over gene expression measurements in the YUMAC strain, we tested for Gene Ontology (GO), KEGG pathway and motif gene set enrichments among the upregulated genes. Our results show that none of the gene sets were enriched at the FDR level of 0.05, indicating that gene upregulation is not dominated by activation of a particular cellular process, or regulation by a transcription factor.
Decitabine-responsive MCF-7 genes
While this independent dataset is limited in size, the overall trend is consistent with our findings and we found no evidence to contradict our hypothesis. Importantly, the MCF-7 cell lines were treated with a low dose decitabine (100 nM) that is comparable to the 200 nM used in the melanoma experiments. One caveat to the comparison is that the decitabine-response data for the MCF-7 cell line is in the form of fold-change following treatment; we can therefore not account for potential bias introduced by basal expression levels (discussed above).
Predicting decitabine response
We found that the combination of promoter methylation and CpG content across the entire promoter has an area under the ROC (AUC) of 77%. The predictive power is superior compared to either methylation status or CpG content alone (AUC 72% and 63%, respectively).
Performance of predictive models of up-regulation.
Distance from TSS
-2200 to +500
-2200 to -1500
-1500 to -1000
-1000 to -500
-500 to -200
-200 to +100
+100 to +500
To further strengthen confidence in the model, we performed 500 permutations of the class label, training and testing the model with ten-fold cross-validation each time. The mean AUC for these 500 permutations was 50.4%, as would be expected of a model with no predictive power. This analysis provides further evidence that the trend observed in promoter characteristics of up-regulated genes was not due to chance alone.
Discussion and Conclusion
The ability to re-activate genes that have been epigenetically silenced in cancer could prove a powerful adjunct to existing chemotherapeutics, yet not all methylated genes undergo up-regulation following demethylating treatment. We detect patterns of promoter characteristics that influence susceptibility to decitabine treatment in a group of melanoma cell strains.
Promoters with a high degree of methylation and an intermediate CpG content appear most susceptible to decitabine-induced up-regulation. For these highly methylated promoters, as the CpG content increases, the efficacy of the drug appears to decrease. We conclude that, in CpG-dense promoters with high levels of methylation, the large absolute number of methyl groups overwhelms the ability of the drug to demethylate sufficiently to allow increased expression.
In contrast, for genes with low levels of methylation, the drug's effects become more pronounced as CpG content increases. We theorize that, in promoters that are sufficiently CpG-dense, those with lesser levels of methylation might still contain an absolute number of methyl groups large enough to constitute a viable target for demethylation-induced expression. Silent promoters that are CpG-sparse and have low levels of methylation are likely repressed by regulatory mechanisms that are not responsive to demethylation.
In addition to examining promoter CpG content and methylation level, we have performed initial analyses (data not shown) to suggest that the presence of DNase hypersensitivity sites, reflecting an open chromatin state, also plays a role in determining a promoter's susceptibility to demethylation-induced up-regulation of expression. Further studies using chromatin structure data, as well as more refined analysis of promoter sequence features, will likely improve upon the classifier's predictive power.
In summary, we identified a trend in promoter characteristics that correlates with the likelihood of response to decitabine in a set of melanoma cell strains, and used this trend to build a computational classifier to predict response to treatment. Further study using higher resolution assessment of methylation, as well as integration with genome-wide promoter architecture data (such as DNase hypersensitivity and histone modification) is needed to decipher in more detail the regulatory forces causing gene silencing and the likelihood of up-regulating key tumor supressor genes using drugs that target DNA methylation.
Methods for cell isolation, culture, drug treatment, MeDIP, and methylation and expression-profiling were previously reported [14, 20]. Briefly, methylated DNA was enriched using the MeDIP approach followed by hybridization to genomic promoter tiling arrays (NimbleGen C426-00-01) containing 390,000 probes. Standard normalization methods for two-channel arrays were applied, and relative methylation levels were determined using the MEDME bioconductor library . Approximately 20-30 million cells were used for each mRNA extraction. Cells were treated with low-dose (200 nM) decitabine for two days, followed by one day recovery before total RNA extraction. Gene expression data was derived from NimbleGen human whole genome expression microarrays (array 2005_04-20_Human_60 mer_1in2) containing 380,000 probes with an average of 11 probes per refseq, located throughout the gene. Probe measurements were then averaged for each refseq. The same chip was hybridizied with differentially labeled, polyA-selected cDNA from decitabine-treated and untreated cells, experiments were repeated with dye swapping. Data was captured and processed by NimbleGen Systems Iceland LLC. Normalization within arrays was performed with Loess-based methods to correct for biases due to labeling with different dyes on the two microarray channels and to correct for spatial artifacts. As such, M and A values were determined where M describes the amount of differential expression (M = log2(cy5/cy3)) and A associates M with the magnitude of overall expression (A = (log2cy5+log2cy3)/2). Normalization between arrays was performed via quantile-based normalization. mRNA RefSeqs were mapped to the genome and those with < 96% sequence identity, as well as those that mapped to more than two genomic loci, were discarded. Analysis of the data revealed upregulation of 292 common genes across the cell lines after Decitabine treatment, and the treatment effect on demethylation was validated in selected strongly upregulated genes, such as CDKN1A and TGFBI.
For our experiments on decitabine-induced gene upregulation, we pooled information on gene promoter methylation level, CpG density, as well as differential gene expression from 7 cell strains. Data pooling yielded measurements for 22,824 promoters per cell strain, for a total of 159,768 data triplets (methylation, CpG density and differential gene expression).
For each promoter, the sequence from 2,200 basepairs upstream to 500 basepairs downstream of the transcription start site (TSS) was analyzed for CpG content, and five equal categories defined. Methylation levels for these promoters were previously reported for each of six bins spanning the same 2,700 basepairs around the TSS . In order to obtain a single methylation measurement for each promoter, we used the sum of these six values and divided the results into four categories such that each bin contains roughly the same number of promoters (~40,000). Basal and decitabine-induced expression values represent the mean of two replicate experiments for each promoter. Promoters that demonstrate a two-fold increase in expression (post-treatment/pre-treatment expression >= 2) as well as an absolute expression increase of at least 5,000 units are labeled as up-regulated.
For each combination of CpG content and methylation level, the non-parametric wilcoxon signed-rank test was employed to compare post-decitabine to pre-decitabine expression levels. For this analysis we used the R stats module wilcox.test function.
For each combination of CpG content and methylation level, the significance of the difference between the observed number of up-regulated genes and the number expected by chance alone (total number in the bin multiplied by the fraction of all genes that undergo up-regulation) is calculated from the hypergeometric distribution using the R stats module dhyper function with the number up-regulated in the CpG/methylation bin, total number of up-regulated promoters, total number not up-regulated, and number in the CpG/methylation bin.
We used the GSEA program http://www.broadinstitute.org/gsea/ to analyze our expression data for enriched Gene Ontology and motif gene sets. We uploaded the gct and cls files corresponding to our data from the YUMAC cell strain, and set the Metric for ranking genes to Ratio_of_Classes, and the Permutation type to gene_set. All other parameters were set to default.
Expression response data following decitabine treatment on the MCF-7 breast cancer cell line was downloaded from the BROAD Institute Connectivity Map . Methylation data for the MCF-7 cell line was downloaded from the supplementary material from a study by Li et al. that used a modified methylation-specific digital karyotyping for genome-wide methylation profiling of two breast cancer cell lines . Methylation levels were in the form of the number of sequencing reads per fragment. Using the 90th quantiles from the MCF-7 and melanoma datasets, the MCF-7 methylation levels were scaled so that the distribution of values between the two datasets occupied an equivalent range. MCF-7 promoters were then categorized as having either low (0-1), intermediate (1-6), or high (>6) levels of methylation.
A computational model of decitabine-response was built using the generalized linear model for logistic regression. This was implemented using the R stats module glm function with the following arguments: formula = upregulated ~promoter methylation + promoter CpG content; family = gaussian; method = glm.fit (iteratively weighted least squares). Briefly, data from the YUMAC cell line was filtered for genes with pre-treatment expression levels below 700 units (app. 40% of the data). For each promoter bin, and for the region as a whole, the model was then trained and tested using ten-fold cross-validation, receiver operating curves were generated and AUCs calculated using the ROCR package . The class labels were then permuted 500 times, the model trained and tested for each permutation, and the mean AUC calculated.
Gene expression and promoter methylation data have been uploaded to GEO (Accession: GSE13706) and ArrayExpress (Accession: E-MTAB-185).
This work was supported by the National Library of Medicine (NIH Grant T15 LM07056, BIOMEDICAL INFORMATICS RESEARCH TRAINING AT YALE), the MD-PhD Program at Yale University (NIH MSTP TG 5T32GM07205), and the Yale SPORE in Skin Cancer (5P50CA121974). The authors are grateful to Sebastian Szpakowski for his contributions.
- Gonzalez-Zulueta M, Bender C, Yang A, Nguyen T, Beart R, Van Tornout J, Jones P: Methylation of the 5' CpG island of the p16/CDKN2 tumor suppressor gene in normal and transformed human tissues correlates with gene silencing. Cancer Res. 1995, 55 (20): 4531-4535.PubMed
- De Smet C, De Backer O, Faraoni I, Lurquin C, Brasseur F, Boon T: The activation of human gene MAGE-1 in tumor cells is correlated with genome-wide demethylation. Proc Natl Acad Sci USA. 1996, 93 (14): 7149-7153. 10.1073/pnas.93.14.7149.PubMed CentralView ArticlePubMed
- Martinez R, Martin-Subero J, Rohde V, Kirsch M, Alaminos M, Fernandez A, Ropero S, Schackert G, Esteller M: A microarray-based DNA methylation study of glioblastoma multiforme. Epigenetics. 2009, 4 (4): 255-264.View ArticlePubMed
- Eden A, Gaudet F, Waghmare A, Jaenisch R: Chromosomal instability and tumors promoted by DNA hypomethylation. Science. 2003, 300 (5618): 455-10.1126/science.1083557.View ArticlePubMed
- Bonazzi V, Irwin D, Hayward N: Identification of candidate tumor suppressor genes inactivated by promoter methylation in melanoma. Genes Chromosomes Cancer. 2009, 48 (1): 10-21. 10.1002/gcc.20615.View ArticlePubMed
- Feltus F, Lee E, Costello J, Plass C, Vertino P: DNA motifs associated with aberrant CpG island methylation. Genomics. 2006, 87 (5): 572-579. 10.1016/j.ygeno.2005.12.016.View ArticlePubMed
- Kim S, Li M, Paik H, Nephew K, Shi H, Kramer R, Xu D, Huang T: Predicting DNA methylation susceptibility using CpG flanking sequences. Pac Symp Biocomput. 2008, 315-326.
- Bird A, Wolffe A: Methylation-induced repression--belts, braces, and chromatin. Cell. 1999, 99 (5): 451-454. 10.1016/S0092-8674(00)81532-9.View ArticlePubMed
- Bestor T: Gene silencing. Methylation meets acetylation. Nature. 1998, 393 (6683): 311-312. 10.1038/30613.View ArticlePubMed
- Iguchi-Ariga S, Schaffner W: CpG methylation of the cAMP-responsive enhancer/promoter sequence TGACGTCA abolishes specific factor binding as well as transcriptional activation. Genes Dev. 1989, 3 (5): 612-619. 10.1101/gad.3.5.612.View ArticlePubMed
- Watt F, Molloy P: Cytosine methylation prevents binding to DNA of a HeLa cell transcription factor required for optimal expression of the adenovirus major late promoter. Genes Dev. 1988, 2 (9): 1136-1143. 10.1101/gad.2.9.1136.View ArticlePubMed
- Martin V, Jørgensen H, Chaubert A, Berger J, Barr H, Shaw P, Bird A, Chaubert P: MBD2-mediated transcriptional repression of the p14ARF tumor suppressor gene in human colon cancer cells. Pathobiology. 2008, 75 (5): 281-287. 10.1159/000151708.View ArticlePubMed
- Barr H, Hermann A, Berger J, Tsai H, Adie K, Prokhortchouk A, Hendrich B, Bird A: Mbd2 contributes to DNA methylation-directed repression of the Xist gene. Mol Cell Biol. 2007, 27 (10): 3750-3757. 10.1128/MCB.02204-06.PubMed CentralView ArticlePubMed
- Koga Y, Pelizzola M, Cheng E, Krauthammer M, Sznol M, Ariyan S, Narayan D, Molinaro A, Halaban R, Weissman S: Genome-wide screen of promoter methylation identifies novel markers in melanoma. Genome Res. 2009, 19 (8): 1462-70. 10.1101/gr.091447.109.PubMed CentralView ArticlePubMed
- Al-Romaih K, Sadikovic B, Yoshimoto M, Wang Y, Zielenska M, Squire J: Decitabine-induced demethylation of 5' CpG island in GADD45A leads to apoptosis in osteosarcoma cells. Neoplasia. 2008, 10 (5): 471-480.PubMed CentralView ArticlePubMed
- Richter E, Masuda K, Cook C, Ehrich M, Tadese A, Li H, Owusu A, Srivastava S, Dobi A: A role for DNA methylation in regulating the growth suppressor PMEPA1 gene in prostate cancer. Epigenetics. 2 (2): 100-109.
- Gollob J, Sciambi C: Decitabine up-regulates S100A2 expression and synergizes with IFN-gamma to kill uveal melanoma cells. Clin Cancer Res. 2007, 13 (17): 5219-5225. 10.1158/1078-0432.CCR-07-0816.View ArticlePubMed
- Tamm I, Wagner M, Schmelz K: Decitabine activates specific caspases downstream of p73 in myeloid leukemia. Ann Hematol. 2005, 84 (Suppl 1): 47-53. 10.1007/s00277-005-0013-0.View ArticlePubMed
- Foltz G, Yoon J, Lee H, Ryken T, Sibenaller Z, Ehrich M, Hood L, Madan A: DNA methyltransferase-mediated transcriptional silencing in malignant glioma: a combined whole-genome microarray and promoter array analysis. Oncogene. 2009
- Halaban R, Krauthammer M, Pelizzola M, Cheng E, Kovacs D, Sznol M, Ariyan S, Narayan D, Bacchiocchi A, Molinaro A, Kluger Y, Deng M, Tran N, Zhang W, Picardo M, Enghild JJ: Integrative analysis of epigenetic modulation in melanoma cell response to decitabine: clinical implications. PLoS One. 2009, 4 (2): e4563-10.1371/journal.pone.0004563.PubMed CentralView ArticlePubMed
- Weber M, Hellmann I, Stadler M, Ramos L, Pääbo S, Rebhan M, Schübeler D: Distribution, silencing potential and evolutionary impact of promoter DNA methylation in the human genome. Nat Genet. 2007, 39 (4): 457-466. 10.1038/ng1990.View ArticlePubMed
- Lamb J, Crawford E, Peck D, Modell J, Blat I, Wrobel M, Lerner J, Brunet J, Subramanian A, Ross KN, Reich M, Hieronymus H, Wei G, Armstrong SA, Haggarty SJ, Clemons PA, Wei R, Carr SA, Lander ES, Golub TR: The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science. 2006, 313 (5795): 1929-1935. 10.1126/science.1132939.View ArticlePubMed
- Li J, Gao F, Li N, Li S, Yin G, Tian G, Jia S, Wang K, Zhang X, Yang H, Nielsen AL, Bolund L: An improved method for genome wide DNA methylation profiling correlated to transcription and genomic instability in two breast cancer cell lines. BMC Genomics. 2009, 10: 223-10.1186/1471-2164-10-223.PubMed CentralView ArticlePubMed
- Pelizzola M, Koga Y, Urban A, Krauthammer M, Weissman S, Halaban R, Molinaro A: MEDME: an experimental and analytical methodology for the estimation of DNA methylation levels based on microarray derived MeDIP-enrichment. Genome Res. 2008, 18 (10): 1652-1659. 10.1101/gr.080721.108.PubMed CentralView ArticlePubMed
- Sing T, Sander O, Beerenwinkel N, Lengauer T: ROCR: visualizing classifier performance in R. Bioinformatics. 2005, 21 (20): 3940-3941. 10.1093/bioinformatics/bti623.View ArticlePubMed
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1755-8794/3/4/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.