On the identification of potential regulatory variants within genome wide association candidate SNP sets
© Chen et al.; licensee BioMed Central Ltd. 2014
Received: 22 January 2014
Accepted: 2 June 2014
Published: 11 June 2014
Genome wide association studies (GWAS) are a population-scale approach to the identification of segments of the genome in which genetic variations may contribute to disease risk. Current methods focus on the discovery of single nucleotide polymorphisms (SNPs) associated with disease traits. As there are many SNPs within identified risk loci, and the majority of these are situated within non-coding regions, a key challenge is to identify and prioritize variants affecting regulatory sequences that are likely to contribute to the phenotype assessed.
We focused investigation on SNPs within lung and breast cancer GWAS loci that reached genome-wide significance for potential roles in gene regulation with a specific focus on SNPs likely to disrupt transcription factor binding sites. Within risk loci, the regulatory potential of sub-regions was classified using relevant open chromatin and epigenetic high throughput sequencing data sets from the ENCODE project in available cancer and normal cell lines. Furthermore, transcription factor affinity altering variants were predicted by comparison of position weight matrix scores between disease and reference alleles. Lastly, ChIP-seq data of transcription associated factors and topological domains were included as binding evidence and potential gene target inference.
The sets of SNPs, including both the disease-associated markers and those in high linkage disequilibrium with them, were significantly over-represented in regulatory sequences of cancer and/or normal cells; however, over-representation was generally not restricted to disease-relevant tissue specific regions. The calculated regulatory potential, allelic binding affinity scores and ChIP-seq binding evidence were the three criteria used to prioritize candidates. Fitting all three criteria, we highlighted breast cancer susceptibility SNPs and a borderline lung cancer relevant SNP located in cancer-specific enhancers overlapping multiple distinct transcription associated factor ChIP-seq binding sites.
Incorporating high throughput sequencing epigenetic and transcription factor data sets from both cancer and normal cells into cancer genetic studies reveals potential functional SNPs and informs subsequent characterization efforts.
KeywordsGWAS Lung cancer Regulatory regions Gene regulation Transcription factor binding site alteration Enhancer Topological domains
Genome wide association studies (GWAS) examine common genetic variants, typically single nucleotide polymorphisms (SNPs), to detect statistical association with a trait across a set of unrelated individuals. Using large-scale SNP genotyping data, these studies compare the genetic makeup between two groups of individuals, those with and without the phenotype or disease of interest. GWAS findings have made important contributions to understanding of numerous disorders.
Once disease associated regions are identified, the challenge is to interpret the potential role of each of the associated SNPs, in order to prioritize candidate functional variants contributing to the phenotype. Around 90% of the variants in GWASdb, a repository of phenotype-associated SNPs identified in GWAS, are situated within intergenic or intronic regions . The analysis and interpretation of such variants are challenging, as sequence-specific functions, such as cis-regulatory elements, are present at low density in these regions and have been largely undetermined. The plethora of large-scale data sets generated with human cells creates a new opportunity to probe the relationship between GWAS identified loci and regulatory sequence variants. Both DNase-seq and ChIP-seq datasets generated by consortia such as the ENCODE project  have been key. Open chromatin regions defined by DNase I hypersensitivity assays are enriched for the presence of regulatory regions such as promoters and enhancers. Histone modifications such as mono- and tri- methylation of the lysine at position 4 of histone 3 (referred to as H3K4me1 and H3K4me3 respectively) are associated with enhancer and promoter positions [3, 4], while acetylation of the lysine at position 27 of histone 3 (H3K27ac) reflects active utilization of the regions . Extensive research into the relationships between diverse chromatin modifications and functional roles of the marked DNA segments is ongoing.
The interpretation of cancer GWAS may be particularly impacted by the study of non-coding regions, as cancer can be considered, at the fundamental level, a disorder of gene regulation. Differential activities of distal regulatory elements, such as enhancers, have been identified at prostate and colon cancer risk variants from GWAS [6–8]. Using H3K4me1 ChIP-seq datasets in colon cancer and normal cells, Akhtar-Zaidi et al. identified thousands of variants with loss or gain of H3K4me1, and found them to comprise a signature that is predictive of in vivo colon cancer gene expression patterns . Gerasimova et al. successfully predicted functional SNPs contributing to asthma, in part by taking into account tissue-specificity of enhancers using epigenetic datasets . Paul et al. showed enrichment of SNPs associated with hematological traits within nucleosome depleted regions of hematopoietic cells . One previous study successfully coupled disease associated SNPs to regulatory sequence annotation by pooling and analyzing datasets from multiple cell types to focus on potential regulatory SNPs . As GWAS derived disease-associated SNPs are most commonly found in non-coding regions, incorporating regulatory sequence annotations into the interpretation process is anticipated to further the identification of the causal variations within GWAS loci.
The identification of regulatory sequence variants impacting phenotype has received increasing research attention . Initial methods focused on the identification of mutations that disrupt transcription factor binding sites (TFBS) [13–15]. More specifically, the intersection of GWAS and large-scale regulatory sequence annotation availability has catalyzed the creation of tools focused on the identification or ranking of potential regulatory variants. Ward et al. created the online resource tool, HaploReg, to provide annotations for non-coding variants through chromatin states of 9 cell types, conservation and impact on predicted TFBS . The RegulomeDB tool annotates functional variation using a combination of high throughput sequencing (HTS) data, TFBS predictions, eQTLs and enhancer information . Coetzee et al. created a functional SNP annotator by incorporating ENCODE TF and histone modification datasets within an R package, FunciSNP, which was subsequently used in a breast cancer GWAS analysis [18, 19]. The ChroMoS web server, on the other hand, facilitates SNP prioritization using genetic and epigenetic data, and predicts differential transcription factor and miRNA binding .
UCSC GWAS SNPs and the corresponding LD80 SNPs
We obtained lung cancer, breast cancer, prostate cancer and colorectal cancer SNP lists from the gwasCatalog table of UCSC database  collected by NHGRI , and applied the stringent threshold of P < 5×10−8 on for genome-wide significance (gwasCatalog downloaded on March 17, 2014; Additional file 1). With the p-value threshold, the Lung.cancer set (in the trait column of gwasCatalog) included SNPs from European GWAS on “Lung adenocarcinoma” and “Lung cancer”. The Breast.cancer set included SNPs from European GWAS on “Breast cancer”, “Breast cancer (male)”, and “Breast Cancer in BRCA1 mutation carriers”. The Colorectal.cancer and Prostate.cancer sets included European GWAS on “Colorectal cancer” and “Prostate cancer”, respectively. We note that some of the studies included meta-analysis. We focused on European descent studies to avoid potential differences in epigenetic status among different ethnicities. All data sets were obtained from published studies, thus no specific ethics approval was required for the study.
To account for potentially biased SNP selection of the SNP arrays, we obtained all SNPs in linkage disequilibrium with the SNPs of interest, using the SNAP webtool . The search distance was limited to 500 kilobases for simplicity, and the r2 threshold was set to 0.80 on SNAP for 1000 Genomes project (Pilot 1), Phase III HapMap (release 2), and Phase II HapMap (release 22 and 21) data sets to expand to variants in strongest LD to disease susceptibility SNPs. A union and unique set of variants, which we referred to as LD80, was compiled from these queries for each GWAS SNP set. Since all SNPs in this report were collected from European descent studies, LD80 SNPs were obtained using r2 values with the CEU (Utah residents with ancestry from northern and western Europe) population. To eliminate duplicate or obsolete SNPs, the LD80 SNPs were further filtered for unique genomic coordinates in hg19 through BioMart, and analyses were conducted on these LD80 SNP sets.
High throughput sequencing data from ENCODE
The HTS datasets of H1 embryonic stem (ES) cells, A549 lung cancer cells and its ‘healthy equivalent’ NHLF (Normal Human Lung Fibroblasts), MCF-7 breast cancer and its ‘healthy equivalent’ HMEC (Human Mammary Epithelial Cells), LNCaP prostate cancer and its ‘healthy equivalent’ PrEC (Human Prostate Epithelial Cells), and two colorectal cancer (Caco-2 and HCT-116) cell lines were obtained from the ENCODE project in the hg19 genomic build . We included in our analysis the datasets from H1 ES cells, as GWAS SNPs associated with various cancers have been reported to be enriched in ES cell enhancers . Regions that were in open chromatin as shown by DNase-I hypersensitivity data, and occupied by H3K4me3 and CTCF (an insulator binding protein) were collected along with bound regions for multiple transcription factors. RNA-seq expression data of A549, NHLF, MCF-7, and HMEC cell lines were obtained for expression analysis. Also, the TAF ChIP-seq peaks of the cell lines (A549, H1 ES, HCT-116, MCF-7 cells) were retrieved, and overlapping peaks across the replicates of each TF within each cell line were required for stringency. No TAF data was identified for NHLF, HMEC, LNCaP, PrEC and Caco-2 cells. Detailed information on the retrieved public datasets was provided in Additional file 2.
Annotation of regulatory sequences
In order to interpret the functionality associated with the SNPs, regulatory sequences were annotated in H1 embryonic stem cells, A549 lung cancer, NHLF lung normal, MCF-7 breast cancer, HMEC breast normal, LNCaP prostate cancer, PrEC prostate normal, Caco-2 and HCT-116 colorectal cancer cell lines. Where data were available, open chromatin regions were specified by DNase I hypersensitive regions from DNase-seq, potential promoters by H3K4me3 ChIP-seq regions, potential insulators by ChIP-seq peaks of CTCF, and finally potential enhancers (pEnh) by DNase I regions lacking H3K4me3 in non-exonic regions.
Genomic functional categories
The Ensembl transcripts from ENSEMBL GENES 71 were used for annotation information to specify the locations of exons, untranslated regions (UTRs), transcription start and termination sites (TSS and TTS). SNPs within 10 kb upstream of TSSs were labeled as upstream regions, and SNPs within 10 kb downstream of TTSs were labeled as downstream regions. SNPs falling into more than one category were assigned in the priority order from high to low: 5′ UTR, 3′ UTR, exons, genic, upstream, downstream and intergenic.
Enrichment tests of SNPs in regulatory sequences
In order to test the enrichment of SNPs in regulatory sequences, we randomly drew from the Illumina 660K SNP Array (widely used in GWAS) an equal number of variants with the overall distributions matching three attributes of the LD80 SNP sets: minor allele frequency, GC content in the region +/-500 bps from the variant and distance to the nearest TSS. We separated each attribute into 20 percentile bins and repeated matched drawings 1,000 times. Using these background sets, we determined the distributions of the number of SNPs overlapping each annotated experimental dataset (e.g. open chromatin, H3K4me3, etc.). We then obtained the p-value of each SNP list with respect to each dataset by comparing the true foreground overlapping counts to the background distributions. Multiple hypothesis testing adjustment was conducted using the ‘qvalue’ R package .
Differential TF binding affinity analysis using PWMs
PWM scores have been repeatedly demonstrated to have strong correlation with the sequence specific binding energy of the modeled TF using the PWM scoring procedure described in [28, 29]. We computed the TF affinity scores on 30 base pairs up- and down- stream of each SNP with major and minor alleles using PWMs from the JASPAR 2010 database . For binding site prediction stringency, we used 90 out of 130 vertebrate PWMs with information content greater than 10. The best score overlapping each SNP for each TF was retained, and we took the differences between PWM scores of the major and minor alleles to represent the difference in binding affinity of each TF. We randomly selected 10,000 SNPs from Illumina 660K SNP array that had the same single nucleotide variation, similar GC content (+/-500 bps) and similar distance to the nearest TSS as the studied set of SNPs, and used the TF binding affinity differences of these random SNPs as the background (i.e. we matched the proportions of every possible nucleotide substitution, and 20 percentile bins of GC content and distance to the nearest TSS between the LD80 SNP set and the randomly drawn sets). Using such background distribution of score differences for each PWM, we obtained a two-tailed empirical p-value of the difference in TF binding affinity for each LD80 SNP of interest. SNPs with empirical p-values ≤ 0.05 were considered to induce differential affinity for such TF. To further distinguish meaningful differential affinity that potentially infers enhancement or disruption of a TF binding site, we included an additional criterion of PWM scores greater than 80 for either allele.
Regulatory potential index
For each of the high throughput sequencing data, intersection of regions from replicated data sets was taken, and values were compiled using geometric means across replicates. To obtain a quantitative score for the regulatory sequences, we computed regulatory potential index (RPI) for each cell line by summing the following values for each SNP: the reads per kilobase per million sequencing reads from DNase-seq, the average enrichment values from H3K4me3 and CTCF ChIP-seq data from the ENCODE consortium. The relative regulatory potential was computed as log2 of (RPIcancer + 1)/(RPInormal + 1). In the case of lung cancer, the relative regulatory potential was log2((RPIA549 + 1)/(RPINHLF + 1)). We note that for colorectal cancer, data sets were only available in two cancer cell lines, in such case the relative regulatory potential was computed from the two cancer cell lines and do not reflect comparison between cancer and normal cell lines.
Case study of a lung cancer meta-analysis
Landi et al. published a meta-analysis of lung cancer GWAS . The Lung.Meta SNP list was generated through a meta-analysis of 11 lung cancer GWAS combining histological types with 13,300 primary lung cancer cases and 19,666 controls of European descent. It is a permissive list that is inclusive of all eight SNPs from Lung.cancer collection introduced above. The published set was based on a threshold of P < 8×10−5, which we took as reported for a case study to demonstrate how the regulatory analyses can be applied to borderline lung cancer candidate variant prioritization. We note that the published threshold is less stringent than the threshold applied to the UCSC collections, and greater caution is therefore required in assessing the reliability of the candidate.
Topological domains and chromatin interactions from Hi-C datasets
We downloaded topological associating domains (TADs) and mapped reads of the chromatin interaction datasets in H1 human ES cells and IMR90 fibroblast cells from Hi-C experiments conducted by Dixon et al. using the restriction enzyme HindIII  (summary files obtained through GEO: GSE35156). We lifted the genomic coordinates of TADs and paired-end reads originally mapped to reference human genome hg18 to hg19 build using the liftOver tool . As the number of paired reads between two genomic regions reflects the degree of interactions, and Hi-C technique is limited by resolution, we counted the numbers of paired reads in 20 kilobase bins and plotted the result using the ‘HiTC’ R package .
Cancer susceptibility SNPs frequently occur in non-coding regions
Delineating potential regulatory sequences of the genome in different cell types
In order to assess the potential regulatory roles of SNP-containing segments, in both cancer and normal cell conditions, we analyzed a diverse group of features profiled in ENCODE project HTS data sets from cells relevant to the GWAS (detailed in Methods), including DNase-seq, H3K4me3, and CTCF ChIP-seq data. As shown in Additional file 3A, open chromatin regions detected by DNase I sensitivity experiments are largely inclusive of the promoter regions marked by H3K4me3. In agreement with previous literature , a large proportion of CTCF bound regions and H3K4me3 marks are shared among cell types, whereas DNase-seq data sets are in greater variation. The CTCF bound regions in NHLF cells are noted to differ from other cell types. We defined regulatory sequences using these markers previously associated with promoters, enhancers and insulators (see Methods). The coverage percentages of the genome for all categories are similar across cell types (Additional file 3B).
Cancer susceptibility SNPs are enriched in regulatory sequences
Overall, the Prostate.cancer and Breast.cancer LD80 sets showed the most frequent enrichment for regulatory sequences found both within and across tissues, indicating that these SNPs were disproportionately situated within active regions shared among cell types (Figure 3). The Prostate.cancer and Breast.cancer sets were significantly enriched in 22 and 20 categories of regulatory sequences (q-value <0.05), respectively (Additional file 4B,C). The Lung.cancer set was most strongly enriched in the promoter regions of multiple cancer cell lines, MCF-7, LNCaP, Caco-2 and A549 as well as normal cell lines, such as HMEC and NHLF (Figure 3, Additional file 4A). The Breast.cancer set was most strongly, but not exclusively, enriched in the open chromatin regions of MCF-7 breast cancer, A549 lung cancer and HMEC breast normal cells (Figure 3, Additional file 4B). The Prostate.cancer set was enriched most strongly in open chromatin regions of LNCaP prostate cancer, PrEC prostate normal and all other cancer cell lines. Enrichment was also noted for promoter and enhancer regions of multiple cell types as well as CTCF binding sites in Caco-2 cells (Figure 3, Additional file 4C). Due to data availability, regulatory sequences of two colorectal cell lines were examined for colorectal LD80 SNPs. Enrichment was observed most strongly in active transcribed regions marked by H3K36me3 of Caco-2 colorectal cancer cells, open chromatin regions of HMEC and NHLF normal cell lines, and promoter regions of HCT-116 colorectal cancer cells (Figure 3, Additional file 4D). Interestingly, we observed a depletion of the inactive mark, H3K27me3, in Caco-2 colorectal cancer cell line in all four SNP sets.
Consequence of the SNPs on TF binding affinity scores
Altered binding affinity has been shown to have substantial impacts on the contributions of TFBS . In order to assess the potential impact of SNPs on TF binding, we next scored the predicted differential TF binding affinity between alleles of each SNP using PWMs. We referred to SNPs that result in a significantly higher or lower PWM scores compared to the major alleles as having an increase or decrease of binding affinity for the specific TF, respectively. The impact of a SNP on TF binding affinity was defined by comparing the observed score difference to the distribution of PWM score differences from 10,000 matched randomly selected SNPs from the Illumina 660K SNP array (see Methods). We reported the empirical p-values which reflected the random chance of having such a difference between two alleles at a TF binding site in Additional file 6. We note that a SNP allele can be found with an increase of TF binding affinity for one TF and a decrease for another. The percentages of SNPs with significant (p < 0.05) predicted differential TF binding affinity for at least one TF were 36 for Lung.cancer, 26 for Breast.cancer, 20 for Prostate.cancer, and 30 for Colorectal.cancer. For example, the Sox17 motif was frequently found to have significant differential binding affinity in breast, prostate and colorectal LD80 sets. Our results showed that the SNPs identified in multiple cancer types can alter TF binding affinity as shown by significant changes in PWM scores.
Prioritizing functional SNPs using regulatory potential and TF binding affinity
Prioritizing functional SNPs using all three criteria
We listed SNPs with regulatory potential and TF affinity differences as well as TAF binding evidence in Additional file 6. The nearest differentially expressed genes were indicated based on RNA-seq data, where available. Edwards et al. compiled a list of functional genetic variants and/or target genes that were identified from GWAS, for which regulatory functions of variants were experimentally verified through diverse methods such as electrophoretic mobility shift assays, reporter assays, and more . As a validation of our prioritization approach, we compared our results to the Edwards’ list (Additional file 8). For the 5 Edwards’ SNPs that were present in the datasets we analyzed, at least one (but never all) of the three criteria was fulfilled (i.e. differential regulatory potential, predicted differential TF binding affinity or overlap with TAF ChIP-seq regions).
For those variants meeting all three criteria, we further applied a threshold of ±5 to the relative regulatory potential score and a threshold of 6 to the number of overlapping TAF ChIP-seq data (both parameter values were set based on the 95th percentile of all examined SNPs). While 12 regulatory variants emerged in the Breast.cancer set, none of the Lung.cancer variants met the thresholds. Within the breast cancer candidates, a set of three highly correlated SNPs stood out among the highest ranking with interesting cancer-related characteristics: rs1292011 (GWAS P = 9×10−22), rs1391720 and rs1391721 (highlighted in Figure 5). While all three SNPs had differential TF affinity of different TFs, the rs1391720 showed the strongest differential TF affinity for a nuclear receptor protein, Nr2e3 (an increase from 72.4 to 87.9 in PWM scores; highlighted by a red arrow in Figure 4). These SNPs are located in a gene-sparse region with multiple long intergenic non-coding RNAs (lincRNAs) within 200 kb of distance, and are over 714 kb upstream of the T-box 3 gene involved with developmental processes, TBX3. The SNPs are within a potential cancer-specific enhancer observed to be in open chromatin for multiple cancer cell lines such as MCF-7, A549 and LNCaP, but not in normal NHLF, HMEC and PrEC cell lines (Additional file 9). All three SNPs overlap with binding sites of 9 unique TAFs in MCF-7 cells including Sin3Ak-20, GATA3, Rad21, CTCF, HDAC2, HA-E2F1, NR2F2, ZNF217, and the enhancer marking p300 .
A case study: functional interpretation of a borderline lung cancer relevant SNP located in a cancer-specific enhancer
Inferring potential targets of a SNP using topological domains
Variants in regulatory sequences that affect the transcription rate of target genes can have substantial impact on phenotype. While the interpretation of protein altering differences has progressed rapidly, the identification of cis-regulatory modifying mutations remains a challenge. The substantial challenges in moving from GWAS-mapped loci to specific causal variants may be in part derived from this limited capacity to identify regulatory SNPs. We have shown that functional annotation of the regions around non-coding SNPs can contribute to the interpretation of cancer risk alleles arising from GWAS. We observe that cancer susceptibility SNPs are enriched in regulatory sequences in the genome, and can be situated in TF binding sites. Integrating genome-scale data sets and TF binding site analysis into the interpretation process can highlight key SNPs consistent with regulatory roles. Such analysis was used here to highlight SNPs in ChIP-seq supported binding sites within cancer-specific enhancers. This work provides a general bioinformatics approach for the identification of regulatory variants within GWAS identified risk loci.
Previously reported experimentally validated functional SNPs that were evaluated in our study met one or two of our three criteria: regulatory potential, TF binding affinity and TAF ChIP-seq binding. We highlighted a group of three highly correlated SNPs, rs1391720, rs1391721 and rs1292011 in the Breast.cancer set that fit all three criteria. These SNPs are located in a cancer-specific enhancer overlapping 9 MCF-7 TAF ChIP-seq data, and are within the TAD containing multiple lincRNAs and TBX3 which is situated 714 kb away. Despite the vast distance, a previous study has suggested rs1292011 to be potentially mediating ER-positive breast cancer through an effect on TBX3 due to significant elevation of expression in plasma from individuals with breast cancer . All three SNPs have differential predicted TF binding affinity of distinct TFs, but rs1391720 shows the strongest significance for an Nr2e3 binding site, which has been previously reported to be an upstream regulator of ESR1 in breast cancer .
As a case study, we also investigated the rs12087869 SNP identified as a borderline candidate in a meta-analysis of lung cancer GWAS (Lung.Meta) that fit all three criteria. The SNP is located in a cancer-specific enhancer situated ~60 kb upstream of the cancer-associated ROR1 gene, which is one of the potential target genes located in the TAD containing the candidate SNP. ROR1 is highly expressed in early embryonic stages and in diverse cancers such as human breast cancer and B-cell chronic lymphocytic leukemia [44, 45]. Another potential target gene, Mir-544, has been reported to be associated with osteosarcoma, gastric cancer, and nasopharyngeal carcinoma [46–48]. The enhancer containing the SNP is marked as active by both H3K4me1 and H3K27ac epigenetics data from A549 lung cancer cells, while these marks of activity are absent in NHLF. The enhancer overlaps ChIP-seq binding peaks of 8 transcription associated factors in A549 cells, including the enhancer marking p300 co-activator and other TFs such as c-Myc, Max, and SP1 which are linked to cancer development and growth [49–51]. Although the cancer cells profiled in large-scale studies (e.g. A549) do not carry the risk allele, the data from the cells indicates that the SNP is situated within a region that is open and accessible in multiple cancer cells (A549, MCF-7, and LNCaP). TLX1::NFIC, MAX and Myc are predicted to have stronger binding affinity in those individuals with the risk allele, highlighting the potential functional role of this SNP in lung cancer.
One of the challenges in the study of regulatory sequences is the relevance of experimental data collected from diverse cells and tissues to the specific cancer cell type. We observe that cancer susceptibility GWAS SNPs are significantly over-represented in regulatory sequences from diverse data sources – the enrichment is not restricted to the specific cancer or normal cell types underlying each GWAS study. Such a finding is not in conflict with a functional role, as a regulatory variant could contribute to the creation of an environment in which a cancer is more likely to form or progress to detectable disease whether it is present in a regulatory sequences of cancer or normal cells. The presence of the TFBS altering change may impact expression in a phenotype-altering manner. From the GWAS SNP sets analyzed, the Breast.cancer and Prostate.cancer LD80 SNP sets showed the highest and most significant enrichment across regulatory regions from multiple cell types. The Lung.cancer and Colorectal.cancer SNP sets had relatively fewer enriched categories of regulatory regions. In particular, we note that the support for the regulatory roles at the breast cancer susceptibility SNP rs1391720 and the borderline lung cancer relevant SNP rs12087869 were provided from cancer-related data collections. Thus, if we had limited the analysis to normal cells, we would not have detected the functional SNP. It is our interpretation that in the search for cancer-related regulatory variants, it is appropriate to consider a range of cell and tissue sources.
This work fits into a spectrum of informatics efforts to better understand regulatory regions in the human genome. Building on high-throughput annotation of genome properties such as epigenetics marks, DNase I hypersensitivity and TF binding, the informatics community has been rapidly creating innovative methods for discrimination of regulatory regions. At the TF affinity level, our initial TF affinity study used only the PWMs from JASPAR 2010, since then, JASPAR 2014  has been made available. Future studies could evaluate alternative sources such as HoCoMoCo  which aggregates profiles from multiple resources. At the regulatory sequence level, methods for scoring regulatory potential such as ChromHMM and Segway incorporate diverse data types in order to predict likely regulatory regions [3, 54, 55].
Recently, three groups have applied informatics methods to predict potential regulatory variants within risk loci from specific GWAS of asthma, breast cancer and blood traits [9, 10, 18]. Our work is complementary to these studies, showing that GWAS loci from diverse cancer studies are enriched for candidate regulatory SNPs and providing a structured procedure for their identification.
The informatics methods and candidate variants presented here highlight clear opportunities for further work. For example, the cancer susceptibility regulatory SNPs highlighted in the report will require experimental investigation. While in vitro binding studies of transcription factors with differential binding affinities could be pursued, it has been well established that computational predictions of binding affinity are highly correlated with experimental measurements . Therefore, we think longer-term it will be most appropriate to obtain relevant cancer cells and normal cells with the specific risk allele and measure TF binding in vivo, expression of potential gene targets (inferred from the Hi-C interaction data and topological domains), and the small-scale chromosome conformation capture in the presence or absence of the candidate regulatory variation. Beyond the specific candidate, a key challenge for the field is the computational prediction of the relationship between enhancers and specific promoters, which would allow for targeted experimental approaches.
Cancer GWAS SNP sets extended with high linkage disequilibrium are over-represented in cis-regulatory sequences active in cancer and normal cells. The degree of over-representation varied across assessed GWAS. Notably, such enrichment of the cancer susceptibility SNPs in normal cells highlights the potential contribution of the variants to the earliest phases of tumorigenesis. Analysis of the differential TF binding affinity between the regular and risk alleles reveals functional SNPs with potential disruption or enhancement of TF binding. We highlight candidate regulatory SNPs, one with borderline lung cancer susceptibility and three with significant breast cancer susceptibility, within cancer-specific enhancers with multiple overlapping TAF binding sites and significant increase in TF binding affinity scores with the risk allele. The methods constitute a framework to interpret functionality of GWAS loci candidate regulatory SNPs based on epigenetics, regulatory potential and TF binding affinity properties.
Genome wide association study
Single nucleotide polymorphism
High throughput sequencing
Transcription associated factor
Position weight matrix
Transcription factor binding site
Topological associating domain.
We thank Dr. Angela Brooks-Wilson for critical review, Rebecca Worsley-Hunt, Dr. Anthony Mathelier and Dr. Maja Tarailo-Graovac for helpful discussions and manuscript feedback, David Arenillas for computational assistance, and Dora Pak for research management.
This work was initiated through the Canadian Institutes of Health Research 2012 Summer Program in Taiwan, and CYC was partially funded by CIHR-IG and NSC-DOIC for the duration. CYC was supported by a scholarship from Canada’s National Sciences and Engineering Research Council. The Wasserman laboratory and the work described in this report were supported by the Canadian Institutes of Health Research (CIHR): MOP82875, and the Natural Sciences and Engineering Research Council of Canada (NSERC): 15RGPIN355532-10. The computer systems of the Gene Regulation Bioinformatics Laboratory were funded by the Canada Foundation for Innovation and the BC Knowledge Development Fund. The research was supported by Award Number R01GM084875 from the National Institute of General Medical Sciences and the ABC4DE project funded by Genome Canada and Genome British Columbia. This research was enabled in part by support provided by WestGrid (http://www.westgrid.ca) and Compute Canada Calcul Canada (http://www.computecanada.ca). Work of ISC was partially supported by Taiwan Bioinformatics Core with the grant number NSC-102-2319-B-400-001. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of General Medical Sciences or the National Institutes of Health. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
- Li MJ, Wang P, Liu X, Lim EL, Wang Z, Yeager M, Wong MP, Sham PC, Chanock SJ, Wang J: GWASdb: a database for human genetic variants identified by genome-wide association studies. Nucleic Acids Res. 2012, 40: D1047-D1054. 10.1093/nar/gkr1182.View ArticlePubMedGoogle Scholar
- Rosenbloom KR, Dreszer TR, Long JC, Malladi VS, Sloan CA, Raney BJ, Cline MS, Karolchik D, Barber GP, Clawson H, Diekhans M, Fujita PA, Goldman M, Gravell RC, Harte RA, Hinrichs AS, Kirkup VM, Kuhn RM, Learned K, Maddren M, Meyer LR, Pohl A, Rhead B, Wong MC, Zweig AS, Haussler D, Kent WJ: ENCODE whole-genome data in the UCSC Genome Browser: update 2012. Nucleic Acids Res. 2012, 40: D912-917. 10.1093/nar/gkr1012.View ArticlePubMedGoogle Scholar
- Chen CY, Morris Q, Mitchell JA: Enhancer identification in mouse embryonic stem cells using integrative modeling of chromatin and genomic features. BMC Genomics. 2012, 13: 152-10.1186/1471-2164-13-152.View ArticlePubMedPubMed CentralGoogle Scholar
- Heintzman ND, Stuart RK, Hon G, Fu Y, Ching CW, Hawkins RD, Barrera LO, Van Calcar S, Qu C, Ching KA, Wang W, Weng Z, Green RD, Crawford GE, Ren B: Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat Genet. 2007, 39: 311-318. 10.1038/ng1966.View ArticlePubMedGoogle Scholar
- Creyghton MP, Cheng AW, Welstead GG, Kooistra T, Carey BW, Steine EJ, Hanna J, Lodato MA, Frampton GM, Sharp PA, Boyer LA, Young RA, Jaenisch R: Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc Natl Acad Sci U S A. 2010, 107: 21931-21936. 10.1073/pnas.1016071107.View ArticlePubMedPubMed CentralGoogle Scholar
- Akhtar-Zaidi B, Cowper-Sal-lari R, Corradin O, Saiakhova A, Bartels CF, Balasubramanian D, Myeroff L, Lutterbaugh J, Jarrar A, Kalady MF, Willis J, Moore JH, Tesar PJ, Laframboise T, Markowitz S, Lupien M, Scacheri PC: Epigenomic enhancer profiling defines a signature of colon cancer. Science. 2012, 336: 736-739. 10.1126/science.1217277.View ArticlePubMedPubMed CentralGoogle Scholar
- Wasserman NF, Aneas I, Nobrega MA: An 8q24 gene desert variant associated with prostate cancer risk confers differential in vivo activity to a MYC enhancer. Genome Res. 2010, 20: 1191-1197. 10.1101/gr.105361.110.View ArticlePubMedPubMed CentralGoogle Scholar
- Zhang X, Cowper-Sal lari R, Bailey SD, Moore JH, Lupien M: Integrative functional genomics identifies an enhancer looping to the SOX9 gene disrupted by the 17q24.3 prostate cancer risk locus. Genome Res. 2012, 22: 1437-1446. 10.1101/gr.135665.111.View ArticlePubMedPubMed CentralGoogle Scholar
- Gerasimova A, Chavez L, Li B, Seumois G, Greenbaum J, Rao A, Vijayanand P, Peters B: Predicting cell types and genetic variations contributing to disease by combining GWAS and epigenetic data. PLoS One. 2013, 8: e54359-10.1371/journal.pone.0054359.View ArticlePubMedPubMed CentralGoogle Scholar
- Paul DS, Albers CA, Rendon A, Voss K, Stephens J, van der Harst P, Chambers JC, Soranzo N, Ouwehand WH, Deloukas P: Maps of open chromatin highlight cell type-restricted patterns of regulatory sequence variation at hematological trait loci. Genome Res. 2013, 23: 1130-1141. 10.1101/gr.155127.113.View ArticlePubMedPubMed CentralGoogle Scholar
- Schaub MA, Boyle AP, Kundaje A, Batzoglou S, Snyder M: Linking disease associations with regulatory information in the human genome. Genome Res. 2012, 22: 1748-1759. 10.1101/gr.136127.111.View ArticlePubMedPubMed CentralGoogle Scholar
- Worsley-Hunt R, Bernard V, Wasserman WW: Identification of cis-regulatory sequence variations in individual genome sequences. Genome Med. 2011, 3: 65-10.1186/gm281.View ArticlePubMedPubMed CentralGoogle Scholar
- Andersen MC, Engstrom PG, Lithwick S, Arenillas D, Eriksson P, Lenhard B, Wasserman WW, Odeberg J: In silico detection of sequence variations modifying transcriptional regulation. PLoS Comput Biol. 2008, 4: e5-10.1371/journal.pcbi.0040005.View ArticlePubMedPubMed CentralGoogle Scholar
- Gerstein MB, Kundaje A, Hariharan M, Landt SG, Yan KK, Cheng C, Mu XJ, Khurana E, Rozowsky J, Alexander R, Min R, Alves P, Abyzov A, Addleman N, Bhardwaj N, Boyle AP, Cayting P, Charos A, Chen DZ, Cheng Y, Clarke D, Eastman C, Euskirchen G, Frietze S, Fu Y, Gertz J, Grubert F, Harmanci A, Jain P, Kasowski M, et al: Architecture of the human regulatory network derived from ENCODE data. Nature. 2012, 489: 91-100. 10.1038/nature11245.View ArticlePubMedPubMed CentralGoogle Scholar
- Spivakov M, Akhtar J, Kheradpour P, Beal K, Girardot C, Koscielny G, Herrero J, Kellis M, Furlong EE, Birney E: Analysis of variation at transcription factor binding sites in Drosophila and humans. Genome Biol. 2012, 13: R49-10.1186/gb-2012-13-9-r49.View ArticlePubMedPubMed CentralGoogle Scholar
- Ward LD, Kellis M: HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res. 2012, 40: D930-D934. 10.1093/nar/gkr917.View ArticlePubMedGoogle Scholar
- Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski M, Karczewski KJ, Park J, Hitz BC, Weng S, Cherry JM, Snyder M: Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 2012, 22: 1790-1797. 10.1101/gr.137323.112.View ArticlePubMedPubMed CentralGoogle Scholar
- Rhie SK, Coetzee SG, Noushmehr H, Yan C, Kim JM, Haiman CA, Coetzee GA: Comprehensive functional annotation of seventy-one breast cancer risk Loci. PLoS One. 2013, 8: e63925-10.1371/journal.pone.0063925.View ArticlePubMedPubMed CentralGoogle Scholar
- Coetzee SG, Rhie SK, Berman BP, Coetzee GA, Noushmehr H: FunciSNP: an R/bioconductor tool integrating functional non-coding data sets with genetic association studies to identify candidate regulatory SNPs. Nucleic Acids Res. 2012, 40: e139-10.1093/nar/gks542.View ArticlePubMedPubMed CentralGoogle Scholar
- Barenboim M, Manke T: ChroMoS: an integrated web tool for SNP classification, prioritization and functional interpretation. Bioinformatics. 2013, 29: 2197-2198. 10.1093/bioinformatics/btt356.View ArticlePubMedPubMed CentralGoogle Scholar
- Landi MT, Chatterjee N, Yu K, Goldin LR, Goldstein AM, Rotunno M, Mirabello L, Jacobs K, Wheeler W, Yeager M, Bergen AW, Li Q, Consonni D, Pesatori AC, Wacholder S, Thun M, Diver R, Oken M, Virtamo J, Albanes D, Wang Z, Burdette L, Doheny KF, Pugh EW, Laurie C, Brennan P, Hung R, Gaborieau V, McKay JD, Lathrop M, et al: A genome-wide association study of lung cancer identifies a region of chromosome 5p15 associated with risk for adenocarcinoma. Am J Hum Genet. 2009, 85: 679-691. 10.1016/j.ajhg.2009.09.012.View ArticlePubMedPubMed CentralGoogle Scholar
- R: A language and environment for statistical computing. [http://www.R-project.org]
- Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004, 5: R80-10.1186/gb-2004-5-10-r80.View ArticlePubMedPubMed CentralGoogle Scholar
- A catalog of published genome-wide association studies. [http://www.genome.gov/gwastudies]
- Johnson AD, Handsaker RE, Pulit SL, Nizzari MM, O’Donnell CJ, de Bakker PI: SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinformatics. 2008, 24: 2938-2939. 10.1093/bioinformatics/btn564.View ArticlePubMedPubMed CentralGoogle Scholar
- Teng L, Firpi HA, Tan K: Enhancers in embryonic stem cells are enriched for transposable elements and genetic variations associated with cancers. Nucleic Acids Res. 2011, 39: 7371-7379. 10.1093/nar/gkr476.View ArticlePubMedPubMed CentralGoogle Scholar
- Storey JD, Tibshirani R: Statistical significance for genomewide studies. Proc Natl Acad Sci U S A. 2003, 100: 9440-9445. 10.1073/pnas.1530509100.View ArticlePubMedPubMed CentralGoogle Scholar
- Wasserman WW, Sandelin A: Applied bioinformatics for the identification of regulatory elements. Nat Rev Genet. 2004, 5: 276-287. 10.1038/nrg1315.View ArticlePubMedGoogle Scholar
- Lenhard B, Wasserman WW: TFBS: Computational framework for transcription factor binding site analysis. Bioinformatics. 2002, 18: 1135-1136. 10.1093/bioinformatics/18.8.1135.View ArticlePubMedGoogle Scholar
- Portales-Casamar E, Thongjuea S, Kwon AT, Arenillas D, Zhao X, Valen E, Yusuf D, Lenhard B, Wasserman WW, Sandelin A: JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Res. 2010, 38: D105-D110. 10.1093/nar/gkp950.View ArticlePubMedGoogle Scholar
- Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y, Hu M, Liu JS, Ren B: Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012, 485: 376-380. 10.1038/nature11082.View ArticlePubMedPubMed CentralGoogle Scholar
- Hinrichs AS, Karolchik D, Baertsch R, Barber GP, Bejerano G, Clawson H, Diekhans M, Furey TS, Harte RA, Hsu F, Hillman-Jackson J, Kuhn RM, Pedersen JS, Pohl A, Raney BJ, Rosenbloom KR, Siepel A, Smith KE, Sugnet CW, Sultan-Qurraie A, Thomas DJ, Trumbower H, Weber RJ, Weirauch M, Zweig AS, Haussler D, Kent WJ: The UCSC Genome Browser Database: update 2006. Nucleic Acids Res. 2006, 34: D590-598. 10.1093/nar/gkj144.View ArticlePubMedGoogle Scholar
- Servant N, Lajoie BR, Nora EP, Giorgetti L, Chen CJ, Heard E, Dekker J, Barillot E: HiTC: exploration of high-throughput ‘C’ experiments. Bioinformatics. 2012, 28: 2843-2844. 10.1093/bioinformatics/bts521.View ArticlePubMedPubMed CentralGoogle Scholar
- Heintzman ND, Hon GC, Hawkins RD, Kheradpour P, Stark A, Harp LF, Ye Z, Lee LK, Stuart RK, Ching CW, Ching KA, Antosiewicz-Bourget JE, Liu H, Zhang X, Green RD, Lobanenkov VV, Stewart R, Thomson JA, Crawford GE, Kellis M, Ren B: Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature. 2009, 459: 108-112. 10.1038/nature07829.View ArticlePubMedPubMed CentralGoogle Scholar
- Rowan S, Siggers T, Lachke SA, Yue Y, Bulyk ML, Maas RL: Precise temporal control of the eye regulatory gene Pax6 via enhancer-binding site affinity. Genes Dev. 2010, 24: 980-985. 10.1101/gad.1890410.View ArticlePubMedPubMed CentralGoogle Scholar
- Edwards SL, Beesley J, French JD, Dunning AM: Beyond GWASs: illuminating the dark road from association to function. Am J Hum Genet. 2013, 93: 779-797. 10.1016/j.ajhg.2013.10.012.View ArticlePubMedPubMed CentralGoogle Scholar
- Michailidou K, Hall P, Gonzalez-Neira A, Ghoussaini M, Dennis J, Milne RL, Schmidt MK, Chang-Claude J, Bojesen SE, Bolla MK, Wang Q, Dicks E, Lee A, Turnbull C, Rahman N, Fletcher O, Peto J, Gibson L, Dos Santos Silva I, Nevanlinna H, Muranen TA, Aittomaki K, Blomqvist C, Czene K, Irwanto A, Liu J, Waisfisz Q, Meijers-Heijboer H, Adank M, van der Luijt RB: Large-scale genotyping identifies 41 new loci associated with breast cancer risk. Nat Genet. 2013, 45: 353-361. 10.1038/ng.2563. 361e351-352View ArticlePubMedPubMed CentralGoogle Scholar
- Visel A, Blow MJ, Li Z, Zhang T, Akiyama JA, Holt A, Plajzer-Frick I, Shoukry M, Wright C, Chen F, Afzal V, Ren B, Rubin EM, Pennacchio LA: ChIP-seq accurately predicts tissue-specific activity of enhancers. Nature. 2009, 457: 854-858. 10.1038/nature07730.View ArticlePubMedPubMed CentralGoogle Scholar
- Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J, Boyd M, Chen Y, Zhao X, Schmidl C, Suzuki T, Ntini E, Arner E, Valen E, Li K, Schwarzfischer L, Glatz D, Raithel J, Lilje B, Rapin N, Bagger FO, Jorgensen M, Andersen PR, Bertin N, Rackham O, Burroughs AM, Baillie JK, Ishizu Y, Shimizu Y, Furuhata E, Maeda S, et al: An atlas of active enhancers across human cell types and tissues. Nature. 2014, 507: 455-461. 10.1038/nature12787.View ArticlePubMedGoogle Scholar
- Nora EP, Lajoie BR, Schulz EG, Giorgetti L, Okamoto I, Servant N, Piolot T, van Berkum NL, Meisig J, Sedat J, Gribnau J, Barillot E, Bluthgen N, Dekker J, Heard E: Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature. 2012, 485: 381-385. 10.1038/nature11049.View ArticlePubMedPubMed CentralGoogle Scholar
- Berlivet S, Paquette D, Dumouchel A, Langlais D, Dostie J, Kmita M: Clustering of tissue-specific Sub-TADs accompanies the regulation of HoxA genes in developing limbs. PLoS Genet. 2013, 9: e1004018-10.1371/journal.pgen.1004018.View ArticlePubMedPubMed CentralGoogle Scholar
- Ghoussaini M, Fletcher O, Michailidou K, Turnbull C, Schmidt MK, Dicks E, Dennis J, Wang Q, Humphreys MK, Luccarini C, Baynes C, Conroy D, Maranian M, Ahmed S, Driver K, Johnson N, Orr N, Dos Santos Silva I, Waisfisz Q, Meijers-Heijboer H, Uitterlinden AG, Rivadeneira F, Hall P, Czene K, Irwanto A, Liu J, Nevanlinna H, Aittomaki K, Blomqvist C, Meindl A, et al: Genome-wide association analysis identifies three new breast cancer susceptibility loci. Nat Genet. 2012, 44: 312-318. 10.1038/ng.1049.View ArticlePubMedPubMed CentralGoogle Scholar
- Park YY, Kim K, Kim SB, Hennessy BT, Kim SM, Park ES, Lim JY, Li J, Lu Y, Gonzalez-Angulo AM, Jeong W, Mills GB, Safe S, Lee JS: Reconstruction of nuclear receptor network reveals that NR2E3 is a novel upstream regulator of ESR1 in breast cancer. EMBO Mol Med. 2012, 4: 52-67. 10.1002/emmm.201100187.View ArticlePubMedPubMed CentralGoogle Scholar
- Zhang S, Chen L, Cui B, Chuang HY, Yu J, Wang-Rodriguez J, Tang L, Chen G, Basak GW, Kipps TJ: ROR1 is expressed in human breast cancer and associated with enhanced tumor-cell growth. PLoS One. 2012, 7: e31127-10.1371/journal.pone.0031127.View ArticlePubMedPubMed CentralGoogle Scholar
- Uhrmacher S, Schmidt C, Erdfelder F, Poll-Wolbeck SJ, Gehrke I, Hallek M, Kreuzer KA: Use of the receptor tyrosine kinase-like orphan receptor 1 (ROR1) as a diagnostic tool in chronic lymphocytic leukemia (CLL). Leuk Res. 2011, 35: 1360-1366. 10.1016/j.leukres.2011.04.006.View ArticlePubMedGoogle Scholar
- Zhi Q, Guo X, Guo L, Zhang R, Jiang J, Ji J, Zhang J, Zhang J, Chen X, Cai Q, Li J, Liu B, Zhu Z, Yu Y: Oncogenic miR-544 is an important molecular target in gastric cancer. Anticancer Agents Med Chem. 2013, 13: 270-275. 10.2174/1871520611313020013.View ArticlePubMedGoogle Scholar
- Thayanithy V, Sarver AL, Kartha RV, Li L, Angstadt AY, Breen M, Steer CJ, Modiano JF, Subramanian S: Perturbation of 14q32 miRNAs-cMYC gene network in osteosarcoma. Bone. 2012, 50: 171-181. 10.1016/j.bone.2011.10.012.View ArticlePubMedGoogle Scholar
- Luo Z, Zhang L, Li Z, Li X, Li G, Yu H, Jiang C, Dai Y, Guo X, Xiang J, Li G: An in silico analysis of dynamic changes in microRNA expression profiles in stepwise development of nasopharyngeal carcinoma. BMC Med Genomics. 2012, 5: 3-10.1186/1755-8794-5-3.View ArticlePubMedPubMed CentralGoogle Scholar
- Nesbit CE, Tersak JM, Prochownik EV: MYC oncogenes and human neoplastic disease. Oncogene. 1999, 18: 3004-3016. 10.1038/sj.onc.1202746.View ArticlePubMedGoogle Scholar
- Eisenman RN: Deconstructing myc. Genes Dev. 2001, 15: 2023-2030. 10.1101/gad928101.View ArticlePubMedGoogle Scholar
- Li L, Davie JR: The role of Sp1 and Sp3 in normal and cancer cell biology. Ann Anat. 2010, 192: 275-283. 10.1016/j.aanat.2010.07.010.View ArticlePubMedGoogle Scholar
- Mathelier A, Zhao X, Zhang AW, Parcy F, Worsley-Hunt R, Arenillas DJ, Buchman S, Chen CY, Chou A, Ienasescu H, Lim J, Shyr C, Tan G, Zhou M, Lenhard B, Sandelin A, Wasserman WW: JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res. 2014, 42: D142-147. 10.1093/nar/gkt997.View ArticlePubMedGoogle Scholar
- Kulakovskiy IV, Medvedeva YA, Schaefer U, Kasianov AS, Vorontsov IE, Bajic VB, Makeev VJ: HOCOMOCO: a comprehensive collection of human transcription factor binding sites models. Nucleic Acids Res. 2013, 41: D195-D202. 10.1093/nar/gks1089.View ArticlePubMedGoogle Scholar
- Hoffman MM, Buske OJ, Wang J, Weng Z, Bilmes JA, Noble WS: Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat Methods. 2012, 9: 473-476. 10.1038/nmeth.1937.View ArticlePubMedPubMed CentralGoogle Scholar
- Ernst J, Kellis M: ChromHMM: automating chromatin-state discovery and characterization. Nat Methods. 2012, 9: 215-216. 10.1038/nmeth.1906.View ArticlePubMedPubMed CentralGoogle Scholar
- Benos PV, Bulyk ML, Stormo GD: Additivity in protein-DNA interactions: how good an approximation is it?. Nucleic Acids Res. 2002, 30: 4442-4451. 10.1093/nar/gkf578.View ArticlePubMedPubMed CentralGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1755-8794/7/34/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.