Skip to main content
  • Research article
  • Open access
  • Published:

Whole-genome bisulfite sequencing in systemic sclerosis provides novel targets to understand disease pathogenesis



Systemic sclerosis (SSc) is a rare autoimmune connective tissue disease whose pathogenesis remains incompletely understood. Increasing evidence suggests that both genetic susceptibilities and changes in DNA methylation influence pivotal biological pathways and thereby contribute to the disease. The role of DNA methylation in SSc has not been fully elucidated, because existing investigations of DNA methylation predominantly focused on nucleotide CpGs within restricted genic regions, and were performed on samples containing mixed cell types.


We performed whole-genome bisulfite sequencing on purified CD4+ T lymphocytes from nine SSc patients and nine controls in a pilot study, and then profiled genome-wide cytosine methylation as well as genetic variations. We adopted robust statistical methods to identify differentially methylated genomic regions (DMRs). We then examined pathway enrichment associated with genes located in these DMRs. We also tested whether changes in CpG methylation were associated with adjacent genetic variation.


We profiled DNA methylation at more than three million CpG dinucleotides genome-wide. We identified 599 DMRs associated with 340 genes, among which 54 genes exhibited further associations with adjacent genetic variation. We also found these genes were associated with pathways and functions that are known to be abnormal in SSc, including Wnt/β-catenin signaling pathway, skin lesion formation and progression, and angiogenesis.


The CD4+ T cell DNA cytosine methylation landscape in SSc involves crucial genes in disease pathogenesis. Some of the methylation patterns are also associated with genetic variation. These findings provide essential foundations for future studies of epigenetic regulation and genome-epigenome interaction in SSc.

Peer Review reports


Systemic sclerosis (SSc) is a chronic autoimmune disease characterized by skin and visceral organ fibrosis [1, 2]. It is associated with high morbidity and mortality [3,4,5]. Aside from autologous hematopoietic stem cell transplant, which is associated with serious treatment related toxicities and is indicated for a small proportion of patients, no therapies have been shown to meaningfully modify disease progression [6]. The pathogenesis of SSc remains to be fully elucidated and this is key to identifying novel therapeutic targets.

Increasing evidence suggests that genetic risk factors are strongly associated with SSc. Though the human leukocyte antigen (HLA) class II region is traditionally considered the most genetically associated with SSc [7,8,9], non-HLA loci and corresponding genes have also been identified [10, 11]. However, these SSc-susceptibility loci are estimated to account for only a small proportion of the disease risk [11].

In recent years, epigenetic dysregulation, including DNA methylation, histone modifications and microRNA expression, have been associated with SSc pathogenesis [12]. In particular, several studies have demonstrated that aberrant DNA methylation patterns at CpG dinucleotides are associated with altered expression levels of key genes related to SSc [12,13,14]. These findings complement the known genetic associations and provide tantalizing additional clues to the etiology of SSc.

However, the majority of epigenetic studies in SSc to date have three important shortcomings. First, most studies of DNA methylation used assays that measure DNA methylation in restricted genic regions. Since these regions constitute a small proportion of the genome, changes in DNA methylation patterns are yet to be discovered in the remaining regions, both genic and intergenic. Second, though studies of other diseases have unraveled the non-negligible role of CHG/CHH methylation (where “H” implies any nucleotide other than G), particularly in cancer [15, 16], no studies have investigated CHG/CHH methylation in SSc. Third, most studies sequenced peripheral blood mononuclear cells (PBMC) without accounting for the different cell types in the mixture. Given the differences in DNA methylation across different cell types [17, 18], studies based on unsorted PBMC samples are prone to confounding. Though different immune responses convolute in the multi-levelled abnormalities in SSc, T lymphocytes play a disproportionate role in SSc [19].

We undertook this pilot study to investigate genome-wide methylation patterns in CD4+ T cells using whole-genome bisulfite sequencing (WGBS) to profile the methylation status of CpG, CHG and CHH cytosines, and to assess the potential of this platform to identify changes in methylation. We successfully identified T cell-specific aberrant methylation patterns in SSc. We then comprehensively explored the regulatory impacts of these epigenetic alterations on biological functions and diseases. We also inferred potential single nucleotide polymorphisms (SNP)-CpG interactions that implicated underlying genetic control of methylation status. Finally, we provide an application enabling visualization of regions and summaries of our results across the genome.


Study subjects and ethical considerations

Nine SSc patients and nine control subjects gave informed consent and were recruited from an ongoing SSc research cohort based at McGill University, Montreal, Canada. Of the nine SSc patients, none were on immunosuppressive drugs at the time of sampling (three were previously on methotrexate and mycophenolate but those medications had been discontinued for > 1 year).

Cell purification and whole-genome bisulfite sequencing (WGBS)

Forty milliliters of blood were obtained from each study subject and processed fresh within 4 h of being drawn. CD4+ T cells were positively selected [anti-CD4 microbeads (Miltenyi Biotec) and auto-MACS] and their purity assessed with flow cytometry. Only samples with a purity > 95% were used for genomic DNA extraction and sequencing. The samples were processed using the in-house DNA isolation and Illumina HiSeq 4000 PE 100 WGBS workflows at the McGill University and Genome Quebec Innovation Centre. Quality control of the genetic materials was performed using fluorescence assay quantification, agarose gel electrophoresis and NanoDrop nucleic acid quantification to ensure sufficient quantity, quality and purity (Additional file 18:).

Data processing and filtering

The WGBS data were aligned to the human genome GRCh37 (hg19) using the NovoAlign™ pipeline ( For each cytosine retained for further analysis, coverage by both strands in the paired-end sequencing library was required. To ensure accuracy in estimation of methylation level, valid cytosines with good read depth were extracted for CpG, CHG, and CHH motifs respectively. For each valid CpG dinucleotide, the estimated methylation level was obtained after merging methylated and unmethylated read counts for the forward and reverse cytosines. Read depth was required to be deeper than 3 at both C/G or C/H sites and the between-site difference in empirical methylation β values was required to be less than 0.2. For each valid CHG/CHH, minimum read depth required for further analysis was 6. Genome-wide SNPs were identified from the same dataset using the Bis-SNP pipeline [20].

Identification of differentially methylated regions

We used bumphunter version 3.3 [21] to identify DMRs in five sets of comparisons:

(i) SSc cases (N = 9) versus female controls (N = 4);

(ii) SSc cases (N = 9) versus all controls (N = 9);

(iii) Diffuse SSc cases (N = 6) versus female controls (N = 4);

(iv) Limited SSc cases (N = 3) versus female controls (N = 4);

(v) Diffuse SSc cases (N = 6) versus limited SSc cases (N = 3).

In all sets of comparisons, we adjusted for the additive effect of age within bumphunter. Moreover, in (ii), we additionally adjusted for the effect of gender.

To maintain statistical power, we restricted the analysis to regions with consistently high coverage across samples and imposed a minimum coverage rate on a per-CpG dinucleotide or per-CHG/CHH site basis. We imposed a minimum sample-level coverage rate for each site analyzed; minimum sample sizes for the analyses presented in (i)-(v) were 6/9, 4/6, 3/4 and 3/3, respectively. Cytosines were then clustered into regions with a maximum 200 bp gap between two cytosines in the same region. The total number of tests done for each of (i)-(v) was 392,810, 383,235, and 380,756 for CpG, CHG and CHH, respectively.

Regions with an adjusted p value (q-value) < 0.05 and an average methylation level difference > 0.2 reported by bumphunter were considered to be DMRs. A Bonferroni corrected p-value threshold of significance for five genome-wide analyses each comprising more than 106 comparisons would require a significance threshold below 2 × 10−7, which would be very unlikely to achieve here given the sample size. For example, for a simple t-test we would require a standardized difference of 9.2 to obtain 90% power at this significance level; that is, the mean difference in methylation between SSc and controls would have to be nine times larger than the standard deviation. Thus, our results should be considered as preliminary and therefore we have placed most emphasis on the results of analysis (i) as they included all the female patients and avoided the potential confounding effect of sex.

Multiple testing

We used the false discovery rate estimates from bumphunter to select DMRs, and added a filter requiring that the difference in methylation be at least 0.2. We also performed a permutation test of the primary analysis comparison between SSc cases (N = 9) versus female controls (N = 4). We randomly relabelled samples as SSc cases or controls, and then repeated the genome-wide identification of DMRs using bumphunter. We repeated the permutation and genome-wide analysis 40 times, and then counted in how many permutations a previously identified DMR was still identified as a DMR with an identical or smaller p value. We also compared the number of identified DMRs between the original data and the permutations.

Annotation of DMR and functional analysis

Genomic context of each DMR was annotated by annotatr [22] based on the most recent annotations of human genome downloaded from the UCSC genome browser ( Accessed 5 March 2019). All genes overlapping with DMR were regarded as DMGs. We performed functional analysis using Ingenuity Pathway Analysis [23] to investigate potential biological impacts through epigenetic alterations in these DMGs. Adjusted p values and averaged methylation level difference of DMRs were used to indicate the degree of discrepancy. For single-DMR genes, the averaged difference of the corresponding DMR represents the gene-level difference. For genes associated with more than one DMR, we calculated the average of the averaged difference of each DMR to represent overall methylation level difference. Genes with both hypermethylated and hypomethylated DMRs may therefore have had the differences neutralized. Identification of DMR and functional analysis were conducted on CpG, CHG and CHH separately.

Detection of SNP-CpG associations

We explored short-range SNP-CpG associations around a selected subset of CpG-based DMR identified in the comparison between SSc cases (N = 9) versus female controls (N = 4) that could be deemed consequential for SSc. Within a window of ±5 kb around each CpG-based DMR (inclusive), we extracted all SNPs as candidates for cis-interaction with the methylation pattern of the DMR. Simultaneously, within each CpG-based DMR, we first regressed out the effect of age and obtained residual methylation level on each CpG dinucleotide. We then used a multivariate method (PCEV [24]) to test associations between residual methylation levels and the binary disease status. We followed this by examining the PCEV-derived variable importance measures to identify the dinucleotide most strongly associated with disease status in the DMR. We extracted the residual methylation level on this specific CpG dinucleotide as a representative of the DMR and performed linear regression with all candidate SNPs of the corresponding DMR. In total, linear associations between 599 DMR and 36,838 candidate SNPs were tested. Unadjusted p values were reported with adjusted R2 for each of the 36,838 SNP-CpG pairs. To adjust for multiple testing on this analysis would require appropriate adjustments for linkage disequilibrium and were not undertaken here.


Clinical characteristics

We recruited nine SSc patients and nine controls. The characteristics of the cases and controls are presented in Table 1. All cases and 4/9 controls were female. SSc disease duration was 10.4 ± 7.0 years, 6 SSc patients had diffuse and 3 had limited cutaneous skin involvement. None of the SSc patients were on immunosuppressive treatment at the time of sampling.

Table 1 Clinical characteristics

Widespread DNA methylation differences

Our WGBS captured methylation levels at 3,690,885 CpG dinucleotides with high coverage (Additional file 17: Figure S1), which largely exceeds studies using targeted sequencing technologies or BeadChip arrays. We were also able to extensively profile methylation patterns at 8,047,371 CHG and 17,331,920 CHH sites.

Our first (and primary) analyses compared methylation patterns between SSc patients and female controls while adjusting for the effect of age. We identified 599 regions genome-wide that exhibited differential CpG methylation under our criteria of a mean methylation β value difference greater than 0.2 and a bumphunter adjusted p-value < 0.05. These differentially methylated regions (DMRs) exhibited high specificity as none of them was identified as DMR in more than six out of 40 permutation tests, i.e. all DMRs had an empirical p value ≤0.15 (Additional file 17: Fig. S2). Nevertheless, given the small sample size of this pilot study, results must be interpreted cautiously. Additional file 17: Figure S3 shows that the number of DMRs identified on each chromosome varied across permutations, and that the number identified in the primary analysis tended to be higher than most of the permutations.

These 599 DMR regions overlapped with 340 genes [differentially methylated genes (DMGs)], among which 169 showed hypermethylation and 163 showed hypomethylation in SSc (Additional file 1: Table S1). In addition, eight genes were identified as containing both hypermethylated and hypomethylated DMRs. Likewise, we also identified 79 CHG-based DMRs annotated to 39 genes (19 hypermethylated and 20 hypomethylated in SSc; Additional file 2: Table S2), as well as 129 CHH-based DMR annotated to 69 genes (28 hypermethylated and 41 hypomethylated in SSc; Additional file 3: Table S3). These three groups of CpG-, CHG- and CHH-based DMGs barely overlapped with each other (Additional file 17: Figure S4). We further inspected the genomic distribution of DMR and found that they were predominantly in intergenic and intronic regions, and relatively sparse in coding and regulatory regions (Fig. 1).

Fig. 1
figure 1

Genomic distribution of DMRs. DMRs based on (a) CpG, (b) CHG and (c) CHH were annotated to the up-to-date human genome separately. The heights of the bars in the graph indicate numbers of DMRs identified with a given genomic annotation configuration. The x-axis displays different annotation configurations. The dots under the bar graph together with the adjacent horizontal histogram display the configurations and their frequency. FiveUTR: 5′ UTR; ThreeUTR: 3′ UTR; Promoter: < 1 kb upstream of the transcription start site (TSS); Upstream_1to5kb: 1-5 kb upstream of the TSS; Intergenic_Region: > 5 kb upstream of the TSS. For example, in (a), the rightmost element in the display shows that 1 DMR was identified spanning at least one of each of these elements: 1-5 kb upstream region, exon, intron-exon boundary, and intron

In contrast to most published work on WGBS, we were also able to examine the methylation on the X chromosome since this comparison involved only female participants. We identified 12 DMGs on the X chromosome, each of which contained one DMR (Table 2).

Table 2 DMG on X chromosome

In supplementary tables (Additional file 4: Tables S4, Additional file 5: Tables S5, Additional file 6: Tables S6), we also report the DMG identified using the analysis that included the male controls while adjusting for gender. Of note, including the male subjects gave slightly different DMG results from those presented using only females; however, the enriched pathways led to generally consistent interpretations (data not shown). We also investigated the difference between SSc subtypes. The supplement also provides the estimated DMG for diffuse SSc cases vs. controls, limited SSc cases vs. controls, and diffuse SSc cases vs. limited SSc cases (Additional file 7: Tables Tables S7, Additional file 8: Tables S8, Additional file 9: Tables S9, Additional file 10: Tables S10, Additional file 11: Tables S11, Additional file 12: Tables S12, Additional file 13: Tables S13, Additional file 14: Tables S14 and Additional file 15: Tables S15). Here, we observed that differential methylation might contribute to diffuse SSc and limited SSc in different ways since the DMGs identified in these three comparisons showed little overlap (Additional file 17: Fig. S5).

Functional analysis

By functional enrichment analysis, we found that the 340 CpG-based DMGs were significantly associated with various signaling pathways, including HIPPO, Wnt/ β-catenin, RhoGDI, Netrin and Ephrin Receptor signaling (Fig. 2a). We also found CpG-based DMGs were associated with a wide variety of biological functions and multi-system diseases, including connective tissue disorders (Table 3). “Skin lesions” was significantly enriched in “diseases and biological functions” (p = 1.2 × 10−13). To illustrate the richness of this preliminary data, we highlighted the 10 most significantly differentially methylated genes associated with “skin lesions” in Fig. 2b.

Fig. 2
figure 2

Biological impacts of differential methylation. (a) Significantly enriched canonical pathways (p < 0.05) based on CpG-DMRs. Pathways are sorted by p values. Percentage (indicated by bars) represents the proportion of significantly hypermethylated (red) and hypomethylated (green) genes among all genes in the corresponding pathway. Log-transformed p values are denoted by the orange line. (b) Top 10 differentially methylated genes with largest averaged difference in methylation levels in CpG-DMRs related to skin lesions. Five most hypermethylated genes (red) and five most hypomethylated genes (green) in SSc are illustrated

Table 3 Top five networks influenced by CpG differential methylation

The top five diseases and biological functions influenced by CHG differential methylation included projection of axons, similar to CpG-based DMGs, and formation of tight junctions (Additional file 17: Fig. S6). CHH-based DMGs were associated with cancer, as well the skeletal system and connective tissue disorders (Table 4).

Table 4 Top five diseases influenced by CHH differential methylation

Exploratory analysis of SNP-CpG associations

We explored potential associations between SNPs and CpG-based DMR. Though we noticed a point of inflexion around a p value of 0.1 in the QQ-plot (Additional file 17: Fig. S7), we were aware that the small sample size and strong linkage disequilibrium could result in inflation of significant p values. Thus, we imposed a more stringent threshold requiring significant SNP-CpG pair to show a p value less than 5  10−5 and an adjusted R2 higher than 0.7. We identified 238 significant SNP-CpG associations, where the involved CpG-based DMRs were distributed across 54 genes (Additional file 16: Table S16). To illustrate typical associations arising from this analysis, Fig. 3 shows methylation, genotype and phenotype at two multi-CpG DMRs in genic regions, namely FBN3 and CDCA8. These analyses can generate hypotheses for how genetic, epigenetic and phenotype data may interact. However, our sample size was not sufficiently large to formally test 3-way interactions.

Fig. 3
figure 3

Illustration of SNP-CpG associations. (a) Methylation pattern of a nine-CpG DMR in FBN3. One SSc case was removed due to missing data. Methylation patterns exhibited a three-level stratification in this DMR where SSc cases were more prone to be hypomethylated. (b) Association between a C-to-A/G SNP at chr19:8138054 and significant loss of methylation in the DMR displayed in (a). (c) Methylation pattern of a three-CpG DMR in CDCA8. Methylation patterns exhibited a two-level stratification where SSc cases were more prone to be hypomethylated in this DMR. (d) Association between a G-to-A SNP at chr1:38156902 and significant increase in methylation in the DMR displayed in (c). Two SSc cases and one control with top three highest levels of methylation carried the A allele

Viewer of DMR results

Results at each DMR have been deposited online in Supplementary Tables, and we also provide a script (available at to facilitate viewing of CpG methylation patterns.


In this pilot study, we identified DMRs and DMGs in each cytosine context (CpG, CHG and CHH) in SSc. We characterized the widespread functional impacts of differential methylation and potential genetic controls through SNP-CpG interactions.

Our findings regarding CpG-based differential methylation highlighted the role of signaling pathways implicated in the pathogenesis of SSc, including HIPPO [25], Wnt/ β-catenin [26,27,28], RhoGDI [29], Netrin [30] and Ephrin Receptor signaling [31]. Our findings were also consistent with the expected direction of effect. For example, hypomethylation of COL23A1, which codes for collagen XXIII and is expressed across different tissues including the skin and lungs [32], may be relevant to the excessive accumulation of collagen in this disease. COL23A1 has been previously reported to be hypomethylated in dermal fibroblasts of patients with SSc [33]. On the other hand, contrary to previous reports of over-expression of WNT9A in an animal model of SSc [34], we found that it was hypermethylated. Other genes illustrated in Fig. 2b could provide important novel targets of interest.

Abnormal expression of transforming growth factor beta (TGF-β) in SSc fibroblasts is central to disease pathogenesis [35, 36]. In our study of CD4+ T cells, we did not find significant changes in DNA methylation or in pathway analysis of TGF-β genes. However, we identified significant hypomethylation of SMAD3, which is a key signal transducer in the TGF-β signaling pathway and is responsible for maintaining CD4+ T cell homeostasis, particularly by inhibiting T cell receptor-induced naïve CD4+ T cell proliferation [37]. We posit this suggests linked yet different roles of TGF-beta dysregulation in SSc-CD4+ T cells and fibroblasts.

Several type I interferon (IFN) signaling pathway-associated genes were previously found to be hypomethylated in CD4+ and CD8+ T cells in SSc patients [38]. Although IFN-related pathways were not enriched in our pathway analyses, possibly due to the abundance of target genes in other pathways, our results still confirmed the importance of IFN signaling in the pathogenesis of SSc. For instance, we found that MX1 was hypomethylated in diffuse SSc patients (Additional file 7: Table S7) compared to healthy controls and that PARP11 was hypomethylated in limited SSc patients (Additional file 8: Table S8). Since hypomethylation of some of these IFN-related genes has been verified to have a strong impact on gene expression [38], it may be promising to further develop efficient biomarkers associated with this pathway.

In previous studies of SSc, the X chromosome has been mostly overlooked. Since our study consisted of mostly female subjects, we had an opportunity to investigate DMG on the X chromosome. Interestingly, we identified three significant DMG that were closely related to epigenetic modification, namely FTSJ1, coding for 2′-O-methyltransferase [39], FIRRE, a long non-coding RNA shown to be associated with histone H3 lysine 27 trimethylation [40], and MECP2, coding for methyl CpG binding protein 2 and regulating gene expression by modifying chromatin [41]. Interestingly, MECP2 has been reported to be involved in SSc skin fibrosis [42]. Aberrant methylation patterns in these epigenetic regulators suggests that the epigenetic regulatory mechanism in SSc is more complex and hierarchical than previously appreciated. Our findings regarding DNA methylation on the X chromosome were different from those of an earlier study [43]. However, in that study, methylation levels were assessed using peripheral blood mononuclear cells. We posit that our study reduced the confounding effect arising from cell mixtures.

Apart from CpG-based differential methylation, our study identified novel insights into the contribution of CHG and CHH methylation in SSc. For example, CCR3 has been previously reported to be increased in SSc monocytes [44]. We identified other genes with CHH-based DMR that could contribute to fibrosis and angiogenesis, including ADAMTS2 [45] and DDR2 [46,47,48]. All these findings suggest the role of non-CpG methylation that is worthy of further study.

Genome-epigenome interaction is a crucial component of regulation of gene expression and its importance is best established in the pathogenesis of cancer [49, 50]. A previous study reported that the SNP rs17435 linked to MECP2 (mentioned above) was related to the clinical outcome of SSc [51]. This suggests that genome-epigenome interaction could be a key to understand aberrant gene regulation in SSc. Our study pinpointed 238 potential short-range SNP-CpG pairs where the methylation levels were strongly associated with the genotype. As illustrated in Fig. 3, differential methylation in FBN3, a gene that codes for extracellular matrix macromolecules responsible for architectural functions in connective tissues [52, 53], was associated with a SNP at chr19:8138054 (rs7257948). Similarly, differential methylation in the cell division cycle associated 8 (CDCA8) gene, which plays an important role in mitosis [54] and has been implicated in SSc [55], was associated with a SNP at chr1:38156902 (rs3762352). We suggest that differential methylation in these genes may be controlled by genetic mutations and may serve as a mediator towards modulation of gene expression [56,57,58]. It is beyond the scope of this study to investigate these functional effects, although these could be the subject of future research.

We acknowledge that the small sample size and possible confounding due to differences in the characteristics of the cases and controls (Table 1) are important limitations of our study. For instance, in this study, we did not identify significant differential methylation in the HLA genes, where known genetic variants are associated with the pathogenesis of SSc [59]. Regardless, this result could not refute that differential epigenetic modifications might be dependent on specific haplotypes, since the small sample size is not able to support statistical tests for differences at such a calibrated level. However, this study was planned and executed as a pilot to explore the potential of WGBS for assessing genome-wide methylation in SSc. Since adjusting for confounders could have led to overfitting of our statistical models, we focused our primary analyses on the SSc cases (who were all female) and the female controls. Additional analyses including male controls, while adjusting for the fixed effect of sex (Additional file 4: Tables S4, Additional file 5: Table S5 and Additional file 6: Table S6) and comparisons between different SSc subtypes (Additional file 7: Tables S7, Additional file 8: Tables S8, Additional file 9: Tables S9, Additional file 10: Tables S10, Additional file 11: Tables S11, Additional file 12: Tables S12, Additional file 13: Tables S13, Additional file 14: Tables S14 and Additional file 15: Tables S15) are provided in the supplement. We have deliberately refrained from making further interpretations of these additional comparisons to minimize overinterpretation of this analysis. Nonetheless, our study forms a foundation for future studies with larger cohorts. Our results and the scripts we have provided for browsing them can be used to confirm previous or future findings, or to explore how other risk factors interact with epigenetic modifications to promote pathogenesis in different subtypes of SSc.


By profiling genome-wide DNA cytosine methylation landscape in SSc CD4+ T lymphocytes, we found widespread differential methylation involving genes relevant to disease pathogenesis. Some of the abnormal DNA methylation patterns in SSc patients are also associated with neighboring genetic variation. These findings can provide a good source for identifying novel targets of interest in SSc, developing well-profiled epigenetic biomarkers that may supplement current diagnostic and prognostic tests, as well as profoundly investigating genome-epigenome interaction in SSc pathogenesis and progression.

Availability of data and materials

The datasets generated and analysed during the current study are available in GitHub repository



Differentially methylated gene


Differentially methylated region


Human leukocyte antigen


peripheral blood mononuclear cell


Single nucleotide polymorphism


Systemic sclerosis


Transforming growth factor beta


Whole-genome bisulfite sequencing


  1. Jacobson DL, et al. Epidemiology and estimated population burden of selected autoimmune diseases in the United States. Clin Immunol Immunopathol. 1997;84(3):223–43.

    Article  CAS  PubMed  Google Scholar 

  2. Cooper GS, Stroehla BC. The epidemiology of autoimmune diseases. Autoimmun Rev. 2003;2(3):119–25.

    Article  PubMed  Google Scholar 

  3. Al-Dhaher FF, Pope JE, Ouimet JM. Determinants of morbidity and mortality of systemic sclerosis in Canada. Semin Arthritis Rheum. 2010;39(4):269–77.

    Article  PubMed  Google Scholar 

  4. Silman AJ. Scleroderma and survival. Ann Rheum Dis. 1991;50(4):267–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Ferdowsi N, et al. Development and validation of the scleroderma clinical trials consortium damage index (SCTC-DI): a novel instrument to quantify organ damage in systemic sclerosis. Ann Rheum Dis. 2019.

  6. Eckes B, et al. Molecular and cellular basis of scleroderma. J Mol Med (Berl). 2014;92(9):913–24.

    Article  CAS  Google Scholar 

  7. Gilchrist FC, et al. Class II HLA associations with autoantibodies in scleroderma: a highly significant role for HLA-DP. Genes Immun. 2001;2(2):76–81.

    Article  CAS  PubMed  Google Scholar 

  8. Arnett FC. HLA and autoimmunity in scleroderma (systemic sclerosis). Int Rev Immunol. 1995;12(2–4):107–28.

    Article  CAS  PubMed  Google Scholar 

  9. Gladman DD, et al. HLA markers for susceptibility and expression in scleroderma. J Rheumatol. 2005;32(8):1481–7.

    CAS  PubMed  Google Scholar 

  10. Assassi S, et al. Genetics of scleroderma: implications for personalized medicine? BMC Med. 2013;11:9.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Mayes MD. The genetics of scleroderma: looking into the postgenomic era. Curr Opin Rheumatol. 2012;24(6):677–84.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Altorok N, et al. Epigenetics, the holy grail in the pathogenesis of systemic sclerosis. Rheumatology (Oxford). 2015;54(10):1759–70.

    Article  CAS  Google Scholar 

  13. Makino T, Jinnin M. Genetic and epigenetic abnormalities in systemic sclerosis. J Dermatol. 2016;43(1):10–8.

    Article  CAS  PubMed  Google Scholar 

  14. Mau T, Yung R. Potential of epigenetic therapies in non-cancerous conditions. Front Genet. 2014;5:438.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  15. Pulverer W, et al. The stem cell signature of CHH/CHG methylation is not present in 271 cancer associated 5'UTR gene regions. Biochimie. 2012;94(11):2345–52.

    Article  CAS  PubMed  Google Scholar 

  16. Tirado-Magallanes R, et al. Whole genome DNA methylation: beyond genes silencing. Oncotarget. 2017;8(3):5629–37.

    Article  PubMed  Google Scholar 

  17. Jaffe AE, Irizarry RA. Accounting for cellular heterogeneity is critical in epigenome-wide association studies. Genome Biol. 2014;15(2):R31.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Reinius LE, et al. Differential DNA methylation in purified human blood cells: implications for cell lineage and studies on disease susceptibility. PLoS One. 2012;7(7):e41361.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. O'Reilly S, Hugle T, van Laar JM. T cells in systemic sclerosis: a reappraisal. Rheumatology (Oxford). 2012;51(9):1540–9.

    Article  CAS  Google Scholar 

  20. Liu Y, et al. Bis-SNP: combined DNA methylation and SNP calling for bisulfite-seq data. Genome Biol. 2012;13(7):R61.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Jaffe AE, et al. Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies. Int J Epidemiol. 2012;41(1):200–9.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Cavalcante RG, Sartor MA. annotatr: genomic regions in context. Bioinformatics. 2017;33(15):2381–3.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Kramer A, et al. Causal analysis approaches in ingenuity pathway analysis. Bioinformatics. 2014;30(4):523–30.

    Article  PubMed  CAS  Google Scholar 

  24. Turgeon M, et al. Principal component of explained variance: an efficient and optimal data dimension reduction framework for association studies. Stat Methods Med Res. 2018;27(5):1331–50.

    Article  PubMed  Google Scholar 

  25. Toyama T, et al. Therapeutic targeting of TAZ and YAP by dimethyl Fumarate in systemic sclerosis fibrosis. J Invest Dermatol. 2018;138(1):78–88.

    Article  CAS  PubMed  Google Scholar 

  26. Bergmann C, Distler JH. Canonical Wnt signaling in systemic sclerosis. Lab Investig. 2016;96(2):151–5.

    Article  CAS  PubMed  Google Scholar 

  27. Wei J, et al. Canonical Wnt signaling induces skin fibrosis and subcutaneous lipoatrophy: a novel mouse model for scleroderma? Arthritis Rheum. 2011;63(6):1707–17.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Wei J, et al. Wnt/beta-catenin signaling is hyperactivated in systemic sclerosis and induces Smad-dependent fibrotic responses in mesenchymal cells. Arthritis Rheum. 2012;64(8):2734–45.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Cao L, Lafyatis R, Burkly LC. Increased dermal collagen bundle alignment in systemic sclerosis is associated with a cell migration signature and role of Arhgdib in directed fibroblast migration on aligned ECMs. PLoS One. 2017;12(6):e0180751.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  30. Sun H, et al. Netrin-1 regulates Fibrocyte accumulation in the Decellularized fibrotic Sclerodermatous lung microenvironment and in Bleomycin-induced pulmonary fibrosis. Arthritis Rheumatol. 2016;68(5):1251–61.

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Lagares D, et al. ADAM10-mediated ephrin-B2 shedding promotes myofibroblast activation and organ fibrosis. Nat Med. 2017;23(12):1405–15.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Koch M, et al. Expression of type XXIII collagen mRNA and protein. J Biol Chem. 2006;281(30):21546–57.

    Article  CAS  PubMed  Google Scholar 

  33. Altorok N, et al. Genome-wide DNA methylation analysis in dermal fibroblasts from patients with diffuse and limited systemic sclerosis reveals common and subset-specific DNA methylation aberrancies. Ann Rheum Dis. 2015;74(8):1612–20.

    Article  CAS  PubMed  Google Scholar 

  34. Bayle J, et al. Increased expression of Wnt2 and SFRP4 in tsk mouse skin: role of Wnt signaling in altered dermal fibrillin deposition and systemic sclerosis. J Invest Dermatol. 2008;128(4):871–81.

    Article  CAS  PubMed  Google Scholar 

  35. Varga J, Pasche B. Transforming growth factor beta as a therapeutic target in systemic sclerosis. Nat Rev Rheumatol. 2009;5(4):200–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Varga J, Whitfield ML. Transforming growth factor-beta in systemic sclerosis (scleroderma). Front Biosci (Schol Ed). 2009;1:226–35.

    Article  Google Scholar 

  37. McKarns SC, Schwartz RH. Distinct effects of TGF-beta 1 on CD4+ and CD8+ T cell survival, division, and IL-2 production: a role for T cell intrinsic Smad3. J Immunol. 2005;174(4):2071–83.

    Article  CAS  PubMed  Google Scholar 

  38. Ding W, et al. Genome-wide DNA methylation analysis in systemic sclerosis reveals Hypomethylation of IFN-associated genes in CD4(+) and CD8(+) T cells. J Invest Dermatol. 2018;138(5):1069–77.

    Article  CAS  PubMed  Google Scholar 

  39. Freude K, et al. Mutations in the FTSJ1 gene coding for a novel S-adenosylmethionine-binding protein cause nonsyndromic X-linked mental retardation. Am J Hum Genet. 2004;75(2):305–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Yang F, et al. The lncRNA Firre anchors the inactive X chromosome to the nucleolus by binding CTCF and maintains H3K27me3 methylation. Genome Biol. 2015;16:52.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  41. Chahrour M, et al. MeCP2, a key contributor to neurological disease, activates and represses transcription. Science. 2008;320(5880):1224–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. He Y, et al. Methyl-CpG-binding protein 2 mediates antifibrotic effects in scleroderma fibroblasts. Ann Rheum Dis. 2018;77(8):1208–18.

    PubMed  Google Scholar 

  43. Selmi C, et al. X chromosome gene methylation in peripheral lymphocytes from monozygotic twins discordant for scleroderma. Clin Exp Immunol. 2012;169(3):253–62.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Lee R, et al. Enhanced chemokine-receptor expression, function, and signaling in healthy African American and scleroderma-patient monocytes are regulated by caveolin-1. Fibrogenesis Tissue Repair. 2015;8:11.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  45. Takahashi T, et al. Epithelial Fli1 deficiency drives systemic autoimmunity and fibrosis: possible roles in scleroderma. J Exp Med. 2017;214(4):1129–51.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Zhu T, et al. The anti-angiogenic role of discoidin domain receptor 2 (DDR2) in laser-induced choroidal neovascularization. J Mol Med (Berl). 2015;93(2):187–98.

    Article  CAS  Google Scholar 

  47. Zhang S, et al. A host deficiency of discoidin domain receptor 2 (DDR2) inhibits both tumour angiogenesis and metastasis. J Pathol. 2014;232(4):436–48.

    Article  CAS  PubMed  Google Scholar 

  48. Makino K, et al. Discoidin domain receptor 2-microRNA 196a-mediated negative feedback against excess type I collagen expression is impaired in scleroderma dermal fibroblasts. J Invest Dermatol. 2013;133(1):110–9.

    Article  CAS  PubMed  Google Scholar 

  49. Brena, R.M. and J.F. Costello, Genome-epigenome interactions in cancer. Hum Mol Genet, 2007. 16 Spec No 1: p. R96–105.

  50. Almouzni G, et al. Relationship between genome and epigenome--challenges and requirements for future research. BMC Genomics. 2014;15:487.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  51. Carmona FD, et al. New insight on the Xq28 association with systemic sclerosis. Ann Rheum Dis. 2013;72(12):2032–8.

    Article  CAS  PubMed  Google Scholar 

  52. Piha-Gossack A, Sossin W, Reinhardt DP. The evolution of extracellular fibrillins and their functional domains. PLoS One. 2012;7(3):e33560.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Tocchioni F, et al. Pectus excavatum and heritable disorders of the connective tissue. Pediatr Rep. 2013;5(3):e15.

    Article  PubMed  PubMed Central  Google Scholar 

  54. Dai C, et al. Transcriptional activation of human CDCA8 gene regulated by transcription factor NF-Y in embryonic stem cells and cancer cells. J Biol Chem. 2015;290(37):22423–34.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Pendergrass SA, et al. Intrinsic gene expression subsets of diffuse cutaneous systemic sclerosis are stable in serial skin biopsies. J Invest Dermatol. 2012;132(5):1363–73.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Wahl S, et al. Epigenome-wide association study of body mass index, and the adverse outcomes of adiposity. Nature. 2017;541(7635):81–6.

    Article  CAS  PubMed  Google Scholar 

  57. Liang L, et al. An epigenome-wide association study of total serum immunoglobulin E concentration. Nature. 2015;520(7549):670–4.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Richardson TG, et al. Mendelian randomization analysis identifies CpG sites as putative mediators for genetic influences on cardiovascular disease risk. Am J Hum Genet. 2017;101(4):590–602.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Angiolilli C, et al. New insights into the genetics and epigenetics of systemic sclerosis. Nat Rev Rheumatol. 2018;14(11):657–73.

    Article  PubMed  Google Scholar 

Download references


We appreciate participation of all study subjects in our study. We thank Dr. Aurélie Labbe for helpful discussion. We thank the support of Scleroderma Quebec. We also thank the reviewers for their constructive comments that improved our manuscript. An abstract of this study was submitted to the International Genetic Epidemiology Society 2019 Conference and has been published online (


This study was funded by the Lady Davis Institute Clinical Research Pilot Project (CLIPP) competition, the Fonds de Recherche du Québec – Santé, the Canadian Institutes of Health Research (MOP 1303445) and the Genome Canada (2017 Bioinformatics). The funding bodies played no role in the design of the study, collection, analysis, and interpretation of data and in writing the manuscript.

Author information

Authors and Affiliations



MH and CG managed the project. MH, CG and TL conceived the analysis. MH, IC and ML collected the data. TL and KOK performed the data analysis. TL, MH and CG wrote the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Marie Hudson.

Ethics declarations

Ethics approval and consent to participate

Ethics approval for this study was obtained from the Institutional Review Board at McGill University, Montreal, Canada. Every study subject signed an informed consent.

Consent for publication

All study subjects gave consent to publish WGBS data and de-identified demographic features.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1: Table S1.

Summary of CpG-based DMGs.

Additional file 2: Table S2.

Summary of CHG-based DMGs.

Additional file 3: Table S3.

Summary of CHH-based DMGs.

Additional file 4: Table S4.

Summary of CpG-based DMGs adjusted for sex effect.

Additional file 5: Table S5.

Summary of CHG-based DMGs adjusted for sex effect.

Additional file 6: Table S6.

Summary of CHH-based DMGs adjusted for sex effect.

Additional file 7: Table S7.

Summary of CpG-based DMGs between diffuse SSc and controls.

Additional file 8: Table S8.

Summary of CpG-based DMGs between limited SSc and controls.

Additional file 9: Table S9.

Summary of CpG-based DMGs between diffuse SSc and limited SSc.

Additional file 10: Table S10.

Summary of CHG-based DMGs between diffuse SSc and controls. (XLS 32 kb)

Additional file 11: Table S11.

Summary of CHG-based DMGs between limited SSc and controls.

Additional file 12: Table S12.

Summary of CHG-based DMGs between diffuse SSc and limited SSc.

Additional file 13: Table S13.

Summary of CHH-based DMGs between diffuse SSc and controls.

Additional file 14: Table S14.

Summary of CHH-based DMGs between limited SSc and controls.

Additional file 15: Table S15.

Summary of CHH-based DMGs between diffuse SSc and limited SSc.

Additional file 16: Table S16.

Summary of SNP-CpG associations.

Additional file 17: Figure S1.

Number of cytosines used in our analysis, i.e. passing quality filters for read depth and number of samples covered. Figure S2. Empirically estimated p values for CpG-based DMRs with a q value < 0.05 and with methylation differences larger than 0.2. Figure S3. Number of (a) largely differentially methylated regions (bumps) and (b) significant DMRs identified by bumphunter in original test and 40 permutation tests by chromosome. Figure S4. Overlap of CpG-, CHG- and CHH-based DMGs. Figure S5. Overlap of SSc clinical-type-specific DMGs. Figure S6. Top five enriched diseases and biological functions based on CHG-DMRs. Figure S7. Quantile-quantile plot of unadjusted p values obtained in 36,838 association tests for SNP-CpG associations.

Additional file 18:

Supplementary Information. Quality control report of de-identified samples.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, T., Klein, K.O., Colmegna, I. et al. Whole-genome bisulfite sequencing in systemic sclerosis provides novel targets to understand disease pathogenesis. BMC Med Genomics 12, 144 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: