Genome-wide identification of methylated CpG sites in nongenital cutaneous warts

Background Low-risk HPV infection has not been the subject of epigenetic investigation. The present study was carried out in order to investigate the methylation status of CpG sites in non-genital cutaneous warts. Methods Genomic DNA was extracted from 24 paired epidermal samples of warts and normal skin. DNA samples were bisulfite converted and underwent genome-wide methylation profiling using the Infinium MethylationEPIC BeadChip Kit. Results From a total of 844,234 CpG sites, 56,960 and 43,040 CpG sites were found to be hypo- and hypermethylated, respectively, in non-genital cutaneous warts. The most differentially methylated CpG sites in warts were located within the C10orf26, FAM83H-AS1, ZNF644, LINC00702, GSAP, STAT5A, HDAC4, NCALD, and EXOC4 genes. Conclusion Non-genital cutaneous warts exhibit a unique CpG methylation signature.


Background
CpG sites are parts of DNA that consist of a cytosine nucleotide linked to a guanine nucleotide by a phosphate group, and they are often found as a part of CpG islands, the latter of which are areas of high CpG frequencies [1]. From an epigenetic perspective, CpGs are of particular importance due to the fact that DNA methylation in mammals occurs primarily in a CpG context [2]. In mammalian genomes, the majority of CpG sites are methylated, while those in CpG islands are generally hypomethylated [3]. Due to the high mutability of methylcytosine, methylated CpG sites are underrepresented in the human genome [4]. Aberrant CpG methylation patterns increase susceptibility to various diseases, including cancer, but such changes can also be induced during host-pathogen interactions [5,6].
Host gene dysregulation is a common component of viral infection, and such changes are often generated via epigenetic exploitation of the host genome [7]. In order to evade the antiviral immune response, DNA viruses induce aberrant methylation of immune-related genes in the host [8]. One such example is the human papillomavirus (HPV), a DNA virus that alters host methylation patterns as a part of its life cycle and replication mechanisms within keratinocytes [9]. To date, more than 200 HPV genotypes have been characterized, most of which are low-risk and often manifest in the form of benign cutaneous or genital lesions known as warts [10]. However, a small group of HPV types are considered to be high risk, as they are a causative agent for several different types of squamous cell carcinomas [11].
High-risk HPV infection affects cervical cancer progression by increasing levels of DNA methylation, although methylation patterns were heterogenous among different neoplastic grades [12][13][14]. Hypomethylation of a CpG site in the MAL gene was reported to be potentially associated with persistent cervical infection with high-risk HPV [15]. Moreover, HPV-positive head-and-neck squamous cell carcinomas exhibited a novel methylation signature in which hypomethylated CpG islands were functionally correlated with gene expression [16]. In fact, HPV-induced epigenetic changes are a major contributing factor to the stability of malignant head-and-neck squamous cell carcinoma [17]. Similarly, CpG loci were differentially methylated in HPVpositive anal squamous neoplasia, and significant differential methylation was observed between in-situ and invasive samples [18].
Unlike its high-risk counterpart, low-risk HPV infection has not been the subject of epigenetic analysis in the context of non-genital cutaneous warts, the latter of which constitutes an extremely common skin disease that is benign and self-limiting in the majority of cases [19]. The most prevalent type of non-genital cutaneous wart is the common wart, which usually manifests on the hands and feet as a firm, hyperkeratotic papule with an irregular surface [20]. The extensive transformation that an HPV-infected keratinocyte undergoes to form a wart suggests that a similar change in methylation patterns must occur. Subsequently, the aim of the current study is to identify the genome-wide methylation status of CpG sites in warts as compared to normal skin.

Patient recruitment
Twelve patients were recruited at the dermatological clinic in King Abdullah University Hospital in the north of Jordan. The Institutional Review Board (IRB) at Jordan University of Science and Technology (JUST) granted ethical approval to conduct the present study. The inclusion criteria for participants comprised the following characteristics: being male, being free from autoimmune disease, presenting with common warts, not having received prior treatment for their warts, and having given written informed consent. Shave biopsies were performed by a resident dermatologist in order to excise paired normal skin and wart samples from each patient, which were then stored at − 20°C until subsequent processing.

Extraction of genomic DNA and bisulfite conversion
RNA-free genomic DNA was extracted by means of the QIAamp DNA Mini Kit (Qiagen, Germany) and shipped to the Australian Genome Research Facility (AGRF) on dry ice. Upon arriving to the AGRF, further quality control analysis was performed for each sample using the QuantiFluor® dsDNA System (Promega, USA) and 0.8% agarose gel electrophoresis to determine their purity and integrity, respectively. After obtaining assurance of their quality, the EZ DNA Methylation kit (Zymo Research, USA) was employed for the bisulfite conversion of normalized samples.

Genome-wide methylation profiling and data processing
The Infinium MethylationEPIC BeadChip Kit (Illumina, USA) was utilized in order to interrogate over 850,000 methylation sites. The MethylationEPIC array contains 866,895 probes that target 863,904 CpG sites, 2932 CpH    sites, and 59 rs sites. The raw intensity data generated by the array was analyzed using RnBeads, a computational R package [21].

Differential methylation analysis
To calculate the extent of differential methylation (DM) for each CpG site, limma was used to determine three ranks: the beta difference in methylation means between warts (W) and normal skin (NS), the log 2 of the quotient in methylation, and the DM p-value [21]. Limma was also utilized to compute p-values on CpG sites [22]. Multiple testing was corrected for by setting the false discovery rate (FDR) at 5% with the Benjamini-Hochberg procedure. Using these three ranks, a combined rank was formulated in which increased DM at a particular CpG site resulted in a smaller rank [21]. The combined rank was used to sort DM CpG sites in ascending order, and the top-ranking 100,000 sites were selected for further analysis.

Enrichment, pathway, and signaling analysis
Gene ontology (GO) term enrichment analysis as well as KEGG and Reactome pathway analysis of the top 100 CpG sites were carried out using the Database for Annotation, Visualization, and Integrated Discovery (DAVID) v6.8 (https://david.ncifcrf.gov/). GO terms revolved around three criteria (biological process (BP), cellular component (CC), and molecular function (MF)), and the cut-off threshold was fixed at p-value ≤0.05. After selecting the top-ranked 100 DM CpG sites, the Signaling Network Open Resource 2.0 (SIGNOR) was used to analyze the signaling networks of associated genes [23].

Sample clustering
Based on the DM values of the top-ranking 1000 loci, an expected clustering pattern can be observed between the NS and W samples (Fig. 1). Using multidimensional scaling (MDS) and principal component analysis (PCA), Fig. 6 Volcano plot of the top-ranking 1000 differentially methylated sites. Differential methylation was measured by the log2 of the mean quotient in methylation (mean.quot.log2) and the mean fold difference (mean.diff) between warts (W) and normal skin (NS). Data points less than 0 represent relative hypomethylation, while those more than 0 represent relative hypermethylation. The intensity of each data point correlates with the combined rank score as shown on the color scale to the right strong signals in sample methylation values were examined ( Fig. 2a and b).
Processing and filtering of data 17,371 probes were removed due to their overlap with SNPs (Fig. 3a). A further 2,310 probes were filtered out using the Greedycut algorithm in RnBeads. Additional filtering eliminated 2,980 probes with specific contexts (Fig. 3b). In total, 22,661 probes were removed and 844, 234 probes were retained. Both probes and samples were subject to the full RnBeads package pipeline, which entailed quality control, preprocessing, batch effects testing, and normalization (Fig. 4). The complete processed methylation data for the CpG sites can be found in Supplementary File.

Differential methylation of CpG sites
Of the top-ranking 100,000 CpG sites in terms of DM, 56,960 sites were hypomethylated and 43,040 sites were hypermethylated in W compared to NS, with a mean beta difference greater than 0.055 and less than − 0.055 (p-value < 0.032; adjusted p-value < 0.032) (Fig. 5). The beta difference for the hypomethylated and hypermethylated sites ranged from − 0.055 to 0.56 and 0.55 to 0.56, respectively. Similarly, the log 2 of the quotient in methylation between W and NS ranged from − 2.47 to 2.9 ( Fig. 6). The highest concentration of DM sites was seen on chromosomes 1 and 2 (Fig. 7). The top-ranking100 CpG sites, i.e. the most DM, are listed in Table 1.

Functional enrichment analysis
GO enrichment analyses of the genes associated with the top 100 DM CpG sites were performed using the DAVID webtool. Table 2 shows the most significant GO terms (p-value ≤0.05). Associated genes were mainly enriched for "SH3 domain binding", "actin binding", and "GTPase activator activity" on the MF level, "regulation of GTPase activity" and "positive regulation of GTPase" on the BP level, and "postsynaptic membrane" on the CC level. The most significant KEGG and Reactome pathway terms with a p-value ≤0.05 are presented. The genes were mainly enriched in the Rap1 signaling and VxPx cargo-targeting to cilium pathways (Table 3).

Signaling network analysis
Analysis of the genes associated with the top 100 DM CpG sites showed that five genes were found to be common regulators with a minimum of 20 connectivities each. These genes are the PRKD1, HDAC4, and STAT5A genes (Fig. 8).

Discussion
In the present study, the genome-wide methylation profile of CpG sites was demonstrated for the first time in non-genital cutaneous warts. Out of the 844,234 CpG sites that were investigated, 56,960 and 43,040 CpG sites were found to be hypomethylated and hypermethylated, respectively, in warts. The combined rank scoring   Table 1 The 100 CpG sites with the lowest combined rank scores Hypomethylation method revealed the top 100 most differentially methylated CpG sites, which lay within the C10orf26, FAM83H-AS1, ZNF644, LINC00702, GSAP, STAT5A, HDAC4, NCALD, and EXOC4 genes, among others. cg09671951 was found to be the most hypermethylated CpG site in warts, and it is located within the C10orf26 gene, which is also known as the outcome predictor in acute leukemia 1 (OPAL1) gene. The C10orf26 gene has been associated with response to treatment in children with acute lymphoblastic leukemia, and it has also been implicated as a modulator of schizophrenia symptoms and disease progression [24][25][26]. The second most hypermethylated CpG site, cg27071672, lies within the FAM83H-AS1 gene, which codes for the FAM83H antisense RNA 1 (head to head). FAM83H-AS1 dysregulation has been associated with carcinogenesis in breast, colorectal, and lung cancer [27][28][29]. Two of the most hypermethylated CpG sites, cg07385604 and cg01890417, were located within the ZNF644 gene, which encodes the zinc finger protein 644. ZNF644 is associated with transcriptional repression as a part of the G9a/GLP complex, and mutations in this gene are responsible for a monogenic form of myopia [30,31].
cg12432168, located with the LINC00702 gene, and cg06305962, located within the GSAP gene, were the fourth and fifth most hypermethylated CpG sites, respectively. The long intergenic non-protein coding RNA 702 (LINC00702), like other long non-coding RNAs, functions in genetic and epigenetic regulation, and its upregulation has been reported in endometrial cancer as well as malignant meningioma [32,33]. However, the γsecretase activating protein (GSAP) has mostly been reported in the context of Alzheimer's disease pathology [34,35]. Comparatively little is known about functions of the LINC00702 and GSAP genes outside of a disease context.
In contrast, three of the most hypermethylated CpG sites (cg08246644, cg20400915, and cg08569613) were located within the signal transducer and activator of transcription 5A (STAT5A) gene, the latter of which has been extensively studied and elucidated. STAT5A has an essential function in lactogenic and mammopoietic signaling and development in adults, and its expression is upregulated by the tumor protein p53 [36,37]. Aberrant STAT5A expression has been reported in a number of different cancers, including breast, colon, head and neck, and prostate cancer as well as leukemia [38][39][40][41][42]. Of particular interest is the association of STAT5A dysregulation with head and neck squamous carcinoma, which is a type of cancer that can be caused by high-risk HPV infection [43,44]. Although low-risk HPV types lack the carcinogenic potential of their high-risk counterparts, it is intriguing that both the benign and cancerous manifestations of HPV infection exhibit aberrant STAT5A expression.
A further three of the most hypermethylated CpG sites (cg05171197, cg19449565, and cg17356718) were found within the histone deacetylase 4 (HDAC4) gene that functions in the condensation of chromatin and repression of transcription via deacetylation [45]. The survival  and growth of multiple myeloma is regulated by the HDAC4-RelB-p52 complex, and the disruption of the latter blocks the growth of these cells [46]. Moreover, HDAC4 degradation by certain chemotherapeutic agents results in the apoptosis of head-and-neck cancer cells that are resistant to TRAIL, while miR-22-driven HDAC4 repression helped to resensitize fulvestrantresistant breast cancer cells [47,48]. Likewise, eptoposide resistance in human A549 lung cancer cells was conferred by STAT1-HDAC4 upregulation, and HDAC4 inhibition has been reported to induce apoptosis in nonsmall cell lung cancer PC-9 cells [49,50]. HDAC4 has been previously implicated in viral replication as well as the host's antiviral response [51]. For example, HIV-1 DNA integration is facilitated by the involvement of HDAC4 in the post-integration repair process [52]. Moreover, infection with the influenza A virus has been reported to cause airway remodeling in asthmatic individuals via the indirect dysregulation of HDAC4 [53]. HDAC4 is also a critical regulator of antiviral response, and its overexpression hinders the host immune response by suppressing type 1 interferon production [54]. Furthermore, STAT-HDAC4 signaling was reported to induce epithelial-mesenchymal transition, a malignant tumor feature that is also exhibited by keratinocytes during tissue repair [55][56][57]. High-risk HPV infection can similarly result in malignancy by inducing this transition in epithelial and keratinocyte cells [58][59][60].
With regard to functional enrichment analysis of the top-ranking 100 DM CpG sites, the most significantly enriched genes in warts were associated with SH3 domain binding, namely the Rho GTPase activating protein 31 (ARHGAP31), zinc finger protein 106 (ZNF106), synaptic Ras GTPase-activating protein 1 (SYNGAP1), and citron Rho-interacting serine/threonine kinase (CIT) genes. Despite the fact that the SH3 domain plays a role in a range of different fundamental cellular processes, not much is known about the aforementioned genes in the context of skin pathology or HPV infection [61].
In contrast, pathway analysis revealed that the Rap1 signaling pathway was the most significantly enriched term, which included the RAP1 GTPase activating protein (RAP1GAP), adenylyl cyclase type 9 (ADCY9), signal-induced proliferation-associated 1 like protein 1 (SIPA1L1), Rap guanine nucleotide exchange factor (GEF) 4 (RAPGEF4), and protein kinase D3 (PRKD3) genes. RAP1GAP downregulation via promoter hypermethylation was reported to promote the cell proliferation, survival, and migration of melanoma cells [62]. Moreover, sequence analysis of the high-risk HPV 16 E6-binding protein showed that it had the highest degree of homology with the mammalian Rap1GAP protein [63]. In addition, PRKD3 has been previously reported to have an important role in promoting the growth and progression of invasive breast cancer [64].
Signaling network analysis of the top-ranking 100 CpG sites identified three common regulators: the protein kinase D1 (PRKD1), histone deacetylase 4 (HDAC4), and signal transducer and activator of transcription 5A (STAT5A) genes. The PRKD1 gene plays an integral role in anti-differentiative and proliferative keratinocyte processes, and its aberrant expression has been suggested to have a putative tumorigenic function in the skin [65,66]. Similarly, the STAT5A gene has been reported to play a major role in the keratinocyte differentiation process [67]. In the context of HPV infection, STAT5A was found to promote HPV viral replication, and STAT-5 isoforms have been indicated to contribute to the progression of HPV-associated cervical cancer [68,69].

Conclusions
The current study reported a number of novel CpG sites that were differentially methylated in non-genital cutaneous warts compared to normal skin. Such differences in methylation status could be responsible for the HPVinduced wart formation process. The identification of methylation status for the most differentially methylated CpG sites may prove beneficial towards the understanding of the epigenetic factors associated with non-genital cutaneous warts. One limitation of the present study is the relatively small sample size, which may result in suboptimal statistical power for the genome-wide methylation analysis. Future research is required to validate the results on a larger scale.
Additional file 1. Supplementary file. Complete processed methylation data for CpG sites.