Exploring the molecular causes of hepatitis B virus vaccination response: an approach with epigenomic and transcriptomic data
© Lu et al.; licensee BioMed Central Ltd. 2014
Received: 13 November 2013
Accepted: 5 March 2014
Published: 11 March 2014
Variable responses to the Hepatitis B Virus (HBV) vaccine have recently been reported as strongly dependent on genetic causes. Yet, the details on such mechanisms of action are still unknown. In parallel, altered DNA methylation states have been uncovered as important contributors to a variety of health conditions. However, methodologies for the analysis of such high-throughput data (epigenomic), especially from the computational point of view, still lack of a gold standard, mostly due to the intrinsic statistical distribution of methylomic data i.e. binomial rather than (pseudo-) normal, which characterizes the better known transcriptomic data.
We present in this article our contribution to the challenge of epigenomic data analysis with application to the variable response to the Hepatitis B virus (HBV) vaccine and its most lethal degeneration: hepatocellular carcinoma (HCC).
Twenty-five infants were recruited and classified as good and non-/low- responders according to serological test results. Whole genome DNA methylation states were profiled by Illumina HumanMethylation 450 K beadchips. Data were processed through quality and dispersion filtering and with differential methylation analysis based on a combination of average methylation differences and non-parametric statistical tests. Results were finally associated to already published transcriptomics and post-transcriptomics to gain further insight.
We highlight 2 relevant variations in poor-responders to HBV vaccination: the hypomethylation of RNF39 (Ring Finger Protein 39) and the complex biochemical alteration on SULF2 via hypermethylation, down-regulation and post-transcriptional control.
Our approach appears to cope with the new challenges implied by methylomic data distribution to warrant a robust ranking of candidates. In particular, being RNF39 within the Major Histocompatibility Complex (MHC) class I region, its altered methylation state fits with an altered immune reaction compatible with poor responsiveness to vaccination. Additionally, despite SULF2 having been indicated as a potential target for HCC therapy, we can recommend that non-responders to HBV vaccine who develop HCC are quickly directed to other therapies, as SULF2 appears to be already under multiple molecular controls in such patients. Future research in this direction is warranted.
KeywordsHepatitis B virus Vaccine Methylation Omics
DNA methylation (addition of a methyl group to the 5th carbon cytosine residues in CpGs islands) is stably maintained, inheritable and regarded as an epigenetic marker, which augments stability and diversity of biological phenotypes, yet without modifying the genomic sequence. DNA methylation not only plays a crucial role in a spectrum of physiological processes, such as gene imprinting and X-chromosome inactivation , but is also associated with diseases including cancer, autoimmune maladies and psychiatric disorders .
Bisulfite-conversion based approaches are widely used for DNA methylation measurements, and exploit both microarray and sequencing technologies, as it is the case for other omics. Examples include Illumina HumanMethylation 450 K beadchip  for the former, and whole genome short-gun bisulfite sequencing (GWSBS ) and reduced representation bisulfite sequencing (RRBS ) for the latter, all offering fine resolution (down to the single nucleotide).
The degree of methylation is usually denoted as β, ranging from 0 to 1. Methylation data are presented in the same matricial form of expression data (locus × sample), however, cautions must be used in the direct application of transcriptiomic analysis tool to methylation data. In particular, the assumption that most genes are not differentially expressed no longer holds for methylation data: in the human genome 70% to 80% of CpGs are methylated to various extents . Furthermore, the overall expression in a transcriptome is generally assumed to be invariant, which is the principle for ratio-intensity (R-I) plots , but this is not the case for methylation data where the total amount of CpG methylation may also differ substantially across individuals . Most importantly, unlike gene expression data, which are generally assumed to be normally or log-normally distributed, DNA methylation data present a peculiar bimodal distribution, which breaks the normality assumption and defies the applications of Gaussian distribution based statistical approaches such as t-test or ANOVA. Although both SAM (Significance Analysis of Microarray, ) and LIMMA (Linear Models for Microarray Data, ) utilize moderated t-statistic and do not need the assumption of rigorous normality, their sensitivity is generally affected by a non-normal distribution.
Despite these difficulties, the ubiquity of methylation phenomena makes them interesting candidates to explain number of open clinical problems. In particular, Hepatitis B virus (HBV) vaccine is an effective prevention of HBV infection, yet not all people can benefit from it because of varying degrees of responsiveness. We have already shown  that genetic effects have a dominant role in such a response, however, the characterization of the phenomenon is far from being complete, and we here propose to enlarge the picture to epigenetic (methylation) aspects.
Given the importance of the clinical phenomenon (infection rate in Southeast Asia, parts of China and tropical Africa above >8% ) and the numerous computational issues involved in the analysis of methylation data, we chose to adopt a custom pipeline, based on multiple filtering and non-parametric statistics to rank differentially methylated (DM) loci in 25 infants showing different responses to the HBV vaccine. Further, as a mean to better filter the list of DMs, we backed this analysis with published transcriptional (mRNA screens) and post-transcriptional (miRNA screens) data to gain more insight into the molecular effects of altered methylation.
The study protocols and consent procedure were approved by The Medical Ethical Committee of Children’s Hospital of Fudan University, Shanghai, China. Written informed consent forms were obtained from parents on the behalf of the participants involved in the study, conducted in accordance with the guidelines proposed in the World Medical Association Declaration of Helsinki.
Whole-genome DNA methylation data & study subjects
All subjects were recruited at their 1-year-old regular physical examination performed at the children-care clinics of five comprehensive hospitals in Urumqi (Xinjiang Uygur Autonomous Region, China) after they received the 5 μg recombinant HBV vaccine recommended by the Chinese Ministry of Health (KT60, 2004 L00065, Kangtai Biological Product, Shenzhen, Guangdong, China). Sampling was done at each of the three injections, following the national 0-1-6 HBV vaccination schedule. The inclusion and exclusion of subjects, HBV biomarker examination and data collection are described elsewhere .
Twenty-five infants were selected for genome-wide DNA methylation analysis. Thirteen of these were non- or low-responders to the vaccine (cases, anti-HBs < 100mIU/ml) and 12 were normal responders (controls, anti_HBs > 500mIU/ml). Whole blood (2 ml) was collected for testing DNA methylation levels using the Illumina HumanMethylation 450 K microarray, processed and filtered according to the standard Illumina protocol in 2 batches: the first with 17 samples and the second with 8 samples. Clinical data are listed in Additional file 1 and expression data are deposited at the National Center for Biotechnology Information Gene Expression Omnibus (GEO, ) public repository http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE48300.
Methylomic data preprocessing
Differential methylation analysis
The approach uses multiple metrics and statistics to ensure that different and complementary characteristics are retained in the final ranked list.
Methylation differential values were quantified as: (i) abs(mean(β case ) - mean(β control )) and (ii) abs(median(β case ) - median(β control )).
Methylation levels’ stratification
0 ~ 0.2
0.2 ~ 0.5
0.5 ~ 0.8
0.8 ~ 1.0
Top 10 DM loci obtained with LIMMA
Differential expression analysis
We collected data from 3 datasets in Gene Omnibus Express (GEO, ): GSE3049  for the transcriptomic level, GSE19980  and GSE22378 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE22378) for the post-transcriptomic level. All 3 studies use immortalized human hepatoma cell line HepG2 as HBV free model and HepG2.2.15, infected with HBV and transformed from HepG2, to mimic human chronic HBV infection.
The expression of mRNAs was monitored by CapitalBio cDNA 22 K long oligo dye-swap microarray, and compared between the two cell lines. Downloaded data were filtered by space- and intensity-dependent normalization (LOWESS), and already summarized as ratio changes for each probe set. No additional pre-processing was performed and fold-change was used with a cut-off 2 to select differentially expressed genes (DEs) in each comparison, leading to 478 DE genes (Additional file 3).
The 2 miRNA datasets were pre-processed (separately, as based on different versions of the CapitalBio mammalian miRNA array) by summarizing the expression value for each set of probes with the median (3 probes for one miRNA). Quality filtering was achieved by removing empty entries, probes for quality control (non-human) and data with more than 40% NAs. Overall 313 and 545 miRNAs remained in GSE19980 and GSE22378, respectively. After pre-processing, t-test, SAM and LIMMA were all applied for differential analysis in each dataset. The union, for completeness, of the results from the 2 datasets was finally retained (Additional file 4).
For DE miRNAs, targets were obtained by searching experimentally validated as well as predicted miRNA target databases, i.e. miRTarBase (release 2.5)  and TarBase (version 5)  for experimentally validated targets; TargetScan (release 6.2)  and microRNA.org (August 2010)  for predictions. Results are listed in Additional file 4 and details on the query settings can be found in Additional file 5.
Results and discussion
Classical approaches were first tested to compute the differentially methylated (DM) loci. Although Student t-test  has been found to be applied to 450 K microarray data , data distribution (see Additional file 6) presents a clearly non-normal behaviour, limiting therefore the validity of the test. Similarly, while SAM  and LIMMA  do not require a rigorous normal distribution and -especially the latter- shows good performance when the sample size is small, we observed that they are not robust enough for cases showing dramatic deviation from normality, a fact also mentioned in LIMMA’s manual.
Figure 2 presents the results of SAM, where the d-statistic (deviation stabilized derivative of t-statistic, free from the normality assumption) obtained from the real (observed) data versus the null (permuted) data is plotted. Only 3 loci were identified as differential (2 hypermethylated in red, 1 hypomethylated in green), and this cannot be remedied by lowering the threshold because of the peculiar “flat” shape of this SAM curve (compared with a plot from normally distributed data, and showing the well-known SAM “S” shape plot, see Additional file 7). Indeed, although SAM exploits permutations to obtain an empirical distribution of the d-statistic, it is largely subject to the outliers (extreme value towards 1 or 0), which are quite frequent and indeed expected in methylomic data, and cannot be simply discarded as they are biologically valid.
LIMMA  computes the B-statistic (posterior odds statistic) by replacing the ordinary standard deviations with posterior residual standard deviations, resulting in a far more stable inference even when the sample size is small. B-statistic denotes the log-ratio of a locus to be differentially methylated over not being methylated, with B > 0 implying p > 0.5, dependent on the prior knowledge of the proportion of differential loci. In our instance, this proportion was set to 0.002, based on the future planned experimental step, which implies the validation of 150 selected loci. Table 2 shows the top 10 differential loci obtained. Although the nominal p-values are significant, the log-ratios (B) indicate only 5 loci with -very weak- differential signals, which confirms the caution recommended when applying LIMMA to Illumina methylation platforms .
Given the intrinsic difficulties in isolating statistically significant differentially methylated loci, due to the numerosity of epigenomic data (curse of dimensionality and multiple hypothesis testing issue), we chose to explore the data with two metrics to combine their diverse advantages, and to support the ranking offered by these measures with two non-parametric statistical tests.
To satisfy the biological rationale that a differential locus should present a variation in β between two phenotypes, we quantified this difference as: (i) abs(mean(β case ) - mean(β control )) to take into account all values presented by the data and (ii) abs(median(β case ) - median(β control )) to better deal with outliers. These values were assumed to be of relevance if above a threshold, usually set to 0.2 .
This ranked selection was backed by the ranking obtained from two non-parametric tests, namely Wilcoxon rank-sum test (WRST, ) and Fisher’s exact test (FET, ) after methylation status discretization . Both statistical tests are free from any assumption on data distribution, yet their sensitivity and specificity vary. Wilcoxon rank-sum test is sensitive to rank orders of β values, rather than to absolute values. For this reason, we added Fisher’s exact test to the statistical framework to retain the otherwise missing information on β value.
This recommendation combined with our 150 top candidate selection lead to threshold = 0.17 (Additional file 2).
To test whether altered methylation states could also be mirrored and supported in altered genes’ expression, we appended to the DM analysis a careful selection of gene expression data at both the transcriptional and post-transcriptional levels from GEO (see Methods). Due to the lack of available blood samples data we turned to hepatic cell lines, implying the assumption that the systemic effects visible in blood mirror events occurring in the disease target organ (liver), a fact that has been observed and confirmed in numerous diseases [30–32]. Differential mRNA and miRNA analyses allowed us to identify 478 DE mRNAs (Additional file 3) and 55 DE miRNAs (Additional file 4).
Epigenomic alterations have recently been discovered as molecular modifications with the potential to stably influence biological systems. Yet, the challenges involved in processing this type of information are numerous, including not only the biological mechanisms triggering and maintaining these modifications, but also the mathematical modelling behind these data, and from there the definition of appropriate methods of analysis. We propose here a combination of approaches to efficiently explore these data and effectively rank a selected number of epigenomic potential causes. In particular, we have been able to highlight the hypomethylation of the transcription factor RNF39 and the controlled expression of SULF2 as two interesting molecular variations related to the HBV vaccine responsiveness.
Hepatitis B virus
Significance analysis of microarrays
Linear models for microarray data
Wilcoxon rank-sum test
Fisher’s exact test
Multiple dimensional scaling.
This work is supported by Chinese National “Twelfth Five-Year” Plan for Science & Technology Support (Grant No. 2012BAI03B00); National Natural Science Foundation of China (Grant No. 30960329 and 81273168); Ministry of Science and Technology (Grant no. 2013DFA30790). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
- Suzuki MM, Bird A: DNA methylation landscapes: provocative insights from epigenomics. Nat Rev Genet. 2008, 9: 465-476.View ArticlePubMed
- Heyn H, Esteller M: DNA methylation profiling in the clinic: applications and challenges. Nat Rev Genet. 2012, 13: 679-692. 10.1038/nrg3270.View ArticlePubMed
- Bibikova M, Barnes B, Tsan C, Ho V, Klotzle B, Le JM, Delano D, Zhang L, Schroth GP, Gunderson KL, Fan J-B, Shen R: High density DNA methylation array with single CpG site resolution. Genomics. 2011, 98: 288-295. 10.1016/j.ygeno.2011.07.007.View ArticlePubMed
- Cokus SJ, Feng S, Zhang X, Chen Z, Merriman B, Haudenschild CD, Pradhan S, Nelson SF, Pellegrini M, Jacobsen SE: Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature. 2008, 452: 215-219. 10.1038/nature06745.PubMed CentralView ArticlePubMed
- Meissner A, Mikkelsen TS, Gu H, Wernig M, Hanna J, Sivachenko A, Zhang X, Bernstein BE, Nusbaum C, Jaffe DB, Gnirke A, Jaenisch R, Lander ES: Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature. 2008, 454: 766-770.PubMed CentralPubMed
- Bock C: Analysing and interpreting DNA methylation data. Nat Rev Genet. 2012, 13: 705-719.View ArticlePubMed
- Quackenbush J: Microarray data normalization and transformation. Nat Genet. 2002, 32: 496-501. 10.1038/ng1032.View ArticlePubMed
- Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci. 2001, 98: 5116-5121. 10.1073/pnas.091062498.PubMed CentralView ArticlePubMed
- Smyth G: Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004, 3: 1544-6115.
- Yan K, Cai W, Cao F, Sun H, Chen S, Xu R, Wei X, Shi X, Yan W: Genetic effects have a dominant role on poor responses to infant vaccination to hepatitis B virus. J Hum Genet. 2013, 58: 293-297. 10.1038/jhg.2013.18.View ArticlePubMed
- WHO | hepatitis B. [http://www.who.int/csr/disease/hepatitis/whocdscsrlyo20022/en/index1.html]
- Barrett T, Troup DB, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Muertter RN, Holko M, Ayanbule O, Yefanov A, Soboleva A: NCBI GEO: archive for functional genomics data sets—10 years on. Nucleic Acids Res. 2011, 39 (suppl 1): D1005-D1010.PubMed CentralView ArticlePubMed
- Ihaka R, Gentleman R: R: a language for data analysis and graphics. J Comput Graph Stat. 1996, 5: 299-314.
- Touleimat N, Tost J: Complete pipeline for Infinium®Human Methylation 450 K BeadChip data processing using subset quantile normalization for accurate DNA methylation estimation. Epigenomics. 2012, 4: 325-341. 10.2217/epi.12.21.View ArticlePubMed
- Wilcoxon F: Individual comparisons by ranking methods. Biometrics. 1945, 1: 80-83. 10.2307/3001968.View Article
- Fisher RA: On the interpretation of χ2 from contingency tables, and the calculation of P. J R Stat Soc. 1922, 85: 87-10.2307/2340521.View Article
- Laurent L, Wong E, Li G, Huynh T, Tsirigos A, Ong CT, Low HM, Kin Sung KW, Rigoutsos I, Loring J, Wei C-L: Dynamic changes in the human methylome during differentiation. Genome Res. 2010, 20: 320-331. 10.1101/gr.101907.109.PubMed CentralView ArticlePubMed
- Edgar R, Domrachev M, Lash AE: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002, 30: 207-210. 10.1093/nar/30.1.207.PubMed CentralView ArticlePubMed
- Guo Y, Guo H, Zhang L, Xie H, Zhao X, Wang F, Li Z, Wang Y, Ma S, Tao J, Wang W, Zhou Y, Yang W, Cheng J: Genomic analysis of anti-hepatitis B virus (HBV) activity by small interfering RNA and lamivudine in stable HBV-producing cells. J Virol. 2005, 79: 14392-14403. 10.1128/JVI.79.22.14392-14403.2005.PubMed CentralView ArticlePubMed
- Liu Y, Zhao J, Wang C, Li M, Han P, Wang L, Cheng Y-Q, Zoulim F, Ma X, Xu D-P: Altered expression profiles of microRNAs in a stable hepatitis B virus-expressing cell line. Chin Med J Engl Ed. 2009, 122: 10-14.
- Hsu S-D, Lin F-M, Wu W-Y, Liang C, Huang W-C, Chan W-L, Tsai W-T, Chen G-Z, Lee C-J, Chiu C-M, Chien C-H, Wu M-C, Huang C-Y, Tsou A-P, Huang H-D: miRTarBase: a database curates experimentally validated microRNA-target interactions. Nucleic Acids Res. 2010, 39 (Database): D163-D169.PubMed CentralView ArticlePubMed
- Vergoulis T, Vlachos IS, Alexiou P, Georgakilas G, Maragkakis M, Reczko M, Gerangelos S, Koziris N, Dalamagas T, Hatzigeorgiou AG: TarBase 6.0: capturing the exponential growth of miRNA targets with experimental support. Nucleic Acids Res. 2011, 40: D222-D229.PubMed CentralView ArticlePubMed
- Lewis BP, Burge CB, Bartel DP: Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are MicroRNA targets. Cell. 2005, 120: 15-20. 10.1016/j.cell.2004.12.035.View ArticlePubMed
- Betel D, Wilson M, Gabow A, Marks DS, Sander C: The microRNA.org resource: targets and expression. Nucleic Acids Res. 2007, 36 (Database): D149-D153. 10.1093/nar/gkm995.PubMed CentralView ArticlePubMed
- Rice JA: Mathematical Statistics and Data Analysis. 2007, Stamford, Connecticut: Cengage Learning
- Nakano K, Whitaker JW, Boyle DL, Wang W, Firestein GS: DNA methylome signature in rheumatoid arthritis. Ann Rheum Dis. 2013, 72: 110-117. 10.1136/annrheumdis-2012-201526.PubMed CentralView ArticlePubMed
- Bioconductor - methylumi. [http://www.bioconductor.org/packages/2.10/bioc/html/methylumi.html]
- Du P, Zhang X, Huang C-C, Jafari N, Kibbe W, Hou L, Lin S: Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinforma. 2010, 11: 587-10.1186/1471-2105-11-587.View Article
- Marabita F, Almgren M, Lindholm ME, Ruhrmann S, Fagerström-Billai F, Jagodic M, Sundberg CJ, Ekström TJ, Teschendorff AE, Tegnér J, Gomez-Cabrero D: An evaluation of analysis pipelines for DNA methylation profiling using the Illumina HumanMethylation450 BeadChip platform. Epigenetics. 2013, 8: 333-346. 10.4161/epi.24008.PubMed CentralView ArticlePubMed
- Varani K, Laghi-Pasini F, Camurri A, Capecchi PL, Maccherini M, Diciolla F, Ceccatelli L, Lazzerini PE, Ulouglu C, Cattabeni F, Borea PA, Abbracchio MP: Changes of peripheral A2A adenosine receptors in chronic heart failure and cardiac transplantation. FASEB J. 2003, 17: 280-282.PubMed
- Varani K, Caramori G, Vincenzi F, Adcock I, Casolari P, Leung E, MacLennan S, Gessi S, Morello S, Barnes PJ, Ito K, Chung KF, Cavallesco G, Azzena G, Papi A, Borea PA: Alteration of adenosine receptors in patients with chronic obstructive pulmonary disease. Am J Respir Crit Care Med. 2006, 173: 398-406. 10.1164/rccm.200506-869OC.View ArticlePubMed
- Varani K, Vincenzi F, Tosi A, Gessi S, Casetta I, Granieri G, Fazio P, Leung E, MacLennan S, Granieri E, Borea PA: A2A adenosine receptor overexpression and functionality, as well as TNF-α levels, correlate with motor symptoms in Parkinson’s disease. FASEB J. 2010, 24: 587-598. 10.1096/fj.09-141044.View ArticlePubMed
- Lai J-P, Sandhu DS, Yu C, Han T, Moser CD, Jackson KK, Guerrero RB, Aderca I, Isomoto H, Garrity-Park MM, Zou H, Shire AM, Nagorney DM, Sanderson SO, Adjei AA, Lee J-S, Thorgeirsson SS, Roberts LR: Sulfatase 2 up-regulates glypican 3, promotes fibroblast growth factor signaling, and decreases survival in hepatocellular carcinoma. Hepatol Baltim Md. 2008, 47: 1211-1222. 10.1002/hep.22202.View Article
- Lai J-P, Thompson JR, Sandhu DS, Roberts LR: Heparin-degrading sulfatases in hepatocellular carcinoma: potential roles in pathogenesis and identification of therapeutic targets. Future Oncol Lond Engl. 2008, 4: 803-814. 10.2217/14796622.214.171.1243.View Article
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1755-8794/7/12/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.