Genome-wide association study identifies new loci associated with risk of HBV infection and disease progression

Recent studies have identified susceptibility genes of HBV clearance, chronic hepatitis B, liver cirrhosis, hepatocellular carcinoma, and showed the host genetic factors play an important role in these HBV-related outcomes. Collected samples from different outcomes of HBV infection and performed genotyping by Affymetrix 500 k SNP Array. GCTA tool, PLINK, and Bonferroni method were applied for analysis of genotyping and disease progression. ANOVA was used to evaluate the significance of the association between biomarkers and genotypes in healthy controls. PoMo, FST, Vcftools and Rehh package were used for building the racial tree and population analysis. FST statistics accesses 0.15 was used as a threshold to detect the signature of selection. There are 1031 participants passed quality control from 1104 participants, including 275 HBV clearance, 92 asymptomatic persistence infection (ASPI), 93 chronic hepatitis B (CHB), 188 HBV-related decompensated cirrhosis (DC), 214 HBV-related hepatocellular carcinoma (HCC) and 169 healthy controls (HC). In the case–control study, one novel locus significantly associated with CHB (SNP: rs1264473, Gene: GRHL2, P = 1.57 × 10−6) and HCC (SNP: rs2833856, Gene: EVA1C, P = 1.62 × 10−6; SNP: rs4661093, Gene: ETV3, P = 2.26 × 10−6). In the trend study across progressive stages post HBV infection, one novel locus (SNP: rs1537862, Gene: LACE1, P = 1.85 × 10−6), and three MHC loci (HLA-DRB1, HLA-DPB1, HLA-DPA2) showed significant increased progressive risk from ASPI to CHB. Underlying the evolutionary study of HBV-related genes in public database, the derived allele of two HBV clearance related loci, rs3077 and rs9277542, are under strong selection in European population. In this study, we identified several novel candidate genes associated with individual HBV infectious outcomes, progressive stages, and liver enzymes. Two SNPs that show selective significance (HLA-DPA1, HLA-DPB1) in non-East Asian (European, American, South Asian) versus East Asian, indicating that host genetic factors contribute to the ethnic disparities of susceptibility of HBV infection. Taken together, these findings provided a new insight into the role of host genetic factors in HBV related outcomes and progression.

the ethnic disparities of susceptibility of HBV infection. Taken together, these findings provided a new insight into the role of host genetic factors in HBV related outcomes and progression.

Study participants
A total of 1104 unrelated, age-and gender-matched, Chinese participants were recruited in the study, enrollment criteria were consistent with a previous report [19]. The population of HBV-related phenotypes was composed of five subgroups: HBV clearance subjects, asymptomatic persistence infection (ASPI) carriers, chronic hepatitis B (CHB) patients, HBV-related decompensated cirrhosis (DC) patients, HBV-related hepatocellular carcinoma (HCC) patients. Healthy controls (HC) who were HBV serum marker-negative (HBsAg, anti-HBc) and had no serological evidence of co-infection with HCV, HDV, and HIV were also included. HBV chronic infection patients were diagnosed based on seropositivity of HBsAg at least 6 months. Then ASPI was defined as HBsAg and anti-HBc positive at least 6 months and serum alanine aminotransferase (ALT), aspartate aminotransferase (AST) in normal values without abnormal before. CHB is defined as HBsAg and anti-HBc positive at least 6 months and ALT, AST abnormal before or at enrollment. DC was defined as HBsAg and anti-HBc positive at least 6 months with decompensated portal hypertension (gastroesophageal bleeding, ascites, edema or encephalopathy) or decompensated liver function (albumin < 35 g/L and total bilirubin > 35umol/L). HCC was defined at least one of following: (a) liver biopsy; or (b) abnormal alpha fetoprotein (AFP) and sonographic, CT or MRI space occupying evidence.

Statistics analysis
GCTA tool [20] was used to perform principal component analyses for estimating population substructure. The first two eigenvectors, pc1 and pc2, were used to display the population structure. PLINK 1.9 [21] software was used to perform logistic regression for identifying susceptibility SNPs of HBV infection and HBV-related outcomes. Gender and age were used as covariates in logistic regression. Chi-square test for trend in proportions was used to identify SNPs with increased effectiveness on disease progression. We used the Bonferroni method to adjust the false positive rate caused by multiple test. The number of independent LD block was used to represent the number of independent multiple test. We calculated a total of 21,077 independent LD blocks via GEC [22] and then set 0.05/21077 as the threshold of genome-wide significance. The genomic control method was used to measure population stratification by calculating the genomic inflation factor (λ) from median P-value. ANOVA was used to evaluate the significance of the association between biomarkers and genotypes in healthy controls. Using the SNPs in HBV infection-related loci in 1000 Genomes Project [23], we performed evolutional analyses, including building phylogenetic tree, detecting the signatures of selection, displaying the core haplotypes, estimating effective population size. Derived allele and ancestral allele of SNPs were accessed from Ensemble human ancestral genome (http://ftp.1000g enome s.ebi.ac.uk/vol1/ftp/phase 1/analy sis_resul ts/suppo rting /ances tral_align ments ). PoMo [24], an allele frequency-based approach, was used to build the racial tree based on the allele frequency of SNPs in each population. F ST [25], a classical metrics of population differentiation, was widely employed in detecting signatures of selection [26] in human genome [27,28] and animal genome [29][30][31]. In our study, F ST was implemented to detect the selective signature between East Asian population and each other population. Vcftools [32] was used to calculate the F ST statistics of SNPs in paired populations. F ST statistics accesses 0.15 [33] was used as a threshold to detect the signature of selection. Rehh package [34,35] was used to display the haplotype bifurcation diagrams of the associated SNPs in different populations. Relate [36], a method for genome-wide genealogy estimation for thousands of samples, was used to estimate the historical population size at default setting.

Results
There are 1031 participants passed quality control from 1104 participants. The demographic and clinical characteristics of 1031 study participants included in our association study are presented in Table 1. All participants were genotyped by Affymetrix 500k SNP Array. A total of 607,153 SNPs passed through quality control (Additional file 1: Figure S1). These SNPs filtered minor allele frequency of < 1% and a call rate of < 95%.
To demonstrate that there is no genetic stratification in the population, we performed a principal component analysis on the SNPs of all participants. The first two principal components show absence of population structure (Additional file 1: Figure S2). To identify susceptibility SNPs for HBV infection, we performed a GWAS in HBV infection similar with previous design [8,9]. HBV clearance was used as a control group versus ASPI, CHB, DC, HCC as HBV chronic infection (case group). We observed associations of two novel MHC loci with progression to certain HBV stages (SNP: rs2395166, Gene: HLA-DRA, P = 1.42 × 10 -7 ; SNP: rs615672, Gene: HLA-DRB1, P = 8.54 × 10 -7 ) and two reported MHC loci (SNP: rs3077, Gene: HLA-DPA1, P = 6.60 × 10 -9 ; SNP: rs9277542, Gene: HLA-DPB1, P = 1.53 × 10 -8 ) ( Table 2; Fig. 1). These MHC loci variants replicated association results of previous studies affirming that MHC gene alleles confer risks of susceptibility of HBV infection in East Asian. Interestingly, we found that these reported MHC loci (rs2395166:C, rs615672:G, rs3077:A, rs9277542:T, rs9277341:T) present significant differences in allele frequency between East Asian and non-East Asian population in gnomAD database (Table 3), as well as the differences between HBV infection group and HBV clearance group. Since different groups may not present an identical minor allele, here, we used the derived allele against the ancestral allele for studying the allele frequency across different populations. The derived allele frequencies in East Asian are much closer to the HBV chronic infection group, while other populations, such as European, are much closer to the HBV clearance group. These genetic differences may suggest a selective signal in non-East Asian population versus East Asian population. To confirm this, we firstly build a phylogenetic tree based on these loci and then showed the genetic diversity in world-wide populations, in which the East Asian population is at the root. We set the East Asian as the ancestral group in these loci according to the derived allele frequencies and the phylogenetic tree. Subsequently, we identified two strong phylogenetic signals (HLA-DPA1, HLA-DPB1) in the European population (Fig. 2) via F ST method. Haplotype bifurcation diagrams of the two core SNPs (rs3077, rs9277542) presented that the resisted allele led to a long-range, and a high frequency homozygosity in European population (Fig. 3), confirming the natural genetic selection. These evidences revealed that the resisted alleles were under positive selection in European population strongly. We estimated the historic population size and then showed these two loci (HLA-DPA1, HLA-DPB1) were under selection during the past 26,000 years (Additional file 1: Figure S3). These results may provide a context for the racking influence of HBV infectious diseases in history.
HBV clearance, ASPI, CHB, DC, and HCC are progressive stages post HBV infection [4]. We hypothesized that the host genetic factor contributes to the development of outcomes, as well as to the individual outcome. To investigate this hypothesis, we test two progressive stages upon HBV infection: 1.) HBV infection itself (CHB, ASPI, and HBV clearance) and 2.) development of CHB (CHB, DC, and HCC). We performed a chisquare test for trend in proportions of allele to identify SNPs increasing risk of HBV-related outcomes in the progressive stages. We observed association with one novel locus (SNP: rs1537862, Gene: LACE1, P = 1.85 × 10 -6 ), one association with a reported locus (SNP: rs9277542, Gene: HLA-DPB1, P = 1.50 × 10 -9 ), and two association variants at MHC genes (SNP: rs615672, Gene: HLA-DRB1, P = 1.39 × 10 -6 ; SNP: rs3128923, Gene: HLA-DPA2, P = 2.06 × 10 -6 ) with trend test of allele frequency across three outcomes (Table 4; Fig. 4A). The three reported MHC genes were demonstrated to play a critical role in the resistance of HBV infection, and two (HLA-DPB:rs9277542, HLA-DRB1:rs9277542) were identified to be associated with HBV clearance (Table 2). We did not observe any SNPs achieve genome-wide significant association with development of CHB; One additional locus (SNP: rs6942409, Gene: AC011288.2, P = 3.08 × 10 -6 ) and the HCC associated locus (SNP: rs2833856, Gene: EVA1C, P = 1.62 × 10 -5 ) were associated with increased risk of DC and HCC during the development of CHB (Table 5; Fig. 4b).
Host genetic factors were demonstrated to influence concentrations of liver enzymes in plasma, which are widely used to indicate liver disease [37,38]. Here, to investigate the functional change in liver influenced by the HBV related loci described above, we performed     Figure S4-9). Six loci (rs1537862, rs3128923, rs9277542, rs9277341, rs9277378, rs4661093) showed modest associations with concentrations of liver enzymes, including ALB, ALP, AFP, and PTA (Fig. 5). These associations suggest pathways linking the host genetic factors, metabolism, and liver function for understanding the mechanisms of infection and disease progression. In sum, our study identified susceptibility SNPs associated with HBV related outcomes and SNPs increased the risk of progressive outcomes from HBV clearance to HBV chronic infection, DC, and HCC in a Chinese population (Additional file 1: Figure S10).

Discussion
HBV infection leads to a wide spectrum of clinical outcomes, including spontaneous clearance, asymptomatic carrier, chronic hepatitis B, liver cirrhosis, and hepatocellular carcinoma. Previous studies showed that MHC genes played an important role in outcomes of HBV infection [7]. Alleles associated with HBV infection versus HBV clearance affect infection risk, and a low-risk allele indicated an effect on virus clearance. By contrast loci associated with CHB versus ASPI indicated risk for   the severe progression, while a low-risk allele affected tolerance of virus. The tolerance-related gene, GRHL2, was demonstrated to influence the inflammation in hepatocytes by regulating microRNA 122 (MIR122) and the target of MIR122, HIF1α [39]. Levels of GRHL2 were increased in liver tissues of patients with alcoholic liver disease and correlated with decreases in levels of MIR122. Increased levels of MIR122 in hepatocytes of mice with ethanol-induced liver disease and advanced fibrosis reduced levels of HIF1α and reduced serum levels of alanine aminotransferase (ALT). Taken together, we propose that the low-risk allele rs1264473:T at GRHL2 ablates severe persistent inflammation through increased the levels of MIR122.
Our previous studies [40,41] showed that NTCP S267F mutation significantly affected the disease progression to cirrhosis (P = 0.017), and hepatocellular carcinoma (P = 0.023) versus CHB [40] and the rs3077:T allele was associated with decreased risk of chronic HBV infection (OR = 0.62, P = 0.001) [41]. In this study, we searched for host genetic factor with increased risk of the development-related outcomes in GWAS. One novel locus, LACE1, and three infection-related MHC loci were associated the progression of HBV infection. These results showed that the host genetic factors, both MHC and non-MHC genes, increased the risk of progressive outcomes post HBV infection, as well as HBV mutation. It is reported that HBV infection altered the mitochondrial metabolism and mitochondrial dynamics, which result in mitochondrial injury and liver disease [42]. LACE1 was reported to affect mitochondrial protein homeostasis [43]. Knockdown of LACE1 converted the expression of a crucial component of regulating mitochondrial dynamics, OPA1 [43][44][45]. In addition, we found that the risk allele, LACE1:rs1537862:T, decreased the level of ALB significantly (P = 0.025, Fig. 5). ALB is a critical marker decreasing with the deterioration of chronic liver diseases [46][47][48]. Biosynthesis of ALB was affected by proinflammatory cytokines [49,50] and excess amounts of oxidative agents released by mitochondria from injured liver [46,51]. Taken together, we proposed LACE1 may affect hepatic infection by changing the hepatic mitochondrial metabolism and leading to the progression of HBV infection.
There is a limitation in our study, that is we do not have an additional cohort for replicate study. In spite of that, Fig. 5 The association between HBV related loci and serum liver enzyme levels in health controls. P values were calculated by ANOVA test. White-circle refer to the mean liver enzymes level with different genotypes. The significant differences indicate that these SNPs contribute to liver enzyme activity we showed the reported loci in MHC region are significantly related to HBV infection. These replicate results of previous studies confirm our findings are reliable and provide confidence for our study in this cohort. Here, we provide novel candidate genes related to individual outcomes, progressive stages, and liver enzymes. Moreover, we identified two SNPs that show selective significance (HLA-DPA1, HLA-DPB1) in non-East Asian (European, American, South Asian) versus East Asian. East Asian population seem more susceptible to HBV infection than non-East Asian, and the differences of susceptibility were affected by HBV genotype [52], immunity [53], and environmental exposure [53,54]. Even in an identical environment (United States), Asian are more prevalent in chronic HBV infection than non-Asian [53]. It seems likely that host genetic factors contribute to the ethnic disparities of susceptibility of HBV infection. Taken together with the genetic associations and evolutionary signals, our findings provide a new insight for HBV study.

Conclusion
In case-control study, we identified one novel locus (SNP: rs1264473, Gene: GRHL2, P = 1.57 × 10 -6 ) significantly associated with CHB, two novel loci (SNP: rs2833856, Gene: EVA1C, P = 1.62 × 10 -6 ; SNP: rs4661093, Gene: ETV3, P = 2.26 × 10 -6 ) significantly associated with HCC. In trend study across multiple outcomes, we identified one novel locus (SNP: rs1537862, Gene: LACE1, P = 1.85 × 10 -6 ) and three MHC loci (HLA-DRB1, HLA-DPB1, HLA-DPA2) significantly increased progressive risk from CHB through ASPI to HBV clearance. In evolutionary study, we showed the derived allele of two HBV clearance related loci, rs3077 and rs9277542, are under strong selection in European population. We suggested these selected alleles may play a role in resisting the susceptibility of HBV in Europeans. Our findings provided a new insight into the role of host genetic factors in HBV related outcomes and progression.
Additional file 1: Figure S1. The summary of final SNPs characteristic, including MAF, call rate, and p-value of Hardy-Weinberg equilibrium test.. Figure S2. Principal component analyses indicated there are no population stratification among 6 subgroups. Abbreviation: ASPI, asymptomatic persistence infection; CHB, chronic hepatitis B; DC, decompensated cirrhosis; HC: healthy controls; HCC, hepatocellular carcinoma. Figure S3. Effective population sizes inferred using Related-package across all individuals of each population in two loci (HLA-DPA1, HLA-DPB1). Recentsize histories (26000 years ago) in European (purple) population showed modest difference compared with East Asian (red) population. Abbreviation: EUR, European; AMR, American; SAS, South Asian; AFR, African. Figure S4. Boxplots of rs2395166 genotype and serum liver enzyme levels in HC. Figure S5. Boxplots of rs615672 genotype and serum liver enzyme levels in HC. Figure S6. Boxplots of rs3077 genotype and serum liver enzyme levels in HC. Figure S7. Boxplots of rs1264473 genotype and serum liver enzyme levels in HC. Figure S8. Boxplots of rs2833856 genotype and serum liver enzyme levels in HC. Figure S9. Boxplots of rs6942409 genotype and serum liver enzyme levels in HC. Figure S10. The summary of associated SNPs contributed to HBV-related outcomes and the progression. Abbreviation: PI, persistence infection; ASPI, asymptomatic persistence infection; CHB, chronic hepatitis B; DC, decompensated cirrhosis; HCC, hepatocellular carcinoma.