GWASs provide information about common variants associated with disease susceptibility. Although GWASs allow the identification of disease risk alleles without prior knowledge of their position or biological function, they require large study cohorts to identify associations at the genome-wide significance threshold (5 × 10-8). By contrast, smaller GWASs using the same threshold may generate false-negative results. For association screening, we performed pooled-DNA GWASs and applied novel selection criteria that took into account strong allele linkage disequilibrium. We selected SNP blocks, defined by a distance of less than 30 kb between each pair of at least 10 SNPs associated with disease at P < 5 × 10-3, with an “index SNP” in the block for which the association was significant at P ≤ 10-4. All of the associations identified using this approach would have been missed using the standard genome-wide significance threshold. Of 22 and 29 associations with PBC and PCS, respectively, selected for validation, 19 and 21 SNPs were verified using TaqMan SNP genotyping assays of individual patient and control samples. In total, 19 SNPs reached the stringent (corrected) significance threshold, while the other 21 reached a nominal level of significance (P < 0.05 with OR > 1.2 or < 0.83), demonstrating at least suggestive evidence for association (Table 2). However, the expected number of false-positive results for 50 independent tests, with a significance threshold of 0.05, is < 3, while appropriate correction for multiple comparisons would reduce this to < 1. In our validation studies, correction for multiple testing reduced the number of associations at the nominal level of significance from 40 to 19, thus likely generating a large number of false-negative results and only slightly reducing the expected number of false-positive results.
This study identified 57 SNPs associated with either PBC or PSC or both disorders, which represent 38 genetic regions (Table 2). As expected, there were higher numbers of associations with HLA and non-HLA loci mapping to the MHC region (chromosome 6p21). Of 13 SNPs from this region, two, five, and six SNPs were associated with PBC, PSC, and both disorders, respectively. Of the SNPs shared between PBC and PSC, only two exhibited the same direction of effect.
As both PBC and PSC are hepatobiliary autoimmune diseases with low prevalence, the largest GWASs of these diseases have recruited individuals from different populations, which has certainly introduced a level of heterogeneity into the results, arising from the different genetic backgrounds of the geographically distinct populations. Consequently, these large cohort studies may have missed some subtle, sub-population-specific risk variants that may account for missing heritability. Conversely, studies with smaller sample sizes typically reveal a smaller fraction of the heritability of a complex disease, as they fail to detect associations because any found do not reach statistical significance thresholds [26]. The relative homogeneity of the Polish population may explain, at least in part, why our investigation identified so many SNPs significantly associated with PBC and/or PSC.
A GWAS including 536 North American PBC patients uncovered disease associations for several gene variants in the HLA class II region and coding variants in the interleukin-12a (IL12A) and IL12 receptor b2 (IL12RB2) genes [11]. Further GWASs have replicated these findings in a European population, and identified additional risk genes overlapping with other autoimmune diseases [17, 18]. In six GWASs, 27 non-HLA risk loci associated with PBC were identified [4]. Most have also been implicated in other autoimmune diseases, highlighting different immunoregulatory pathways. Our findings indicated the possible involvement of six previously described regions (1p31.3, rs3790567 [11]; 3q13 [13, 16]; 7q32.1, rs10488631 [12, 14]; 11q23.3 [13, 16]; 17q12, rs9303277 [18]; and 19q13.33, rs3745516 [14]) and the HLA-containing 6p21 locus in the development of PBC in Polish patients (Table 2).
Genomic studies of PSC have uncovered 18 associated genetic regions: 1p36 (TNFRSF14, MMEL1); 2q13 (BCL2L1); 2q33 (CD28); 2q35 (GPBAR1); 2q37.3 (GPR35); 3p21 (USP4, MST1); 4q27 (IL2, IL21); 6q15 (BACH2); 6p21 (HLA region); 10p15 (IL2RA); 11q23 (SIK2); 12q13 (HDAC7); 12q24 (SH2B3, ATXN2); 13q31 (GPC5/6); 18q21.1 (TCF4); 18q22 (CD226); 19q13 (PRKD2, STRN4); and 21q22 (PSMG1) [5, 6, 8, 27, 28]. Of these, only the MHC region was replicated in our PSC patients (Table 2).
The MHC region, which contains more than 224 genes and is highly polymorphic, is known to be associated with more than 100 different autoimmune and infectious diseases [29–31]. In European-based GWASs, the most pronounced MHC associations with PSC were with class I (HLA-B and -C) rather than class II (HLA-DRB1 and -DQB1) loci [27]. A meta-analysis of three independent PBC cohorts identified HLA class II alleles (HLA-DRB1, HLA-DQA1, and HLA-DQB1) achieving genome-wide significance levels, with similar allele frequencies in Canadian, US, and Italian PBC cohorts [15]. Outside of the MHC region, our investigation confirmed six and zero genetic regions uncovered by previous GWASs as associated with PBC and PSC, respectively [27]. Of 30 chromosomal regions representing novel susceptibility loci, 13, 9, and 8 were associated with PBC, PSC, and both disorders, respectively. Of these, 17 SNPs have a shared genetic association with IBD, three with rheumatoid arthritis, two with lupus erythematosus, and single SNPs with psoriasis, lateral sclerosis, T1D, and intrahepatic cholestasis of pregnancy.
While well-designed GWASs should be conducted with groups of at least 1,000 patients and 1,000 controls, the appropriate level of statistical power to test for genetic associations (at P < 5 × 10-8) often relates to higher effect sizes [32]. However, since loci with a high effect size have generally been efficiently removed from the human population by natural selection, the identification of a common polymorphic susceptibility locus strongly associated with disease, with an OR > 2 or < 0.5, is unlikely [33]. Instead, a large number of previously identified loci associated with different disorders exhibit relatively small effect sizes, with ORs < 1.3. The present study uncovered only one SNP (rs35730843, POLR2G, P = 1.2 × 10-5, OR = 0.393) strongly associated with PBC and 11 SNPs strongly associated with PSC (rs3822659, coding in WWC1, P = 0.0051, OR = 0.236; rs9686714, intron of WWC1, P = 0.00077, OR = 0.195; rs13191240, intron of ADGRB3, P = 0.0095, OR = 0.2; rs7454108, intergenic between LOC100294145 and C4B_2, P = 0.0013, OR = 0.326; rs2524163, intron of HLA-B, P = 5.9 × 10-7, OD = 2.02; rs2187668, intron of HLA-DQA1, P = 1.5 × 10-7, OR = 2.47; rs3130484, intron of MSH5-SAPCD1, P = 5.1 × 10-11, OR = 3.23; rs1264377, intergenic between PSORS1C3 − MIR877, P = 8 × 10-8, OR = 2.39; rs3130626, intron of PRRC2A, P = 1.5 × 10-6, OR = 2.10; rs419788, intron of SKIV2L, P = 1.2 × 10-6, OR = 2.03; and rs34708188, intergenic close to SENCR, P = 0.0056, OR = 2.27). Of these, nine SNPs map to the MHC region (6p21), while POLR2G, WWC1, and ADGRB3 are located in other genomic regions.
Our results indicated that a rare variant in the POLR2G gene promoter is associated with decreased risk of PBC, with a high effect size in the Polish population. POLR2G encodes one of the subunits in the polymerase 2 RNA complex, which is responsible for transcribing protein coding genes, miRNAs, and some classes of non-coding RNAs [34] and maps to the 11q12.3 locus, within which variants associated with chronic obstructive pulmonary disease [35] and asthma [36] have been identified. We also identified a decreased risk for PCS (OR < 0.25) conferred by a single rare variant in the ADGRB3 gene and two rare variants (rs3822659 and rs9686714) in the WWC1 gene. ADGRB3 encodes transmembrane adhesion G protein-coupled receptor B3 (BAI3), which is broadly expressed in the brain and involved in the regulation of excitatory synapse connectivity [37]. Furthermore, BAI3 can promote myoblast fusion in vertebrates [38]. A previous GWAS identified SNPs at the ADGRB3 locus as associated with early-onset venous thromboembolism [39]. WWC1 encodes the KIBRA protein that plays versatile roles including in the regulation of cellular signaling, cell polarity, vesicular trafficking, and cell migration and division [40]. Specifically, KIBRA is a regulator of the Hippo signaling pathway, which controls tissue growth and tumorigenesis by inhibiting cell proliferation and promoting apoptosis [41]. Notably, WWC1 hypermethylation occurs in 70% of B-cell acute lymphocytic leukemias [42] and its epigenetic silencing is also associated with unfavorable prognostic parameters in chronic lymphocytic leukemia [43]. Interestingly, the triggering of IL-6 trans-signaling, a process of aggregation of extracellular soluble IL-6 receptor and IL-6 associated with rheumatoid arthritis [44] and IBD [45], significantly increased WWC1 expression in human airway smooth muscle cells [46], suggesting a link between its expression and inflammatory diseases. Importantly, rs3822659 is a missense variant (Ser735Ala) that alters the interaction of KIBRA with phosphatidylinositol 3-phosphate [47]. Other GWASs have implicated SNPs in WWC1 as associated with memory performance and cognition [48], as well as Alzheimer’s disease [49].