Our results show that the human CLCN5 gene comprises at least 20 exons, eight of them being in the 5′UTR region, with transcription initiating from at least three different start sites. As a result of 5′ alternative splicing in some exons, 11 different mRNAs are generated. Our findings highlight the structural complexity of the CLCN5 5′UTR region in renal and extrarenal tissues, and suggest that this region is likely involved in ClC-5 expression and Dent disease pathogenesis. Although some authors [5, 8] have tested the promoter region in their patients without detecting variants, the deeper characterization obtained by our study should allow to explore regions of the gene never analyzed before for searching possible rare variants that may act as disease-causing mutations or modifier alleles.
To further characterize the functional organization of the gene, the 5′-flanking region of exons 1a, 1b1 and I were analyzed for possible promoter regions and transcription factor binding sites. ENCODE project data, aimed to delineate all of the functional elements encoded in the human genome sequence including the mapping of histone modifications, the transcription factor (TF) binding sites by chromatin immunoprecipitation (ChIP), and the transcriptional regulatory regions, were used. Thus the Transcription track, the Overlayed H3K4Me1 and Overlayed H3K27Ac tracks, the DNase Clusters, and the Txn Factor ChIP tracks were considered. These tracks complement each other and together can shed much light on regulatory DNA [20].
The results suggest that 3 functional promoters are present in the CLCN5 gene of different strength, originating all isoforms with varying efficiency. Strong promoters are present upstream of exon 1a and exon I, and indeed variant 3 and variants 1 and 2 are the mRNA species most expressed in the kidney (Figure 5). Both promoters lack characteristic features of eukaryotic promoters, but instead contain consensus binding sites for transcription factors. GATA1 and GATA2 factor binding sites and consensus binding sites for different transcription factors including E2F1 (transcription factor 1), ZNF263 (zinc finger protein 263), Nrf1 (nuclear respiratory factor 1), HMGN3 (high mobility group nucleosomal binding domain 3), USF1 (upstream transcription factor 1) and Ini1 (RING finger-like protein Ini1) are present for mRNA variants 1 and 2. For mRNA variant 3 the identified binding sites for the transcription factors are USF1, USF2 (upstream transcription factor 2, c-fos interacting), KAP1 (kinesin-ii-associated protein), and CTCF (CCCTC-binding factor). All these transcription factors were identified in a kidney cell line (HEK293). A weaker promoter appears to control the expression of mRNA variant 4, alternative variant 4 and variants 6 and 7. This promoter contains consensus binding sites for some transcription factors such as FOXA2 (forkhead box A2) and SETDB1 (SET domain, bifurcated 1). For all promoters the specific region containing the transcription factor binding sites overlap with a region that is DNaseI sensitive. At the functional level, DNase hypersensitivity suggests that a region is very likely to be regulatory in nature, and promoters are particularly DNase sensitive.
The data from ENCODE project did not identify sites for the transcription factor HNF1α. Instead, in silico analysis conducted by Tanaka et al. [22] had revealed numerous HNF1α binding sites in the 5′ regulatory sequences of both mouse and human Clcn5/CLCN5 gene. The transactivation of the Clcn5/CLCN5 promoter by HNF1α was verified in vitro, and the binding of HNF1α to the Clcn5 promoter in vivo was confirmed by chromatin immunoprecipitation in mouse kidney [22].
The mRNA variant 4, the alternative variant 4 and variants 6 and 7 share the same transcription start site but have different lengths that depend on which donor site is used. They are probably generated, with differential efficiency, by multiple alternative splicing occurring at the 5′of a single exon. This type of exon commonly originates from ancestral constitutive exons that, following mutation/s inside the exon or along the flanking intron, result in the creation of new alternative splice sites that compete with the ancient one for splice site selection [23].
In the case of variants 6 and 7, the two alternative 5′ splicing sites in exon c have similar strength and so regulation is essential. It seems that the delicate balance between cis acting elements-enhancer (ESE) splicing regulatory elements (ESR) and silencers (ESS) located immediately upstream of each splice site is probably the major factor governing the level of each site usage in splicing [23, 24]. In order to determine if this is the case, bioinformatic analysis, using the Human Splicing Finder version 2.4.1 program [17] was performed. The results of this analysis demonstrated that two ESE, two ESR and six ESS are present in the first 15 nt upstream of the donor splice site of exon c.1. Upstream of the donor splice site of exon c eight ESE, three ESR and five ESS are present. Therefore, although both sites have a similar strength, a higher density of ESE-ESR and lower density of ESS upstream of the exon c promote use of this splice site. Consistent with this observation, the expression levels of variant 7 are higher than those of variant 6 (Figure 5).In the case of mRNA variant 4, the donor splice site of exon 1b has a higher strength (value of 82.4) and therefore is favored. This isoform also contains 10ESE, 2 ESR and 6 ESS. However this is not in agreement with our experimental results because this isoform is barely expressed. Alternative variant 4, whose level of expression follows that of mRNA variant 3 of mRNA variants 1, 2 and 8-11, is characterized by the presence of exon 1b and the retention of intron 1b (exon 1b1) that, contrary to what usually happens, has not been removed during the processing of the primary transcript to mature messenger. Most likely other factors play an important role in the regulation of its transcription levels. Exon 1b1 could represent the ancestral constitutive exon from which all other exons (1b, c and c.1) originated (Figure 4). This exon is, in fact, usually present as the main product in respect to the others (Figure 5).
The GC content around splice sites is closely associated with the splice site usage [18, 19, 25]. We considered a region of 141 nucleotides surrounding the donors splice sites of exons c.1, c and 1b (70 nucleotides upstream and downstream of the splice site). It was possible to see that the highest GC content (11 GC) is in the donor splice site of exon c, exon c.1 (7 GC), and exon 1b (5 GC).
The web server mRNAfold was then used to predict the pre-mRNA secondary structure via calculation of minimum free energy [18, 19]. It has been reported that local RNA secondary structures affect splice site selection, the splicing sites closest to the start transcription site forming more stable structures than those located in more central RNA locations [18, 19]. The minimum free energy calculated by the software was -44.70, -38.84 and -30.70 kcal/mol for the donor splice sites downstream of the exons c, c1 and 1b, respectively. Both the evaluation of GC content and the calculation of free energy once again are in agreement with the results we obtained from the expression study. In fact, the expression level of isoform containing exon c is higher than those containing exon c.1 and 1b (Figure 5).
To conclude our characterization, we proceeded with the open reading frame analysis using the ORF Finder program. ORF analysis revealed that the mRNA variants 3, 6, 7 and alternative variant 4, as well as variants 8-11 encode for the canonical ClC-5 protein of 746 amino acids while variant 4, and variants 1-2 for a protein with 20 and 70 additional in frame amino acids, respectively. It is of note that the presence in the long transcripts of exon VI and/or V stabilizes the initiation of translation to the ATG in exon 2 and do not add, to the protein, additional amino acids. This is the most common situation among most genes that have alternative promoters and, while not generating different protein isoforms, have mRNA variants which differ in the transcription pattern and in translation efficiency.
The ClC-5 translated region was expressed in all human tissues examined. Our results are in agreement with what reported by Steinmeyer et al. [26] who demonstrated in the mouse that ClC-5 was predominantly expressed in the kidney but also observed in brain, liver, lung, and testis. Unlike Ludwig et al. [5], but in agreement with Ramos-Trujillo et al. [27] we demonstrated that ClC-5 is present in the human liver, brain and skeletal muscle.
On the contrary not all the 5′UTR isoforms are expressed in the various tissues. mRNA variants type 3, 2, 7 and alternative variant 4 appear to be the most abundant in the human kidney. 5′UTR exons that are commonly present among expressed isoforms are candidates for mutation analysis of Dent disease patients without genetic variation in the CLCN5 coding region. Polymorphisms or rare variants might also reside in these regions that acting as modifier alleles and might explain the phenotypic heterogeneity of Dent disease not only in Dent disease 1 but also in Dent disease 2. We have demonstrated, in fact, that variants in both OCRL and CLCN5 genes may act in concert in determining Dent disease phenotype variability [28].
Despite widespread expression of ClC-5, the Dent disease 1 phenotype is largely renal. Different 5′UTR ends present in various tissues may serve to differently regulate gene expression in response to physiological and pathological stimuli through mechanisms involving not only transcription but also translation efficiency. It is known, in fact, that the 5′UTR region has several roles in translational efficiency and translation inhibition probably through the interaction with the ribosome and specific DNA binding proteins or through some elements contained in non coding regions. So, it is possible that CLCN5 mRNA levels do not correspond to ClC-5 protein level and actual ClC-5 functions.
Also of note is the presence of ClC-5 in the human brain and skeletal muscle. Although CNS and muscle impairment is common in Lowe syndrome, it has not been described in Dent disease 1 [11]. We recently evaluated a patient carrying a CLCN5 mutation whose clinical symptoms suggested a Dent 2 phenotype or a mild Lowe phenotype (unpublished). Our findings point to the possibility that certain Dent cases with CLCN5 disease-causing mutations might manifest extrarenal symptoms or a mild Lowe phenotype.
The tissues that are most similar, both in terms of abundance and expression pattern of CLCN5 UTR isoforms are kidney, colon and testis. It is known that in rats and pigs ClC-5 is expressed in intestinal tissues that have endocytotic machinery [29, 30]. As in renal proximal tubular and intercalated collecting duct cells, intestinal and colon epithelial cell ClC-5 is predominantly if not exclusively intracellular, located in densely packed endocytotic vesicles in rats [29]. Some authors have evaluated the role of ClC-5 in intestinal calcium absorption by directly regulating the expression of calcium transport proteins, such as TRPV 6 [30–33]. Although in humans the intestinal calcium absorption takes place mainly in small intestine, our data, albeit indirectly, can support the hypothesis that in Dent disease hypercalciuria may be due to increased intestinal absorption of calcium rather than decreased tubular re-absorption.
No phenotype associated with testicular dysfunction has been described so far in Dent disease patients. Future studies might be warranted to explore the possible role of ClC-5 in male infertility and to determine testicular function in Dent disease 1 patients, analogous to the role of the CFTR gene in male infertility [34].