Phenotypic spectrum and genetics of PAX2-related disorder in the Chinese cohort

Background Pathogenic variants of PAX2 cause autosomal-dominant PAX2-related disorder, which includes variable phenotypes ranging from renal coloboma syndrome (RCS), congenital anomalies of the kidney and urinary tract (CAKUT) to nephrosis. Phenotypic variability makes it difficult to define the phenotypic spectrum associated with genotype. Methods We collected the phenotypes in patients enrolled in the China national multicenter registry who were diagnosed with pathogenic variant in PAX2 and reviewed all published cases with PAX2-related disorders. We conducted a phenotype-based cluster analysis by variant types and molecular modeling of the structural impact of missense variants. Results Twenty different PAX2 pathogenic variants were identified in 32 individuals (27 families) with a diagnosis of RCS (9), CAKUT (11) and nephrosis (12) from the Chinese cohort. Individuals with abnormal kidney structure (RCS or CAKUT group) tended to have likely/presumed gene disruptive (LGD) variants (Fisher test, p < 0.05). A system review of 234 reported cases to date indicated a clear association of RCS to heterozygous loss-of-function PAX2 variants (LGD variants). Furthermore, we identified a subset of PAX2 missense variants in DNA-binding domain predicted to affect the protein structure or protein-DNA interaction associated with the phenotype of RCS. Conclusion Defining the phenotypic spectrum combined with genotype in PAX2-related disorder allows us to predict the pathogenic variants associated with renal and ophthalmological development. It highlighted the approach of structure-based analysis can be applied to diagnostic strategy aiding precise and timely diagnosis. Supplementary Information The online version contains supplementary material available at 10.1186/s12920-021-01102-x.

Here we analyzed PAX2 pathogenic variants from the Chinese Children Genetic Kidney Disease Database (CCGKDD) which has assembled the largest genetically screened cohort with childhood kidney disease in China to date [12]. And we reviewed all published cases of PAX2 pathogenic variants to explore the genotypic and phenotypic spectrum of PAX2-related disorders. An approach to define the phenotypic spectrum associated with PAX2 missense variants was established to determine how confidently the clinical phenotype can be predicted by quantifying the pathogenicity of amino acid substitutions.

Study design and participants
Among the individuals with kidney disease consecutively enrolled in the national multicenter registry CCGKDD (www. ccgkdd.com.cn) from 2014 to 2020, patients were recruited with the diagnosis of PAX2-related disorder. Enrolled participants were solicited via clinicians from the CCGKDD [12]. Participants were asked to provide information on presenting clinical features, genetic diagnosis (trio-exome sequencing or family based-exome sequencing), and medical management, age at the clinical events (initial presentation, end stage renal disease, ESRD, transplantation), and status (with native renal function, dialysis, transplantation, deceased) at the last follow-up. Patients enrolled the study had been followed up more than 12 months. No identifying information was collected about patients or respondents. The name of the reporting center was collected to allow comparison of entries to avoid duplicates.
PAX2-related disorder diagnosis was confirmed based on the clinical phenotype and presence of a genetic mutation through exome sequencing. Clinical phenotypes were stratified by the abnormal findings of urinalysis or radiology. Patients group of RCS (papillorenal syndrome) was defined on the basis of the renal developing deficiency combined with ocular anomalies. Patients without ocular anomalies who was diagnosed with VUR, unilateral or bilateral renal hypodysplasia or multicystic dysplastic kidney were classified as the isolated CAKUT group. Patients presented initially with proteinuria or hematuria were classified as the nephrosis group.

Genetic analysis
Samples of enrolled individuals were subjected to whole-exome sequencing (WES) of parent-child trios upon obtaining informed consent. The annotation of the WES procedure and its variants has been described in detail previously [12]. The variant interpretation was performed manually by a panel of nephrologists and clinical molecular geneticists [12]. For clinical sequence interpretation, variants were classified according to the American College of Medical Genetics and Genomics (ACMG) guidelines. The Human Gene Mutation Database (HGMD), LOVD (Leiden Open Variation Database) and ClinVar databases were used to search the known disease causing PAX2 variants. All information on the pathogenic variants and variants of uncertain significance (VUS) of known pathogenic genes can be found from the website of www. ccgkdd. com. cn with a guest account (browse only, not download). Nucleotide and amino acid sequence changes are recorded using the following the National Center for Biotechnology Information RefSeq accession number of PAX2 (NM_003987 and NP_003978).

Literature review for PAX2-related disorder
We collected reported loci with references from public database: Human Gene Mutation Database (HGMD, http:// www. hgmd. cf. ac. uk/ ac), Leiden Open Variation Database (LOVD, http:// www. lovd. nl/3. 0/ http:// LOVD. nl/ GENE), Clinvar database (https:// www. ncbi. nlm. nih. gov/ clinv ar) and literature searching. PAX2 function disturbance due to chromosome aberrant is excluded. We conducted a literature search of PubMed database with the search terms "PAX2", "renal coloboma syndrome", "FSGS" and "CAKUT" including systematical review of reported PAX2 mutations, case report of PAX2 related disorders and retro/prospective study to investigate prevalence of genes including PAX2 in diseased population. The search procedure was conducted during the first week of November 2020 (start date: November 2nd, 2020; end date: November 8th, 2020). Systemic analysis was performed on the information of genotype and phenotype. Cases without appropriately described phenotypes of kidney involvement, or cases without pathogenic variants identified were excluded. The phenotypic data were curated from original publications. The clustering analyses and generation of the heat maps were performed using the R packages cluster and gplots, and the function heatmap. Within heatmap, the functions dist and daisy were combined to compute the distance/dissimilarity matrix, respectively.

Protein structural analysis
The structural analysis of PAX2 was performed using crystallographic structure of PAX5 in complex with DNA (PDB accession 1k78). The global sequence alignment between PAX2 and PAX5 indicated the percent identity of 70.44%. The sequence of the paired domain of PAX5 differs from that of PAX2 by just three residues (97,122 and 123), all relatively far from those affected by the mutations, so that the generation of a homology model was not necessary [9]. The effects of all missense variants were modeled with the program FoldX [13,14], both using the protein monomer by itself and the protein: DNA complex. The effects of all missense variants were modeled with the program FoldX, both using the protein monomer by itself (PDB accession 6pax) and the protein:DNA complex (PDB accession 1k78). Default FoldX parameters were used, with ten replicates performed per variant. Variants with ΔΔG > 1.6 kcal/mol (using the same threshold described previously) for the protein monomer were classified as "destabilize folding" (D). Variants with monomer ΔΔG ≤ 1.6 kcal/mol, where the ΔΔG for the full complex was > 0.8 kcal/mol, were classified as "perturb protein: DNA interaction" (P) [15]. All other variants were classified as "unknown molecular effect" (U). In total, 12 different computational phenotype predictors [16] were run for all of the pathogenic and putatively benign (gnomAD) variants in the primary PAX2 isoform (Uniprot ID: Q02962-1). The predicted properties of all variants are provided in Additional file 1: Table S2.

Statistics and analysis
Data were analyzed using Excel. Continuous variables were summarized with median, IQR and categorical data were summarized with proportions. Mann-Whitney test (for continuous variables) and the Fisher exact probability test (for categorical variables) were used to analyze the differences.

Clinical characteristics of patients with PAX2-related disorder
A total of 32 probands of PAX2-related disorder were enrolled in this study of 2256 affected individuals with a wide spectrum of kidney diseases on CCGKDD from 2014 to 2020. Thirty-two patients from 27 families presented initially at a median age of 10 years old (IQR, 5-22) with a female/male ratio being 1:1.3. Patient characteristics were shown in Table 1. Twenty-five probands were initially diagnosed with CAKUT and five of them had concomitant proteinuria or hematuria. Six probands were presented with proteinuria, hematuria and chronic kidney disease (CKD 2-5 stage) with unknown etiology and one with a primary diagnosis of steroid resistant nephrotic syndrome. Two probands were clinically diagnosed of CAKUT at birth through prenatal ultrasound. There were multiple phenotypes of kidney development including renal hypodysplasia (22), vesicoureteral reflux (VUR, 6) cystic kidneys (2) and multicystic dysplasia kidney (1). Renal biopsy was performed in 6 patients providing the histopathological changes of FSGS (3), IgA nephropathy (Hass IV type.1), membranous nephropathy (1) and tubulointerstitial nephropathy (1).
Combined with phenotypes and genotype, the final diagnosis of RCS was established in 9 patients, PAX2related CAKUT was identified in 11 patients and PAX2related nephrosis was identified in 12 patients. None of the missense variants was detected in patients with RCS. Individuals with abnormal kidney structure (RCS or CAKUT group) tended to have a pathogenic LGD variants (Fisher test, p = 0.02). It did not show significant cluster of variants in the paired domain of PAX2 among the patients from the different phenotype group (Fisher test, p > 0.05). Recurrent variants were seen in 16 individuals involved in the paired domain (p.Arg50Trp, p.Val26Glyfs*28, p.Glu74_Thr75dup) or in the transactivation domain (p.Gln376Pro).

Phenotypic cluster analysis identifies LGD variants of PAX2 correlated with RCS
To identify pathogenic variants that either mimic haploinsufficiency or represent hypomorphic alleles, we reviewed the phenotypic and genetic data from all international registry sources and published cases that were accessible to us. In the available literature, we found that 90 reported pathogenic variants in PAX2 from 234 patients with kidney disease (Fig. 1, Additional file 1: Table S1). Among the reported cases of PAX2-related disorders, 147/234 (63.0%) individuals were diagnosed with We performed a cluster analysis to determine whether the missense or LGD variants (variants of deletion, frameshift, insertion, truncating and splice site) in PAX2 could be distinguished on the basis of their phenotypes. It confirmed that RCS was highly correlated with LGD variants (Fig. 2). There were more cases with missense variants presenting with nephrosis compared the cases with RCS, isolated CAKUT or CKD of unknown (Fisher test, p < 0.05, Fig. 3). We have noticed 39 patients with non-RCS who was from 156 individuals with LGD variants. Among them, there were 26 individuals who came from the same family with other affected individuals with RCS.

Protein structural analysis reveals that PAX2 missense variants affecting DNA binding are highly associated with kidney and ocular development deficiency
To explore the molecular basis underlying the PAX2 variants, we investigated the protein structural of PAX2 protein including an N-terminal DNA-binding paired domain and a C-terminal transactivation domain (Fig. 1). The location of pathogenic variants was distributed throughout the multiple domains of the PAX2 protein (Fig. 1). There was obvious clustering of variants in the DNA-binding domain from patients with RCS, isolated CAKUT or CKD of unknown etiology compared with those from the patients with nephrosis (Fisher test, p < 0.05, Fig. 1). Next, the molecular modeling program of FoldX was utilized to predict the effect of PAX2 missense variants that could disrupt folding or interactions with DNA. We classified all 38 pathogenic missense variants into three categories: those predicted to be highly destabilizing to protein structure and disrupt protein folding (D, n = 15), those predicted to perturb the DNA binding (P, n = 10), and those of unknown effect (U, n = 13). Most of the variants (15/16) identified in RCS group (31 individual) were predicted to affect the DNA binding or the stability of protein folding, compared with only 10/22 from the non-RCS group (47 individuals, Fisher test, p = 0.002, Fig. 4, Additional file 1: Table S1). Recurrent missense variants were found in three substitutions located in paired domain and were predicted to unstable the protein structure or protein-DNA interaction (p.Arg71Thr, p.Gly76Ser and p.Pro80Leu). Additionally, the missense variants reported from our CCGKDD cohort did not seem to perturb the interaction with DNA (p.Arg50Trp, p.Pro80Gln, p.Pro149Ser, p.Ala177Thr, p.Thr313Ile and p.Gln376Pro). These six missense variants were carried by the patients with nephrosis or isolated CAKUT.
We also investigated the probability of the existing sequence-based phenotype predictors to identify pathogenic PAX2 missense variants ( Table 2). Those predictor tools showed no significant difference in discriminating RCS from non-RCS variants. However, many perform very well at distinguishing the pathogenic PAX2 missense variants from 223 putatively benign variants present in the gnomAD database (Additional file 1: Table S2). Interestingly, FoldX by use of structure alone, is comparable with the other top-ranking predictors that are largely developed from evolutionary conservation. The results gave evidence of the predicting value on protein structure for the pathogenicity as well as the phenotype of PAX2 missense variants.

Discussion
In this study, we explored the phenotypic and genotypic features in a group of 32 children for the presence of heterozygous PAX2 pathogenic variants. Combining with our research cohort, a system review of 234 reported cases to date with the phenotypic and genotypic information allowed us to define the phenotypic spectrum associated with variants in PAX2. The association of typical RCS to heterozygous loss-of-function PAX2 variants (LGD variants) understandably dominates the human disease literature on PAX2-related disorders. Here we put Ninety reported pathogenic variants in the available literature from 234 patients with PAX2-related kidney disease were enrolled in the clustering analysis. Heat map generated using Manhattan distance with dendrogram and PAX2 variants shown (right panel). All reported pathogenic missense variants in PAX2 (white) and a size-matched cohort of likely/presumed gene disruptive (LGD) variants (MediumAquamarine; deletion, truncating, nonsense and frameshift variants with predicted nonsense-mediated decay) were clustered according to phenotypic features using the R packages cluster and gplots, and function heatmap (Red, renal coloboma syndrome, RCS; Yellow, CAKUT, Blue, nephrosis, NavajoWhite, CKD of unknown etiology, CKDu). The clustering reveals that LGD variants are predominantly associated with RCS, whereas missense variants have a wider phenotypic spectrum that includes distinct clusters of CAKUT or nephrosis or CKD of unknown etiology phenotypes. Detailed versions of the phenotyping data set used for clustering analysis and the heat map, with all of the missense and LGD variants labeled, are available (Additional file 1: Table S1) forward an approach to identified a subset of PAX2 missense variants predicted to affect the protein structure or protein-DNA interaction associated with the phenotype of RCS. A wide variety of clinical phenotypes have been reported in individuals with PAX2-related disorder [8,9,11]. Dysplasia of the optic nerve was the main ophthalmological finding of the disorder, which covered 63% of literature-based cases. Among the nine patients with RCS in this study, five patients did not screen for any fundus diseases until identifying the PAX2 variants, who were subsequently diagnosed with unilateral coloboma. Underdiagnosis of eye lesions in patients might be attributed to the lack of awareness of PAX2-related disorder, especially in patients without any complaints of poor eyesight who were diagnosed with unilateral coloboma. The discrepancy in the diagnosis within the same family has been shown in the literature (Additional file 1: Table S1). It may attribute to the phenotypic varieties during the development and the underdiagnosis of the extrarenal anomalies related to PAX2 variants. Further fundus examination should be conducted for the thirteen patients in current study even with normal visual acuity. It indicated that the genetic diagnosis of PAX2 Fig. 4 Protein structural analysis and DNA interaction of PAX2 missense variants. A Locations of missense variants stratified by phenotype. The structural analysis of PAX2 was performed using crystallographic structure of PAX5 in complex with DNA (PDB accession 1k78). The sequence of the paired domain of PAX5 differs from that of PAX2 by just three residues (97,122 and 123), all relatively far from those affected by the variants, so that the generation of a homology model was not necessary. Residues caused by pathogenic variants in individuals with renal coloboma syndrome (RCS) were shown in red, residues associated with variants in individuals with isolated CAKUT were shown in yellow, and residues associated with variants in individuals with nephrosis were shown in cornflower blue. Cartoon representation showing both N-and C-terminal domains. The fig was done with the molecular visualization software Pymol. B Schematic view of the distribution of PAX2 variants and the energy results predicted by FoldX. The residues are color coded according to its change in stability. Residues with two colors represent the results for different variants in the same position. Residue numbering throughout the article is based on the Uniprot numbering (Q02962-1, Isoform 1). Protein structure was indicated on base of PAX5 in complex with DNA (PDB accession 1k78) with α-helices shaded in gray cylindrical and β-sheets shaded in gray arrow pathogenic variants can help to detect the ophthalmological abnormalities early.
The Paired box (Pax) family act as transcription factors that are required for embryonic development through regulating lineage specification and subsequent morphogenesis of tissues and organs [17]. The conservation of the paired domain is identified through phylogenetic analysis (https:// www. ncbi. nlm. nih. gov/ Struc ture/ cdd/ cddsrv. cgi? uid= smart 00351) [16], indicating the sequence similarity of the paired domain of PAX2 with the other PAX family protein. According to the available crystallographic structure of the PAX2 protein, a N-terminal DNA binding paired box domain and C-terminal homeodomain as the second DNA-binding motif is presented in Fig. 1 [9,15]. In this regard, it is interesting to note that a cluster of variants located in the DNA-binding domain from the patients with development deficiency of kidney or ophthalmology compared that from the patients with nephrosis. We showed that FoldX can be used as a tool to predict DNA-binding specificity on base of the structure of protein-DNA complex. We observed that most of the PAX2 missense variants in the paired domain leading to kidney and ocular deficiency are driven by the loss of stability of the domain, which is coherent with the early concept that development deficiency was due to haploinsufficiency. Indeed, destabilization of the domain could result in its spatial reorganization and a total loss of binding as a truncated protein would do. The protein structural analysis indicated the correlation between the energetic and structural effects of the missense variants and their phenotypic outcome. Even the previous studies did not reveal a consistent genotype-phenotype correlation [6], we identified a subset of missense variants associated with the phenotype of RCS which perturbs the interaction of the DNA-binding domains. Previous function studies have proved the weak transactivation activity of the PAX2 dominant variants (p.Arg56Gln, p.Pro80Leu and p.Ser133Phe) was due to decreased proteins-DNA binding [9]. Prediction through the structure-based protein design analysis can help to discriminate between the phenotype of RCS and phenotype of isolated CAKUT or nephrosis. It can allow the accurate prioritization of missense variants in PAX2 when assessing a potential effect of pathogenic variant detected in utero. Importantly, however, the molecular prediction alone cannot explain all the phenotypic heterogeneity observed among PAX2 variants. It was highlighted by the fact that there are several examples in our dataset of different patients with the same variant (p.Val26Glyfs*28; p.Thr75dup) exhibiting different phenotypes. Recent studies demonstrated substantial genetic complexity underpinning renal diseases, including some well-documented cases of digenic inheritance [18]. Further work on epigenetic or digenic mechanism could provide novel insights into phenotypic heterogeneity in PAX2 related disorders.
Our study had several limitations. Firstly, we did not report much more extrarenal phenotypes involved in multiple systems (i.e. skeletal deformity, Mullerian duct anomalies, et al.) in our patients. PAX2 gene is expressed in many tissues besides the kidney and eye, including the optic vesicle, genitourinary tract, pancreas, cerebellum, hypothalamus, and midbrain/hindbrain boundaries [6]. It