A 3-biomarker-panel predicts renal outcome in patients with proteinuric renal diseases

Clinical and histological parameters are valid prognostic markers in renal disease, although they may show considerable interindividual variability and sometimes limited prognostic value. Novel molecular markers and pathways have the potential to increase the predictive prognostic value of the so called “traditional markers”. Transcriptomics profiles from laser-capture microdissected proximal tubular epithelial cells from routine kidney biopsies were correlated with a chronic renal damage index score (CREDI), an inflammation score (INSCO), and clinical parameters. We used data from 20 renal biopsies with various proteinuric renal diseases with a median follow-up of 49 months (discovery cohort). For validation we performed microarrays from whole kidney biopsies from a second cohort consisting of 16 patients with a median follow-up time of 28 months (validation cohort). 562 genes correlated with the CREDI score and 285 genes correlated with the INSCO panel, respectively. 39 CREDI and 90 INSCO genes also correlated with serum creatinine at follow-up. After hierarchical clustering we identified 5 genes from the CREDI panel, and 10 genes from the INSCO panel, respectively, which showed kidney specific gene expression. After exclusion of genes, which correlated to each other by > 50% we identified VEGF-C from the CREDI panel and BMP7, THBS1, and TRIB1 from the INSCO panel. Traditional markers for chronic kidney disease progression and inflammation score predicted 44% of the serum creatinine variation at follow-up. VEGF-C did not further enhance the predictive value, but BMP7, THBS1 and TRIB1 together predicted 94% of the serum creatinine at follow up (p < 0.0001). The model was validated in a second cohort of patients yielding also a significant prediction of follow up creatinine (48%, p = 0.0115). We identified and validated a panel of three genes in kidney biopsies which predicted serum creatinine at follow-up and therefore might serve as biomarkers for kidney disease progression.


Background
In biopsy-proven renal diseases several clinical and histopathological features have been established as markers indicative of progression [1][2][3][4][5]. Nonetheless, most of these traditional risk markers have limited accuracy and reliability, which may be improved by including further molecular markers. Microarray technology and integrative bioinformatics strategies resulted in the identification of novel molecular features, multi-gene expression patterns, and biological pathways being associated with renal disease progression. In native kidney disease, Reich and colleagues identified a panel of eleven genes expressed in kidney biopsies which were related to the degree of proteinuria and allowed to distinguish biopsies from patients with IgA nephritis (IgAN) from control subjects [6]. Boettinger and coworkers identified a panel of 30 TGF-beta 1-related transcripts expressed in the renal tubulointerstitial compartment which correlated significantly with eGFR in patients with CKD I-V [7]. Our group showed that diminished renal tubular expression of VEGF-A and increased expression of hypoxia response genes at time of biopsy better predict renal outcome in CKD patients than serum creatinine and proteinuria [8]. In zero-hour renal allograft biopsies Perco et al. reported a panel of three genes enhancing the predictive value for allograft function one year after kidney transplantation by 2-fold as compared to traditional markers of allograft function [9]. Einecke et al. published a panel of 30 genes from "for cause"-kidney transplant biopsies which predicted graft loss better than pathohistological features or function at time of biopsy [10]. More recently, Sellares et al. developed a molecular score for antibody-mediated rejection in kidney grafts consisting of the expression values of 30 specific genes, which significantly predicted future graft failure [11]. Pathways that have been identified as highly affected in high-throughput Omics studies in the context of progressive native kidney disease included for example the VEGF-signaling and hypoxia response pathways in various glomerulonephritis [8], or the NF-κB module NFKB_IRFF_01 pathway in diabetic nephropathy [12]. These studies further underline a substantial benefit of multi-gene over single-gene patterns. However, most gene expression studies on the association of renal transcriptomics and kidney function decline are cross-sectional with only a limited number of longitudinal studies available.
The condition of the tubulointerstitial compartment and in particular of proximal tubular epithelial cells (PTECs) plays a pivotal role in the progression of chronic renal failure. A variety of mechanisms possibly responsible for renal function decline have been identified in this context, such as tubular atrophy, intersititial fibrosis, tubulointerstitial hypoxia, capillary rarefaction, impaired angiogenesis, epithelialmesenchymal transition and inflammation [13]. Importantly, these histopathological features can be found in various renal diseases, such as FSGS, minimal change disease, lupus nephritis or IgA nephropathy, independently from diagnosis and correlate with poor renal prognosis. Therefore, we and others focused on gene-expression profiles derived from microdissected PTECs or from the tubulointerstitial compartment [8,12].
In this project we followed a longitudinal study setup utilizing transcriptomics data from two of our recent studies [8,14]. The given gene expression sets derived from laser-capture micro dissected (LCM) human renal proximal tubule cells were correlated to histological characteristics and to renal function after a median follow up of 49 months to identify candidate markers for the prediction of renal disease progression. The results from the first cohort were validated in an independent cohort of whole kidney biopsies.

Renal biopsies, RNA isolation and microarray hybridization
In the current study we analyzed pre-existing microarray expression data of laser-capture microdissected (LCM) PTECs from 50 renal biopsy samples from patients with proteinuric kidney diseases (discovery cohort) from two of our previous studies [8,14]. Of these samples those with insufficient material for detailed histological assessment of renal damage and inflammation score (see below) were excluded. Additionally, we excluded those that were not on a stable dosing of immunosuppression, where applicable (e.g. Lupus nephritis), prior to biopsy or developed AKI within 1 week after biopsy. Finally, 27 renal biopsy samples (discovery cohort) fulfilled these criteria and were in depth analysed for histological signs of damage and inflammation (see below).
The degree of glomerular sclerosis (gs), interstitial fibrosis and tubular atrophy (ifta) as well as interstitial inflammation (ii) was assessed for each of the biopsies as follows: 0 = 0% (none), 1 = 1 -10% (slight), 2 = 11 -25%, 3 = 26 -50% (moderate) and 4 > 50% (severe). The chronic renal damage index (CREDI) for each biopsy was derived from the sum score of gs and ifta. The degree of interstitial inflammation resulted directly in the inflammation score (INSCO). Follow-up laboratory data was available for 20 of these patients, with a median follow-up time of 49 months (range: 29 -68 months).
A second group of patients with proteinuric kidney diseases was used for validation. We analyzed microarray expression data of whole kidney biopsies from 16 renal biopsy samples from patients with proteinuric kidney diseases (validation cohort). The degree of gs, ifta and ii was assessed for each of the biopsies as stated above. The median followup time was 28 months (range: 1 -72 months).
In the discovery and the validation cohort progressive disease was defined as either doubling of serum creatinine or reaching end-stage renal disease during follow-up, all other patients were defined as stable. For data security purposes individual ages are not stated in the tables; instead age groups have been defined: group 1 = age <30 years, group 2 = age 30-45 years, group 3 = age 46-60 years, group 4 = age > 60 years. However, we used individual ages for statistics concerning age comparing progressive and stable patients.
The LCM, RNA isolation, and microarray hybridization have been described in detail in previous work [14,15]. In brief, PTECs were stained for alkaline phosphatase using 4-nitro blue tetrazolium chloride/5-bromo-4chloro-3-indolyl phosphate under RNase-free conditions, and the cells were isolated using the PixCell IIs Laser Capture Microdissection System and CapSure™ LCM Caps (Arcturus, Mountain View, CA, USA). Total RNA was isolated using the Pico Pure™ RNA Isolation Kit (Arcturus, Mountain View, CA, USA). Owing to low RNA amounts, we performed two rounds of linear RNA amplification using the RiboAmp™ RNA Amplification Kit (Arcturus, Mountain View, CA, USA). The quality of the amplified RNA was assessed by spectrophotometry (A260/280) and with the Agilent Bioanalyzer and RNA6000 LabChip™ Kit (Agilent, Palo Alto, CA, USA). cDNA-microarrays were obtained from the Stanford Functional Genomics Facility (https://microarray.org/ sfgf/). The arrays contained 41 792 spots, representing 30 325 genes assigned to a UniGene cluster and 11467 ESTs. Arrays were scanned using a GenePix 4000B microarray scanner, and the images were analyzed with the GenePix Pro 4.0 software (Axon Instruments, Union City, CA, USA). All samples were processed in technical duplicates and gene expression values were averaged.
The Institutional Review Board (IRB) of the Medical University of Innsbruck (Ethikkommittee Medizinische Universtität Innsbruck) accredited the use of surplus material from routine for-cause renal biopsies for research purposes. A written consent was not obtained (and not required from the IRB) as the biospies were performed in the context of clinical routine and have been analyzed after patient treatment and hospitalization were ultimately completed. All other patient records have been anonymized and de-identified prior to analysis.
Establishment and validation of a biomarker model predictive for follow-up creatinine Correlation of genome-wide gene expression with histological scores We included genes with intensity values of more than 2.5 over background and a valid signal in more than 80% of the processed arrays in further analysis. Pearson correlation coefficients were calculated between gene expression values and the CREDI as well as the INSCO scores, respectively.
Significantly correlated genes (p-value < 0.01) with at least one of the two histological scores were functionally annotated using gene ontology terms as provided by the SOURCE tool (http://source.stanford.edu). We furthermore searched for biological pathways being enriched or depleted with respect to the set of significantly correlated genes using the PANTHER (Protein ANalysis THrough Evolutionary Relationships) Classification System [16][17][18].

Linear regression model for prediction of creatinine at follow-up
Univariate linear regression models including expression values of candidate genes were computed to predict serum creatinine values at follow-up. For genes showing a significant prediction (p < 0.05) kidney-specific gene expression was evaluated using information as provided by the SOURCE tool. We focused on kidney-specific genes by purpose as microarray data derived from lasercapture microdissected renal proximal tubule cells. From the set of candidate genes, the ones showing high correlation to serum creatinine but low Pearson correlation coefficients in their pairwise comparison (Pearson R < 0.5 and > −0.5, respectively) were selected for building multivariate regression models. This procedure was applied for avoiding the inclusion of highly correlated variables into one model. Multiple linear regression models for the prediction of kidney function were established using a combination of genes based on either the CREDI or the INSCO panel along with the gold standard parameters INSCO, CREDI, creatinine and proteinuria at time of biopsy. The best model was generated in a step-wise selection procedure. In order to get a 95% confidence interval for the coefficient of determination we applied a bootstrap sampling using 2000 case resampled datasets. The confidence interval was computed by the bias corrected algorithm [19].
For validation of the model found in the discovery cohort we performed linear regression analysis for the predictive value in a second cohort of patients (validation cohort).
For a better comparison of our data with published literature we also calculated sensitivity and specificity for each biomarker candidate using the renal endpoints "end stage renal disease (ESRD)" and "doubling of serum creatinine" for definition of progressive kidney disease (see Tables 1 and 2). Using Youden's J statistics [20] (J = Sensitivity + Specificity -1) we calculated the optimal cut-off values [micro array fluorescence intensity values] for each biomarkers with respect to maximized sensitivity and specificity.

General statistics
The following procedure was carried out to test significance of findings: Values from stable and progressive patients were tested for Gaussian distribution using Kolmogorov-Smirnov-Test unless categorical data. In case of non-Gaussian distribution non-parametric unpaired Kruskal-Wallis test was applied. In all other cases Student's T-Test was used. P values below 5% were defined as statistically significant.

Results
Establishment of a biomarker model predictive for follow-up creatininediscovery cohort Patient characteristics Detailed patient characteristics were published previously and are summarized in Table 1 [8,14]. The proportion of female patients was 37%. Ten patient samples showed a CREDI of less than or equal to 2, seven samples had a CREDI of 3 to 4, and ten samples showed severe damage with a CREDI score above 4. Twenty-three samples had INSCO values of 0 or 1, three samples had a score of 2, and one patient sample was scored with a value of 3. There was a significant correlation between CREDI and INSCO (R = 0.57, p < 0.05). Twenty patients with sufficient clinical follow-up data were included in the subsequent correlation analysis of CREDI and INSCO associated genes with kidney function during follow-up. There were

Correlation of gene expression with chronic renal damage index
Transcriptomics profiles from the 20 samples were correlated to CREDI. The expression status of 562 unique genes correlated significantly with CREDI. Of these 562 genes 222 showed a positive correlation (i.e. being upregulated) with increasing damage score, and 340 genes showed a negative correlation (i.e. being downregulated) with increasing damage score, respectively (Additional file 1: Table S1). Pathways found to be enriched either in positively or negatively correlated genes are summarized in Table 3. Among others the insulin-like growth factor-, the transforming growth factor-, and the integrin signaling-pathway were significantly enriched with genes in the list of 222 positively correlated genes. Using the 340 downregulated genes we identified 27 significantly enriched pathways. These pathways included, among others, metabolism (e.g. ATP synthesis) signaling (e.g. FGF-signaling, EGF receptor signaling, VEGF-signaling, angiogenesis), and inflammation (e.g. T-cell activation, B-cell activation).

Correlation of gene expression with inflammation score
Next, we correlated gene expression data from the same 20 samples to INSCO. The expression values of 285 genes correlated with INSCO, of which 150 showed a positive and 135 showed a negative correlation (Additional file 1: Table S2). In the 150 genes three pathways were significantly overrepresented, namely the chorismate biosynthesis-, the nicotinic acetylcholine receptor signaling-, and the mannose metabolism-pathway. Thirteen pathways were identified on the basis of the 135 negatively correlated genes (Table 3). Several pathways linked to amino-acid synthesis such as leucine-, alanine-, valine-, isoleucine-and serine glycine biosynthesis-pathways. Furthermore, mRNA splicingand general transcription by RNA polymerase I-pathways were identified as being enriched with down regulated genes.

2-step approach for identification of candidate genes for prediction of renal outcome
We correlated the expression values of genes showing statistically significant association to CREDI (562 genes) and INSCO (285 genes) with follow-up creatinine. Significant correlation (p < 0.05) was identified for 39 genes from the CREDI group, and for 90 genes from INSCO set of genes, respectively. To maximize the predictive value of selected genes we applied a 2-step approach: Kidney specificity of the genes was evaluated using information as provided by the SOURCE tool (http://source.stanford.edu). Five (CREDI) and ten (INSCO) genes were found to be highly expressed in human kidney (Table 4). We further evaluated the pairwise correlation of expression to identify those candidate genes showing the lowest correlation in expression, thus having an independent predictive value for follow-up creatinine. Four CREDI genes (vascular endothelial growth factor C (VEGF-C), podoplanin (PDPN), semaphorin 6A (SEMA6A), integrin beta 6 (ITGB6)), and six INSCO genes (thrombospondin 1 (THBS1), tribbles homolog 1 (TRIB1), bone morphogenetic protein 7 (BMP7), chordin-like 1 (CHRDL1), apoptosis-inducing factor 1 (AIFM), syntaxin 7 (STX7)) were identified following this approach.

Linear regression analysis
The predictive value of classical markers, creatinine at time of biopsy and histopathological grading, was calculated, and the additive predictive value of a gene-panel of CREDI or INSCO genes was calculated. Adjusted  Table 5). The best model after a step-wise selection using all traditional markers and the CREDI genes resulted in a model consisting of VEGF-C and INSCO with a comparable predictive value of 0.51 (p = 0.0009).
Using the INSCO genes together with the traditional parameters resulted in a model consisting of the three genes thrombospondin 1 (THBS1), bone morphogenetic protein 7 (BMP7), and tribbles homolog 1 (TRIB1). The predictive value of this model was 0.94 (p < 0.0001), thus predicting 94% of the variation of follow-up creatinine for the given sample set with a median follow-up time of 49 months ( Table 5). The bias corrected bootstrap confidence interval for the coefficient of determination was 0.558 -0.987. These three genes were found to be significantly differentially expressed when comparing array data from stable and progressive patients. In progressive patients we found a significantly higher expression of THBS1 (p = 0.009) and TRIB1 (p = 0.011), whereas the expression levels of BMP7 were significantly lower (p = 0.007) as compared to stable patients.

Validation cohort Patient characteristics
In order to validate the established biomarker model microarrays were performed using whole kidney biopsies of a 2 nd cohort of patients with proteinuric kidney diseases (n = 16). The patient characteristics are shown in detail in Table 2. This cohort consisted of 25% females. There was no significant difference in age, follow-up time, creatinine/proteinuria at time of biopsy comparing stable and progressive patients. Comparing the clinical data with those from the discovery cohort we found also no significant difference, except a shorter follow-up time (49 vs. 28 months).

Linear regression analysis
The predictive value of the model creatinine at time of biopsy plus the three biomarkers of the INSCO-panel was calculated. The predictive value of this model was 0.4852 (p = 0.0115), thus predicting 48% of the variation of follow-up creatinine for the given sample set with a median follow-up time of 28 months. Comparing the predictive value of the traditional parameter creatinine at time of biopsy with the 3-biomarker panel yielded in a significantly better prediction of follow-up creatinine using the 3 biomarkers (p = 0.0236).
Again, these three genes were found to be differentially expressed when comparing array data from stable and progressive patients; however only TRIB1 reached statistical significance (p = 0.023).

Discussion
In this project we performed a transcriptomics approach to identify tubular gene expression profiles associated with (i) glomerulosclerosis, tubular atrophy or interstitial fibrosis, or (ii) interstitial inflammatory infiltration. The histopathological features of tubular atrophy and interstitital fibrosis correlate better with poor renal prognosis compared to glomerular lesions, which is accordance with previously published data. The identified activation of features assigned to IGF-, PDGF-, TGF beta-and integrin-pathways in fibrotic tissue is in line with data from various groups [21][22][23]. More interestingly, a number of downregulated features was enriched in specific pathways including several receptor-and intracellular signaling-, metabolism-, cell cycle-and angiogenesis-pathways. The impact on ATP-synthesisand cell cycle-pathways is in accordance with data showing that hypoxia in particular in proximal tubule cells depletes the cells of ATP, induces mitochondrial fragmentation, and finally leads to apoptosis [24].
We next correlated tubular gene expression with the presence of interstitial inflammation. Several pathways linked to amino acid synthesis, mRNA-splicing, transcription, PDGF-signaling, apoptosis and p53-signaling showed enrichment in downregulated genes. Seven of the 13 pathways were defined as significantly overrepresented because of the presence of two genes: phosphoserine aminotransferase 1 (PSAT1) and branched chain amino-acid transaminase 2, mitochondrial (BCAT2) (data not shown). PSAT1 as a member of the PLP biosynthesis and the serine glycine biosynthesis pathway is involved in serine, glycine and threonine metabolic pathways, and also in the biosynthesis of vitamin B6 (pyridoxine). BCAT2 is expressed in mitochondria and catalyzes the first step in the production of the branched amino-acids leucine, isoleucine and valine. These results may emphasize the role of mitochondrial dysregulation in CKD [24][25][26].
In the next step we aimed at the identification of novel molecular markers indicative for progression of chronic renal failure. Since the extent of tubulointerstitial fibrosis represents an established risk factor for kidney disease progression, we hypothesized that genes from the CREDI panel will outperform genes from the INSCO panel regarding the predictive value. Surprisingly, only 39 of 562 (7%) CREDI genes also correlated with serum creatinine during follow up, while 90 of 285 (32%) INSCO genes showed a significant correlation. Furthermore, INSCO itself predicted 44% of follow-up creatinine variation (p = 0.0012), while the predictive value of CREDI was not significant. It has been generally accepted that tissue fibrosis and inflammation both contribute to progressive renal scarring and are finally associated with renal function decline [27]. Although some uncertainty exists about details of the causal and chronological relationship, it has been generally proposed that renal injury is followed by recruitment of inflammatory cells, release of fibrogenic cytokines, and finally the activation of collagen-producing cells [28]. Our results of significant correlation of inflammation but not fibrosis with renal function decline point towards a biopsy bias favouring acute inflammatory glomerulonephritis rather than slowly progressing fibrosing renal disease. On the other hand these findings might also be compatible with the chronological sequence of inflammation followed by fibrosis processes.
We identified a panel of three genes which were highly predictive for the variation of follow-up creatinine after a median follow up time of more than 4 years. Using the expression values of THBS1, BMP7 and TRIB1 we were able to predict an additional 43% of creatinine variance on top of traditional progression markers, being creatinine and inflammation score at time of biopsy. Again, all three genes were identified from the inflammation gene panel further emphasizing the relevance of tissue inflammation in progressive renal failure.
In order to validate this model found in microdissected renal proximal tubule cells, we performed microarrays from whole kidney biopsies in a 2 nd cohort of patients suffering from proteinuric kidney diseases. We decided to use whole kidney biopsies by purpose, as LCM of renal proximal tubule cells is not a standard procedure in histopathological work-up after kidney biopsy and we were interested if genomic patterns derived from LCM proximal tubule cells can be found in the background signal from all other cell types in whole kidney biopsies. Using our 3-biomarker model we were able to predict 48% of creatinine variance at follow up (p = 0.0115).
We used creatinine at follow-up as the clinical outcome parameter in this project, since worsening of creatinine (e.g. doubling of serum creatinine) and reaching end-stage renal disease (ESRD) are the only established endpoints in clinical renal research approved by the food and drug administration (FDA). Alternatively, one can use eGFR or eGFR slope as parameter for changes in kidney function, but eGFR formulas (MDRD, CKD-EPI) are not reliable at eGFR values above 60 and below 20 ml/min/1.73 m 2 [29][30][31]. In our cohorts (Tables 1  and 2) a substantial number of patients are in these GFR ranges, hence we did not correlate gene expression data with eGFR or eGFR slope. However, creatinine at follow-up showed a strong and significant correlation (Pearson R = 0.848, p < 0.0001) with creatinine slope, and also the three identified biomarker candidates showed a significant correlation with creatinine slope   The primary approach of this study was to establish a model to predict serum creatinine levels at the follow-up time point. By establishing multiple linear regression models using continuously distributed biomarker candidates and gold standard parameters (e.g. creatinine or proteinuria at time of biopsy), we identified a model consisting of 3 biomarkers (THBS1, TRIB1 and BMP7). In addition we calculated sensitivity and specificity of each of these markers to facilitate the interpretation of our data in the context of published literature [32,33]. For this purpose we defined progression of chronic kidney disease using established parameters, i.e. doubling of serum creatinine and ESRD. Furthermore, best cut-off values for the single markers were calculated using Youden's statistics [20]. We found that THBS1 was the best marker, followed by BMP7 and TRIB1. It was surprising that THBS1, providing a sensitivity and specificity of 1.00 in the discovery cohort, performed that well. However, given the limited number of patients investigated these results should not be generalized to other populations without caution. We are currently planning to establish a staining of our three biomarkers in patient kidney biopsy samples in order to prospectively address the predictive value of this biomarker.
Concerning the role of TRIB1, Kiss-Toth and coworkers first described tribbles homologes as MAPK activity controlling proteins [34]. Recently, it was shown that TRIB1 might be utilized in renal allografts as biomarker for chronic antibody-mediated rejection [35]. Furthermore, TRIB1 expression was correlated to non-diabetic end-stage renal disease [36]. BMP7 is a member of the TGF-beta superfamily and has been shown to be implicated in regulation of renal function and determination of the number of renal progenitor cells [37]. In particular, various groups have described the pivotal role of BMP7 in EMT, but results are controversial. Xu et al. [38,39] published data providing evidence that BMP7 exerts antifibrotic effects via blocking and reversing TGF-beta 1 induced EMT, whereas Dudas and colleagues did not find an inhibitory effect of BMP7 on TGF-beta 1 mediated EMT [40]. In a previous project we were able to show increased expression of BMP7 mRNA and protein in renal tubule cells in biopsies from proteinuric renal diseases as compared to controls [14]. However, most of the patients included in this former study showed a stable course of kidney disease with virtually no decline of kidney function over time. Recently, a low BMP7 RNA levels were proposed to be an early marker of renal allograft dysfunction [41]. In amyloidosis patients Denizli et al. described a non-significant correlation of high levels of BMP7 and CKD progression [42]. Thrombospondin 1 is a subunit of a disulfide-linked homotrimeric protein. This protein is an adhesive glycoprotein that mediates cell-to-cell and cell-to-matrix interactions. Hugo et al. [43] have shown that THBS1 precedes and is an early marker for the development of tubulointerstitial kidney disease, which might be explained by its role as an endogenous activator of TGF-beta in type 1 diabetes [44] and fibrotic renal disease [45]. Cui et al. have shown that THBS1 is an important mediator of obesity-induced kidney dysfunction [46]. Recently, a THBS1 short hairpin RNA suppressed peritubular capillary injury and tubulointerstitial fibrosis in unilateral ureteral obstruction (UUO)-induced renal fibrosis [47].
To summarize, THBS1 and BMP7 are involved in TGF beta pathways, which have been shown to be activated in progressive CKD. This paper has several limitations. More clinical and demographic data are needed to adjust the calculated models to confounders, such as hypertension or hypercholesterolemia. However, it was not possible to assess this information from the patients' records. Another interesting issue would have been to analyse the gene expression patterns in response to therapeutic interventions. First, the cohorts and the interventions (if there were any at all) were to heterogenous to collect and to analyze data in a reliable manner. Second, some of the patients were not treated at our center, so a significant proportion of intermediate follow-up data is missing. And third, we did not continuously perform kidney biospies during the follow-up, so the outcome of the interventions on the histological and genomic level is unclear. Hence, it was not possible to investigate the predictive value of changes of gene expression in response to treatment.
We correlated gene expression data with the variability of creatinine at follow-up, which might be a very hard endpoint and thus more subtle transcriptomic changes are missed. Due to the limitations of eGFR mentioned above we did not correlate the expression of the transcripts with delta eGFR. However, the significant correlation between creatinine at follow-up and creatinine slope, and between the expression values of the three biomarker candidates and creatinine slope corroborates our results.
Certainly, the degree to which the 3-gene panel can be generalised to other chronic kidney disease cohorts needs to be tested in further studies in larger validation sample sets, as the predictive power delineated on the given cohort is probably an overestimation of the true predictive value due to the small size of the 1 st study cohort. However, the bootstrap sampling using 2000 case resampled datasets resulted in a bias corrected confidence interval of 0.558 -0.987, suggesting nevertheless a higher predictive value of these 3 genes than a combination of traditional parameters such as creatinine and INSCO. The procedure for delineating such gene sets as presented in this work might well be applicable for larger cohorts. Additionally, we have analysed the 3biomarker model in a 2 nd validation cohort of patients and also found a predictive value, which was statistically significant, for follow-up creatinine.

Conclusion
We identified distinct gene expression profiles from lasercapture microdissected renal tubule cells associated with chronic renal damage or inflammation. The 3-gene panel THBS1, BMP7 and TRIB1 from the inflammation gene panel predicted follow-up creatinine significantly better than traditional markers such as serum creatinine at time of biopsy and the presence of inflammatory infiltrates in the biopsy. These data were validated using gene expression profiles from whole kidney biopsies in a second cohort of patients.

Additional file
Additional file 1: Tables S1 and S2. Show the 562 and 285 genes that were either negatively or positively correlated with CREDI and INSCO, respectively. Gene names, accession numbers, UniProt ID, Gene IDs, GO-Terms (abbreviated) and a few other possible important information concerning the respective genes are listed in the first 12 collumns. Diagnosis and CREDI/INSCOvalues are listed in the first 2 lines above each patient (patient numbers (as in Table 1) are listed in the third line above expression values). The last 5 collumns show the number of missing expression values per transcript (blanks) and if at least 80% of signals in the microarray per gene met our quality standards (80% filter; 1=yes, 0=no); furthermore correlation coefficient (Pearson) and t-and p-values are listed for each gene.