Short and long term gene expression variation and networking in human proximal tubule cells when exposed to cadmium

Cadmium (Cd2+) is a known nephrotoxin causing tubular necrosis during acute exposure and potentially contributing to renal failure in chronic long-term exposure. To investigate changes in global gene expression elicited by cadmium, an in-vitro exposure system was developed from cultures of human renal epithelial cells derived from cortical tissue obtained from nephrectomies. These cultures exhibit many of the qualities of proximal tubule cells. Using these cells, a study was performed to determine the cadmium-induced global gene expression changes after short-term (1 day, 9, 27, and 45 μM) and long-term cadmium exposure (13 days, 4.5, 9, and 27 μM). These studies revealed fundamental differences in the types of genes expressed during each of these time points. The obtained data was further analyzed using regression to identify cadmium toxicity responsive genes. Regression analysis showed 403 genes were induced and 522 genes were repressed by Cd2+ within 1 day, and 366 and 517 genes were induced and repressed, respectively, after 13 days. We developed a gene set enrichment analysis method to identify the cadmium induced pathways that are unique in comparison to traditional approaches. The perturbation of global gene expression by various Cd2+ concentrations and multiple time points enabled us to study the transcriptional dynamics and gene interaction using a mutual information-based network model. The most prominent network module consisted of INHBA, KIF20A, DNAJA4, AKAP12, ZFAND2A, AKR1B10, SCL7A11, and AKR1C1.


Introduction
Cadmium is a toxic heavy metal that is widely distributed due to industrial pollution such as mining, refining of metals, burning of fossile fuels, battery production and the use of tobacco. The health threats presented by cadmium has been increasingly recognized by government agencies, healthcare providers and the general public [1]. One of the primary target organs of cadmium toxicity is the human kidney where cadmium accumulates and induces a number of adverse events. The proximal tubule is the main site for cadmium accumulation and the site of cadmium induced toxicity. When the levels of cadmium exceed a tolerance level of the intracellular defense systems, such as metallothionein and glutathione, cells become susceptive to the toxic effect of this heavy metal. [2,3]. Under high dose acute exposure, cadmium is known to cause proximal tubule necrosis. On the other hand, a few cohort studies showed chronic low doses of cadmium can also induce proximal tubule toxicity [4,5]. Toxicokinetic models indicated that the half-life of cadmium in humans is between 15 and 20 years [6]. This suggests that low level exposure of cadmium may be harmful to human kidney via the long term accumulation of this metal in proximal tubule. In this study, we investigate the acute and chronic effects of cadmium on human proximal tubule (HPT) cells. The HPT cell culture system developed in our lab has been shown to have important properties that resemble human proximal tubule [7]. The ability of forming dome by confluent cell monolayers implies the HPT cells retain the vectorial transport activity. The cells are mortal and exposure to high concentration of cadmium induces a necrotic mechanism of cell death [8,9]. By exposing HPT cells to various doses of cadmium in either a short period of 1 day or a long period of 13 days, we expect different physiological responses to be elicited which represent the acute and chronic responses to cadmium toxicity. In order to investigate the underlying molecular mechanisms of cell toxicity and compensatory cellular defenses related to cadmium exposure, we profiled the global gene expression in HPT cells exposed to increasing concentrations of this metal during short-term (1 day) and long-term (13 days) exposure.
An initial analysis of the microarray data has previously reported a number of genes that were differentially expressed between cadmium treated and control samples [10]. In this manuscript, we investigated how genes respond to different doses of cadmium and how the gene network is formed in response to cadmium toxicity. The high cost of microarray chips prevents researchers from collecting microarray data with large sample sizes; thereby, limiting the power of statistical analysis. In this study, we only have one sample for each condition-time point, however, the overall experimental design enabled us to investigate the gene expression profiles correlated with cadmium concentration. Furthermore, we developed a gene set enrichment analysis (GSEA) method that can be applied to regression-based tests for individual genes, while traditional GSEA methods focus on two-sample comparison analysis, such as t test.
This study adopted a complicated design by perturbing HPT cells with a variety of doses of cadmium and under various time periods that resulted in variation and dynamics of global gene expression. Therefore, it provides a unique opportunity to investigate the dynamic change of gene expression levels and the network structure of gene interactions in a well-designed experimental environment. A mutual information-based network model [11] was used to investigate the gene networking in response to cadmium toxicity and a network module with a small set of activated genes with high interaction between each other was identified.

Materials and methods
Cell culture and total RNA collection The HPT cell culture system and cadmium treatment has been described previously [7,10]. RNA was purified from triplicate cultures of HPT cells by the RNeasy Mini kit (Qiagen, Valencia, CA). The integrity and quality of RNA was ensured via the ribosomal bands using agarose gel electrophoresis.

Microarray analysis
Microarray experiment was performed by Genome Explorations Inc. (Memphis, TN) using Affymetrix GeneChip Human Genome U133 Plus 2.0 arrays, which cover over 47,000 human transcripts and variants representing approximately 39,000 of the best characterized human gene transcripts and expressed sequence tags (ests) (Affymetrix, Santa Clara, CA). The scanned images were analyzed using programs resident in Gene-Chip Operating System v1.4 (GCOS; Affymetrix). The raw data was processed and normalized using MAS 5.0 statistical algorithm that gave detection call regarding signal present or absent for each probe. The correlation of gene expression with cadmium concentrations was assessed using linear regression. The gene signal pathways that were activated by cadmium were identified using gene set enrichment analysis (GSEA) [12]. The R package GSA that was based on standardized statistics was employed for gene set enrichment analysis [13]. The gene list for each pathway was obtained from the Kyoto Encyclopedia of Genes and Genomes (KEGG) http://www.genome.jp/kegg. Cluster analysis was used to group the microarray samples using hierarchical clustering method. The distances between samples were calculated by Pearson dissimilarity and the agglomeration of clusters was performed by Ward linkage method.

Networks analysis
The relations between genes were defined by mutual information [11]. The resulting adjacent matrix was used for networks analysis. Network modules were identified by hierarchical clustering. The set of genes that have short distances between each other and large distance with other genes was defined as a network module. The most prominent network module was identified for further investigation. All data analyses were performed using R statistical programming language.

Results
Gene expression variation in response to acute (1 day) cadmium toxicity The HPT cells were exposed to various levels of concentrations of Cd 2+ for a short term of 24 hours. The Cd 2+ concentrations were 0, 9, 27, and 45 μM. Our hypothesis is that Cd 2+ exposure will elicit an acute toxicity response to HPT cells within one day. As reported before, the HPT cells exhibited a stress response during 24 hour exposure to increasing concentrations of Cd 2+ and showed visible signs of toxicity at the high dose [10]. The goal of the current study is to discover the genes that are responsive to the short-term exposure of Cd 2+ using messenger RNA microarray data. For each Cd 2+ concentration, a sample of HPT cells was harvested, total RNA was purified and submitted for microarray analysis. The Affymetrix U133 Plus 2.0 array was used for this analysis. When using MAS 5.0 for testing, we found 28,720 genes that were present in at least one sample. These genes were used for subsequent analysis. In order to identify the genes whose expression levels were dependent on Cd 2+ concentrations, we conducted regression analysis for correlating gene expression with concentrations. This resulted in 4,036 genes that have significant P values (< 0.05). However, none of them had a false discovery rate below 0.05 due to the small size of only 4 samples. Figure 1a shows the P value distribution that was tilted to low values implying that many genes had positive responses to acute Cd 2+ toxicity. If only a few genes had positive responses, the P values would be evenly distributed. Using a P value cutoff of 0.01, we reported 403 genes that were positively correlated with cadmium concentration (Table 1), and 522 genes that were negatively correlated ( Table 2). The positively correlated genes in the top list included heat shock genes like HSP40, HSP70 and HSPBAP1, and ion and anion transporters SLC7A11, SLC16A1, SLC20A2, SLC25A37 and SLC7A1. Both the highest positively and negatively correlated genes contained cell cycle arrest and cell death P values genes such as G0S2, GSPT1 and BAG5. These activated genes represented a stress response for the acute Cd 2+ toxicity.
Our next aim was to identify the gene regulatory pathways that were activated in the HPT cells in response to the acute, 1 day exposure to Cd 2+ . The gene pathway database compiled in Kyoto Encyclopedia of Genes and Genomes (KEGG) has been a valuable source for investigating pathways. The gene set enrichment analysis (GSEA) method was employed to identify activated KEGG pathways using the microarray data. The approach of GSEA is to test whether the ranks of P values of a set of genes are randomly distributed among all the P values [12]. Because most of GSEA methods employ two sample comparison method such as unpaired or paired t test [13], we set out to develop a statistical testing method for GSEA based on regression analysis for each individual gene. As described earlier, out of a total of 28,720 genes,  Using this function, we can test whether x is significantly larger than what would be expected from a hypergeometric distribution. Statistically, if we observe r genes in the significant set, P(X > r) designates the P value for the significance of r genes. Thus, P value = P (X > r) = p(r+1) + p(r+2) + p(r+3) + ... +p(x). This gene set enrichment analysis found 3 significant pathways, NOD-like receptor signaling pathway, base excision repair, and steroid biosynthesis, with p values 0.0486, 0.0385, 0.0009, respectively. NOD-like receptors play an important role in regulation of inflammatory and apoptotic responses and are activated to stress response [14]. DNA repair is a mode of action known to response to heavy metal toxicity. Steroid biosynthesis is a metabolic process that is required for recovery from an acute toxicity attack.
Gene expression variation in response to long term (13 day) cadmium toxicity HPT cells were exposed to Cd 2+ for a long period in order to determine the mechanism by which global gene expression adapted to environmental challenge. HPT cells were exposed to various concentrations of Cd 2+ , 4.5, 9 and 27 μM Cd +2 for 13 days. Lower concentration of Cd 2+ were used for the 13 day time course since 45 μM Cd 2+ caused toxicity and a loss of cell viability. There was no loss of cell viability when the cells were treated with 4.5 and 9 μM Cd 2+ over a 13 day period, while exposure to 27 μM Cd +2 caused an approximate 30% loss in cell viability [10]. The ability of the HPT cells to form domes was only affected by exposure to 27 μM Cd +2 for 13 days. The microarray data were processed and analyzed as described above for 1 day exposure. The distribution of P values for each individual gene is shown in Figure 1b. The distribution is significantly skewed to small P values, implying a large number of genes were altered by cadmium exposure. From a total of 28,720 genes, 366 were positively correlated with the cadmium concentration, and 517 were negatively correlated at a P value cutoff of 0.01 (Table 3 and 4). Similar to that observed for 1 day exposure, long term cadmium exposure altered the expression of a number of cell cycle control and cell death genes, including GSPT1, G2E3 and BCL2. The expression of metal binding and transport genes, such as CAB39L, TMC5, SLC5A3 and SLC35F2, were affected as well. However, expression levels of stress response genes such as heat shock and DNA repair genes were not changed after 13 days exposure, instead, many tumor-related genes were activated. These included RAB42, DEK, RRAD, TPD52L1 and a variety of mitogen related kinases. The GSEA analysis revealed 9 altered KEGG pathways including 3 cancer pathways, prostate cancer, melanoma and p53 signaling pathway ( Table 5). None of the pathways overlapped with the ones identified after 24 hours exposure.

The common expression patterns between 1 day and 13 day of cadmium exposure
In order to investigate genes that showed expression variation throughout both short and long term cadmium exposure, we conducted analysis for the pooled data of the two time points. Figure 2 shows a heat map for the 8 samples for 1 day and 13 day exposure. The data were normalized to a grand mean of 0 and only 901 genes with standard deviation > 1 were used for this analysis. Hierarchical clustering method that was based on Pearson dissimilarity and Ward linkage showed as expected that the HPT cells treated with high doses of Cd 2+ tended to be grouped together. The cells exposed to 27 and 45 μM Cd 2+ for both 1 day and 13 days formed a group, and all cells exposed to 9 μM or lower Cd 2+ concentration formed the other major group. The change of dissimilarity and linkage methods did not alter the results. There were 17 genes that were positively correlated with cadmium concentration for both 1 day and 13 day exposure (Figure 3a). Given that the number of positively correlated genes was 403 and 366 for 1 and 13 day respectively, the random number of overlapping genes can be modeled as a hypergeometric distribution. The probability of observing more than 17 overlapping genes from the random sets of 403 and 366 genes is 5.89E-06. Thus, the two exposure time periods had a significant number of common genes that were positively correlated with cadmium concentration. Similarly, we found 26 genes that were negative correlated with cadmium concentration for both time periods ( Figure  3b). This number of overlapping genes was statistically significant when compared to selecting 26 gene by chance (P value = 1.32E-06). Therefore, we conclude that the HPT cells have a common response component in global gene expression variation during short term and long term cadmium exposure.

Gene regulatory network in response to cadmium exposure
In response to heavy metal toxicity, thousands of genes will be activated forming a complicated gene regulatory network interacting cooperatively. To decode the complete gene network for cadmium response is impossible for the small sample size used in this study and high complexity of network structure [15]. It is also computational forbidden to construct a network by including a large number of genes. Therefore, we modeled the gene network focusing on only the 901 identified genes as having high variability shown in the heat map. Mutual information for transcriptional levels was used to quantify the relations between genes. The resulting adjacent matrix was analyzed for the presence of an active network module that has high connectivity within the module and low connectivity with other components of the network. Figure 4 shows a network modules consisting of 8 genes. DNAJA4 encodes the heat shock protein, SLC7A11 an anion amino acid transporter, AKR1B10 and AKR1C1 are subunits for a reductase associated with cancer, ASPM and KIF20A are involved in mitotic spindle functions, ZFAND2A is a zincfinger protein that is inducible by arsenite, and INHBA is known to inhibit cell proliferation and to have tumor-suppressor activity. The network modules provide hypotheses for gene interactions which can be verified in subsequent biological experiments.

Discussion
Microarray experiments have become a standard approach to simultaneously quantify the transcriptional levels of thousands genes under various conditions. However, due to the high cost of chips, microarray experiments are often carried out with a limited sample size, which requires a careful experimental design. In this study, we chose two time points, 1 day and 13 day, to account for toxicity caused by short and long cadmium exposure. For each time point, we had HPT cells exposed to 3 cadmium concentrations plus a control sample (non-treated). Thus, even with only a single replicate at each concentration, we were able to test the correlation between transcriptional levels and cadmium concentrations. Our analysis suggests that experimental design with various conditions overcomes the difficulty of lack of replications.
Exposure of HPT cells to 1 day and 13 days of cadmium induces distinct sets of genes. At 1 day, HPT cells appeared to have a stress response that activated a number of heat shock and DNA repair genes and pathways.  This was expected because stress responses that are characterized with induced expression of heat shock proteins are acute and transient that last for only a few hours to a few days [16][17][18][19][20]. This acute response to Cd 2 + with the induction of heat shock proteins, such as HSP 27, 60, 70 and 90, have been repeatedly verified in our previous studies [21][22][23][24]. The 13 days exposure was used to model the chronic response of kidney to environment of cadmium exposure. The molecular changes in the HPT cells in response to long term cadmium exposure were manifested with activation of many oncogenes and cancer pathways. The association of cancer occurrence and cadmium exposure has been established by numerous cohort studies [25][26][27]. Although 13 days of exposure might not exactly represent the chronic effect of cadmium on the kidney living environment, our experiment suggests that the cell culture system serves as a valid model for studying the long term toxicity of cadmium on the kidney.
Although the two time periods of cadmium exposure appeared to induce two distinct toxicity responses, the acute and chronic responses, they shared some common features in gene expression. The heat map showed that the samples were not separated by time of exposure, instead, the samples tended to be grouped by doses across two time points. All the HPT samples that were treated with 27 μM Cd 2+ for both 1 day and 13 days formed a cluster. When looking into the genes that were correlated with cadmium concentrations in both 1 day and 13 days, we found that the number of overlapping genes was significantly larger than what was expected from random selection. The set of overlapping genes included ion transporter proteins, kinases, and transcriptional factors. These genes should be further investigated to reveal their functions in toxicity responses.
For any biophysical process, thousands of genes interact to form a network to maintain their biological functions. There are commonly two statistical approaches to study gene interaction, gene set enrichment analysis (GSEA) and networks analysis. GSEA has been successfully applied to microarray data to identify the activated pathways. However, GSEA has been so far focused on comparing different treatment groups, mostly for two groups, treatment and control [12,13,28,29]. For this study, we developed a GSEA method that can be used for ranks of P values from any statistical tests. We have showed that our proposed GSEA statistic follows a hypergeometric distribution. Thus, the P value of the GSEA is available.
Gene networks analysis is computationally challenging. Networks are often highly complicated, consisting of a large number of genes that can be fairly dynamic in nature. Decoding such a gene network requires researchers to quantify the global gene expression under dynamic and variable conditions. The current study provides a unique opportunity in that microarray data was acquired at multiple time points and at various levels of Cd 2+ exposure. We used mutual information to account for the relations between genes. Mutual information is able to measure gene dependency without assuming data distributions and linear relationship. However, the mutual information-based network models require huge amount of computation that is not feasible for modeling all the genes. For this reason, we focused on only a set of most    variable genes across all samples. The network analysis revealed gene network modules consisting of a small group of genes that intensively interacted with each other. The network modules can be verified in molecular experiments and they serve as hypothesized models for functional relations of genes when cooperating for toxicology responses induced by cadmium.