Bmc Medical Genomics Microarray Analysis in B Cells among Siblings With/without Ms -role for Transcription Factor Tcf2

Background: We investigated if global gene expression and transcription networks in B-lymphocytes of siblings with multiple sclerosis (MS) were different from healthy siblings.


Background
MS is a complex genetic disease associated with inflammation predominantly in the white matter of brain and spinal cord. It is thought to be mediated by autoreactive T cells [1,2]. Susceptibility to MS is determined by both inherited and non-inherited factors [3]. Approximately 15-20% of MS patients have a family history of MS, but large extended pedigrees are uncommon. Studies in twins [4,5] and conjugal pairs [6] show that much of the familial clustering is the result of shared genetic risk factors.
MS susceptibility is linked to HLA-DR2 [7]. Increased risk of MS in women has been detected with interleukin-1 receptor antagonist (IL-1RA) allele 2 [8], 5G5G genotype of plasminogen activator inhibitor 1 (PAI-1) gene [9] and interaction between estrogen receptor 1 (ESR1) and HLA-DR2 [10]. B cells are implicated in MS and have been found in the cerebrospinal fluid (CSF) of MS patients [11]. Additionally, oligoclonal bands identified in CSF point to the role of B cells in MS pathogenesis [12]. Furthermore, antibody-secreting B cells contribute to tissue injury [13]. We hypothesized that B cells derived from MS patients could harbor genes that confer a higher MS risk as compared to B cell gene expression in healthy siblings.
Human B-cells have a receptor for Epstein-Barr virus (EBV) and can become immortalized after in vitro infection with EBV. Additionally, the link between EBV and MS is highly impressive [14] though inconclusive; we hypothesized that analysis of gene expression and transcription networks in EBV-transformed B cells between siblings with and without MS could yield important clues to understanding the pathology of MS.
Large-scale analyses of transcripts from peripheral blood cells or brain lesions from MS patients have created possibilities for therapeutics [15] and global gene expression analysis using microarrays is a sensitive method to investigate molecular heterogeneity [16]. In this study, we tested a new software tool that is in development, to map transcription networks in the microarray data.
Our objectives were to i) to determine if gene expression and transcription networks in B-lymphocytes of siblings with MS were different from healthy siblings in EBV-transformed B cells and ii) to validate data using qPCR techniques.

Methods
EBV-transformed B cell lines from Coriell Institute for medical research (Camden, NJ, USA) and the National Institute of General Medical Sciences (NIGMS, Bethesda, MD) were obtained for our study. As shown in Table 1, B cells were harvested from one family (# 2108, proband, affected sister) and an unaffected brother (control); cells from another family (# 2112, proband, affected brother) and three unaffected brothers (controls). Cells from the third family (# 2102) comprised of proband and an affected sister but no unaffected controls. None of the patients were on immunomodulatory agents (IMAs) to treat MS.
Cells were harvested in 4 ml of Tri-Reagent (Molecular research center, Inc) and the total RNA was isolated using standard protocols. Briefly, B-cells were incubated at 37°C and 5% CO 2 overnight, the day after cells were counted. Cells were expanded in RPMI 1640 with 2 mM Lglutamine and 15% fetal bovine serum following Coriell's lymphoblast line maintenance protocols. 2 vials were fro-zen and 1 × 10 7 cells were spun down and resuspended in 2 ml of Tri-reagent. The cell mixture was frozen at -80°C until further analysis. The RNA was isolated using the standard Tri-reagent protocol and further cleaned up using RNeasy kits (Qiagen) using a protocol that included an on column Dnase step. RNA samples were checked for quality on a Bioanalyzer and a nanodrop spectrophotometer. Samples with 28s/18s ratio > 1.8 were deemed acceptable for the microarray experiment. The total RNA (1 ug) was first converted to double stranded cDNA using the standard Codelink protocol. This was then purified on Qiaquick PCR columns and in vitro transcribed to labeled cRNA using biotin-11-UTP (Perkin Elmer). The labeled cRNA was purified on Qiagen RNeasy Mini columns and 10 ug was fragmented as per the Codelink protocol following which it was hybridized to the Codelink Human Whole Genome bioarrays for 18 h at 37°C. The arrays were subsequently washed, stained with Cy5-Streptavidin (GE Amersham) and scanned on an Axon Genepix 4000B scanner. The data were collected using Codelink Expression Analysis software and analyzed using Genespring software version 7.3.1 (Agilent). All microarray data are deposited at Gene Expression Omnibus as GSE 10064 and linked via [17].

Microarray data analysis
Values below 0.01 were set to 0.01. Each measurement was divided by the 50th percentile of all measurements in that sample. All samples were normalized to the median of the control samples. Each measurement for each gene in those specific samples was divided by the median of that gene's measurements in the corresponding control samples. The gene list (54,902 transcripts) was filtered using the cross gene error model for replicates to select genes with good signal-to-noise ratios (31, 168 genes). This gene list was subjected to a student's t-test (p-value < 0.05) and yielded 1, 417 genes with significant differences between the two conditions. This list was further filtered for confidence using a Benjamini-Hochberg false discovery rate correction to obtain 1405 genes. A 1.5-fold cut-off filter was then applied to identify genes that were prefer- entially upregulated (260) or downregulated (452) genes in samples with disease as compared to controls. These 712 genes were further filtered to only select those which were consistently 'absent" or "present" in 3 of 4 controls and 5 of 6 diseased samples. The final gene list had 705 genes; 259 upregulated and 446 downregulated genes while using 2-fold cut off resulted in 84 upregulated and 111 downregulated genes (Tables 2 & 3).

Hierarchical clustering
This gene list was subjected to a hierarchical clustering routine in GeneSpring using a Pearson correlation similarity measure across conditions and a standard correlation similarity measure across genes. The final tree is represented in Figure 1.

Probe ID Fold Change Common Genbank Description
Pathway Analysis The 1.5 fold list was further analyzed using Onto-Express and Pathway-Express, part of the Onto-Tools package available from the Draghici laboratory. This software allows for identification of pathways that show enrichment in the microarray data and is a novel tool that translates lists of differentially regulated genes into functional profiles characterizing the impact of the condition studied.
The expression levels of these 11 genes were evaluated by two-step Taqman-based RT-qPCR. First strand cDNA was synthesized with M-MLV reverse transcriptase (Ambion, Inc., Austin, TX) and primed with oligo-dT per the manufacturer's specifications. The cDNA equivalent of 20 ng starting RNA was then included in reactions containing 1× Taqman Master Mix (Applied Biosystems, Foster City, CA), and 1× gene-specific assay reagents as recommended by the manufacturer (Applied Biosystems). All reactions were run in triplicate. Reactions that did not contain template cDNA were included as negative controls.
Reaction plates were processed on an Applied Biosystems 7900 HT Sequence Detection System. The AmpliTaq Gold polymerase was activated at 95°C for 10 min followed by  [18]. GAPDH expression was used as an endogenous control to normalize expression within each sample.

Transcription network analysis
We employed a software program that one of our authors continues to develop (G.A.) to recast the matrix of regulatory interactions found in a prior step as a directed graph overlaying color information for gene responses. Our regulatory network was constructed by integrating transcription factor binding site information (from TRANSFAC) with gene expression data using graph TFs (Bioconductor functions for transcription network analysis). The standard microarray experiments cannot measure the transcription factor activities (TFAs) directly, since TFAs are subject to post-translational modifications or ligand binding in order to exert their function.
We hypothesized that decoding some of the important TFAs involved in B cells of MS patients might provide important clues to disease pathogenesis or uncover potential molecular targets for therapy. We started with an initial list of all genes changing at the 1.5-fold level. In this dual sample study, we assumed all pairs of genes moving in the same direction to be concordant and all pairs moving in opposite directions to be discordant. Combining this with the potential for regulatory relationships inferred from a TRANSFAC binding site dataset, we were able to extrapolate the potential transcription regulatory networks driving expression changes. We found that TCF2 and CDC5L bind very similar promoter motifs and seem to control overlapping sets of up and down regulated genes in this study. The transcripts we uncovered in this dataset are most notably, CCR1 and CXCR4, molecules which are associated with MS. The multiplication of corresponding cells in the TFBS and correlation matrices yields a regulation matrix where the sign of the interaction score reflects positive or negative regulation and the magnitude of the score reflects the amount of supporting evidence, as shown in the grid below ( Figure 2). We examined these networks for various thresholds of foldchange and TRANSFAC evidence scores to obtain the picture of regulation of downstream molecules by tcf2 and cdcl5 shown in Figure 3.
Hierarchical clustering of all statistically significant genes that show differential expression between diseased and normal samples Figure 1 Hierarchical clustering of all statistically significant genes that show differential expression between diseased and normal samples. Diseased Normal

Results
As shown in Figure 1, the rows represent hierarchical clustering of all differentially expressed genes between the diseased (MS) and control (normal siblings) samples. The MS samples were 8923, 8922 A, 8830, 8839, 9013 and 9016, respectively. The unaffected siblings were 9017, 9018, 9023 and 8921A, respectively. Onto-Express analyses depicted in Figure 4 shows that transcription factor binding is a highly significant component in our microarray data. Results from Pathway-Express analysis are listed in Table 4, and these show that immune function and neurospecific pathways are also highly implicated in this data set.
We selected a subset of genes that were up and down regulated in the MS samples for further validation by RT-qPCR. Our results generally confirmed the results obtained with the Codelink arrays. TCF2, CXCL10, and FUT4 were all up regulated in the MS samples whereas CDC5L, TNFRSF19 and HLA-DR were down regulated (Table 5).
For three other genes, we obtained disparate results for expression when the control and MS samples were pooled for analysis. CD83 was up regulated by RT-qPCR while down regulated by array analysis, SERPINB9 was down regulated by RT-qPCR while up regulated on the arrays, and CD9 was only minimally up regulated by RT-qPCR while clearly up regulated on the arrays (Table 6). For each of these genes, some samples were negative for expression by RT-qPCR (control sample 8921A for SERPINB9 and CD9, MS sample 8922A for CD83), and the influence of the negative values is not fully accounted for when pooling the samples prior to analysis by the ∆∆Ct method. Thus, for these genes we further evaluated the RT-qPCR data by analyzing the control and MS samples individually and then averaging, rather than pooling the samples prior to the ∆∆Ct analysis. After evaluating individual samples, we also found that some control samples were significant outliers for the CD83 and SERPINB9 genes. Control sample 8921A had a 26-fold increase in CD83 expression relative to the next highest control sample, and control sample 9018 had a 255-fold increase in SERPINB9 expression and a 156-fold increase in CD9 expression relative to the next highest samples for those genes (data not shown). To compensate for these outliers, we eliminated the highest expression samples from both the control and MS samples from the analysis and also included the negative samples. When analyzed in this way, CD83 expression in the MS samples was down regulated by 0.58-fold, SERPINB9 expression was up regulated by 2.47-fold, and Analysis of transcription networks from microarray data Figure 2 Analysis of transcription networks from microarray data.
Transcription factor network deciphered from genes differentially expressed at a 1.5 fold cutoff in microarray data Figure 3 Transcription factor network deciphered from genes differentially expressed at a 1.5 fold cutoff in microarray data. The color 'red" shows upregulation while "green" shows downregulated expression. Genes are depicted by circles while transcription factors are represented as rectangles. Green arrows indicate that the nodes at each end of the arrow are regulated in the same direction (up or down) while the red arrows connect the nodes that are anticorrelated. A.

B.
CD9 expression was up regulated 3.5-fold relative to the Control samples (Table 6) and these results are in concordance with array data for these genes.
Although we included a Taqman assay for LYPLA3 in our validation studies, we were not able evaluate expression of this gene by RT-qPCR. None of the four Control samples were consistently positive for expression of this gene by RT-qPCR, and only one of the six MS samples was consistently positive. The probe for this gene on the Codelink array is located near the 3' end of the coding sequence whereas the Taqman assay detects sequences at the 5' end Results from functional profiling of data using Onto-Express  of the coding sequence, at the junction between exons 1 and 2 (6 total exons). Thus, full length cDNA would be required to detect LYPLA3 expression with the Taqman assay we employed. Our inability to consistently detect this gene suggests that we did not generate full length LYPLA3 cDNA during the RT step -either due to mRNA instability or inefficient reverse transcription.

Additional data
A list of genes at 1.5 fold differential expression from microarray data can be found in Additional file 1.

Discussion and Conclusion
The genes involved in our analyses encode proteins involved in apoptosis, cytokine pathways and inflammation. Although up or downregulation of gene transcription is not reflected in a 1:1 translation of protein expression, the gene product generally follows gene regulation dynamics [19]. A growing number of expression profiling studies provide experimental evidence indicating the presence of a transcriptionally distinct gene pattern in MS. Much of our current knowledge of MS stems from the analysis of a mouse model, experimental autoimmune encephalomyelitis (EAE) that is thought to be similar to MS. While EAE has striking clinical and histopathological similarities to MS, it has failed in predicting the efficacy of new therapeutics [20]. Our discussion is largely restricted to those genes that had > 2.5 fold up-regulation and the two transcription factors that had the most effect on other genes. Among the total list of differ-entially expressed genes, the proportion of downregulated genes was higher than that of upregulated ones (262 vs. 81), an observation that suggests an interplay involving complex inflammatory cascades in MS.

Study limitations
The findings in our study are limited by a) sample size and b) the cellular pathways that Epstein-Barr virus (EBV) regulates remain poorly characterized and there are few data on how EBV may influence gene pathways in B cells derived from MS patients vs. controls. The ideal starting material would be to obtain native B cells from controls and MS patients; however, collection and processing bias are hard to eliminate in any microarray-based study.

Upregulated genes
The gene for CXCL10 (interferon-γ-inducible protein-10) encodes for an interferon (gamma)-induced, secreted protein of 10 kDa, a chemokine of the CXC subfamily that is one of the ligands for the receptor CXCR3. The binding of this protein to CXCR3 causes pleiotropic effects, including stimulation of monocytes, natural killer and T-cell migration, and modulation of adhesion molecule expression (UCSC web browser). Its levels are increased in cerebrospinal fluid of MS patients with symptomatic attacks of demyelination, suggesting a role for this molecule in MS [21]. Following mouse hepatitis virus (MHV) infection in mice [22], CXCL10 (IFN inducible protein 10 kDa) protein was expressed during both acute and chronic stages of disease suggesting a role for this protein in disease exacer- RE; relative expression. +/-; calculated by determining relative expression for the value obtained by adding the ∆∆Ct and ∆Ct standard deviation (RE +/-), and then subtracting the RE +/-value from the RE. Serpin B1 is a leukocyte elastase inhibitor and regulates the activity of the neutrophil proteases elastase, cathepsin G and proteinase-3. In humans, serpins constitute 10% of the plasma proteins and are known as critical regulators of both thrombotic and fibrinolytic systems. Serpins participate in the regulation of the complement cascade, angiogenesis, apoptosis and innate immunity. Most of the human clade B serpins inhibit serine and/or papain-like cysteine proteinases and protect cells from exogenous and endogenous proteinase-mediated injury. As some serpins also guard cells against the deleterious effects of promiscuous proteolytic activity [23], it is possible that the cytoprotective function is a common feature of intracellular serpin clades and we hypothesize that up-regulation of serpin B1 gene may be protective in MS. While our findings need to be validated, serpin B1 gene may represent a novel therapeutic target to ameliorate MS, considering the importance of these molecules in regulating proteolytic cascades.
We found that FUT4 gene was upregulated by 2.5 fold. It is a member of the interleukin 1 cytokine family and the protein encoded by the gene is produced by activated macrophages as a pro-protein and processed to its active form by caspase 1 (CASP1/ICE). This cytokine is an important mediator of the inflammatory response, cell proliferation, differentiation, and apoptosis. The induction of cyclooxygenase-2 (PTGS2/COX2) by this cytokine in the central nervous system (CNS) is found to contribute to inflammatory pain hypersensitivity. FUT4 mRNA is increased in Jurkat cells undergoing apoptosis.
LYPLA3 (lysosomal phospholipase A2): Lysophospholipases are enzymes that act on biological membranes to regulate the multifunctional lysophospholipids. The protein encoded by this gene hydrolyzes lysophosphatidylcholine to glycerophosphorylcholine and a free fatty acid. LYPLA3 is present in the plasma and thought to be associated with high-density lipoprotein. Cellular phospholipases are key participants in cellular transduction [24] and are thought to be involved in the pathogenesis of local and systemic inflammatory disorders [25]. Interestingly, cytosolic phospholipase A2 levels were found to be low, possibly reflecting proteolysis or inactivation of enzyme activity in brain MS lesions [26].
The gene for XCL1 was upregulated; it encodes for chemokines, a group of small (8-14 kD) molecules that regulate cell trafficking of leukocytes through interactions with a subset of 7-transmembrane, G protein-coupled receptors.
Chemokines also play fundamental roles in the development, homeostasis, and function of the immune system, and they have effects on cells of the central nervous system as well as on endothelial cells involved in angiogenesis or angiostasis. The protein product of XCL1 has chemotactic activity for neutrophils may play a role in inflammation and exerts its effects on endothelial cells in an autocrine fashion. We also found PSTPIP1 (proline-serine-threonine phosphatase interacting protein1) upregulated; its role in MS is unclear.

Downregulated genes
An important finding in our study is an 11.7-fold downregulation of eukaryotic translation initiation factor 1A, Ylinked (EIF1AY) gene. It is thought that EIF1AY encodes for minor histocompatibility antigen (mHA) and B-cell mediated antibody response to Y-chromosome encoded histocompatibility antigens (H-Y antigens) is associated with maintenance of disease remission in graft-vs-host disease [27].  [30].
The downregulation of insulin receptor tyrosine kinase substrate (BAIAP2L1) gene in B cells of MS siblings is significant. The insulin-like growth factor I receptor (IGF-IR), is a member of the receptor tyrosine kinase family of growth factor receptors. In rheumatoid arthritis (RA) it has been shown that IgG antibodies from the sera of patients can stimulate synovial fibroblasts through interaction with insulin-like growth factor receptor 1 (IGF-R1), provoking trafficking of T cells [31]. The data demonstrate, for the first time, a bridging link between B-cell activity and T-cell trafficking. In addition, they are of potential importance for the development of innovative therapeutic strategies, in which interrupting the IGF-1/ IGF-1R axis could result in sustained disease modification by affecting both the growth-factor triggered activation of fibroblasts and the accumulation of T lymphocytes. A similar link in MS has not been demonstrated but is plausible.
Other genes downregulated include ADAMTS16, BID, MIF and DAPK2, involved in the apoptosis pathway.

TCF2, diabetes and MS
Our transcription and microarray analysis showed that TCF2 was upregulated. TCF2 encodes transcription factor 2, a liver-specific factor of the homeobox-containing basic helix-turn-helix family. The TCF2 protein forms heterodimers with another liver-specific member of this transcription factor family, TCF1; depending on the TCF2 isoform, activation or inhibition of transcription of target genes occurs. We found that TCF2 upregulates the CC chemokine receptor 1 (CCR1) a molecule expressed on macrophages and T cells; in MS plaques, numerous CCR1positive infiltrating macrophages and microglial cells, associated with CCL3, were found [2] and CCR1 has been associated with newly infiltrating monocytes in MS lesions [32]. CCR1-deficient mice show an attenuated EAE course [33]. Additionally, CXCR1, a pro-inflammatory molecule is upregulated by TCF2. Furthermore, TCF2 downregulated CD28, a co-stimulatory molecule involved in T-cell stimulation and IL6R, a pro-inflammatory molecule. Taken together, it is plausible that TCF2 plays a central role in the pathogenesis of MS.
Interestingly, a mutation in TCF2 has been linked to the etiology of MODY5 (Maturity-Onset of Diabetes, Type 5) and in a recent Danish nationwide cohort study, intraindividual and, to a lesser degree, intra-familial co-occurrence was evident in MS and type 1 diabetes. The underlying mechanisms may involve both genetic and environmental factors [34]. Individuals with type 1 diabetes are more than three times more likely to develop MS than controls; in addition, the two diseases appear to be linked, albeit to a weak extent, within families. The exact mechanism of how mutations in these transcription factors cause diabetes remains unknown. To reiterate, a link between TCF2 mutation and MODY5 has been established, while type 1 diabetics have an increased risk of MS development. Our study points to TCF2 having a prominent role in regulation of other transcription factors. In summary, the role of TCF2 in MS needs to be further explored since MODY5 and TCF2 are linked and type 1 diabetes confers increased risk of MS development.
Another transcription factor that we found in our transcription analysis, CDC5L, was downregulated. It is a well-characterized pre-mRNA splicing factor and involved in cell cycle kinetics in yeast, S. pombe; its role in MS, if any, is not characterized.