Identification of protein-coding and non-coding RNA expression profiles in CD34+ and in stromal cells in refractory anemia with ringed sideroblasts

Background Myelodysplastic syndromes (MDS) are a group of clonal hematological disorders characterized by ineffective hematopoiesis with morphological evidence of marrow cell dysplasia resulting in peripheral blood cytopenia. Microarray technology has permitted a refined high-throughput mapping of the transcriptional activity in the human genome. Non-coding RNAs (ncRNAs) transcribed from intronic regions of genes are involved in a number of processes related to post-transcriptional control of gene expression, and in the regulation of exon-skipping and intron retention. Characterization of ncRNAs in progenitor cells and stromal cells of MDS patients could be strategic for understanding gene expression regulation in this disease. Methods In this study, gene expression profiles of CD34+ cells of 4 patients with MDS of refractory anemia with ringed sideroblasts (RARS) subgroup and stromal cells of 3 patients with MDS-RARS were compared with healthy individuals using 44 k combined intron-exon oligoarrays, which included probes for exons of protein-coding genes, and for non-coding RNAs transcribed from intronic regions in either the sense or antisense strands. Real-time RT-PCR was performed to confirm the expression levels of selected transcripts. Results In CD34+ cells of MDS-RARS patients, 216 genes were significantly differentially expressed (q-value ≤ 0.01) in comparison to healthy individuals, of which 65 (30%) were non-coding transcripts. In stromal cells of MDS-RARS, 12 genes were significantly differentially expressed (q-value ≤ 0.05) in comparison to healthy individuals, of which 3 (25%) were non-coding transcripts. Conclusions These results demonstrated, for the first time, the differential ncRNA expression profile between MDS-RARS and healthy individuals, in CD34+ cells and stromal cells, suggesting that ncRNAs may play an important role during the development of myelodysplastic syndromes.


Background
Myelodysplastic syndromes (MDS) are a heterogeneous group of clonal hematological disorders characterized by ineffective hematopoiesis with morphological evidence of marrow cell dysplasia resulting in peripheral blood cytopenia [1,2]. Low-risk MDS are characterized by profound anemia and transfusion dependency, and a relatively low risk of progression to acute myeloid leukemia. Refractory anemia with ringed sideroblasts (RARS) is a subtype of low-risk MDS in which an excess of iron accumulates in the perinuclear mitochondria of ringed sideroblasts in the form of mitochondrial ferritin (MtF) [3][4][5]. However, the molecular genetic basis of RARS remains unknown.
Gene expression profile of hematopoietic progenitor cells of MDS patients demonstrates the involvement of genes related to differentiation and proliferation of progenitor cells [6][7][8][9]. Furthermore, there is increasing evidence that, in certain hematological disorders, the marrow microenvironment is abnormal, both in composition and function [10]. In MDS, the adherent layer of bone marrow stroma is defective in supporting normal myelopoiesis in vitro, presenting a poor maintenance of hematopoietic stem cells [11]. Alteration of stroma components can be implicated in the modification of the development and apoptosis of hematopoietic cells [12,13].
The ENCODE project identified and characterized the transcriptionally active regions in 1% of the human genome, and described that the majority (63%) of transcripts was of long non-coding RNAs (ncRNAs). These transcripts resided outside GENCODE annotations, both in intronic (40.9%) and intergenic (22.6%) regions [14]. Non-coding RNAs are known to be involved in different biological processes such as cell survival and regulation of cell-cycle progression [15], transcriptional or post-transcriptional control of gene expression [16,17], genomic imprinting [18] and biogenesis of mature RNAs through changes in the intron-exon structure of host genes [19][20][21]. In addition, introns have been shown to be sources of short ncRNAs such as microRNAs [22] and small nucleolar RNAs [23].
Microarray technology has permitted a refined highthroughput mapping of the transcriptional activity in the human genome [24], and has revealed different expression signatures of long intronic ncRNAs in prostate, liver and kidney [25]. In addition, sets of long intronic ncRNAs were found to be responsive to physiological stimuli such as retinoic acid [26] or androgen hormone [27]. Long ncRNAs change the interpretation of the functional basis of many diseases [28,29]such as α-thalassemia [30], Prader-Willi syndrome [31], and cancer [32][33][34][35]. Recently, a myelopoiesis-associated regulatory intergenic non-coding RNA transcript has been described [36].
In the present study, expression profiles of CD34 + and stromal cells of MDS-RARS patients and healthy individuals were characterized with a 44 k combined intronexon oligoarray platform, allowing the identification of protein-coding and intronic ncRNA expression signatures in MDS-RARS patients.

Patients
Bone marrow (BM) samples were collected from 4 healthy subjects and 7 MDS patients, seen at the Hematology and Hemotherapy Center, University of Campinas. All patients were diagnosed as RARS according to the French-American-British (FAB) classification and did not present chromosomal abnormalities; they received no growth factors or any further MDS treatment. Patients' characteristics are shown in Table 1. All patients and healthy subjects provided informed written consent and the National Ethical Committee Board of School of Medical Science -University of Campinas approved the study.

CD34+ cell and stromal cell selection
BM mononuclear cells were isolated by density-gradient centrifugation through Ficoll-Paque Plus (GE Healthcare, Uppsala, Sweden), labeled with CD34 MicroBeads, and CD34 + cells were isolated using MACS magnetic cell separation columns (Miltenyi Biotec, Mönchengladbach, Germany) according to the manufacturer's instructions. The purity of CD34 + cells was at least 92% as determined by fluorescence-activated cell sorting (FACS), using anti-CD34 antibody (Caltag Laboratories, Burlingame, CA). The mononuclear cells without CD34 + , were plated onto Iscove's Dulbeccos (IMDM) (Sigma, St Louis, MO, USA) supplemented with 10% fetal bovine serum and 10% horse serum. Supernatant with non-adherent cells was removed weekly and replaced with fresh medium. When the monolayer was established (90% confluence) cells were trypsinized and plated under the same conditions. After three re-platings, a homogeneous cell population was obtained and the stromal cells were evaluated by FACS for the absence of CD34, CD45 and CD68 antigens.

RNA extraction
Total RNA was extracted with RNAspin Mini RNA Isolation Kit (GE Healthcare, Freiburg, Germany). The integrity of RNA was evaluated using Agilent 2100 Electrophoresis Bioanalyzer (Agilent Technologies, Santa Clara, CA).

Microarray experiments and data analysis
Gene expression measurements were performed using 44 k intron-exon oligoarrays that were custom-designed by the group of Verjovski-Almeida and collaborators [25] following Agilent Technologies probe specifications [37] and were printed by Agilent. The array comprises 60-mer oligonucleotide probes for 24,448 long (>500 nt) ncRNAs mapping to 6,282 unique gene loci, with genomic coordinates of the Human Genome May 2004 Assembly (hg17). Non-coding RNAs probed on the array are transcribed from either intronic regions in both the sense or the antisense strands with respect to a protein-coding gene [25] or from 1,124 intergenic regions [38]. The array also comprises 13,220 probes for the respective protein-coding genes [25]. Gene locus name annotation for intronic ncRNA is that of the protein-coding gene of the same locus; intergenic ncRNA is annotated with the name of the nearest protein-coding gene in that chromosome. All annotations were updated as of December 2009. The array design is deposited in the GEO platform under accession number GPL9193.
For each individual sample, 150 ng total RNA was amplified and labeled with Cy3 or Cy5 using the Agilent Low RNA Input Fluorescent Linear Amplification Kit PLUS, two-Color (Agilent Technologies) according to the manufacturer's recommendations. Labeled cRNA was hybridized using Gene Expression Hybridization Kit (Agilent).
Slides were washed and processed according to the Agilent Two-Color Microarray-Based Gene Expression Analysis protocol (Version 5.5) and scanned on a GenePix 4000 B scanner (Molecular Devices, Sunnyvale, CA, USA). Fluorescence intensities were extracted using Feature Extraction (FE) software (version 9.0; Agilent). A gene was considered expressed if probe intensity was significantly (p > 0.05) higher than the local background intensity, as calculated by the FE software. The software then applied local background subtraction and corrected unequal dye incorporation using the default LOWESS (locally weighted linear regression) method. We have included into further statistical analyses only those genes that were detected as expressed in all samples.
Data were normalized among the samples by quantile [39] using Spotfire DecisionSite ® for Microarray Analysis (TIBCO Software Inc, Somerville, MA, USA). Genes differentially expressed between MDS-RARS and healthy individuals were identified with the Significance Analysis of Microarray (SAM) statistical approach [40] followed by a patient leave-one-out cross validation [41], which consisted in removing one sample and determining a new set of significantly altered genes using the remaining samples. This procedure was repeated for each sample, computing the statistical significance of each gene in the various leave-one-out datasets. For both CD34 + and stromal cell samples we used the following parameters: two-class unpaired responses, t-statistic, 500 permutations. We considered as significantly altered those genes that showed a minimum fold change of 1.7 and maximum false discovery rate (FDR) of 1% for CD34 + or 5% for stromal cells among all leave-one-out datasets. Less stringent parameters (5% FDR for CD34 + or 15% for stromal cells) were used for generating a list of altered genes that was uploaded to Ingenuity Pathways Analysis (IPA) software (Ingenuity ® Systems, http://www.ingenuity.com) for identification of relevant altered gene networks. The software assigns statistical scores, taking into account the user's set of significant genes, network size, and the total number of molecules in Ingenuity Knowledge Base. The network score is the negative logarithm of p-value, which reflects the probability of finding the focus molecules in a given network by random chance. The identified network is then presented as a graph, indicating the molecular relationships between gene products. The raw data has been deposited in Gene Expression Omnibus (GEO) http://www.ncbi.nlm.nih.gov/geo/ database under accession number GSE18911.
Gene Ontology Annotation (GOA) database http:// www.ebi.ac.uk/GOA/ was used for annotating the biological processes of proteins encoded by transcripts that were statistically significantly altered and showed at least 1.7-fold change in expression levels. Intronic non-coding RNA transcripts were annotated according to the corresponding protein-coding genes transcribed from the same loci. Functional descriptions of the genes were obtained from the Online Mendelian Inheritance in Man (OMIM) database of the National Center for Biotechnology Information (NCBI) http://www.ncbi.nlm.nih.gov/.

Quantitative real-time RT-PCR
Real-time RT-PCR was performed to confirm expression levels of expression data for selected transcripts. Reverse transcription was performed using Superscript™III Reverse Transcriptase (Invitrogen Life Technologies). Primers (Table 2) were designed using PRIMER3 software (version 0.4.0) http://frodo.wi.mit.edu/primer3/ with published sequence data from the NCBI database. For real-time PCR analysis, Power SYBR ® Green PCR Master Mix (Applied Biosystems) was used, according to the manufacturer's instructions, and the reactions were run in a 7500 Real-Time PCR Systems (Applied Biosystems) for 40 cycles. Each sample measurement was performed in triplicate and a negative control, "No Template Control", was included for each primer pair. Expression of transcripts was normalized to the HPRT endogenous gene, and the relative expression was calculated as 2 -ΔΔCT , where ΔΔCT is the C T value difference for each patient normalized by the average C T difference of samples from healthy subjects (ΔΔCT method) [42].

Protein-coding and non-coding transcripts expression profiles in CD34 + cells
CD34 + cells obtained from 4 patients with MDS-RARS (nos. 1-4; Table 1) were compared with CD34 + cells of healthy individuals using custom-designed combined intron-exon expression oligoarrays [25]. The oligoarray includes probes for protein-coding genes and for both sense and antisense strands of ncRNAs, as described in the Methods. In order to reduce the effect of individual variability, thus promoting the identification of a robust gene expression signature of MDS-RARS, Significance Analysis of Microarrays (SAM) [40] was combined with a patient leave-one-out cross validation [41]. A total of 216 significantly (q-value ≤ 0.01) differentially expressed transcripts between MDS-RARS patients and healthy individuals were identified (Figure 1), being 129 downregulated and 87 up-regulated in MDS-RARS (Additional file 1). Interestingly, 65 differentially expressed transcripts were ncRNAs, 32 down-regulated and 33 up-regulated (Table 3). Differentially expressed protein-coding genes were related to cell adhesion, apoptosis, ion transport and regulation of transcription (Table 4). Six protein-coding genes, namely ABCB7, EBF1, IFI30, IL10RA, NR4A2 and VEGF, have been previously shown in the literature as differentially expressed in MDS-RARS [7,[43][44][45]. In addition, we identified a number of protein-coding transcripts not previously described as altered in MDS-RARS.
Ingenuity Pathways Analysis (IPA) was used for identifying enriched gene networks and functions among the differentially transcribed protein-coding genes. We identified 11 relevant networks that were significantly    1 Gene locus name for intronic ncRNA is that of the protein-coding gene of the same locus; intergenic ncRNA is annotated with the name of the nearest protein-coding gene in that chromosome. 2 Minimum significance among all patient Leave-one-out analyses Table 3:

ncRNAs with significantly altered CD34 + expression in MDS-RARS patients in relation to healthy individuals. (Continued)
enriched (p-value < 0.001) with genes belonging to the MDS-RARS gene expression signature in CD34 + cells (Additional file 2). Figure 2 shows a gene network involved in hematological system development and function, humoral immune response and tissue morphology. Eight transcripts were chosen to validate microarray data by real-time RT-PCR. RNA extracted from CD34 + cells from 5 MDS-RARS patients (nos. 2-6; Table 1) was calculated as fold change compared with healthy controls. All transcripts were confirmed by real-time RT-PCR ( Figure 3).

Protein-coding and ncRNA transcript expression profiles in stromal cells
Stromal cells obtained from three MDS-RARS patients (nos. 4-6; Table 1) were compared with stromal cells of healthy individuals using the same custom-designed combined intron-exon expression oligoarrays and significance analysis. SAM combined with patient leave-oneout cross validation identified 12 significantly (q-value ≤ 0.05) differentially expressed genes (10 up-regulated and 2 down-regulated in stromal cells of MDS-RARS patients) (Figure 4; Additional file 3), of which 3 were ncRNAs (up-regulated in MDS-RARS patients) ( Table 5). The low number of differentially expressed genes was mostly due to the high homogeneity of stromal cells from patients and donors (correlation coefficient between all donor and patient stromal samples = 0.93, contrasted to 0.9 of CD34 + cells, p = 10 -5 ). The signature expression profile of protein-coding transcripts in MDS-RARS stromal cells revealed genes related to several biological processes, such as cell motility, DNA replication, protein amino acid phosphorylation and protein transport ( Table  4).
Ingenuity Pathways Analysis (IPA) of stromal cell genes statistically differentially expressed between MDS-RARS patients and healthy individuals (fold change ≥ 1.7; qvalue ≤ 0.15 in all patient leave-one-out cross-validation analyses) identified two significantly enriched gene networks (p-value < 0.001) (Additional file 2). The most significantly enriched gene network involves genes related to cell morphology, cellular compromise, and neurological disease ( Figure 5).
Four transcripts were chosen to validate microarray data by real-time RT-PCR. RNA extracted from stromal cells from 5 MDS-RARS patients (nos. 3-7; Table 1) was used for validation studies; all transcripts were confirmed by real-time RT-PCR ( Figure 6).

Discussion
In this study, we used the 44 k intron-exon oligoarray and stringent statistical criteria to determine the protein-coding and intronic non-coding transcript expression profiles in CD34 + and stromal cells of MDS-RARS patients and healthy individuals. We herein validated the expression of a set of selected transcripts by real time RT-PCR in five MDS-RARS patients, however future confirmation in a larger group of MDS-RARS cases is warranted. Pathway analyses of differential protein-coding transcripts pointed to new genetic networks that are altered in both . Each row represents a single gene probe (151 protein-coding and 65 ncRNAs) and each column represents a separate CD34 + donor sample. Donor samples were clustered according to the correlation of expression profiles using the Unweighted Pair-Group Method, which resulted in two homogenous groups: MDS-RARS patients (4 columns at left) and healthy individuals (4 columns at right). Expression level of each gene is represented by the number of standard deviations above (red) or below (green) the average value for that gene across all samples. In MDS-RARS patients, a total of 87 genes were up-regulated and 129 down-regulated.

CD34 + and stromal cells of MDS-RARS patients (Additional file 2).
MDS are characterized by hematopoietic insufficiency associated with cytopenia, leading to severe morbidity in addition to increased risk of leukemia transformation [46]. The exact stage of CD34 + progenitor cells involved in the process of MDS and transformation to AML are still in debate. Bone marrow microenvironment contributes to regulate self-renewal, commitment, differentiation, proliferation and the dynamics of apoptosis of hematopoietic progenitors [47], and CD34 + progenitor cells are known to be severely impacted in MDS by the composition of micro environmental stimuli [48]. Detection of differentially expressed transcripts in MDS-RARS stromal cells suggests that these transcripts could contribute to maintain CD34 + cells.
One of the mechanisms that contribute to hypercellular marrow and peripheral blood cytopenia of patients with early stage MDS is the significant increase in apoptosis of hematopoietic cells [51]. The higher expression of Fas-FasL system found in MDS plays a role in inducing MDS bone marrow apoptosis and works in both an autocrine (hematopoietic cell-hematopoietic cell interaction) or paracrine (hematopoietic cell-stromal cell interaction) pattern [52]. The protein encoded by SEMA3A (Class 3 semaphorins), a secreted member of the semaphorin family involved in axonal guidance, organogenesis, angio-genesis, and highly expressed in several tumor cells [53,54], has recently been demonstrated to be an important determinant of leukemic cells sensitivity to Fasmediated apoptosis signal [55]. Furthermore, Sema3A has already been described to act through different signaling pathways to control neural progenitor cell repulsion activating Erk1/2 or apoptosis process involving p38MAPK [56]. Surprisingly, SEMA3A is present in both affected networks of MDS-RARS stromal cells (see Additional file 2), suggesting participation of this gene in diverse abnormalities implicated in the modification of hematopoietic cells development and apoptosis in MDS [12,13].
The non-coding expression profiles of CD34 + and stromal cells of MDS-RARS were clearly distinct from  1 Gene locus name for intronic ncRNA is that of the protein-coding gene of the same locus; intergenic ncRNA is annotated with the name of the nearest protein-coding gene in that chromosome. 2 Minimum significance among all patient Leave-one-out analyses . Each row represents a single gene (9 protein-coding genes with names in black, and 3 ncRNAs in blue) and each column represents a separate stromal donor sample. Donor samples were clustered according to the correlation of expression profiles using the Unweighted Pair-Group Method, which resulted in two homogenous groups: MDS-RARS patients (3 columns at right) and healthy individuals (4 columns at left). Expression level of each gene is represented by the number of standard deviations above (red) or below (green) the average value for that gene across all samples. In MD-RARS patients, a total of 10 genes were up-regulated and 2 downregulated.
those obtained from CD34 + and stromal cells of healthy controls, representing 30% and 25% of the total amount of differentially expressed genes in CD34 + and stromal cells of MDS-RARS patients, respectively. Currently, evidence of the biological roles played by ncRNA have increased, especially those transcribed from partially conserved introns of protein-coding genes [57]. Recently, eosinophil granule ontogeny (EGO) has been shown to involve an ncRNA expressed during IL-5 stimulation, whose function is to regulate MBP granule protein and EDN mRNA levels [58].
Interestingly, our results showed 13 differentially expressed ncRNA transcripts in CD34 + cells of MDS-RARS patients for which there was a simultaneous change in expression of the protein-coding gene in the corresponding locus: for 7 of them both the ncRNA and the protein-coding gene were simultaneously down-regulated in MDS-RARS, 5 were up-regulated, and in one gene locus the TIN ncRNA was up-regulated whereas the protein-coding gene was down-regulated. Expression of both, protein-coding and non-coding pairs in the same locus, suggest that these intronic ncRNAs may act upon cis-regulatory factors, modulating the stability and/or processing of the corresponding protein-coding transcript, or even directly affecting the levels and/or the splicing of protein-coding isoforms [27,59]. We found 2 altered genes of the nuclear receptor subfamily 4, group A (NR4A2 e NR4A3), known to be involved in T-cell apoptosis, brain development, and vascular disease [60], and both showed a simultaneous up-regulation of the protein-coding and the ncRNA from the same locus in MDS-RARS, suggesting that these ncRNAs could be involved in the control of protein coding expression of this gene family in MDS-RARS patients.

Conclusion
The presence of intronic ncRNA transcripts differentially expressed in CD34 + and stromal cells may shed light upon the not yet fully understood molecular mechanisms involved in the heterogeneity of myelodysplastic syndromes and suggest that ncRNAs may play a role during disease development. Characterization of those ncRNA transcripts would contribute to a better understanding of MDS-RARS, or even towards the development of biomarkers and therapeutic targets.