RNA sequencing from human neutrophils reveals distinct transcriptional differences associated with chronic inflammatory states

Background The transcriptional complexity of mammalian cells suggests that they have broad abilities to respond to specific environmental stimuli and physiologic contexts. These abilities were not apparent a priori from the structure of mammalian genomes, but have been identified through detailed transcriptome analyses. In this study, we examined the transcriptomes of cells of the innate immune system, human neutrophils, using RNA sequencing (RNAseq). Methods We sequenced poly-A RNA from nine individual samples corresponding to specific phenotypes: three children with active, untreated juvenile idiopathic arthritis (JIA)(AD), three children with the same disease whose disease was inactive on medication (CRM), and three children with cystic fibrosis (CF). Results We demonstrate that transcriptomes of neutrophils, typically considered non-specific in their responses and functions, display considerable specificity in their transcriptional repertoires dependent on the pathologic context, and included genes, gene isoforms, and long non-coding RNA transcripts. Furthermore, despite the small sample numbers, these findings demonstrate the potential of RNAseq approaches to biomarker development in rheumatic diseases. Conclusions These data demonstrate the capacity of cells previously considered non-specific in function to adapt their transcriptomes to specific biologic contexts. These data also provide insight into previously unrecognized pathological pathways and show considerable promise for elucidating disease and disease-state specific regulatory networks. Electronic supplementary material The online version of this article (doi:10.1186/s12920-015-0128-7) contains supplementary material, which is available to authorized users.

Conclusions: These data demonstrate the capacity of cells previously considered non-specific in function to adapt their transcriptomes to specific biologic contexts. These data also provide insight into previously unrecognized pathological pathways and show considerable promise for elucidating disease and disease-state specific regulatory networks.

Background
Neutrophils are the most common leukocytes in the human circulation and an important sentinel for recognizing invading micro-organisms and tissue damage. Thus, they are an important component of the acute response to infection and tissue injury. However, in recent years, we have also demonstrated that neutrophils show transcriptional aberrations in chronic childhood inflammatory diseases, including juvenile idiopathic arthritis (JIA) [1] and juvenile dermatomysositis [2]. In JIA, these transcriptional aberrations do not correct with therapy [3] and are associated with specific perturbations in cellular metabolic function [1]. Thus, in addition to their role in acute infectious and inflammatory disease, neutrophils appear to play important roles in chronic, indolent human inflammatory diseases.
The gene expression data used to elucidate the above findings were generated using conventional hybridizationbased gene microarrays. The limits of hybridizationbased microarrays are well documented [4]. Furthermore, hybridization-based arrays fail to capture the full complexity of the transcriptome, including novel alternatively spliced isoforms and non-coding RNAs. Therefore, gene microarrays have serious limits from the standpoint of understanding the transcriptional-rewiring [5] that very likely underlies many complex human diseases.
RNA sequencing techniques carry the promise of revolutionizing our understanding of the transcription processes that underlie phenotypes [6]. As data from projects like ENCODE [7] reveal the complexities of the transcriptome in eukaryotic cells, it is becoming clear that, in order to fully understand human pathological cellular networks, we are going to need more detail of the transcriptional events that underlie disease phenotypes.
Neutrophils are a particularly challenging cell with which to work. The presence of endonucleases within human neutrophils, a part of the host defense against bacteria [8], presents particular challenges to preparing high-quality nucleic acid for sequencing studies. Neutrophils are thus conspicuously absent from both the ENCODE and Roadmap Epigenomics data sets. The studies we report here were undertaken to determine the specificity of neutrophil transcriptomes to specific human illnesses or disease states, a prerequisite for biomarker development, by examining specific phenotypes that show subtle differences from one another.

Patients and patient samples
Neutrophils were collected from nine children after informed consent was obtained from their parents according to a protocol approved by the University of Oklahoma Health Sciences Center Institutional Review Board. Three of the samples were from children (ages 5-10 years, all girls) with newly-diagnosed, untreated polyarticular juvenile idiopathic articular arthritis (JIA). Samples were also obtained from 3 patients; also girls aged 5-10, who fit criteria for clinical remission on medication (CRM). That is, these children had normal physical exams, no symptoms of arthritis (morning stiffness, gait disturbance, fatigue) and normal laboratory studies (complete blood counts, erythrocyte sedimentation rate) and had maintained this state for at least 6 continuous months. In addition, a control population consisting of 3 children with cystic fibrosis (CF) (ages 6-21 years, all boys) was also studied. The latter group is an important and seldom used-control; children with CF have chronic, indolent inflammation in the lung, and thus allow us to discern disease-specific characteristics in JIA from those that might be seen in any chronic, sub-acute inflammatory state. Children with CF were seen during routine followup and were stable from the standpoint of pulmonary symptoms at the time they were studied.

Cell isolation
Whole blood was drawn into 10 mL CPT tubes (Becton Dickinson, Franklin Lakes, NJ), which is an evacuated blood collection tube system containing sodium citrate anticoagulant and blood separation media composed of a thixotropic polyester gel and a FICOLL™ Hypaque™ solution. Cell separation procedures were started within 1 h from the time the specimens were drawn. Neutrophils were separated by density-gradient centrifugation at 1,700× g for 20 min. After removing red cells from neutrophils by hypotonic lysis, neutrophils were then immediately placed in TRIzol® reagent (Invitrogen, Carlsbad, CA) and stored at −80°C until used for RNA isolation. Cells prepared in this fashion are more than 98 % CD66b + by flow cytometry and contain no contaminating CD14+ cells, as previously reported [9]. Thus, although these cell preparations contained small numbers of other granulocytes, they will be referred to here as "neutrophils" for brevity and convenience.

RNA isolation and sequencing
Total RNA was extracted using Trizol® reagent according to manufacturer's directions. RNA was further purified using RNeasy MiniElute Cleanup kit including a DNase digest according to the manufacturer's instructions (QIA-GEN, Valencia, CA). RNA was quantified spectrophotometrically (Nanodrop, Thermo Scientific, Wilmington, DE) and assessed for quality by capillary gel electrophoresis (Agilent 2100 Bioanalyzer; Agilent Technologies, Inc., Palo Alto, CA). Single-end cDNA libraries were prepared for each sample and sequenced using the Illumina TruSeq RNA Sample Preparation Kit by following the manufacture's recommended procedures and sequenced using the Illumina HiSeq 2000. Library construction and RNA sequencing were performed in the Columbia Genome Center in Columbia University Medical Center.

Data processing and analysis
The short reads were mapped to the reference genome (Human: NCBI/build37.2) using TopHat (version 2.0.4) [10] with 4 mismatches (−−read-mismatches = 4) and 10 maximum multiple hits (−−max-multihits = 10). Transcripts were assembled and the relative abundance (aka expression level) of genes and splice isoforms were estimated using Cufflinks in "fragments per kilobase of exon model per million mapped reads" (FPKM) [11]. (version 2.0.2) with default settings. Differential expression genes and exomes were tested using DEseq. To define significantly differential expression genes/exomes, we used a p-value < 0.05 as the cutoff. The Database for Annotation, Visualization and Integrated Discovery (DAVID), v6.7, (http://david.abcc.ncifcrf.gov/home.jsp) was used for Gene Ontology (GO) analysis.

Ingenuity Pathway Analysis (IPA)
To identify upstream regulators of the differentially expressed genes between AD and CRM or between AD and CF, we used IPA software (Ingenuity Systems, Redwood City, CA). Gene symbols were used as identifiers and the Ingenuity Knowledge Base gene set as a reference for a pathway analysis. Identification of upstream transcription regulator was assessed using IPA where the activation or inhibition of a transcription regulator was determined from expression patterns of the transcription factor and its downstream-regulated genes within the differentially expressed list. The absolute value of the z-score ≥ 2.0 was considered statistically significant with a positive value indicating activation and a negative values indicating inhibition of the transcription factor.
Differentially expressed genes and LncRNA expression validation by quantitative real-time RT-PCR Total RNA was reverse transcribed with iScript™ cDNA synthesis kit according to the directions of the manufacturer (Bio-Rad, Hercules, CA, USA). Real-time RT-PCR was performed using SYBR Green reagents on a StepOne Plus (for the testing group; Applied Biosystems, Foster City, CA, USA) as described previously [3]. Gene-specific amplification was confirmed by a single peak in the ABI Dissociation Curve software. Average threshold cycle (Ct) values for GAPDH (run in parallel reactions to the genes of interest) were used to normalize average Ct values of the gene of interest. These values were used to calculate averages for each group, and the relative ΔCt was used to calculate fold-change values between the groups. The nucleotide sequences of the primers are listed in Table 1. All primers applied were tested to display an efficiency of amplification approximate 98 % (±SD 4.65 %).

An overview of RNA-seq data
In this study, we conducted genome-wide RNA sequencing for 9 individual samples corresponding to specific phenotypes: 3 children with active, untreated juvenile idiopathic arthritis (JIA), a common chronic disease in children characterized by inflammation and hypertrophy of synovial membranes [12,13]. These subjects will be referred to as AD. We also studied 3 children with the same disease whose disease had been inactive and stable for 6 continuous months on the anti-inflammatory, immunosuppressive drug, methotrexate. The children were described as being in clinical remission on medication (CRM) as defined by accepted international criteria [14]. Finally, we studied 3 children with cystic fibrosis (CF), an autosomal recessively inherited disorder characterized by chronic, indolent inflammation in the lungs. For all 3 AD samples, we generated an average of 19.35 million 101 bp reads per sample, and average number of reads mapped to the genome was 13.59 millions. For the CRM samples, the average number of reads per sample was 20.82 million, and average number of reads mapped to the genome was 14.34 million. For the CF samples, an average number of reads per sample was 20.53 million, and the average number of reads mapped to genome was 16.84 millions ( Table 2). There were no significant differences in the number of reads between AD and CRM or CF samples. The sequencing performance and library quality was further assessed using RNA-SeQC v1.1.7 [15]. The results show that 76.8 % of reads mapped to known exons, and~18.7 % mapped to intronic regions. These statistics indicate that the sequencing data is robust (Table 3).
Long non-coding RNAs (lncRNAs) are defined as transcripts of greater than 200 nucleotides without evident protein coding function [18]. Thus, lncRNA is a broad definition that encompasses multiple different classes of RNA transcripts, including enhancer RNAs, small nucleolar RNA (snoRNA) hosts, intergenic transcripts, and transcripts overlapping other transcripts in either sense or antisense orientation. So far, only a few RNA-Seq studies have detected or analyzed lncRNAs [19]. We took advantage of our next generation RNA-Seq data to identify all lncRNA expressed in neutrophils. We mapped the RNA-seq data to a comprehensive compendium of long non-coding RNAs (www.LNCipedia.org, version 2.0). The current version of this long non-coding RNA database contains 32,183 human annotated lncRNAs [20]. We found 2981 lncRNAs were expressed in neutrophils in at least one of nine subjects (FPKM ≥ 1) (Additional file 3: Table S3). As expected, these non-coding transcript FPKM values were lower than protein coding genes (Fig. 1). Multiple studies have shown that lncRNA expression details vary for different classes [21,22] and there is evidence that lncRNAs participate in multiple networks regulating gene expression and function [23]. LncRNAs also interact with multiple proteins, enabling scaffolding functions and combinatorial control [24]. For example, the recently identified lincRNA-Cox2 mediates both the activation and repression of distinct classes of immune genes [25]. However, the in vivo functions of most We used Cufflinks to assemble transcripts and estimate the abundance of isoforms. There are 31,046 isoforms expressed in neutrophils in at least one of nine subjects with FPKM value >0 (Additional file 4: Table  S4). While the genes have several alternatively spliced transcripts, their isoforms are not expressed at equivalent levels. For most genes, one isoform is expressed more highly than others (data not shown), a finding compatible with what has been reported in B cells [11].

Identification of differentially expressed genes between phenotypes
Although neutrophils are typically regarded as nonspecific mediators of inflammatory responses, data from hybridization-based gene expression arrays suggest that there may be subtle differences in neutrophil transcriptomes that correlate with human disease phenotypes [26]. We used DESeq [27] to test the differential expression of genes comparing the three phenotypes. When the expressed genes are defined as FPKM ≥1 in all 3 replicates, there are 8597 expressed genes in the AD neutrophils, 8668 expressed genes in CRM neutrophils and 8102 expressed genes in the CF neutrophils, which were aligned to the reference genome (data no shown).
One hundred fifty-nine genes showed differential expression when children with active juvenile arthritis (the AD group) were compared to children with sustained, inactive disease on medication (the CRM group -Additional file 5: Table S5). Twenty of these genes were expressed in higher levels in AD and 136 were expressed in lower levels in AD compared with CRM (top 10 up and down regulated DE genes in Table 5). There are two genes (ADARB2, C20orf134) only expressed in AD and one gene, FLJ31813 was expressed only in CRM. The ADARB2 gene (RNA-specific adenosine deaminases, B2) is associated with brain tumors [28] and a single nucleotide polymorphism (SNP) in ADARB2 is associated with metabolic disorders [29]. An inflammatory/immune function for ADARD2 is not known. Genes showing lower levels of expression in AD included ICAM-1, IL-1B, CCR1, IFIH1, SOCS1, TNFAIP3 and TNFSF13B, which are strongly associated with acute and chronic inflammation and adult rheumatoid arthritis. For example, SOCS1 was upregulated in the synovial membranes from patients with RA when compared with osteoarthritis [30]. Interferoninduced proteins (IFI35, IFI44, IFI44L, IFI6, IFIH1, IFIT2,  IFIT3, IFIT5, IFITM1, IFITM3) also show lower levels of expression in neutrophils of children with AD, suggesting that attenuated interferon responses may be an underlying aspect of the disease. In support of this idea was the finding that Type I IFN response genes OAS1, OAS2 and OAS3, were also expressed in lower levels in AD samples compared with CRM. This finding corroborates studies that have shown that, at least in adult patients with rheumatoid arthritis, therapies with anti-TNF antibody, a common treatment for severe forms of the disease, induce significant increases in type I IFN response gene activity [31]. Other genes expressed in lower levels in AD  are associated with immune responses or responses to viruses. The C5orf56 gene also showed lower expression in AD. This gene is of special interest, as, in a recent GWAS for the juvenile arthritis phenotype studied here, SNPs within the C5orf56 region had one of the strongest associations with genetic risk among the regions identified [32]. The PAM gene (Peptidylglycine alpha-amidating monooxygenase) was up-regulated in AD. PAM cleaves immature adrenomedullin (an antiapoptotic peptide) to the mature peptide. PAM also expressed rheumatoid arthritis fibroblast-like synoviocyte and osteoarthritis fibroblast-like synoviocyte [33]. In order to characterize the differentially expressed genes, we conducted a Gene-Ontology analysis using DAVID as described above. Unsurprisingly, the top three functional groups enriched in differentially expressed genes are: (a) immune response (p-value = 4.34E-13, FDR = 7.06E-10), including 28 differentially expressed genes; (b) response to viral infection (p-value = 4.33E-10, FDR = 7.04E-7), including 12 genes, and (c) host defense responses (p value = 1.65E-4, FDR = 0.2682), including 15 genes (Fig. 2).
We further examined the specificity of neutrophil transcriptomes by examining another phenotype characterized by chronic, soft tissue inflammation. CF is an autosomalrecessively inherited disease caused by mutations in the CFTR gene that lead to abnormal ion transport in respiratory epithelial cells [34,35]. These abnormalities are associated with chronic, indolent inflammation in the small airways and lung parenchyma, due, in part, to chronic infection/colonization with pseudomonas aerigninosa as well as other bacteria [36]. Lungs of affected patients show a characteristic neutrophilic infiltrate and high concentrations of tumor necrosis factor-alpha (TNF-a), which amplifies the inflammatory process by stimulating release of IL-1, IL-6, IL-8 [37,38]. CF thus presents a good control and comparison group to children with JIA, as it allows us to identify transcriptional reorganization that is diseasespecific and distinguish it from changes that are generic to chronic inflammation in soft tissues.
Because the CF patients consisted entirely of males, while the JIA patients were females, we excluded genes on either sex chromosome that showed differential expression in the comparison of CF and JIA patients. There were 113 genes that showed differential expression with at least 1.9-fold change (p < 0.05) when neutrophils from children with untreated arthritis were compared with those from children with CF (Additional file 5: Table S5). Forty of these gene transcripts were found in greater abundance in the neutrophils from children with untreated arthritis and 70 were found in higher in the neutrophils from CF (top 10 up and down regulated DE genes in Table 6). There are two genes (MYH6, PPP4R4) expressed only in CF and gene ELOVL2 expressed only in AD. Predictably, many of the DE genes are involved inflammatory responses (KLRG1, CCR5, C4A, CCR4, CFD, SPP1) and immune responses (CCR5, C4A, IFITM2, CCR4, MSH2, FCGR2C, TREM1, CFD). Interestingly, there are ten DE genes that are associated with cell cycle regulation (CDKN1A, PLK3, ZC3HC1, MSH2, CYP26B1, KIF20B, CENPV, G0S2, CDK6, AHR).
To confirm the differences in gene expression between AD and CRM observed in the RNA-seq experiments, we performed real-time qRT-PCR. Ten genes that showed significant differentially expressed between AD and CRM in the RNA-seq analysis (DDX60, IFH1, IFITM3, IGHMBP, MOV10, OAS1, PML, RNF213, TNFAIP6, TRIM5) were analyzed by real-time qRT-PCR in an independent patient cohort. Table 1 shows that seven of ten genes differentially expressed in the RNAseq analysis were also differentially expressed in the real-time qRT-PCR.

Sample reproducibility
To assess reproducibility of gene expression of biological represents in neutrophils, Principal Component Analysis (PCA) and sample pairwise correlation coefficient calculation are performed to obtain an overview of gene expression for the three conditions (AD, CRM, CF). A Genes were classified into three groups, high-expression (25 %), mediumexpression (50 %) and low expression (25 %). Gene ontology analysis was performed using DAVID. Top 20 categories for each group was presented here PCA scatter plot (Fig. 3a) and heat map of heat map of sample distance based on Jensen Shannon entropy over gene counts (Fig. 3b) show that the three groups (AD, CRM, CF) formed three distinct clusters. While the sample designated AD1 appears to be somewhat distinct in the heat map, this sample still clusters closest with the other 2 AD samples. This sample was from a 10 year old girl, while the other 2 were from 5 and 6 year old girls. It is possible that early pre-pubertal changes may have had an effect on gene expression in this patient. Thus, although the differences in the three phenotypes are relatively subtle from an immunologic standpoint, neutrophils in each show subtle specificity that suggests fine-tuning of the transcriptome in response to specific inflammatory environments, or, in the case of JIA, as a result of immune suppressive/anti-inflammatory therapies.

Transcription factor networks
A long-term goal of our work is to understand how cellular networks are perturbed in chronic inflammatory diseases and the effects of therapy on restoring normal transcriptional wiring. We therefore undertook network analysis of RNA expression in the samples we had subjected to RNAseq. Upstream regulator analysis in IPA allowed the identification and determination of the state of activation of upstream regulators that might be responsible for the observed gene expression changes between different groups. We identified 2 transcription factors (TFs), GFI1 and NUPR1, from 131 differentially expressed genes between AD and CF, which were identified as being inhibited. GFI1 regulates genes CDKN1A, EGR3, PCOLCE2, PLOD1 and SPP1; while NURP1 regulates genes FLVCR1, FUCA1, HBEGF, PLK3 and TMEM158. Analysis of the 159 genes differentially expressed between AD and CRM identified 121 TFs. Twelve of the 121 TFs (EGR1, BRCA1, IRF1, IRF7, IRF3, IRF5, NFATC2, NFKB1, SMARCB1, STAT1, STAT2 and STAT3) were identified as being inhibited (Fig. 4a), and 4 TFs (GFI1, MYC, NKX1-3 and TRIM24) were identified as activated (Fig. 4b). These TFs regulate more than 5 genes in a single overlapping regulatory network (Fig. 4). Signal Transducer and Activator of Transcription (STATs) have numerous functions in innate immunity [39,40]. Specifically, STATs have key functions for neutrophils and regulate gene transcription by alternative proteolytic processing [41,42]. NF-κB is also included is this network. The NF-κB pathway is induced by a wide variety of stimuli, including cytokines such as the tumor necrosis factor-alpha and interleukin-1β, both of which are the targets of biologic therapies used to treat rheumatoid arthritis and juvenile idiopathic arthritis [43,44]. The involvement of the oncogenes MYC in  [26].

Differential usage of exons
We applied DEXSeq [45] to test for differential exon usage in RNA-seq data. The DEXSeq analysis revealed that there were 204 significantly differential exon usages (p < 0.01) between AD and CRM (Additional file 6: Table  S6). These differential exon usages were distributed in 179 genes. Of 179 genes demonstrating differential exon usage, 7 genes (BTN3A2, IFI44L, IFIT3, LOXHD1, PAM, RNF213, SH3RF3) also showed differential expression between AD and CRM at the gene level. We also found 838 genes that demonstrated significant differential exon usage (p < 0.01) when we compared AD and CF (Additional file 7: Table S7). Differential exon usage was distributed in 678 genes. Of 678 genes demonstrating differential exon usage, 6 genes (IFITM2, LOXHD1, MSH2,, RPH3A, SH3RF3, ZNF107) also showed differential expression between AD and CF at the gene level.  The comparison of AD vs. CRM reveals a neutrophil response to therapeutic intervention and thus may be somewhat artificial. In contrast, the CF and AD situations are rather similar (chronic inflammation in soft tissues), and subtle re-organization of the transcriptome in these parallel but slight different scenarios is actually quite interesting.

Differentially expressed lncRNA
As noted above, we identified 2981 lncRNAs expressed in neutrophils in at least one of nine subjects. We applied differential expression analysis of all the lncRNAs that were expressed in all the libraries using DESeq. The analysis revealed there are 38 lncRNAs that showed differential expression when the AD group was compared with CRM. There were 30 lncRNAs that showed detectable differences in expression between AD and CF. There were 63 DE lncRNAs in CRM vs CF comparison (Additional file 8: Table S8). These results show that lncRNA expression is disease and disease-state specific.
To validate the RNA-seq data, we chose randomly 7 of 38 DE lncRNAs between AD and CRM, and performed real-time PCR in an independent patient cohort. Table 1 shows that 7 lncRNAs differentially expressed in the RNA-seq analysis were also differentially expressed in real-time PCR. It is interesting to note that two of these lncRNA, lncIFITM-4 and lncPML-1, are adjacent to differentially-expressed genes (IFITM3 and PML, respectively), suggesting that these transcripts may act directly in to regulate gene expression in neutrophils and fine-tune transcriptional responses to specific inflammatory/disease states. As the functions of lncRNAs are largely unknown, an approach for inferring putative functions of long ncRNAs is to examine protein-coding genes located near ncRNAs of interest [46,47]. We examined the expression pattern of paired neighbor protein-coding genes of differential expressed lncRNAs between AD and CRM. Interestingly, we found that six neighbor protein-coding genes (CHSY1, IFITM3, LILRA5, PGM5, PML, ZCCHC2) of these DE lncRNAs were also differentially expressed at same direction. Of these 5 genes, IFITM3 mRNA and PML mRNA are known to be upregulated in labial minor salivary glands and associated with in primary Sjogren's syndrome [48].

Discussion
We have demonstrated the feasibility of preparing highquality RNA from human neutrophils in sufficient quantity to perform RNA-Seq in the context of different human phenotypes. Furthermore, we have demonstrated that such studies can be undertaken even with the relatively small amounts of human blood available for translational studies in children. Finally, we demonstrate that neutrophil transcriptomes show subtle variations that correspond to specific human phenotypes and inflammatory conditions. Neutrophils are a critical cell in human defense, and depletion of neutrophils from peripheral blood, whether as a consequence of therapeutic efforts or human disease, has almost immediate adverse consequences [49]. Furthermore, neutrophils play a critical role in their ability to direct and instruct the adaptive immune system [50]. Despite the importance of these cells, they are conspicuously absent from the both ENCODE and Roadmap Epigenomics data sets. Thus, we know little of the functional genomics of these cells and how neutrophil genomes adapt to regulate transcription in response to external signals and disease states.
Like most leukocyte genomes, neutrophils show substantial transcriptional complexity. Although most transcripts were expressed at low levels (compared with lymphocytes, for example) we found that more than 7700 genes, or about 30 % of all the known protein-coding genes, were detected in all nine samples. Furthermore, isoform usage was extensive, with more than 9500 detected in each of the three different phenotypes. While extensive RNA splicing is known to characterize adaptive immune responses [51], these findings suggest that even neutrophils carry the capacity for supple, threat-specific adaptations to host injury or infection. The latter point is corroborated by the subtle differences in the transcriptomes of the three different childhood phenotypes that we studied.
Because of the differences between the three different phenotypes, our studies suggest that RNAseq may be a substantial improvement over hybridization-based gene expression arrays for the development of informative biomarkers in human disease. As gene expression microarrays became widely available and affordable, there was considerable excitement about their use in developing predictive or diagnostic biomarkers [4]. It was disappointing, then, when biomarkers identified in one cohort (e.g., for prediction of response to therapy) showed little or overlap with biomarkers developed in independent cohorts [52,53]. While there have been successful attempts to corroborate gene array data in independent patient cohorts [3], the limited dynamic range and considerable technical variation (large batch effects) of hybridization-based arrays will very likely continue to limit their utility for medical purposes. With next-gen sequencing costs continuing to fall, the possibility of developing "personalized transcriptomes" for diagnosis Fig. 3 Sample reproducibility. a Principal Component Analysis (PCA) was performed based on differentially expressed genes (p < 0.05) to obtain an overview for the three conditions (AD, CRM, CF) on gene expression. The PCA scatter plot showed that the three groups (AD, CRM, CF) is the best separation and formed three distinct clusters. b Heat map of sample distance based on Jensen Shannon entropy over gene counts. The result showed three groups were able to be separated or prognosis (e.g., predicting therapeutic response) seems an achievable goal.
This sample size is too small to make broad inferences about the pathogenesis of JIA, although some interesting findings emerge from these data. For example, we have previously reported the involvement of interferon gamma in gene expression networks constructed from gene microarray data in JIA neutrophils [9]. The current data suggest an attenuation of type 1 interferon responses in JIA neutrophils, a new finding and not one that we have previously discerned in expression data in JIA neutrophils [1,3,26], although we see decreased expression of numerous pro-inflammatory genes in JIA neutrophils using this approach. These findings are probably medically relevant: a recent study shows that IFN-response gene expression levels in neutrophils in adult RA correlates with a good response to TNF inhibitor therapy [54]. While we believe that our findings support efforts to continue biomarker development from human neutrophils in chronic inflammatory diseases, it is unlikely that, by itself, neutrophil transcriptome profiling will not be sufficient to crisply elucidate the pathogenesis of JIA or other complex traits characterized by chronic, indolent inflammation.
RNAseq also allowed us to characterized the expression of lncRNAs in neutrophils in CF patients and JIA patients with different disease states. Our results demonstrate a large number of lncRNAs commonly expressed in neutrophils, and these data thus provide a useful resource for lncRNA expression in human neutrophils. We also analyzed differential expression analysis of the lncRNAs between diseases or disease states and identified 38 differentially expressed lncRNAs in AD vs CRM comparison, and 30 in AD vs CF. These results provide further evidence that neutrophils exhibit considerable adaptability in their transciptomes and that gene and transcript expression is disease or disease state specific.
(See figure on previous page.) Fig. 4 Transcription factors (inner circle) identified by the IPA software analysis for differentially expressed genes in the comparison of neutrophil RNAseq data between AD and CRM juvenile idiopathic arthritis patients. IPA analysis elucidated12 TFs that are predicted to be inhibited (a) and 4 TFs predicted to be activated (b). Each TF is connected with its target genes; dashed and solid connecting lines indicate an indirect or direct relationship with the master regulator respectively. The colors of the lines connecting TFs and their corresponding targets indicate the predicted activation (orange) or inhibition (blue) status