Profiling non-coding RNA levels with clinical classifiers in pediatric Crohn’s disease
BMC Medical Genomics volume 14, Article number: 194 (2021)
Crohn’s disease (CD) is a heritable chronic inflammatory disorder. Non-coding RNAs (ncRNAs) play an important role in epigenetic regulation by affecting gene expression, but can also directly affect protein function, thus having a substantial impact on biological processes. We investigated whether non-coding RNAs (ncRNA) at diagnosis are dysregulated during CD at different CD locations and future disease behaviors to determine if ncRNA signatures can serve as an index to outcomes.
Using subjects belonging to the RISK cohort, we analyzed ncRNA from the ileal biopsies of 345 CD and 71 non-IBD controls, and ncRNA from rectal biopsies of 329 CD and 61 non-IBD controls. Sequence alignment was done (STAR package) using Human Genome version 38 (hg38) as reference panel. The differential expression (DE) analysis was performed with EdgeR package and DE ncRNAs were identified with a threshold of fold change (FC) > 2 and FDR < 0.05 after multiple test corrections.
In total, we identified 130 CD specific DE ncRNAs (89 in ileum and 41 in rectum) when compared to non-IBD controls. Similarly, 35 DE ncRNAs were identified between B1 and B2 in ileum, whereas no differences among CD disease behaviors were noticed in rectum. We also found inflammation specific ncRNAs between inflamed and non-inflamed groups in ileal biopsies. Overall, we observed that expression of mir1244-2, mir1244-3, mir1244-4, and RN7SL2 were increased during CD, regardless of disease behavior, location, or inflammatory status. Lastly, we tested ncRNA expression at baseline as potential tool to predict the disease status, disease behaviors and disease inflammation at 3-year follow up.
We have identified ncRNAs that are specific to disease location, disease behavior, and disease inflammation in CD. Both ileal and rectal specific ncRNA are changing over the course of CD, specifically during the disease progression in the intestinal mucosa. Collectively, our findings show changes in ncRNA during CD and may have a clinical utility in early identification and characterization of disease progression.
Inflammatory bowel disease (IBD) is a disorder affecting the intestines with two prominent disease types; ulcerative colitis (UC) and Crohn’s disease (CD), whereby UC is confined mostly to the colonic mucosa with persistent chronic inflammation  and CD is a transmural disease affecting the entire gastrointestinal tract . Rising incidence of IBD have been attributed to the gut-microbiome interactions, genetic predispositions, and environmental triggers , but more recently attention has been placed on epigenetic mechanisms . Non-coding RNAs (ncRNAs) play an important role in epigenetic regulation by affecting gene expression but can also directly affect protein function, thus having a substantial impact on biological processes . Though there are many types of ncRNAs, the defining characteristic of a ncRNA is their length; ranging from ~ 22 nucleotides (nts) to > 200nts , and for micro-RNAs, 20–25 nts .
The Montreal classification system defines the CD behaviors as B1, a non-stricturing/non-penetrating disease; B2 as stricturing; and B3 as penetrating disease. Similarly, the site of CD manifestation is also stratified into three main locations, L1 (ileal), L2 (colonic), and L3 (ileocolonic) . Previous CD studies suggest that environmental factors can modulate pre-disposition and affect disease outcomes , likely through genetic and/or epigenetic mechanisms. However, little is known about the role epigenetics or ncRNA expression might play in influencing specific disease behavior, location, or inflammatory status during CD.
In fact, very few studies have investigated the role of ncRNA on epigenetic mechanisms to determine whether they are playing a beneficial or detrimental role in IBD development and progression. Notably, earlier we performed the first large scale CD ncRNA analysis in IBD and identified two differentially expressed (DE) up-regulated lncRNAs, LINC01272 and HNF4A-AS1 in CD patients . Likewise, a few other studies have analyzed smaller datasets and observed changes in IBD specific ncRNAs, i.e. LINC01272 , DIO3OS , and KIF9-AS  using RT-PCR in intestinal tissues and plasma samples in IBD. In particular, the expression of these ncRNAs was greater in IBD patients when compared to controls . However, none of these studies explored whether the ncRNAs expression are specific to tissue types, CD disease behaviors, disease location or disease inflammation. Therefore, in this study our goal was to identify the ncRNAs associated with tissue location (ileum vs. rectum), CD disease behaviors (among B1, B2 and B3), and inflammation on disease location (among L1, L2 and L3) during CD by utilizing CD participants from the largest RISK cohort clinical and disease characteristics.
Using the RISK cohort , we evaluated high-density ncRNA transcriptomic profiles from 735 samples (345 are from ileal and 390 are from rectal biopsies) with well-defined 5-year patient follow-up and clinical metadata. We assessed whether CD specific ncRNA expressions were consistent across tissue location (ileum vs. rectum) and investigated whether changes in ncRNA levels are representative of CD disease behaviors (B1, B2 and B3). Lastly, we tested whether changes in ncRNA levels have the potential to distinguish inflammatory status from disease location within CD patients. Our results show a dysregulation in ncRNA abundance in different mucosal location during distinct stages of CD progression (thus the disease behavior) that are potentially useful in future clinical companion diagnosis.
All samples used in this study were part of the RISK study (Risk Stratification and Identification of Immunogenetic and Microbial Markers of Rapid Disease Progression in Children with Crohn’s Disease) . Prior to diagnosis and treatment, confirmation of disease status and extent was histologically evaluated by a physician(s). Ileal and rectum bulk-biopsies were obtained from newly diagnosed CD patients by colonoscopy. Patients with no bowel pathology, negative gut inflammation, and asymptomatic for IBD, were considered as non-IBD controls. Ileal biopsies were obtained from 71 non-IBD controls and 274 CD patients at diagnosis. Similarly, rectal biopsies were obtained from 61 controls and 329 CD patients at diagnosis from the same RISK cohort. Individuals were clinically assigned according to the Montreal classification system, initially at baseline, for CD patients. Physician assessed Montreal classifiers were denoted by disease status such as B1 (inflammatory), B2 (stricturing), or B3 (penetrating); and inflammation location such as L1 (ileal), L2 (colonic), and L3 (ileal-colonic) . Disease severity classifications, demographics, and clinical information were collected for each patient at time of enrollment and during follow-up, are provided in Table 1 along with other patient metrics included age, sex, disease type, disease behavior, inflammatory status and inflammation on disease location.
All biopsies were extracted and processed as previously described [14, 15] with NEBNext Ultra RNA Library Prep Kit used for Illumina RNA sequencing library preparations by following set manufacturer’s recommendations (NEB, Ipswich, MA, USA). Approximately 77% of the 274 CD ileal biopsies, n = 212, were used for our previous publication . These were re-sequenced for a deeper RNAseq, alongside the other ileal samples. Libraries were sequenced on the HiSeq system using Paired End (PE) 150 base pair chemistry by GEMEWIZ, South Plainfield NJ. Whole biopsy RNA sequencing for both ileal and rectal samples was done in a single batch. Read quantification was conducted and aligned to the GENCODE v28 (HG38) reference genome using STAR package . EdgeR was used to analyze Differential Expressed long ncRNAs (DE lncRNAs) [17, 18]. In total, 20,779 ncRNAs were analyzed in both the ileum and rectum datasets. Overall, a list of fifteen types of non-coding RNAs were assessed, where the categorization was based upon their length of nucleotides (Additional file 1: Table S1).
The workflow and the overall layout of this study are provided in Additional file 2: Fig. S1, where the total number of samples included and number of differentially expressed ncRNA (DEncRNA) are identified in each comparison. Overall, we examined the differential expression of ncRNA across tissue types, CD disease behaviors, inflammation on disease location obtained from a subset of pediatric CD patient’s intestinal biopsies from RISK study (Additional file 3: Table S2).
Genome-wide differential expression analysis was conducted using computational algorithms with packages EdgeR  and SARtools  in R Studio version 1.2 . In this study, the DE both up- and down- regulated ncRNAs were defined with FC > 2 and FDR < 0.05 after multiple test corrections and Principal Component Analysis (PCA) were performed using princomp package .
Thermodynamic calculations, binding affinities, and base-pair modeling were conducted using IntaRNA version 2.3.0, in conjunction with Vienna RNA package 2.4.9 [2, 22, 23]. Utilizing this server, possible interactions between target mRNAs such as mir-1244-2, mir-1244-3, mir-1244-4 and query RN7SL2 ncRNA were computed with parameters for the sliding window size set with 150, maximum length of unpaired region with 150, maximum distance of two paired bases with 100, the weight for ED values of target RNA and query RNA were set at 1, and the temperature was set at 37 degrees Celsius. The Heuristic for hybridization end used was also incorporated within the molecular base-pairing predictions.
Random forest prediction
The total number of samples was split into a training and a validation set. The training sets contained equal number of samples for each comparison. These were randomly selected based on alignment quality, sorting best to worst. Of the selected samples, the training set consisted of 50% of the data whereas the validation sets contained 50% of the remaining data. Using RandomForest, classifiers were built on the training set using n = 100,000 trees, mtry set to 2, and disease status, behavior, or inflammation were evaluated. To test each classifier model, five-fold cross-validation was implemented. Each time, samples were arbitrarily selected to train and test the model. Cross-validation folds were fixed, across all comparisons. Accuracies were calculated using confusion matrices of test set class labels and test set predictions. These accuracies demonstrate the overall robustness of each model.
ncRNA profiles within mucosal biopsies separate CD from controls
Overall ncRNA transcriptomic profiles from ileal biopsies explained 34% and 12% of the variances with the first two PCs (Fig. 1a), whereas in rectal biopsies it only explained 16% and 7% variances (Fig. 1b), respectively. The first two PCs from the entire ncRNA transcriptomic profile showed that ncRNA levels have the potential by nature to discriminate CD from non-IBD controls (Fig. 1a, b). Further differential expression (DE) analysis identified a total of 89 DE ncRNAs in the ileum when comparing 274 CD cases to 71 controls. Of them, 62 were up-regulated in CD, while 27 were down-regulated (Fig. 1c). A similar comparison in rectal biopsies showed 41 DE ncRNAs when 329 CD cases were compared to 61 controls (Fig. 1d). Of those, 18 were up-regulated and 23 were down-regulated in CD. A hierarchical clustering of DE ncRNAs showcased two independent clusters representing CD and control groups. This pattern was observed for both ileum and rectal biopsies using the corresponding DE ncRNAs observed in each tissue separately (Fig. 1e, f). List of all FDR significant DE ncRNAs at FDR < 0.05 as well as the nominally significant DE ncRNAs (P < 0.05) for both ileal and rectal biopsies are provided in Additional file 4: Table S3. To further test whether the expression of CD specific ncRNAs are consistent across the different location of the intestine, we compared the log2FC of ncRNAs that are observed in both ileal and rectal biopsies. Of the 130 DE ncRNAs tested, 17 were shared in both the ileum and rectum, 87 were differentially expressed only in ileum, and 25 were differentially expressed only in the rectum (Fig. 1g, Table 2, Additional file 2: Fig. S2a). We determined whether the disease-specific expression of the 130 DE ncRNAs are expressed in the same direction or magnitude regardless of tissue type by comparing the log2FC of ileal DE ncRNAs to those in the rectum (Fig. 1g). Surprisingly, 88% (n = 114) of DE ncRNAs were expressed in the same direction regardless of tissue types, with a strong positive correlation of R = 0.69; P < 2.2e−16. The remainder (n = 16) were directionally inconsistent, with a strong negative correlation of R = -0.79; P < 2.2e−16 (Fig. 1h). These 16 ncRNAs showed unique, statistically significant differences when examined in both ileal and rectal samples amongst disease status. Similar to previous analysis , most of the DEncRNA found in our analysis were either antisense or lincRNAs.
Next, to test our findings robustness and reliability, we compared our current results to our previous study , which contains a subset of ileal samples from the same RISK cohort. For this comparison, we excluded the matched ileal samples (n = 212) that were shared with our previous study, and in this subset analysis, 188 DE ncRNAs were observed. A direct log2FC comparison of ncRNAs expression in CD patients from both the studies showed a directionally consistent pattern with a strong positive correlation of R2 = 0.86; P < 2.2E−16, validating our methods and replicability (Additional file 2: Fig. S2b).
Taken together, these results show that changes in mucosal ncRNA levels are specific to CD and are most prevalent in the small intestine, but interestingly shows distinct changes in ncRNA signatures in the rectum. In contrast, from the 114-disease specific DE ncRNAs, we also found a small number of were miRNAs (ncRNAs with < 22 length of nucleotides), namely, MIR-1244-1, MIR1244-2, and MIR1244-3, regardless of tissue location.
Pathways annotated to CD specific ncRNAs
The total ncRNA transcriptomic data appear to show larger CD-specific differences in the ileum than in the rectum. Based on this, it was not surprising that the TopGO pathway analysis on DE ncRNAs results showed more significant pathways hits in ileal (n = 136) (Additional file 2: Fig. S3a, b) than rectal biopsies (n = 36) (Additional file 5: Fig. S4a, b) between cases and controls (Additional file 5: Table S4 and Additional file 6: Table S5). However, there were 29 common pathways observed in ileal and rectal gene ontologies, including intracellular transport (GO:0,046,907) in the cellular component category that was annotated by RN7SL2.
Behavior-specific DE ncRNAs in intestinal biopsies
Next, we examined if PC1 of 130 DE ncRNAs that were observed between CD and controls were able to differentiate Crohn’s disease behaviors (B1, B2 and B3). As expected, the PC1 had a potential to differentiate one from the others (Fig. 2a). Therefore, we further extended our analysis based on CD disease behaviors. First, we compared the expression of ncRNAs among individual CD behavior group against controls in both ileal and rectal biopsies and revealed ncRNAs specific to distinct CD behavior groups. We noticed more DE ncRNAs in ileum; B1 (n = 70), B2 (n = 124) and B3 (n = 22) (Fig. 2b, Additional file 7: Table S6) than in the rectal biopsies; B1 (n = 23), B2 (n = 9) and B3 (n = 14) (Fig. 2c, Additional file 8: Table S7).
Similarly, the comparison among CD disease behavior groups from inflammatory (B1) to stricturing (B2) to penetrating (B3) showed an increased pattern of the variance explained 32%, 35%, 45%, respectively (Additional file 2: Fig. S5 a−g). The DE analysis among B1, B2 and B3 showed a similar tread, which is an ileal-centric nature with more DE ncRNAs were observed in ileal B2 versus B1 (n = 35), B3 versus. B1 (n = 13) and B3 versus. B2 (n = 14) than was found in the rectum (Additional file 9: Table S8; Fig. 2c). Interestingly, all DE ncRNAs observed in B3 versus. B1 (Fig. 2e) were also observed in B2 versus B1 (Fig. 2d) and B3 versus B2 (Fig. 2f) comparisons, potentially demonstrating certain CD characteristics that may be present across B1, B2, and B3. Notably, most of them were antisense or lincRNAs types of non-coding elements (Additional file 2: Fig. S6a, b). Similar comparison in rectal biopsies showed no DE ncRNAs to be statistically significant. Thus, our results indicate that a set of ncRNAs in ileal biopsies reflects the Montreal CD disease behaviors, whereas such a pattern was not observed in rectal biopsies of CD patients.
Inflammation and disease location-specific ncRNAs in CD
Inflammation is often a visible hallmark signature of CD, thus we next examined whether the expression of ncRNAs in the inflamed group of CD patients could distinguish them from the non-inflamed groups and furthermore, the location of disease. We used two groups that are assigned by physicians based on i) inflammatory status (inflamed vs. non-inflamed), and ii) disease (inflammation) location such as ileal-centric (L1), colonic (L2), and ileocolonic (L3). We tested whether the PC1 obtained from 130 CD specific DE (Fig. 1) can differentiate inflammatory status and disease location in CD patients. Overall, PC1 obtained from ileal biopsies (Additional file 2: Fig. S7a) showed significant differences between inflamed and non-inflamed/controls samples than rectal biopsies (Additional file 2: Fig. S7b). Especially PC1, as it largely differentiated the CD patients with L1 and L3 ileal inflamed disease locations from controls (P < 2.2E−16), as compared to L2 CD patients with colon inflamed disease location (P < 2.2E−14) or patients with non-inflamed sites (P < 3.5E−08) groups (Fig. 3a).
Therefore, we further subjected the CD patients to identify DE ncRNAs specific for inflammation status and location of the disease, which are classified through physician’s clinical assessment. Since this study is primarily focused on CD, where the disease largely occurs in the ileum, we restricted this analysis to only ileal biopsies. In order to identify the ncRNAs specific for disease locations, the L1 and L3 CD patients (n = 198) were combined as the inflamed group and then we compared with non-inflamed ileal CD patients (n = 20) alone and then to non-inflamed + L2 (n = 76) groups together (Additional file 2: Fig. S8a−c), keeping in mind that the L2 CD patients were inflamed only in the colonic location, not in the ileum. Our DE analysis on the ileal biopsies showed 21 DE ncRNAs for L1 + L3 versus non-inflamed (Fig. 3b), and 31 DE ncRNAs for L1 + L3 versus non-inflamed + L2 (Fig. 3c) (Additional file 10: Table S9). A total of 10 DE ncRNAs were shared in both comparisons (Fig. 3d). Likewise, using log normalized FPM (fragments per million) of 21 DE ncRNAs showed better differentiation between the inflamed (L1 + L3) non-inflamed groups (Fig. 3e) rather than the other comparison with 31 DE ncRNAs (Fig. 3f), which incorporated L2 samples into non-inflamed group. Further, the FPM analysis on specific inflammation location showed both L1 and L2 groups as being similar, while L2 and non-Inflamed groups as more closely related (Fig. 3g-h). Using these results in comparing disease inflammation and location status, ncRNA transcriptomic profiles of L2 CD patients were more like CD patients with non-inflamed ileal disease location than inflamed ones (L1, L3).
DE ncRNAs RN7SL2, mir-1244-2,3,4
miRNAs have been observed to regulate multiple facets of gene expression including other non-coding RNAs, and are known to be dysregulated during CD , yet the mechanisms remain unclear. Of interest is the down regulation of RN7SL2 by mir-125b to control cell death . In our analysis, we found an increase in the levels of mir-1244-2,3,4 and a decrease in the levels of RN7SL2 in CD versus controls (FDR significance, but not in log2FC) (Fig. 4a). Using IntaRNA to test for molecular interactions amongst RN7SL2 and miRNA-1244-x (2,3,4), we obtained six possible predicted conformations with stable base pairing (Table 3). For two of these possible interactions, one had complementary base pairing for the miRNA-ncRNA interaction at RN7SL2, nucleotides 268–289 with miRNA-1244-2,3,4 at nucleotides 28–48, while the second predicted interaction was RN7SL2, nt 195–199 with miRNA-1244-2,3,4 at nts 8–12. The Δ°G values of these interactions suggest stable binding and schematically represented in Fig. 4b, along with a more realistic molecular model generated by using SRP RNA structures based on ribosomal RNA interactions (Fig. 4c) . Taken together, these results suggest that changes in miRNA levels during CD have physiological impacts that can change cellular function and potentially alter disease outcomes, with RN7SL2 being a potential candidate for targeted therapy.
ncRNA as a potential tool to predict disease status, disease behaviors and disease location in IBD
Lastly, we tested the accuracy of these non-coding elements to predict disease from controls, disease behavior, and disease inflammation in ileal biopsies through RandomForest  approach. To test whether ncRNAs serve as a potential index to predict disease status, we used the entire dataset of both CD and CTRLs. The specificity and sensitivity of the modeling showed an average AUC of 0.80 with 84% accuracy, reflecting the robustness of these DEncRNAs to decipher CD versus controls (Fig. 5a). Whereas, in terms of disease behavior, due to our dataset being composed of limited sample size in B2 and B3 when compared to B1, we arbitrarily down sampled the larger dataset in each comparison with respect to the smaller comparative dataset. Therefore, to predict the B2, and B3, from B1, we randomly down sampled B1 to mitigate sample bias. With this, our model predicted B2 from B3 with a mean AUC of 0.84 and 80% accuracy (Fig. 5c), B2 from B1 with 0.72 AUC and 62% accuracy (Fig. 5b), and B3 from B1 with 0.68 AUC and 68% accuracy (Fig. 5d). Likewise, in comparing inflammatory status, the inflamed samples were down sampled with respect to non-inflamed samples. Our model showed better prediction to non-inflamed (without L2) from inflamed with 0.63 AUC and 72% accuracy (Fig. 5e). Interestingly, a poorer prediction was observed when non-inflamed and L2 were tested against inflamed displaying 0.55 AUC and 61% accuracy (Fig. 5f). Details of sample sizes in each comparison and fivefold cross-validation prediction results for disease status, disease behavior and inflammation status are provided in Additional file 11: Table S10, Additional file 12: Table S11, Additional file 13: Table S12, Additional file 14: Table S13, Additional file 15: Table S14, Additional file 16: Table S15.
While changes in ncRNA during CD have been documented [6, 10, 11, 28, 29], little is known about the role of ncRNAs in different location of the intestine. At diagnosis, the location of the diseased mucosal biopsy obtained through colonoscopy often determines diagnosis of IBD subtypes. Inflammation in the ileum is most common for CD whereas rectal inflammation is common for UC. As often is the case however, CD can manifest at multiple locations along the intestine. Using the same IBD patients from the RISK cohort, our previous transcriptomic analysis on the protein-coding genes and a combined analysis with genotypes (eQTL) of both UC and CD in different tissues types, ileum and rectum  showed that the transcriptomic and eQTL signatures are distinct to disease characteristics. Here in, we have taken a similar approach and have applied it to profiling ncRNAs in conjunction with location and behavior specific. In doing so we have expanded on previous analysis by including a larger number of CD patients. Importantly, we have shown that ncRNA changes correlated with clinical subtypes originally diagnosed by the physician, and thus it can potentially be applied as a tool to categorize CD disease and corresponding inflamed disease location through ncRNA profiling.
We previously observed that the gene signatures associated with development of complications in CD are ileal specific and often associated with genes involved in producing extracellular matrix (ECM) when progressing from B1 to B2 forms of CD . In our current study, we found the highest degree of DE ncRNA in the ileum, consistent with ileal-centric disease, but also observed multiple DE ncRNAs within the ileum and rectum associated with the ECM. For example, the ncRNA, AC016735.2, has been observed to regulate COLIA1 and COLIA2 whose function involves mediation of collagen organization, a prevalent component of the stromal ECM , and we found that AC016735.2 was the most differentially expressed in B1 and B3 disease. Since the intestinal epithelium requires properly functioning ECM in order to establish a functional barrier from luminal contents to prevent adverse immune responses , the ability to track early signs of collagen dysregulation by monitoring ncRNA levels may be an important diagnostic tool and potential target for therapy. The involvement of AC016735.2 in other intestinal diseases such as gastric cancers , suggests imbalances in this molecules function may play a negative role in IBD outcomes. Thus, by mapping such changes through ncRNA profiling and the location where they are taking place, we have shown a ncRNA profile for CD patients and provided further insights into potential epigenetic sources of disease manifestation i.e. ncRNA dysregulation, within the mucosa.
In a Danish cohort of 213 CD patients, it was observed that individuals with L1: ileal site of disease manifestation and B2: stricturing behavior exhibited the highest risk for surgical intervention . Our results, using CD classifications, show individuals with B2 CD had the lowest correlation coefficient, independent of tissue type, in comparison to B1 and B3. Likewise, L1: ileal, forms of CD also exhibited the lowest correlation coefficients in both the rectal and ileal datasets. Patients with B2-CD and L1: ileal site of disease localization was the most distinct in terms of transcriptomic profiling utilizing ncRNAs. Therefore, we demonstrate that the levels of non-coding genetic elements reflect distinct changes in CD patients that correlate with other clinical indicators and indexes of diagnosis, giving further validity to their potential use as biomarkers.
Consistent with our previous reports  and Braga-Neto et al. , we detected LINC01272 to be 1.65 × DE in increased quantities, ~ 2.9 log2FC, in CD versus controls, in both tissue types. Across the ileum and rectum samples, multiple DE ncRNAs such as, OVCH1-AS1, RP11-143J24.4, and RP11-184E9.1, showed increased expression in CD versus controls. Notably, RN7SL2 was the only down-regulated DE ncRNA that was observed in both the rectum and ileum displaying significantly lower levels in diseased CD samples. RN7SL2 serves as a 7SL RNA molecule and serves to scaffold the formation of a cytoplasmic ribonucleoprotein complex, Signal Recognition Particle (SRP). This type of non-coding ribonucleoprotein (ncRNP) is conserved across multiple species and the mammalian versions of SRP are comprised of six proteins and one RNA molecule, RN7SL2 .
Notably, RN7SL2 or RNSRP2 was downregulated in both the ileum and rectum biopsies of pediatric CD patients in comparison to controls. It was the only ncRNA across the assessment panel to be statistically significant and observed across disease status, CD versus controls, and disease behavior (B1, B2), in both the ileal and rectal mucosa. This suggests that certain non-coding genetic elements associated with CD may be detected across different tissues and sites of disease manifestation with similar DE trends. RN7SL2 functions involve mediation of secretory proteins into the lumen of the endoplasmic reticulum (ER), sometimes via co-translational-insertion . Also, there is an additional conformation of 7SL-RNA (RN7SL2) that forms a propeller-secondary structure with two hairpins being converged by a tetranucleotide bulge loop at its 5’-end that increases its topological efficiency allowing up-regulated activation of pol-III transcription . Thus, changes in this molecule could potentially impact protein processing through the ER, export to the cell surface, vesicle release as microparticles, and even affect the translation of proteins at the ribosome by manipulating the transcription activities of pol-III and its transcribing of rRNA and tRNA. However, how these changes in RN7SL2 levels during CD drives the onset and/or progression of CD still needs further analysis.
Collectively, our Random Forest prediction results show that ncRNAs could be used as a tool to discriminate CD from non-IBD with great accuracy. Nevertheless, our sample sizes are limited for disease behavior and inflammation statuses to achieve a better prediction model. With these results, we believe that clinical disease classifiers further support the utility of non-coding elements as potential biomarkers, prognostic tools, and pharmaceutical targets for therapy.
Taken together, our studies have revealed a clear change in the levels of ncRNAs in the mucosa during distinct phases of CD that correlate well with clinical classifiers. The predominant changes were ileal-specific signatures that likely involved changes in both the ECM and factors that regulate protein function at the ribosome, ER and nucleus. ncRNA changes are thus promising indexes of disease behavior and could potentially serve as therapeutic targets for treatment of distinct stages of CD.
Our study has shown that ncRNA in the intestinal mucosa change during CD in the ileum and the rectum and correlate well with clinical indicators, but that the largest percentage of those changes occurred in the ileal tissues, reflecting the ileal-specific nature of CD. Since these signatures appear to correlate well with severe disease and location, they are most likely strong indicators of disease status. Although it is unclear from our analysis if the changes in ncRNA levels are in fact cellular repair measures or further contributing to mucosal injury, the dysregulated levels of ncRNA in mucosal tissue of CD patients suggests they play a role in CD and might have clinical utility in aiding in early identification and characterization of disease progression.
Availability of data and materials
The ileal bulk-biopsy RNA sequencing data included in this study have been deposited in the Accession PRJNA594730 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA594730). The rectal bulk-biopsy RNA sequencing data is deposited in the GEO series accession number GSE117993. Analyses accompanied with ncRNA annotations was conducted using GENCODE v28 (HG38) (https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.26/) and Ensembl (http://useast.ensembl.org).
Inflammatory bowel disease
- DE ncRNAs:
Advanced Diabetic Eye Disease
False discovery rate
- p. adj:
Adjusted p value after multiple test corrections
Receiver operator characteristic curve
Area under curve
Adams SM, Bornemann PH. Ulcerative colitis. Am Fam Physician. 2013;87:699–705.
Nobrega VG, et al. The onset of clinical manifestations in inflammatory bowel disease patients. Arq Gastroenterol. 2018;55:290–5.
Fiocchi C. Inflammatory bowel disease: etiology and pathogenesis. Gastroenterology. 1998;115:182–205.
Ray G, Longworth MS. Epigenetics, DNA organization, and inflammatory bowel disease. Inflamm Bowel Dis. 2019;25:235–47.
Bautista RR, et al. Correction to: Long non-coding RNAs: implications in targeted diagnoses, prognosis, and improved therapeutic strategies in human non- and triple-negative breast cancer. Clin Epigenet. 2018;10:106.
Lv X, et al. FAL1: A critical oncogenic long non-coding RNA in human cancers. Life Sci. 2019;236:116918.
Niu YW, Wang GH, Yan GY, Chen X. Integrating random walk and binary regression to identify novel miRNA-disease association. BMC Bioinform. 2019;20:59.
Thurgate LE, Lemberg DA, Day AS, Leach ST. An overview of inflammatory bowel disease unclassified in children. Inflamm Intest Dis. 2019;4:97–103.
Agrawal M, Burisch J, Colombel JF, Shah SC. Viewpoint: Inflammatory bowel diseases among immigrants from low- to high-incidence countries: opportunities and considerations. J Crohns Colitis. 2019;14:267–73.
Haberman Y, et al. Long ncRNA landscape in the ileum of treatment-naive early-onset crohn disease. Inflamm Bowel Dis. 2018;24:346–60.
Wang S, et al. KIF9AS1, LINC01272 and DIO3OS lncRNAs as novel biomarkers for inflammatory bowel disease. Mol Med Rep. 2018;17:2195–202.
Kugathasan S, et al. Prediction of complicated disease course for children newly diagnosed with Crohn’s disease: a multicentre inception cohort study. Lancet. 2017;389:1710–8.
Moon JS, et al. Clinical characteristics and postoperative outcomes of patients presenting with upper gastrointestinal tract Crohn. Ann Coloproctol. 2020;36:243–8.
Haberman Y, et al. Corrigendum. Pediatric Crohn disease patients exhibit specific ileal transcriptome and microbiome signature. J Clin Investig. 2015;125:1363.
Kelly D, et al. Microbiota-sensitive epigenetic signature predicts inflammation in Crohn’s disease. JCI Insight. 2018;3:e122104.
Pouzat C, Chaffiol A. Automatic spike train analysis and report generation. An implementation with R, R2HTML and STAR. J Neurosci Methods. 2009;181:119–44.
Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139–40.
McCarthy DJ, Chen YS, Smyth GK. Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res. 2012;40:4288–97.
Varet H, Brillet-Gueguen L, Coppee JY, Dillies MA. SARTools: A DESeq2- and EdgeR-based R pipeline for comprehensive differential analysis of RNA-seq data. PLoS ONE. 2016;11:e0157022.
Loraine AE, et al. Analysis and visualization of RNA-Seq expression data using RStudio, bioconductor, and integrated genome browser. Methods Mol Biol. 2015;1284:481–501.
Allen GI, Maletic-Savatic M. Sparse non-negative generalized PCA with applications to metabolomics. Bioinformatics. 2011;27:3029–35.
Mann M, Wright PR, Backofen R. IntaRNA 2.0: enhanced and customizable prediction of RNA-RNA interactions. Nucleic Acids Res. 2017;45:W435–9.
Wright PR, et al. CopraRNA and IntaRNA: predicting small RNA targets, networks and interaction domains. Nucleic Acids Res. 2014;42:W119–23.
Mohammadi A, Kelly OB, Smith MI, Kabakchiev B, Silverberg MS. Differential miRNA expression in ileal and colonic tissues reveals an altered immunoregulatory molecular profile in individuals with Crohn’s disease versus healthy subjects. J Crohns Colitis. 2019;13:1459–69.
Su Y, et al. Regulatory non-coding RNA: new instruments in the orchestration of cell death. Cell Death Dis. 2016;7:e2333.
Becker MM, Lapouge K, Segnitz B, Wild K, Sinning I. Structures of human SRP72 complexes provide insights into SRP RNA remodeling and ribosome interaction. Nucleic Acids Res. 2017;45:470–81.
Goldstein BA, Hubbard AE, Cutler A, Barcellos LF. An application of Random Forests to a genome-wide association dataset: methodological considerations and new findings. BMC Genet. 2010;11:49.
Braga-Neto MB, et al. Deregulation of long intergenic non-coding rnas in CD4+ T cells of lamina propria in Crohn’s disease through transcriptome profiling. J Crohns Colitis. 2020;14:96–109.
Man HJ, Marsden PA. LncRNAs and epigenetic regulation of vascular endothelium: genome positioning system and regulators of chromatin modifiers. Curr Opin Pharmacol. 2019;45:72–80.
Venkateswaran S, et al. Bowel location rather than disease subtype dominates transcriptomic heterogeneity in pediatric IBD. Cell Mol Gastroenterol Hepatol. 2018;6:474–6.
Wang Y, Zhang J. Identification of differential expression lncRNAs in gastric cancer using transcriptome sequencing and bioinformatics analyses. Mol Med Rep. 2018;17:8189–95.
Okamoto R, Watanabe M. Role of epithelial cells in the pathogenesis and treatment of inflammatory bowel disease. J Gastroenterol. 2016;51:11–21.
Lo B, et al. Changes in disease behaviour and location in patients with Crohn’s disease after seven years of follow-up: a danish population-based inception cohort. J Crohns Colitis. 2018;12:265–72.
Braga Neto MB, Ramos GP, Loftus EV Jr, Faubion WA, Raffals LE. Use of immune checkpoint inhibitors in patients with pre-established inflammatory bowel diseases: retrospective case series. Clin Gastroenterol Hepatol. 2020. https://doi.org/10.1016/j.cgh.2020.06.031.
Massenet S. In vivo assembly of eukaryotic signal recognition particle: A still enigmatic process involving the SMN complex. Biochimie. 2019;164:99–104.
Ullu E, Weiner AM. Human genes and pseudogenes for the 7SL RNA component of signal recognition particle. EMBO J. 1984;3:3303–10.
Englert M, Felis M, Junker V, Beier H. Novel upstream and intragenic control elements for the RNA polymerase III-dependent transcription of human 7SL RNA genes. Biochimie. 2004;86:867–74.
Data interpretation and analysis was also supported by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) of the National Institutes of Health, under grant numbers RO1-DK098231 and RO1-DK087694 to S.K.
Ethics approval and consent to participate
The institutional review board (IRB) at Emory University reviewed and approved the protocols. Informed written consent and assent was obtained from parents or guardians. All patients and participants provided appropriate written assent. The IRB Study ID identifier is Emory University: IRB00012206.
Consent to publish
The authors have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
List of fifteen types of non-coding RNAs categorization based upon their length of nucleotides.
Overall Workflow of Analysis: Differential expression analysis was performed on n = 345 ileal and n = 390 rectal Bulk-RNA-later biopsies using Illumina deep-RNA sequencing in a single batch. The reference panel utilized was Hg38 with STAR package used for alignment of 58,381 transcripts, of which n = 20,779 were ncRNAs. EdgeR was used for differential expression of these ncRNAs in three comparisons: Disease Status, Disease Behavior, and Disease Inflammation. For each analysis, the total number of DE ncRNAs for ileal and rectal datasets is represented. Fig. S2. a The log2FC obtained from case-control analysis of all the 20,779 ncRNAs were compared between ileal and rectal biopsies. Each point is a ncRNA expression and they colored based on tissue- or disease-specific. b Using a subset of newly diagnosed CD patients (n = 133) from the ileal dataset (n = 345), the DE analysis results were compared with our previous study (Haberman et al. 2018) results performed on the same RISK cohort. The strong positive log2FC results shows the reliability and replicability of our analysis. Fig. S3. Gene Ontology Analysis of Crohn’s Disease versus Controls DE ncRNAs in Ileal Biopsies: Using TopGO, gene ontology analysis was conducted on n = 89 DE ncRNAs in ileal biopsies. The results displayed significant, FDR < 0.05, hits in (a) cellular components (n = 79), and b biological processes (n = 21). Fig. S4. Gene Ontology Analysis of Crohn’s Disease versus Controls DE ncRNAs in Rectal Biopsies: Using TopGO, gene ontology analysis was conducted on n = 89 DE ncRNAs in rectal biopsies. The results displayed significant, FDR < 0.05, hits in a cellular components (n = 12), and b biological processes (n = 17). Fig. S5. Principal Components and Volcano Plots of DE ncRNAs based on Crohn’s Disease Behavior: The PCs were calculated using entire list of ncRNA (n = 20,779) and first two PCs were plotted. Each point represents a subject and they are clearly separating the CD disease severity groups from controls across a B1 versus controls, b B2 versus Controls, and c B3 versus Controls, in ileal biopsies. The volcano plots of DE analysis show DE ncRNAs were identified across d B1 versus controls (n = 70), e B2 versus controls (n = 124), f and B3 versus controls (n = 22) comparisons. Fig. S6. Types of ncRNAs observed in Crohn’s Disease Behavior DE ncRNAs: Sankey plots of DE ncRNAs observed in ileal samples by comparing a B2 versus B1 (n = 35), and b B3 versus B2 (n=14) were correlated with protein coding genes (n=18,927) from the same dataset. Those pairs of ncRNA-mRNA with correlations > 0.50 were deemed significant. Using these correlation pairs, the relationship of ncRNAs and mRNAs were sequestered based on CIS, same strand, or TRANS, different strands. The coding strand is represented by (−) and non-coding strand by (+). Fig. S7. Principal Component Analysis of CD patients available with Inflammation Status ncRNAs: The disease inflammation location classifiers: L1, L2, L3, were used to classify the inflammation status. Any CD patients had disease at L1, L2 and L3 were considered as inflamed and the rest were considered as non-inflamed samples. The first two PCs were plotted to see any clusters based on inflammation status in (a) ileal biopsies (n=345) and (b) rectal (n = 390) rectal biopsies. Fig. S8. Principal Component Analysis of Inflammation Location ncRNAs: Using all n = 20,779 ncRNAs, princomp was used to extrapolate PCs in order to visualize trends amongst disease inflammation location groups in entire CD patients. PCs were calculated for inflamed versus non-inflamed samples a Both L1 + L3 (n = 198) CD patients with ileal disease were considered as inflamed group and compared against the non-inflamed group (n = 20 b), and non-inflammed + L2 (n = 76) group, because the L2 CD patients are inflamed in colonic region (c).
The cohort Metadata Chi-squares: Chi-squared of patient characteristics such as sex, age, disease type, disease behavior, and inflammatory status in both ileal (n = 345), and rectal (n = 390) datasets.
The case-control DE analysis results on ileal (n = 345) and rectal (n = 390) biopsies, by comparing disease status (CD = 274 vs. CTRL = 71) in ileal and rectal (CD = 329 vs. CTRL=61) datasets. All FDR significant DEncRNAs and nominally significant (P < 0.05) obtained from both ileal (n=3520) and rectal (n = 1447) are provided Ileum Gene Ontology: Using TopGO with input of 89 DEncRNAs in ileal samples from CD versus controls, 136 pathways were observed. Of which, 36 were molecular function, 79 cellular components, and 21 biological processes.
Ileum Gene Ontology: Using TopGO with input of 89 DEncRNAs in ileal samples from CD versus controls, 136 pathways were observed. Of which, 36 were molecular function, 79 cellular components, and 21 biological processes.
Using TopGO with input of 41 DEncRNAs in rectal samples from CD versus controls, 36 pathways were observed. Of which, 7 were molecular function, 12 were cellular, and 17 biological processes.
Ileum Disease Behavior DEncRNAs. DE analysis results for ileal (n = 345) biopsies by comparing disease behavior against controls and among CD disease behaviors. All FDR significant DEncRNAs with log2FC > 1 and nominally significant (P < 0.05) are listed, B1 versus controls (n = 70), B2 versus controls (n = 124), and B3 versus controls (n = 22).
Rectum Disease Behavior DEncRNAs: DE analysis results for rectal (n = 390) biopsies by comparing disease behavior against controls and among CD disease behaviors. All FDR significant DEncRNAs and nominally significant (P < 0.05) are listed, B1 versus controls (n = 23), B2 versus controls (n = 9), and B3 versus controls (n = 14).
Intra-Disease Behavior DEncRNAs: DE analysis results for ileal (n = 345) biopsies by comparing among disease behaviors. All FDR significant DEncRNAs and nominally significant (P < 0.05) are listed, B2 versus B1 (n = 35), B3 versus B1 (n = 13), and B3 versus B2 (n = 14).
Inflammation and disease location specific DEncRNAs. DE analysis results for ileal (n = 345) biopsies by comparing disease inflammation status, non-inflamed, inflamed, against controls and within-CD. All FDR significant DEncRNAs and nominally significant (P < 0.05) are listed. Differential expression was compared using groups comprised of: L1 + L3 ileal Inflamed group, (n = 198) versus non-inflamed ileal group (20), identified 31 DEncRNAs, and the same inflamed group versus non-inflamed (47) (20) + L2 (27), identified 21 DEncRNAs.
Using RandomForest, n = 345 ileal patients were randomly split into equal number of samples, n = 71 controls and n = 71 CD. These were further aliquoted into test and train datasets for cross-validation of AUC with n = 13,777 ncRNAs, base mean greater than n > 10.00. The input included n = 71 CD and n = 71 controls used to test disease versus controls status by evaluating model and test for accuracy based on sensitivities and specificity.
Using RandomForest, n = 274 ileal patients were randomly split into equal number of samples, n = 14 B1 and n = 14 B2. These were further aliquoted into test and train datasets for cross-validation of AUC with n = 13,777 ncRNAs, base mean greater than n > 10.00. The input included n = 14 B1 and n = 14 B2 used to test disease versus controls status by evaluating model and test for accuracy based on sensitivities and specificity.
Using RandomForest, n = 274 ileal patients were randomly split into equal number of samples, n = 14 B2 and n = 14 B3. These were further aliquoted into test and train datasets for cross-validation of AUC with n = 13,777 ncRNAs, base mean greater than n > 10.00. The input included n = 14 B2 and n = 14 B3 used to test disease versus controls status by evaluating model and test for accuracy based on sensitivities and specificity..
Using RandomForest, n = 274 ileal patients were randomly split into equal number of samples, n = 14 B1 and n = 14 B3. These were further aliquoted into test and train datasets for cross-validation of AUC with n = 13,777 ncRNAs, base mean greater than n > 10.00. The input included n = 14 B1 and n = 14 B3 used to test disease versus controls status by evaluating model and test for accuracy based on sensitivities and specificity.
Using RandomForest, n = 274 ileal patients were randomly split into equal test and train datasets for cross-validation of AUC with n = 13,777 ncRNAs, base mean greater than n > 10.00. The input included n = 20 non-inflamed and n = 20 inflamed (L1 = 10 and L3 = 10) used to test disease inflammation status by evaluating model and test for accuracy based on sensitivities and specificity.
Using RandomForest, n = 274 ileal patients were randomly split into equal test and train datasets for cross-validation of AUC with n = 13,777 ncRNAs, base mean greater than n > 10.00. The input included n = 76 non-inflamed (+ L2 = 56) and n = 76 inflamed (L1 = 38, L3 = 38) used to test disease inflammation status by evaluating model and test for accuracy based on sensitivities and specificity.
About this article
Cite this article
Pelia, R., Venkateswaran, S., Matthews, J.D. et al. Profiling non-coding RNA levels with clinical classifiers in pediatric Crohn’s disease. BMC Med Genomics 14, 194 (2021). https://doi.org/10.1186/s12920-021-01041-7