This article has Open Peer Review reports available.
Biomarker discovery: quantification of microRNAs and other small non-coding RNAs using next generation sequencing
© Lopez et al. 2015
Received: 16 March 2015
Accepted: 16 June 2015
Published: 1 July 2015
Small ncRNAs (sncRNAs) offer great hope as biomarkers of disease and response to treatment. This has been highlighted in the context of several medical conditions such as cancer, liver disease, cardiovascular disease, and central nervous system disorders, among many others. Here we assessed several steps involved in the development of an ncRNA biomarker discovery pipeline, ranging from sample preparation to bioinformatic processing of small RNA sequencing data.
A total of 45 biological samples were included in the present study. All libraries were prepared using the Illumina TruSeq Small RNA protocol and sequenced using the HiSeq2500 or MiSeq Illumina sequencers. Small RNA sequencing data was validated using qRT-PCR. At each stage, we evaluated the pros and cons of different techniques that may be suitable for different experimental designs. Evaluation methods included quality of data output in relation to hands-on laboratory time, cost, and efficiency of processing.
Our results show that good quality sequencing libraries can be prepared from small amounts of total RNA and that varying degradation levels in the samples do not have a significant effect on the overall quantification of sncRNAs via NGS. In addition, we describe the strengths and limitations of three commercially available library preparation methods: (1) Novex TBE PAGE gel; (2) Pippin Prep automated gel system; and (3) AMPure XP beads. We describe our bioinformatics pipeline, provide recommendations for sequencing coverage, and describe in detail the expression and distribution of all sncRNAs in four human tissues: whole-blood, brain, heart and liver.
Ultimately this study provides tools and outcome metrics that will aid researchers and clinicians in choosing an appropriate and effective high-throughput sequencing quantification method for various study designs, and overall generating valuable information that can contribute to our understanding of small ncRNAs as potential biomarkers and mediators of biological functions and disease.
KeywordsBiomarker microRNA Small non-coding RNA Next-generation sequencing Small RNA sequencing Whole-blood Brain Heart Liver Clinical samples
There is significant interest in the prediction and early detection of disease through the analysis of biological markers, or biomarkers, which have the potential to significantly improve clinical outcomes [1, 2]. Biomarkers are defined as any molecule derived from a biological sample that can indicate current disease status, evaluate progression of the disease, or assess potential responsiveness to a particular medication . Biomarkers come in many forms including DNA mutations, proteins, and messenger RNA (mRNA) transcripts . For example, ratios of aspartate/alanine aminotransferase are used as a reliable biomarker for liver fibrosis , protein levels of S100-beta are used as a biomarker of treatment response for malignant melanoma , while mutations of the genes BRCA1 and BRCA2 are well known biomarkers predicting the development of breast cancer . DNA methylation is also a well-studied biomarker [8–10]. Though not a focus of the current report, methylated cytosine residues have been associated with several diseases, including cancer and neurological disorders .
Over the years, non-coding RNAs (ncRNAs) have become the focus of biomarker research, an approach that has been favorably used in the investigation of response to treatment for several medical conditions. There are several types of ncRNAs, of which microRNAs (miRNAs) are the best known and the most frequently assessed for their potential role as biomarkers. MiRNAs have been proposed as molecular biomarkers in cancer , liver and cardiovascular disease [13, 14], and central nervous system disorders [15–18], among many others [19–22]. MiRNAs are small ncRNAs molecules that follow a well characterized biogenesis pathway that includes processing through the DGCR8/ DROSHA, Exportin-5, Dicer and RISC molecular complexes . Through post-transcriptional activity, these small, single-stranded, 19–25-base RNA transcripts regulate the expression of numerous genes. Binding of the miRNA to the complementary sequence of a target mRNA relies on recognition of the seed region, the 2–8 nucleotides located at the 3′end of the miRNA, which leads to either mRNA degradation or translational repression [19, 21, 24].
Other ncRNA species such as PIWI-interacting RNAs (piRNAs), small nucleolar RNAs (snoRNAs), small nuclear RNAs (snRNAs) and long non-coding RNAs are also gaining support as key components of cellular regulation [19, 25], and thus might be potentially assessed as biomarkers of disease. PiRNAs are small ncRNAs of 24–31 nt length. In contrast to miRNAs, these are Dicer-independent and interact with the PIWI subfamily of Argonaute proteins involved in the regulation of genome stability [26, 27]. PIWI proteins are involved in gene regulation through RNA degradation and have been linked to DNA methylation . In addition, piRNAs have been reported as potential biomarkers for bladder , breast , and gastric  cancers. SnoRNAs are key components of the small ribonucleoproteins (snoRNPs) which are responsible for sequence-specific 2′-O-Methylation of ribosomal RNA (rRNA) . SnoRNAs have been shown to participate in post-transcriptional regulation of rRNA by targeting snoRNPs in the nucleus . In addition, snoRNAs have been proposed as potential biomarkers for several forms of human cancers [34–36]. Long non-coding RNAs are another class of ncRNAs that have gained a lot of attention recently as potential biomarkers [37–40]. They comprise a heterogeneous group of ncRNAs larger than 200 nt, which includes long non-coding RNAs (lncRNAs), large intergenic non-coding RNAs (lincRNAs) and transcribed ultraconserved regions (T-UCRs), among others . LncRNAs are known to regulate DNA methylation by recruiting chromatin remodeling complexes . LincRNAs have been associated with active transcription in regions of transcriptional elongation . Finally, while the function of T-UCRs is still unknown, they have been demonstrated to interact with microRNAs and might have a role in the development of disease . T-UCRs have been recently postulated as potential diagnostic and prognostic biomarkers in colorectal cancer patients .
While any ncRNA is a putative biomarker, miRNAs have received the most attention because they possess several features that render them especially powerful : (1) they are highly conserved, and evolutionary complexity correlates with miRNA complexity, which suggests an important biological function; (2) there are a relatively small number of individual miRNAs with a large dynamic range of expression; (3) they are secreted into circulation and can be measured in all biological fluids; (4) they are not easily degraded and are thus highly stable in clinical samples; (5) they are involved in pathway regulation, as one miRNA can target many genes, and a single gene can be regulated by many different miRNAs; (6) miRNAs show tissue and cell specific expression profiles; and (7) there is a large body of literature supporting their role in the pathophysiology of disease .
Most ncRNA quantification studies performed to date rely on qRT-PCR, in situ hybridization, or microarray techniques. These methods have several strengths, but also contain some important limitations. These include: the number of miRNA molecules that can be analyzed simultaneously, the amount of RNA required for the analysis of multiple targets, the quality and source of the RNA, the sensitivity of detection, and the need for previous knowledge of targets . Next generation sequencing (NGS) provides researchers with a powerful tool for the detection of RNA molecules in biological samples. NGS offers methodological advantages such as increased throughput, decreased RNA input, consistency and quality of data, higher detection depth, analysis of all RNA populations, and discovery of novel molecules. Furthermore, length of protocols, sequencing time, and prices are continuously dropping, making NGS an ideal tool for biomarker research .
In terms of clinical utility, blood is a reliable and non-invasive source of biological tissue that reflects different stages of disease. Blood samples are relatively easy to collect and can be stored over long periods of time without having a significant effect on the levels of miRNAs and other ncRNAs in whole-blood, plasma or serum . As biomarker research using ncRNAs is still in its infancy, there is no consensus yet on the best source of blood cells for the study of disease. Some studies suggest that whole-blood, peripheral blood mononuclear cells (PBMCs), or white blood cells (WBCs) are good sources to explore ncRNAs which have been secreted into circulation. In addition, these cells can provide important information on inflammatory states . On the other hand, some argue that plasma or serum are optimal to investigate ncRNAs that are being actively secreted into circulation via exosomes, lipoproteins or protein complexes [50, 51]. There are several available methods for blood collection, storage, and RNA isolation, depending on the source of interest and the study design, for example: (1) PAXgene Blood RNA System, for collection of whole-blood (PreAnalytiX, Switzerland); (2) EDTA-Vacuette tubes, followed by centrifugation, to collect plasma or serum; (3) ExoQuick System for isolation of exosomes (System Biosciences, USA); or (4) LeukoLOCK Total RNA Isolation System, for isolation of RNA from WBCs (Life Technologies, USA). In this study, we used PAXgene tubes, which are intended for easy collection and transport, but more importantly, are optimized for the stabilization of RNA and long-term storage of blood samples. However, using PAXgene tubes makes it impossible to separate any of the blood fractions, thus allowing only the analysis of whole-blood. Although we did not test blood collection procedures or RNA extraction methods, the source of RNA and extraction method can have a significant impact on the measured levels of ncRNAs. Prichard et al. provides a comprehensive review on sample collection and processing for miRNA quantification .
The objective of this study is to provide researchers with general guidelines for quantification, data processing and analysis of miRNA, and other small non-coding RNAs (sncRNAs), from human clinical samples using NGS. Here, we test critical, alternative library preparation steps based on the ubiquitously used Illumina TruSeq small RNA sequencing methodology, as well as the effects of total RNA input and quality. Additionally, we describe methods for data processing, data analysis, and downstream validation techniques. Finally, we provide expression patterns and distribution of miRNAs and other sncRNAs from human whole-blood, brain, heart, and liver samples. This study provides tools and outcome metrics that will aid researchers and clinicians in choosing an appropriate quantification method, processing large amounts of data efficiently, and overall generating valuable information that can contribute to our understanding of small non-coding RNAs as potential biomarkers and mediators of biological functions and disease.
A total of 45 biological samples were included in the present study, and include 1) peripheral blood samples (N = 32) obtained at a community outpatient clinic at the Douglas Mental Health University Institute from healthy anonymous volunteers; 2) postmortem, prefrontal cortex brain tissue (N = 4), which was obtained in collaboration with the Quebec Coroner’s Office and the Douglas-Bell Canada Brain Bank (Douglas Mental Health University Institute, Montreal, Canada); 3) commercially available, human brain (N = 1), human heart (N = 4), and human liver (N = 4) (Ambion). Ethics approval for this study was obtained from the Institutional Review Board of the Douglas Mental Health University Institute, and written informed consent was obtained from volunteers or family members, as appropriate.
Sample processing and RNA extractions
Peripheral blood samples were collected in PAXgene blood RNA tubes (PreAnalytix, Switzerland). PAXgene tubes were frozen using a sequential freezing process. This involves storing tubes at room temperature for 3 h, transferring to 4 °C overnight, followed by 6–8 h at −20 °C and then final storage at −80 °C. Total RNA (including the miRNA fraction) was isolated from whole-blood using the PAXgene Blood miRNA Kit (Qiagen, Canada), according to manufacturer’s instructions. Furthermore, total RNA was isolated from frozen brain, heart and liver tissues using the miRNeasy Mini Kit protocol (Qiagen, Canada) with no modifications. RNA and miRNA yield and quality were determined using the Nanodrop 1000 (Thermo Scientific, USA) and Agilent 2100 Bioanalizer (Agilent Technologies, USA).
Small RNA library preparation
- 1)Comparison of small RNA library preparation methods (Fig. 1a)
Testing RNA input amounts for small RNA library preparation (Fig. 1b)
Exploring the effects of RNA quality on small RNA library preparation (Fig. 1c)
Testing sequencing coverage for small RNA sequencing (Fig. 1d)
Characterization of ncRNA expression patterns in four human tissues (Fig. 1e)
Library preparation methods
Purification by Novex TBE PAGE gel: 50 μl of amplified cDNA from samples A1-A3 and C1 were loaded into a 6 % Novex gel and run for 80 min at 130–135 V. After cleaning the gel with RNase free water, a band was manually cut to contain all fragments sized 145–160 nt, corresponding to mature miRNAs and other regulatory small RNA molecules (Additional file 1: Figure S1).
Purification by Pippin Prep automated gel system (Sage 3 %): The Pippin Prep system (PPS) allows automatic selection of specified cDNA products. 25 μl of amplified cDNA from samples A4-A7 were loaded into a Pippin Prep machine. Furthermore, in order to test variability between machines, samples A4 and A5 were loaded into PPS1, while samples A6 and A7 were loaded into PPS2. Size selection was automated for products between 125 and 180 nt (Additional file 1: Figure S2).
Purification by AMPure XP beads: Biotinylated magnetic AMPure beads allow for selection of specified cDNA products bound to streptavidin. 50 μl of amplified cDNA from samples A8-A10 were mixed and purified two times with AMPure XP beads at a 1.8:1 ratio (beads:sample). This ratio allows for optimal selection of all products higher than 100 nt.
Libraries were validated and quantified using an Agilent 2100 Bioanalyzer High Sensitivity DNA chip and qRT-PCR with the KAPA library quantification kit (Kapa Biosystems, USA). Sample C1 (control-human brain) was not sequenced. All additional samples (A1-A10), as well as sample AC (control-no purification method), were sequenced.
Total RNA input amounts
Next, we tested the optimal amount of total RNA input required to prepare small RNA libraries from peripheral blood samples. As previously done, we split total RNA from the same individual into 5 aliquots and each was used as a technical replicate. We prepared 5 additional libraries, starting with different amounts of RNA: A11 (1 μg), A12 (0.5 μg), A13 (0.25 μg), A14 (0.1 μg), and A15 (0.05 μg) (Fig. 1b). All 5 libraries were purified using PPS and validated using an Agilent 2100 Bioanalyzer High Sensitivity DNA chip and qRT-PCR with the KAPA library quantification kit.
Effects of RNA integrity
We also explored the effects of RNA integrity on library preparation for small RNA sequencing. To address this issue, we selected peripheral blood samples from 15 healthy volunteers. These samples were collected and processed following the same protocols as previously described, but were selected based on varying RNA integrity number (RIN) values. These values represent the level of RNA degradation in the sample, where 10 and 0 are the highest and lowest quality scores, respectively. The 15 samples were split into 5 groups with average RIN values of 9, 7, 5.4, 2.2 and 0 (Fig. 1c). Small RNA libraries were prepared as previously described, validated and quantified using an Agilent 2100 Bioanalyzer High Sensitivity DNA chip and qRT-PCR with the KAPA library quantification kit.
Small RNA sequencing coverage
Next we tested how sequencing depth affects the amount of information obtained from whole-blood samples. We prepared small RNA libraries using total RNA from an additional 12 healthy volunteers, as previously described. All 12 libraries were pooled and sequenced on both a HiSeq2500 and MiSeq Illumina sequencers (Fig. 1d).
Small ncRNA expression in human whole-blood and brain
To characterize the expression and explore tissue specificity of small ncRNAs in human biological samples, we prepared 16 additional libraries from human whole-blood, brain, heart and liver tissues (Fig. 1e). Brain, heart and liver libraries were prepared with 1 μg of total RNA, purified using AMPure beads, validated and quantified using an Agilent 2100 Bioanalyzer High Sensitivity DNA chip and qRT-PCR with the KAPA library quantification kit.
Sequencing data processing and analysis-Small RNA-Seq Pipeline
Samples were sequenced at the McGill University and Genome Quebec Innovation Centre (Montreal, Canada) and the European Molecular Biology Laboratory (EMBL), Genomics Core Facility (Heidelberg, Germany), using the HiSeq2500 or MiSeq Illumina sequencers with 50 nt single-end reads. All sequencing data were processed using CASAVA 1.8+  and extracted from FASTQ files. Fastx_toolkit  was used to trim the Illumina adapter sequences. Additional filtering based on defined cutoffs was applied in order to obtain high quality data. These filters included: 1) Phred quality (Q) mean scores higher than 30, 2) reads between 15–40 nt in length, 3) adapter detection based on perfect-10 nt match, and 4) removal of reads without detected adapter. Any specific cutoffs used in our small RNA sequencing pipeline can be adjusted according to any experimental design. For instance, one can choose to lower the Q score filtering criteria, loosen the adapter detection perfect-match, or decrease the size selection range. Nevertheless, there is a risk of introducing sequencing error probabilities or background noise to the data. Additionally, we used Bowtie  to align reads to the human genome (GRCh37)  and ncPRO-seq  in combination with miRBase (V20) to match them to known miRNA sequences [57, 58]. We used the Rfam  and NCBI’s piRNA  databases to map other small RNA sequences. Furthermore, all sequencing data was normalized with the Bioconductor–DESeq2 package , using a detection threshold of 1 count per miRNA (present at least once in each of the libraries tested). All RNA sequencing data used in this study is available on the NCBI-Gene Expression Omnibus database with accession code GSE69825.
Quantitative Real-Time Polymerase Chain Reaction (qRT-PCR)
Small RNA sequencing data was validated using qRT-PCR. Total RNA samples were reverse transcribed using TaqMan RT-PCR microRNA assays (Applied Biosystems) according to the manufacturer’s instructions. Real-time PCR reactions were run in quadruplicate using the ABI 7900HT Fast Real-Time PCR System and data was collected using the Sequence Detection System 2.4 (SDS) software (Applied Biosystems). Expression of miRNAs was quantified using miRNA TaqMan probes (Applied Biosystems) and calculated using the Absolute Quantitation (AQ) standard curve method. RNU6B was used as an endogenous control as it showed expression levels that remained relatively constant with low variance and high abundance across the samples tested.
All numerical data are expressed as the mean ± s.e.m. Statistical differences among groups were analyzed by Student’s t–test, One–Way ANOVA with post–hoc correction, and Pearson’s correlation coefficients. Statistical significance was calculated using GraphPad Prism5 and SPSS 20. P <0.05 was considered statistically significant.
Results and Discussion
This study assessed several steps involved in the development of an ncRNA biomarker discovery pipeline, ranging from sample preparation to bioinformatic processing. At each stage, we evaluated the pros and cons of different techniques that may be suitable in some circumstances but not others, depending on experimental design. Evaluation methods included quality of data output in relation to hands-on laboratory time, cost, and efficiency of processing.
Bioinformatic output measures for small RNA sequencing quality control
Bioinformatic output measures for small RNA sequencing quality control
According to Illumina guidelines for small RNA sequencing, 1–2 M reads is an accepted range for expression profiling experiments, while 2–5 M reads is the accepted range for discovery applications.
To avoid background noise due to small fragments of degraded RNA, we removed all reads <15 nt. Size filtering can be easily modified to target a specific small RNA species. For example, 15–28 nt (miRNAs), 24–31 nt (piRNAs), or 15–40 nt if interested in all small ncRNAs.
Quality (Q) is based on a Phred score, which estimates sequencing error probabilities per base. A Q = 10 means a 1/10 probability of incorrect base calling or 90 % accuracy; Q = 20 (1/100; 99 %); Q = 30 (1/1000; 99.9 %); and Q = 40 (1/10000; 99.99 %). We removed reads with a quality score <30.
Adapter detection can be adjusted to allow for one or more mismatches in the first 10 nt to identify and trim the adapters. In order to enhance high quality reads, we set our adapter detection threshold to a perfect-10 nt match. Ligation of the 3′ and 5′ adapters to each other happens by chance at a very low rate. However, this can become an important issue for libraries prepared from very small amounts of RNA. We removed all adapter-adapter reads.
RNAs > 40 nt
This feature refers to RNA reads larger than 40 nt in length. In most cases these reads map to midsize and larger non-coding RNA populations. The percentage of reads >40 nt can vary (1 %–50 %) depending on library preparation method used.
This metric shows the number of reads that pass all the quality and trimming filters previously described. A good quality library should have surviving rates between 50 % and 100 %, depending on method used.
Due to sequencing errors, stringent QC filters, or RNA from other species (usually added as control, i.e. PhiX), a very small percentage of reads do not map to any human genomic location.
Unique & Multi-Mapped
In contrast to other types of sequencing (DNA and larger RNA), the percentage of reads that map to multiple genomic locations in small RNA sequencing is expected to be high (>50 %). Several small RNAs are encoded at more than one genomic location. This is thought to be a compensatory mechanism or response to ncRNA knockouts by random mutations.
We used miRBase to align our reads to known miRNA species. A high percentage of reads aligned to miRNAs is expected. However, this percentage can vary depending on the source and quality of RNA.
Rfam and NCBI’s piRNA databases were used to map our reads to other small RNA species. The number of these reads is very small compared to miRNAs. However, just like with miRNAs, the number of reads mapping back to other sncRNAs is associated with the source and quality of RNA.
(Repeat, Coding gene, Unknown)
This refers to an additional portion of reads that map to repetitive sequences, coding genes, and unknown sequences in the human genome. The number of these reads is expected to be low.
We set a detection threshold at one count per miRNA (present at least once in each of the libraries tested) in order to get a better picture of lowly expressed miRNAs. However, for quantification and discovery studies, we recommend higher detection thresholds, usually >10 or >20 counts per miRNA, to avoid background noise and false positives.
Library purification methods of small RNA sequencing
First, we tested three commercially available library preparation methods for small RNA sequencing: (1) Novex TBE PAGE gel; (2) Pippin Prep automated gel system; and (3) AMPure XP beads (Fig. 1a). It is important to point out that the main goal of this experiment was not to single out the “best” purification method, but rather to test and highlight the strengths and limitations of the top available options and provide guidelines as to what would best fit a particular study design. We were able to obtain good quality sequencing libraries for all samples, but nonetheless, we found significant differences across purification methods.
Before purification, adapter-ligated libraries for all samples showed a peak corresponding to miRNAs around 147 nt in length (Additional file 1: Figure S3). After purification, all libraries showed a sharp, single peak, corresponding to miRNAs and other small non-coding RNA molecules (Additional file 1: Figure S4). Samples purified using a Novex TBE PAGE gel showed a sharp, single peak at 147 nt, corresponding to miRNAs and other small non-coding RNA molecules (Additional file 1: Figure S4a-c). The four libraries purified using PPS also showed single peaks corresponding to miRNAs, but these libraries contained more than 50 times more product after purification, as compared to the Novex gel method (Additional file 1: Figure S4E-H). Finally, samples purified with AMPure XP beads, showed similar results as PPS, but these libraries showed the additional presence of other small RNA molecules ranging from 160–225 nt in length (Additional file 1: Figure S4i-k). All libraries, plus a control sample (no purification), were pooled and sequenced in a single lane of the HiSeq2500.
Size (<15 nt)
Low Quality (Q <30)
RNAs >40 nt
miRNA Count (≥1)
The Novex TBE PAGE gel proved to be the most specific for isolating the miRNA population in the samples. This is because we were able to manually and carefully cut the band between 145–160 nt corresponding to miRNAs from the gel and avoided any other smaller or larger RNA populations in the samples. However, we lost a significant amount of library product after purification from the gel, and in the end generated less reads after sequencing. In addition, this method requires a significant amount of hands-on time in the lab, which ultimately translates to very low throughput and significantly higher cost. We found purification by Novex gel to be a very good and specific method, particularly fit for small sample size projects where miRNAs are the main focus.
PPS generated the highest number both of total reads and distinct miRNAs identified, as well as very high specificity to miRNAs. This can be attributed to several factors, for example: (1) the libraries purified with PPS contained more than 50 times more product after purification, as compared to the Novex gel methods. This is due to the fact that the PPS is an automated system that does not require extraction of the library products directly from the gel, which can lead to less library product; (2) the range of the automatically isolated bands can be optimized to a desired product size (we used 125–180 nt), due to size selection and specificity, PPS contained the least number of reads removed due to a size either smaller than 15 nt or larger than 40 nt; (3) PPS showed the lowest number of adapter-adapter ligated reads. However, because each PPS instrument limits a run to only 4 samples, we tested variations across instruments. We found a significant difference in the final number of miRNAs identified per machine with 50 more miRNAs identified with PPS2. The PPS showed limitations in terms of consistency, and while the protocol requires less hands-on time in the laboratory, it does not increase throughput (only 4 samples per run) or cost significantly. We believe this is a very good method for medium size projects.
Library preparation: purification methods
Novex TBE PAGE gel
(manually cutting band; very specific)
Pippin Prep Automated gel system
(automated band; less specific)
(4 libraries/run [2 hrs])
AMPure XP beads
(all products >100 nt)
(24 libraries/2 hrs)
(50 and up)
Finally, control sample AC was not purified or size selected before sequencing in order to compare the results to the three methods tested. However, all libraries in the study (including control sample AC) were prepared using the Illumina TruSeq Small RNA protocol. This protocol is ideal for the investigation of small RNA species, as it takes advantage of the structure of most small RNA molecules by ligating specific adapters to the 5′-phosphate and 3′-hydroxyl group, which are molecular signatures of their biogenesis pathway. This means that if the adapter ligation works well, in theory, the libraries don’t require any further purification. However, the success of purification methods also depends on suppression of adaptor dimer products in order to keep their representation at acceptable levels, ideally <2.5 %. The AC control results were similar to AMPure XP beads because, as previously explained, AMPure XP beads do not contain a very specific size selection (all products >100 nt) as opposed to Novex (145–160 nt) or PPS (125 and 180 nt).
Total RNA input amounts for small RNA sequencing from whole-blood samples
Total RNA input
Size (<15 nt)
Low Quality (Q <30)
RNAs >40 nt
miRNA Count (≥1)
Effects of RNA quality on small RNA sequencing
RNA degradation: whole-blood
Size (<15 nt)
Low Quality (Q <30)
RNAs >40 nt
miRNA Count (≥1)
Sequencing coverage for small RNA sequencing
Size (<15 nt)
Low Quality (Q <30)
RNAs >40 nt
miRNA Count (≥1)
miRNA Count (≥10)
miRNA Count (≥20)
Number of total miRNAs expected per million reads in whole-blood
# of Reads (million)
miRNA Count (>1)
miRNA Count (>10)
miRNA Count (>20)
Expression of miRNAs and other small ncRNAs in human biological samples
MicroRNA expression patterns can be tissue and cell specific. For example, miR-1 has been shown to be enriched in cardiomyocytes  while miR-122 is the highest expressed miRNA in the liver . Others have shown that some miRNAs are uniquely present in specific body fluids, such plasma, tears, breast milk, and seminal fluid . To explore this, here we sequenced 16 samples (E1-E16) using a MiSeq sequencer to compare the expression of small ncRNAs in four human tissues: whole-blood, brain, heart, and liver (Fig. 1e). We used miRBase, Rfam and NCBI’s piRNA databases to map miRNAs and other small RNAs.
The goal of this study was to highlight some fundamental details of small ncRNA profiling, and provide the reader with general guidelines for quantification, data processing and analysis of sncRNAs from clinical samples using NGS. Our results show that good quality sequencing libraries can be prepared from small amounts of total RNA and that varying degradation levels in the samples do not have a significant effect on the overall quantification of sncRNAs via NGS. In addition, we discuss the strengths and limitations of three commercially available library preparation methods, describe our bioinformatics pipeline, provide recommendations for sequencing depth and coverage, and describe in detail the expression and distribution of all sncRNAs in four human tissues: whole-blood, brain, heart and liver. Ultimately, this study provides valuable information that will help researchers plan and execute future small RNA profiling studies that will contribute to the understanding of sncRNAs as potential biomarkers and mediators of biological functions and disease.
We are grateful for the invaluable contributions made by volunteers consenting to donate blood samples to the McGill Group for Suicide Studies. We thank all participants from the 2013 EMBO Practical Course-Analysis of small non-coding RNAs: From discovery to function. http://www.embl.de/training/events/2013/RNA13-01/programme/index.html.We thank Raphael Pujol who helped establish the computational pipeline and performed information analyses for this manuscript. In addition, we would like to thank Tao Ye, Dr. Alfredo Staffa, Dr. Jonathon Blake, and Dr. Mark McCarthy for their kind advice on bioinformatic strategies and methods. This work was supported by operating grants from the Canadian Institutes of Health Research (CIHR) (2013#311113), as well as support from the Fonds de recherche du Québec–Santé (FRQS) through its network program (RQSHA). J.P.L received a doctoral funding award from CIHR. G.T. is an FRQS chercheur national. C.E is supported by the Canada Research Chairs program.
- Hampel H, Frank R, Broich K, Teipel SJ, Katz RG, Hardy J, et al. Biomarkers for Alzheimer’s disease: academic, industry and regulatory perspectives. Nat Rev Drug Discov. 2010;9(7):560–74. doi:10.1038/nrd3115.PubMedGoogle Scholar
- Shaw LM, Korecka M, Clark CM, Lee VM, Trojanowski JQ. Biomarkers of neurodegeneration for diagnosis and monitoring therapeutics. Nat Rev Drug Discov. 2007;6(4):295–303. doi:10.1038/nrd2176.PubMedGoogle Scholar
- Davis J, Maes M, Andreazza A, McGrath JJ, Tye SJ, Berk M. Towards a classification of biomarkers of neuropsychiatric disease: from encompass to compass. Mol Psychiatry. 2014. doi:10.1038/mp.2014.139.PubMed CentralGoogle Scholar
- Strimbu K, Tavel JA. What are biomarkers? Curr Opin HIV AIDS. 2010;5(6):463–6. doi:10.1097/COH.0b013e32833ed177.PubMedPubMed CentralGoogle Scholar
- Liu Z, Que S, Xu J, Peng T. Alanine aminotransferase-old biomarker and new concept: a review. Int J Med Sci. 2014;11(9):925–35. doi:10.7150/ijms.8951.PubMedPubMed CentralGoogle Scholar
- de Blacam C, Byrne C, Hughes E, McIlroy M, Bane F, Hill AD, et al. HOXC11-SRC-1 regulation of S100beta in cutaneous melanoma: new targets for the kinase inhibitor dasatinib. Br J Cancer. 2011;105(1):118–23. doi:10.1038/bjc.2011.193.Google Scholar
- Helleday T, Eshtad S, Nik-Zainal S. Mechanisms underlying mutational signatures in human cancers. Nat Rev Genet. 2014;15(9):585–98. doi:10.1038/nrg3729.PubMedGoogle Scholar
- Kandimalla R, van Tilborg AA, Zwarthoff EC. DNA methylation-based biomarkers in bladder cancer. Nat Rev Urol. 2013;10(6):327–35. doi:10.1038/nrurol.2013.89.PubMedGoogle Scholar
- Ordovas JM, Smith CE. Epigenetics and cardiovascular disease. Nat Rev Cardiol. 2010;7(9):510–9. doi:10.1038/nrcardio.2010.104.PubMedPubMed CentralGoogle Scholar
- Warton K, Samimi G. Methylation of cell-free circulating DNA in the diagnosis of cancer. Front Mol Biosci. 2015;2:13. doi:10.3389/fmolb.2015.00013.PubMedPubMed CentralGoogle Scholar
- Schubeler D. Function and information content of DNA methylation. Nature. 2015;517(7534):321–6. doi:10.1038/nature14192.PubMedGoogle Scholar
- Calin GA, Croce CM. MicroRNA signatures in human cancers. Nat Rev Cancer. 2006;6(11):857–66. doi:10.1038/nrc1997.PubMedGoogle Scholar
- Szabo G, Bala S. MicroRNAs in liver disease. Nat Rev Gastroenterol Hepatol. 2013;10(9):542–52. doi:10.1038/nrgastro.2013.87.PubMedPubMed CentralGoogle Scholar
- Flemming A. Heart failure: targeting miRNA pathology in heart disease. Nat Rev Drug Discov. 2014;13(5):336. doi:10.1038/nrd4311.PubMedGoogle Scholar
- Lopez JP, Fiori LM, Gross JA, Labonte B, Yerko V, Mechawar N, et al. Regulatory role of miRNAs in polyamine gene expression in the prefrontal cortex of depressed suicide completers. Int J Neuropsychopharmacol. 2014;17(1):23–32. doi:10.1017/S1461145713000941.PubMedGoogle Scholar
- Lopez JP, Lim R, Cruceanu C, Crapper L, Fasano C, Labonte B, et al. miR-1202 is a primate-specific and brain-enriched microRNA involved in major depression and antidepressant treatment. Nat Med. 2014;20(7):764–8. doi:10.1038/nm.3582.PubMedPubMed CentralGoogle Scholar
- Maffioletti E, Tardito D, Gennarelli M, Bocchio-Chiavetto L. Micro spies from the brain to the periphery: new clues from studies on microRNAs in neuropsychiatric disorders. Front Cell Neurosci. 2014;8:75. doi:10.3389/fncel.2014.00075.PubMedPubMed CentralGoogle Scholar
- O’Connor RM, Dinan TG, Cryan JF. Little things on which happiness depends: microRNAs as novel therapeutic targets for the treatment of anxiety and depression. Mol Psychiatry. 2012;17(4):359–76. doi:10.1038/mp.2011.162.PubMedGoogle Scholar
- Esteller M. Non-coding RNAs in human disease. Nat Rev Genet. 2011;12(12):861–74. doi:10.1038/nrg3074.PubMedGoogle Scholar
- Li Z, Rana TM. Therapeutic targeting of microRNAs: current status and future challenges. Nat Rev Drug Discov. 2014;13(8):622–38. doi:10.1038/nrd4359.PubMedGoogle Scholar
- Qureshi IA, Mehler MF. Emerging roles of non-coding RNAs in brain evolution, development, plasticity and disease. Nat Rev Neurosci. 2012;13(8):528–41. doi:10.1038/nrn3234.PubMedPubMed CentralGoogle Scholar
- Rukov JL, Vinther J, Shomron N. Pharmacogenomics genes show varying perceptibility to microRNA regulation. Pharmacogenet Genomics. 2011;21(5):251–62. doi:10.1097/FPC.0b013e3283438865.PubMedGoogle Scholar
- Ha M, Kim VN. Regulation of microRNA biogenesis. Nat Rev Mol Cell Biol. 2014;15(8):509–24. doi:10.1038/nrm3838.PubMedGoogle Scholar
- Hu W, Coller J. What comes first: translational repression or mRNA degradation? The deepening mystery of microRNA function. Cell Res. 2012;22(9):1322–4. doi:10.1038/cr.2012.80.PubMedPubMed CentralGoogle Scholar
- Taft RJ, Pang KC, Mercer TR, Dinger M, Mattick JS. Non-coding RNAs: regulators of disease. J Pathol. 2010;220(2):126–39. doi:10.1002/path.2638.PubMedGoogle Scholar
- Aravin AA, Sachidanandam R, Girard A, Fejes-Toth K, Hannon GJ. Developmentally regulated piRNA clusters implicate MILI in transposon control. Science. 2007;316(5825):744–7. doi:10.1126/science.1142612.PubMedGoogle Scholar
- Brennecke J, Aravin AA, Stark A, Dus M, Kellis M, Sachidanandam R, et al. Discrete small RNA-generating loci as master regulators of transposon activity in Drosophila. Cell. 2007;128(6):1089–103. doi:10.1016/j.cell.2007.01.043.PubMedGoogle Scholar
- Kuramochi-Miyagawa S, Watanabe T, Gotoh K, Takamatsu K, Chuma S, Kojima-Kita K, et al. MVH in piRNA processing and gene silencing of retrotransposons. Genes Dev. 2010;24(9):887–92. doi:10.1101/gad.1902110.PubMedPubMed CentralGoogle Scholar
- Chu H, Hui G, Yuan L, Shi D, Wang Y, Du M, et al. Identification of novel piRNAs in bladder cancer. Cancer Lett. 2015;356(2 Pt B):561–7. doi:10.1016/j.canlet.2014.10.004.PubMedGoogle Scholar
- Zhang H, Ren Y, Xu H, Pang D, Duan C, Liu C. The expression of stem cell protein Piwil2 and piR-932 in breast cancer. Surg Oncol. 2013;22(4):217–23. doi:10.1016/j.suronc.2013.07.001.PubMedGoogle Scholar
- Cui L, Lou Y, Zhang X, Zhou H, Deng H, Song H, et al. Detection of circulating tumor cells in peripheral blood from patients with gastric cancer using piRNAs as markers. Clin Biochem. 2011;44(13):1050–7. doi:10.1016/j.clinbiochem.2011.06.004.PubMedGoogle Scholar
- Kiss-Laszlo Z, Henry Y, Bachellerie JP, Caizergues-Ferrer M, Kiss T. Site-specific ribose methylation of preribosomal RNA: a novel function for small nucleolar RNAs. Cell. 1996;85(7):1077–88.PubMedGoogle Scholar
- King TH, Liu B, McCully RR, Fournier MJ. Ribosome structure and activity are altered in cells lacking snoRNPs that form pseudouridines in the peptidyl transferase center. Mol Cell. 2003;11(2):425–35.PubMedGoogle Scholar
- Thorenoor N, Slaby O. Small nucleolar RNAs functioning and potential roles in cancer. Tumour Biol. 2015;36(1):41–53. doi:10.1007/s13277-014-2818-8.PubMedGoogle Scholar
- Martens-Uzunova ES, Olvedy M, Jenster G. Beyond microRNA--novel RNAs derived from small non-coding RNA and their implication in cancer. Cancer Lett. 2013;340(2):201–11. doi:10.1016/j.canlet.2012.11.058.PubMedGoogle Scholar
- Mannoor K, Liao J, Jiang F. Small nucleolar RNAs in cancer. Biochim Biophys Acta. 2012;1826(1):121–8. doi:10.1016/j.bbcan.2012.03.005.PubMedGoogle Scholar
- Yarmishyn AA, Kurochkin IV. Long noncoding RNAs: a potential novel class of cancer biomarkers. Front Genet. 2015;6:145. doi:10.3389/fgene.2015.00145.PubMedPubMed CentralGoogle Scholar
- Duggirala A, Delogu F, Angelini TG, Smith T, Caputo M, Rajakaruna C, et al. Non coding RNAs in aortic aneurysmal disease. Front Genet. 2015;6:125. doi:10.3389/fgene.2015.00125.PubMedPubMed CentralGoogle Scholar
- Jin K, Luo G, Xiao Z, Liu Z, Liu C, Ji S, et al. Noncoding RNAs as potential biomarkers to predict the outcome in pancreatic cancer. Drug Des Dev Ther. 2015;9:1247–55. doi:10.2147/DDDT.S77597.Google Scholar
- Zhang W, Ren SC, Shi XL, Liu YW, Zhu YS, Jing TL, et al. A novel urinary long non-coding RNA transcript improves diagnostic accuracy in patients undergoing prostate biopsy. Prostate. 2015;75(6):653–61. doi:10.1002/pros.22949.PubMedGoogle Scholar
- Gupta RA, Shah N, Wang KC, Kim J, Horlings HM, Wong DJ, et al. Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature. 2010;464(7291):1071–6. doi:10.1038/nature08975.PubMedPubMed CentralGoogle Scholar
- Guttman M, Amit I, Garber M, French C, Lin MF, Feldser D, et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature. 2009;458(7235):223–7. doi:10.1038/nature07672.PubMedPubMed CentralGoogle Scholar
- Calin GA, Liu CG, Ferracin M, Hyslop T, Spizzo R, Sevignani C, et al. Ultraconserved regions encoding ncRNAs are altered in human leukemias and carcinomas. Cancer Cell. 2007;12(3):215–29. doi:10.1016/j.ccr.2007.07.027.PubMedGoogle Scholar
- Sana J, Hankeova S, Svoboda M, Kiss I, Vyzula R, Slaby O. Expression levels of transcribed ultraconserved regions uc.73 and uc.388 are altered in colorectal cancer. Oncology. 2012;82(2):114–8. doi:10.1159/000336479.PubMedGoogle Scholar
- Berezikov E. Evolution of microRNA diversity and regulation in animals. Nat Rev Genet. 2011;12(12):846–60. doi:10.1038/nrg3079.PubMedGoogle Scholar
- van Rooij E. The art of microRNA research. Circ Res. 2011;108(2):219–34. doi:10.1161/CIRCRESAHA.110.227496.PubMedGoogle Scholar
- Pritchard CC, Cheng HH, Tewari M. MicroRNA profiling: approaches and considerations. Nat Rev Genet. 2012;13(5):358–69. doi:10.1038/nrg3198.PubMedPubMed CentralGoogle Scholar
- Weiland M, Gao XH, Zhou L, Mi QS. Small RNAs have a large impact: circulating microRNAs as biomarkers for human diseases. RNA Biol. 2012;9(6):850–9. doi:10.4161/rna.20378.PubMedGoogle Scholar
- De Guire V, Robitaille R, Tetreault N, Guerin R, Menard C, Bambace N, et al. Circulating miRNAs as sensitive and specific biomarkers for the diagnosis and monitoring of human diseases: promises and challenges. Clin Biochem. 2013;46(10–11):846–60. doi:10.1016/j.clinbiochem.2013.03.015.PubMedGoogle Scholar
- Huang X, Yuan T, Tschannen M, Sun Z, Jacob H, Du M, et al. Characterization of human plasma-derived exosomal RNAs by deep sequencing. BMC Genomics. 2013;14:319. doi:10.1186/1471-2164-14-319.PubMedPubMed CentralGoogle Scholar
- Spornraft M, Kirchner B, Haase B, Benes V, Pfaffl MW, Riedmaier I. Optimization of extraction of circulating RNAs from plasma--enabling small RNA sequencing. PLoS One. 2014;9(9), e107259. doi:10.1371/journal.pone.0107259.PubMedPubMed CentralGoogle Scholar
- Illumina. Illumina CASAVA 1.8 http://support.illumina.com/content/dam/illumina-support/documents/myillumina/33d66b02-53b5-4f4d-9d8b-f94237c7e44d/casava_qrg_15011197b.pdf. 2011.
- Gordon A. FASTX-toolkit. Computer program distributed by the author, website http://hannonlab.cshl.edu/fastx_toolkit/index.html [accessed 2014–2015]
- Song L, Florea L, Langmead B. Lighter: fast and memory-efficient sequencing error correction without counting. Genome Biol. 2014;15(11):509. doi:10.1186/PREACCEPT-9663167051308943.PubMedPubMed CentralGoogle Scholar
- Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, et al. The human genome browser at UCSC. Genome Res. 2002;12(6):996–1006. doi:10.1101/gr.229102. Article published online before print in May 2002.PubMedPubMed CentralGoogle Scholar
- Chen CJ, Servant N, Toedling J, Sarazin A, Marchais A, Duvernois-Berthet E, et al. ncPRO-seq: a tool for annotation and profiling of ncRNAs in sRNA-seq data. Bioinformatics. 2012;28(23):3147–9. doi:10.1093/bioinformatics/bts587.PubMedGoogle Scholar
- Kozomara A, Griffiths-Jones S. miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res. 2011;39(Database issue):D152–7. doi:10.1093/nar/gkq1027.PubMedGoogle Scholar
- Kozomara A, Griffiths-Jones S. miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res. 2014;42(Database issue):D68–73. doi:10.1093/nar/gkt1181.PubMedGoogle Scholar
- Gardner PP, Daub J, Tate JG, Nawrocki EP, Kolbe DL, Lindgreen S, et al. Rfam: updates to the RNA families database. Nucleic Acids Res. 2009;37(Database issue):D136–40. doi:10.1093/nar/gkn766.PubMedGoogle Scholar
- Sai Lakshmi S, Agrawal S. piRNABank: a web resource on classified and clustered Piwi-interacting RNAs. Nucleic Acids Res. 2008;36(Database issue):D173–7. doi:10.1093/nar/gkm696.PubMedGoogle Scholar
- Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. doi:10.1186/s13059-014-0550-8.PubMedPubMed CentralGoogle Scholar
- Camps C, Saini HK, Mole DR, Choudhry H, Reczko M, Guerra-Assuncao JA, et al. Integrated analysis of microRNA and mRNA expression and association with HIF binding reveals the complexity of microRNA expression regulation under hypoxia. Mol Cancer. 2014;13:28. doi:10.1186/1476-4598-13-28.PubMedPubMed CentralGoogle Scholar
- van de Bunt M, Gaulton KJ, Parts L, Moran I, Johnson PR, Lindgren CM, et al. The miRNA profile of human pancreatic islets and beta-cells and relationship to type 2 diabetes pathogenesis. PLoS One. 2013;8(1), e55272. doi:10.1371/journal.pone.0055272.PubMedPubMed CentralGoogle Scholar
- Jung M, Schaefer A, Steiner I, Kempkensteffen C, Stephan C, Erbersdobler A, et al. Robust microRNA stability in degraded RNA preparations from human tissue and cell samples. Clin Chem. 2010;56(6):998–1006. doi:10.1373/clinchem.2009.141580.PubMedGoogle Scholar
- Gantier MP, McCoy CE, Rusinova I, Saulep D, Wang D, Xu D, et al. Analysis of microRNA turnover in mammalian cells following Dicer1 ablation. Nucleic Acids Res. 2011;39(13):5692–703. doi:10.1093/nar/gkr148.PubMedPubMed CentralGoogle Scholar
- Zhang Z, Qin YW, Brewer G, Jing Q. MicroRNA degradation and turnover: regulating the regulators. Wiley Interdiscip Rev RNA. 2012;3(4):593–600. doi:10.1002/wrna.1114.PubMedPubMed CentralGoogle Scholar
- Bail S, Swerdel M, Liu H, Jiao X, Goff LA, Hart RP, et al. Differential regulation of microRNA stability. RNA. 2010;16(5):1032–9. doi:10.1261/rna.1851510.PubMedPubMed CentralGoogle Scholar
- Wang Y, Sheng G, Juranek S, Tuschl T, Patel DJ. Structure of the guide-strand-containing argonaute silencing complex. Nature. 2008;456(7219):209–13. doi:10.1038/nature07315.PubMedPubMed CentralGoogle Scholar
- Nagy C MM, Lopez JP, Vaillancourt K, Cruceanu C, Gross J, Arnovitz M, Mechawar N, Turecki G. The effects of post-mortem interval on biomolecule integrity in the brain. J Neuropath Exp Neur. 2015;In Press.Google Scholar
- Fiedler J, Thum T. MicroRNAs in myocardial infarction. Arterioscler Thromb Vasc Biol. 2013;33(2):201–5. doi:10.1161/ATVBAHA.112.300137.PubMedGoogle Scholar
- Zhang Y, Jia Y, Zheng R, Guo Y, Wang Y, Guo H, et al. Plasma microRNA-122 as a biomarker for viral-, alcohol-, and chemical-related hepatic diseases. Clin Chem. 2010;56(12):1830–8. doi:10.1373/clinchem.2010.147850.PubMedGoogle Scholar
- Weber JA, Baxter DH, Zhang S, Huang DY, Huang KH, Lee MJ, et al. The microRNA spectrum in 12 body fluids. Clin Chem. 2010;56(11):1733–41. doi:10.1373/clinchem.2010.147405.PubMedPubMed CentralGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.