A novel method for the normalization of microRNA RT-PCR data
© Qureshi and Sacan; licensee BioMed Central Ltd. 2012
Published: 23 January 2013
Skip to main content
© Qureshi and Sacan; licensee BioMed Central Ltd. 2012
Published: 23 January 2013
MicroRNAs (miRNAs) are short non-coding RNA molecules that regulate mRNA transcript levels and translation. Deregulation of microRNAs is indicated in a number of diseases and microRNAs are seen as a promising target for biomarker identification and drug development. miRNA expression is commonly measured by microarray or real-time polymerase chain reaction (RT-PCR). The findings of RT-PCR data are highly dependent on the normalization techniques used during preprocessing of the Cycle Threshold readings from RT-PCR. Some of the commonly used endogenous controls themselves have been discovered to be differentially expressed in various conditions such as cancer, making them inappropriate internal controls.
We demonstrate that RT-PCR data contains a systematic bias resulting in large variations in the Cycle Threshold (CT) values of the low-abundant miRNA samples. We propose a new data normalization method that considers all available microRNAs as endogenous controls. A weighted normalization approach is utilized to allow contribution from all microRNAs, weighted by their empirical stability.
The systematic bias in RT-PCR data is illustrated on a microRNA dataset obtained from primary cutaneous melanocytic neoplasms. We show that through a single control parameter, this method is able to emulate other commonly used normalization methods and thus provides a more general approach. We explore the consistency of RT-PCR expression data with microarray expression by utilizing a dataset where both RT-PCR and microarray profiling data is available for the same miRNA samples.
A weighted normalization method allows the contribution of all of the miRNAs, whether they are highly abundant or have low expression levels. Our findings further suggest that the normalization of a particular miRNA should rely on only miRNAs that have comparable expression levels.
MicroRNAs (miRNAs) are short non-coding RNA sequences that average 22 nucleotides in length [1–3]. This class of RNAs is distinct from other short sequence RNA types such as siRNA and snRNA. The first RNA of this class was identified in C. Elegans in 1993 . However, miRNAs were not recognized as a special class of RNAs until a decade ago . To date, all animal and plant species have been found to express miRNAs . At this time approximately 1000 miRNA sequences have been identified in the human microribonucleome . miRNA sequences are highly evolutionarily conserved among mammals [4, 8–12]. Approximately 80% of known miRNA genes are found in intronic regions of the genome [13, 14]. miRNAs are involved in many biological processes by influencing the regulation of specific target genes, generally resulting in the down-regulation of those target genes. There are two postulated methods by which miRNAs act on their target genes. If the miRNA binds with an mRNA transcript and they exhibit high complementarity, it will cause the degradation of the mRNA. If the miRNA binds with incomplete complementarity then it causes translational repression of the mRNA. In plants the primary mechanism of action of miRNAs is mRNA transcript degradation, while in animals, translational repression is more common . An estimated 60% of mammalian mRNAs are targeted by one or more miRNAs [10, 12].
miRNAs have been discovered to play a role in many diseases and pathologies [2, 10, 13, 15, 16]. The role of miRNAs in cancer has been examined and several miRNAs have been found to regulate tumor-related genes [1–3, 10, 13, 17–19]. In fact, more than half of all miRNA genes are located in cancer-associated regions of the genome or in fragile sites [3, 13]. As a result, therapeutic applications of miRNAs are being investigated. Furthermore, due to the link between many miRNAs and cancer, these RNA molecules are being investigated as potential cancer biomarkers. The fact that some miRNAs can be found extracellularly and maintain their stability in the extracellular environment facilitates their usage as biomarkers .
Theoretically, endogenous controls are selected because they have low variance in their expression levels across samples. In the case of miRNAs, the endogenous controls are typically recommended by the manufacturer of the miRNA kit used in the PCR. Some of the most commonly used endogenous controls are RNU44, RNU48, and U6 . However, the usage of these endogenous controls is problematic, because even though these endogenous controls have stable expression levels in normal tissue samples, they have been found to be differentially expressed in cancerous tissue compared with normal tissue .
Directly applying this method can lead to misleading results if the CT values in the data are not normalized. There are several commonly used methods for miRNA normalization, including: quantile normalization, median normalization, and cyclic loess. Quantile normalization involves sorting the expression values of each gene in a given sample in order from least to greatest. This is done for each sample in the study. The vectors of the sorted CT values for each sample are combined into a matrix. The mean of each row of the matrix is calculated. The CT value in each element in each row is replaced with the mean of the entire row. In the case of median quantile normalization the median of the row is used instead of the mean. The CT values in each sample are then rearranged back into their original order. This causes the distribution of CT values across all samples to assume the same shape, which will minimize the variance except for that resulting from the experimental condition beings studied [21, 22].
Median normalization shifts the CT values in each sample such that the median CT value of each sample is the same. The median of each plate should be determined, and the medians of all plates should be arranged in a vector and sorted to determine the median of the medians. In each plate the difference between the median of the sample and the overall median should be subtracted from the CT value of each gene .
In cyclic loess normalization, pairs of plates are considered. For all pairs of plates the difference of the log of the CT for each gene is represented by M, and the average of each gene of the log of the expression values is represented by A. Then a loess curve is fit by regression of M on A which results in a fitting vector F. The genes in the first sample are adjusted by adding half the F value corresponding to the log of the CT for each gene. In the second sample half the F value is subtracted from the log CT of the gene [9, 21].
A number of normalization methods developed for microarrays have been applied to RT-PCR experiments. These methods assume that all miRNAs present in the organism are being profiled in the experiment. While microarrays can profile all miRNAs encoded in a genome, this assumption does not hold for RT-PCR experiments which typically only profile a few hundred miRNAs at a given time . Mar et al. investigated the use of quantile normalization as well as rank-invariant set normalization . In rank-invariant set normalization genes are ranked by their expression for each sample and the ranked list is compared to the ranked list of genes for a reference sample. Genes are considered to be rank-invariant if they have similar ranks in the reference sample and the experimental sample. All experimental samples are compared to the reference sample and an intersection of these lists is used to identify the rank-invariant genes which can then be used for normalization . Deo et al. compared several normalization methods and concluded that data-driven methods performed best. They compared normalization by endogenous controls, using the mean as a pseudo-control, and two different methods of quantile normalization. They concluded that quantile normalization performed the best despite the fact that using the mean as an endogenous control produced lower standard deviation .
One of the main problems with RT-PCR that remains as yet unaddressed by current normalization methods is the systematic bias present within the data. We observe that standard deviation increases as CT values increase. We believe that the most likely cause of this observation is the assumption that the PCR magnification at each cycle is an exact doubling of the expression levels is inaccurate. There seems to be an accumulation of an expression-level specific rate-limiting effect. As a result, a small difference in the size of the initial sample being amplified causes larger variations in the CT values of the less abundant microRNA molecules. Consequently, using endogenous controls, which are usually chosen from highly expressed microRNAs, for normalization becomes inappropriate for the less-abundant microRNAs. Even quantile normalization has been observed to produce more variance at high CT values than was present in the original raw data . One potential solution is to use the mean expression values of all genes in a sample as the endogenous control, as proposed by Mestdagh et al. , and calculate ΔCT by subtracting this mean CT value from the CT value of all genes in the sample. However, this approach is not ideal because the mean of the entire sample is sensitive to fluctuating genes as well as undetected genes which have high CT values. As a result, the mean-value normalization method is dominated by the large fluctuations of the less-abundant microRNAs and may cause spurious differential expression levels for otherwise stable microRNAs. In this study, we propose a method of using a weighted mean as an artificial endogenous control to calculate ΔCT values. The standard deviation of a microRNA across all samples is considered as a stability measure and each microRNA is weighted by its stability to generate the artificial endogenous control levels.
The primary dataset used in this study was obtained from a recently deposited microRNA RT-PCR dataset in the Gene Expression Omnibus (GEO) . The data was from a study by Jukic et al. that examined the difference in miRNA expression profiles in melanocytic neoplasms between young and older adults . Their study examined 10 young adults and 10 older adults and measured the expression of 666 microRNAs. We used the raw CT values measured in their data to compare different approaches to normalizing the data. This dataset has been previously used by Deo et al. to compare various normalization techniques; this dataset is highly suited to the comparison of normalization studies due to the large number of samples and the use of multiple cards .
We have investigated several normalization methods, including quantile, mean, and median normalization methods, and endogenous controls identified using various stability criteria. In mean and median normalization, the mean and median of all of the genes in a given sample are used as the value for CT 0 . For identification of endogenous controls, we calculate the standard deviation of each microRNA across all samples, and rank them in the order of increasing standard deviation. The CT values of the top-k microRNAs are averaged in each sample to provide the CT 0 values.
where wmp is the weighted mean power, which can be adjusted to shift the dominance between stable and unstable microRNAs, n is the number of genes or microRNAs, and STD is a function that returns the standard deviation. The weighted mean calculation involves raising the inverse of the standard deviation of a given gene across all samples to the weighted mean power, which is usually specified as 1, and dividing by the sum of the inverses of the standard deviations for all genes. CT 0 is calculated for each sample by taking the sum of the product of all the raw CT values in the sample and the previous number. When the ΔCT is calculated the CT of each gene is subtracted by the above value. This method gives a higher weight to genes with a lower standard deviation.
We also examined the reproducibility of miRNA expression experiments between RT-PCR and microarray. To explore this topic we utilized data from Chen et al. . They evaluated miRNA expression in murine myoblasts utilizing both RT-PCR and microarrays. They evaluated the consistency of different RNA preparation methods for RT-PCR. We harnessed their data to explore the correlation of RT-PCR with microarray. We further explored whether the expression level of a particular miRNA in RT-PCR would bias its correlation with its expression on a microarray.
In conclusion, the fluctuations of the low-abundant miRNAs are not random. The changes in their expression levels are correlated well with the overall changes in all miRNAs, which is assumed to be due to different starting sample sizes for the PCR reactions. We see that there is a systematic bias in the CT values that causes the expression levels of the low-abundant miRNAs to be more sensitive to the initial sample sizes.
mean CT 0
Weighted Mean Normalization
mean CT 0
Using the top 10 miRNAs as endogenous controls
mean CT 0
We explored the phenomenon whereby differences in the initial sample size of miRNA in an RT-PCR experiment were magnified with increasing CT levels. This was illustrated by the strong correlation of the CT values of the individual miRNAs with the average CT values of all miRNAs and by the increased sensitivity in the CT values of the low-abundant miRNAs to the average CT values. We conclude that a systematic bias in RT-PCR exists in which the fluctuations in the CT are dependent on the expression levels of the particular miRNAs. We further proposed a novel data-driven method of addressing this bias by using the weighted mean instead of an endogenous control in the calculation of ΔCT. We demonstrated that the new normalization method produces lower standard deviations and is more stable than other methods.
Note that, while the power parameter in the weighted mean normalization method provides a convenient way of adjusting how much one wishes to let the less stable microRNAs influence the normalization of other microRNAs, its optimization currently requires enumeration of different values and using the one with the best overall stability. Several CT0 values can be calculated for different values for the weighted mean power, subsequently the value of the power that produces the lowest standard deviation or is determined to be the most stable by geNorm can be used for normalization. The standard deviation or geNorm stability calculations are two methods to quantitatively determine the ideal weighted mean power. Other criteria, such as significance of the differentially expressed microRNAs can be utilized in this optimization. Furthermore, a different custom CT 0 value for each microRNA may be used, such that each microRNA is normalized differently, dependent on its average expression level.
We further examined the reproducibility of miRNA expression experiments across two different platforms by comparing RT-PCR and microarray results. We explored the relationship between the CT value and the consistency of the expression of a miRNA between RT-PCR and microarray. We leave as a future work the comparison of the ability of different normalization methods to detect differentially expressed genes.
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.