Gene expression techniques such as qRT-PCR, microarrays and Northern blotting require accurate normalization methods in order to obtain reliable, quantitative data. A common approach is to use an endogenous reference gene. The purpose of the reference gene(s) is to remove differences not attributable to real biological variation. However, numerous reports indicate that differences in the expression levels of commonly used endogenous reference genes vary considerably between different tissues and between different experimental conditions. Thus, there are no universal reference genes for all tissues or experimental conditions [2, 3, 29–35].
To address this problem, the best reference genes must be determined for each individual tissue or experimental setting. Moreover, a combination of several endogenous control genes produces more reliable normalization than any single control gene . We therefore examined the expression of over 39,000 genes in 526 samples of whole blood from men, women and children that included healthy controls and individuals with a variety of different diseases. The 100 most stably expressed probe-sets were identified based upon having the least variance across all of these samples over a broad range of expression values. These 100 genes could also be useful as endogenous internal controls for microarray studies – an approach not commonly used at present for microarray studies. Fifty percent of the annotated genes were involved in primary metabolic processes. Thus these genes appear to be good ''housekeeping genes'' for human blood because of their stable expression across various ages, genders, and diseases.
qRT-PCR validation was then performed on 10 candidate reference genes, four commonly used reference genes (ACTB, GAPDH, B2M, and HMBS) and PPIB, which is reported to be stably expressed in blood . The two frequently used control reference genes in nervous tissue, ACTB and GAPDH, were not stably expressed under our experimental conditions and are therefore not suitable reference genes for human whole blood. They have also been shown to vary in a variety of experimental settings and tissues [2, 33, 36–39], including in whole blood . B2M and PPIB showed stable expression in our study. They were not identified as stably expressed in our Affymetrix data because their expression was higher than our upper limit for selection. PPIB was also reported to be stably expressed in whole blood of patients with local and systemic inflammatory syndromes and healthy controls . HMBS had a high SD based on our Affymetrix arrays on the 526 samples (SD of log-2-transformed values = 0.72). For comparison, the most stably expressed genes in the same intensity interval had a standard deviation (SD) of the log-2-transformed values around 0.2. HMBS, however, showed stable expression when using qRT-PCR in the MS versus healthy controls experiment. Closer inspection of the expression stability on the 526 samples from the Affymetrix chips showed that the HMBS SD of the log-2-transformed values for most of the studies was around 0.5, but it was 0.94 for the stroke study. This result suggests that there were study-specific differences for stroke versus the other diseases, leading to its unstable expression. Our conclusion that HMBS was unstably expressed across our 526 samples is consistent with a previous study reporting that HMBS was unstably expressed in leukocytes . Among the 10 candidate reference genes, TRAP1, DECR1, FPGS, and FARP1 showed the most stable expression.
In most research studies, the target genes or genes regulated within the study are expressed at different levels. In such situations, it is preferable if the comparable endogenous or "housekeeping genes" are expressed at comparable levels as the target gene. This is essential if the microarray, RT-PCR or Northern blotting platform is non-linear at very low or at very high levels. Thus, having a set of endogenous reference genes that are stably expressed at different levels makes it possible to help select those whose expression is comparable to the target genes.
An optimal set of reference genes was identified based on the expression stability measure M as utilized in the GeNorm algorithm. It is based on the concept that the ratio between two ideal reference genes should always be constant regardless of the experimental conditions used. An important assumption here is that the genes in the analysis are not coordinately regulated and have independent functions. The six genes analyzed in GeNorm, seem to be involved in different biological processes, molecular functions and/or are located in different cellular compartments (Table 4). TRAP1 and PPIB are involved in the same biological process however their cellular compartmentalization is different. TRAP1 is a mitochondrial molecular chaperone also known as heat shock protein 75 (Hsp75). PPIB, encoding the cyclophillin B protein, is a cytoplasmic peptide isomerase. The broad spectrum of functions of the stably expressed genes makes it less likely that they are coordinately regulated.
An advantage of the approach taken in this study is that we performed a whole-genome survey of a large number of subjects of different ages, genders and diseases to identify the most stably expressed genes in whole blood. Drawbacks to this approach include the fact that it is an array-based discovery method that is less likely to reliably detect very low expressing genes. In addition, the subjects chosen do reflect some bias related to the diseases studied, and reflect other biases in the subject selection – such as predominantly hospitalied and male patients.
It should be emphasized that the ultimate usefulness of the proposed endogenous reference genes for whole blood has not been demonstrated in this study. To do this, future studies must show that after correcting technical and individual variability according to the control genes, the discriminatory power of the diagnostic genes should be substantially improved.
It is important to point out the utility of the reference genes identified here for use in clinical human testing. Should the reference genes identified in this study provide sufficiently accurate normalization, then genes of interest could be normalized to these reference genes in healthy individuals and in individuals with a given disease or physiological condition. Once reference expression levels and standard deviations are derived in a large group of healthy individuals, deviations could be identified in target individuals using these endogenous genes for normalization without the need for repeated samples from normal, healthy individuals. For non-clinical research studies, such normalization to the same endogenous reference genes would allow for comparison across studies.