Decoding the transcriptome is a major goal in the process to gain a better understanding of the underlying mechanisms of a disease and the potential for discovering therapeutic targets. Using microarray technology, high-throughput expression data for the whole genome can be generated within a short period of time. This gives us the unique possibility of screening a high number of patient samples for a massive number of features. Nevertheless, working with clinical samples is challenging. It is important in this kind of investigation to have a high degree of standardized procedures (e.g. storage of tissue or isolation of RNA), as a technical bias might otherwise arise that would override the biological differences of the samples. Furthermore, tissue quality is a serious concern as well as RNA quality. RNA degradation always takes place in the tissue that is removed and stored for later processing. However, this process is not only influenced by the time between removing the specimen and the removal of the tissue specimen, but rather by the time between removal of the specimen and the disruption of the blood supply that depends on the type of tissue that is removed. The level of degradation can vary tremendously, which leads to the question of how far degradation influences the results of gene expression screens.
To assess this influence, we simulated RNA degradation in vitro by using heat treatment of patient samples. We believe that this approach provides data that is more relevant to our approach of gene expression profiling. However, it remains unclear to which extent this resembles the in-vitro situation or changes during tissue preparation. Since in previous studies, tissue was treated to achieve degradation [4–6]. Those studies have to deal with the problem of differences due to functional changes in living cells such as apoptotic changes for example. We therefore used heat induced degradation that has previously been used in .
Moreover, degradation of mRNA itself instead of degradation of the tissue after removal is closer to the clinical routine. Firstly because in the clinical practice of taking rectal cancer biopsies, for example, the tissue is taken from the patients and transferred into RNA conserving fluid within less than 20 seconds. Secondly, when using the same mRNA for later analysis, the homogeneity of the investigated material is higher than when taking different tissue samples. For normal tissue, the problem of heterogeneity might be negligible, but in cancer, that is well known for its heterogeneous set-up, applying different tissue samples introduces the risk that the heterogeneity is behind the expression differences, and not the degradation.
To standardize the degradation state of the samples, we used the RNA integrity number (RIN). As expected, the RIN decreased as the time of heat treatment increased. The degradation showed a similar trend that was highly correlated for all patient samples.
The main purpose of our analysis was to investigate to which extent RNA degradation limits the comparability of gene expression data derived from rectal cancer biopsies from different patients. Although quality thresholds of 6 or 7 were set in the past [4, 5] these results have to be interpreted carefully since non-cancerous tissue (e.g. rat liver) or different conservation methods (fresh frozen) were applied. Furthermore, as already discussed, different biopsies from patients with rectal cancer previously degraded in a time-dependent manner were analyzed.
In a preliminary test, RNA was exposed to different temperature levels. No degradation occurred at room temperature and only very little at 45°C. Additionally, very different levels of degradation were retrieved compared to the setup at 60°C. To get comparable degradation results, we therefore choose four time points that showed RIN differences of up to 4 levels (P159; 0:00 h RIN 9 and 3:15 h RIN 4.7). Interestingly, PCA as well as clustering analysis revealed very high similarity of samples from the same patient and samples from different patients did not cluster together revealing their high biological divergence.
Although biology was the dominating effect of both PCA and cluster analysis in our case, certain gene expression differences were found and became more and more evident the more the RNA was degraded. Apart from an overlap of genes between the samples at comparable degradation time points, genes that were differentially expressed at an early degradation time point reappeared as differentially expressed at the later time points as well, indicating that changes observed on gene expression in all three comparisons originated exclusively from the degradation. Furthermore, the over- or under-expression of a given gene was constant throughout the entire experiment. Though these changes were not derived from specific regulatory processes we considered these changes not as an expression itself but rather as a representation of the genes.
Since we considered these changes in dysregulation as specific, we looked more closely at these genes trying to find functional similarities. Using Gene Ontology, a few processes were found but none of them could be put into the context of degradation.
These results suggest that degradation activated by heating is closely and more related with the nature of mRNA degradation than with the activation of a pathway involved in a specific biological process. We identified short mRNAs to be under-represented that might be due to the higher probability of affecting sequences that are important for detection by microarray.
A third stability marker for a more stable gene expression was the position of the complementary probe which was spotted within the gene. Accordingly, genes with probes that were designed to bind closer to the 3' end were found to be more stable.
Investigating the length distribution and probe positions of the most under-represented genes, we found that short genes especially were affected, as well as those with a short distance between the 5' end and the probe binding sequence.
This finding is based on two essential degradation mechanisms. First the degradation from the 5' end but second and more obvious a random degradation. In that case affection of the mRNA between the 3' end (where the Poly-A-tail is located and that is used for reverse transcription in Agilent arrays) and the sequence that later binds to the array probe is more probable the longer the distance between the probe and the 3' end, or the shorter the distance between the probe and the 5' end, respectively. In this context, we can also assume that degradation by heating is less effective for longer transcripts with a high GC content.
However, these results have some limitations. The platform used within these analyses has to be taken into consideration. Apart from different methods of reverse transcription as for example the use of oligo-dT primers versus random hexamers the platform itself might play a major role. While the probes that are spotted on the Agilent 44 K array used here are 60 mer long those from the Affymetrix chip for example are only 25 bp long. This difference might strongly influences the binding characteristics to the microarray, especially when degraded RNA is used. Furthermore we investigated a small group of different rectal cancers. The results, that implicit a much higher difference based on biology than on degradation might only hold true as long as such heterogeneity within the investigated samples can be found.