Conversion of a molecular classifier obtained by gene expression profiling into a classifier based on real-time PCR: a prognosis predictor for gliomas

Background The advent of gene expression profiling was expected to dramatically improve cancer diagnosis. However, despite intensive efforts and several successful examples, the development of profile-based diagnostic systems remains a difficult task. In the present work, we established a method to convert molecular classifiers based on adaptor-tagged competitive PCR (ATAC-PCR) (with a data format that is similar to that of microarrays) into classifiers based on real-time PCR. Methods Previously, we constructed a prognosis predictor for glioma using gene expression data obtained by ATAC-PCR, a high-throughput reverse-transcription PCR technique. The analysis of gene expression data obtained by ATAC-PCR is similar to the analysis of data from two-colour microarrays. The prognosis predictor was a linear classifier based on the first principal component (PC1) score, a weighted summation of the expression values of 58 genes. In the present study, we employed the delta-delta Ct method for measurement by real-time PCR. The predictor was converted to a Ct value-based predictor using linear regression. Results We selected UBL5 as the reference gene from the group of genes with expression patterns that were most similar to the median expression level from the previous profiling study. The number of diagnostic genes was reduced to 27 without affecting the performance of the prognosis predictor. PC1 scores calculated from the data obtained by real-time PCR showed a high linear correlation (r = 0.94) with those obtained by ATAC-PCR. The correlation for individual gene expression patterns (r = 0.43 to 0.91) was smaller than for PC1 scores, suggesting that errors of measurement were likely cancelled out during the weighted summation of the expression values. The classification of a test set (n = 36) by the new predictor was more accurate than histopathological diagnosis (log rank p-values, 0.023 and 0.137, respectively) for predicting prognosis. Conclusion We successfully converted a molecular classifier obtained by ATAC-PCR into a Ct value-based predictor. Our conversion procedure should also be applicable to linear classifiers obtained from microarray data. Because errors in measurement are likely to be cancelled out during the calculation, the conversion of individual gene expression is not an appropriate procedure. The predictor for gliomas is still in the preliminary stages of development and needs analytical clinical validation and clinical utility studies.


Background
Since the inception of gene expression profiling, researchers have sought to use this technology to improve the diagnosis of diseases, especially cancers. Recently, MammaPrint [1,2] and Oncotype DX [3,4] were established as diagnostic tests based on multiple gene assays for breast cancer. Despite the success of these diagnostic tests, the development of assays for gene expression profiling is still difficult. In particular, there have been few examples of microarray-based diagnostic tests, although microarrays are frequently used as a discovery tool. One reason for the paucity of microarray-based diagnostic tests is that DNA microarrays require considerable effort to achieve the level of technical refinement necessary for diagnostic practice. On the contrary, real-time PCR is stable and robust and is frequently used for diagnosis. Because there are many studies describing the use of microarrays at the discovery phase, a convenient method to convert a microarraybased algorithm into one based on real-time PCR would help to accelerate the development of diagnostic systems based on gene expression profiling.
Previously, we performed gene expression profiling of 152 glioma tissues [5] with a high-throughput quantitative PCR technique called adaptor-tagged competitive PCR (ATAC-PCR) [6,7]. ATAC-PCR is an advanced version of quantitative competitive PCR characterised by the addition of unique adaptors for different cDNAs. A single ATAC-PCR reaction includes five cDNA samples and two different amounts of a control cDNA sample with different adaptor tags, and it measures the relative expression of the samples against that of the control. We discovered a correlation between gene expression profiles and glioma prognosis, and we developed a prognosis predictor based on a 58-gene profile [5]. The performance of the predictor based on ATAC-PCR was cross-validated with a learning set of 110 glioma samples and validated with a test set of 42 samples. Cox regression analysis revealed that the correlation between the predictor and the prognosis was superior to that of histological classification and was an independent risk factor. The current prognostic standard, the histopathological classification system, is limited in its diagnostic accuracy, and prognoses range widely even within the same grade. Diagnosis depends on individual pathologists, and the results are often discordant among multiple pathologists [8]. The performance of the prognosis predictor based on ATAC-PCR indicated that this predictor held promise for the support of conventional histopathological classification. Our classifier is also expected to bring benefits in the clinical setting for personalized management of glioma patients. For example, various molecular-targeted drugs have recently been evaluated in clinical trials for gliomas. These novel treatments should be considered for tumours that are resistant to conventional chemoradiotherapy. Yet, it is important to avoid using such a therapy for tumours that are sensitive to conventional chemoradiotherapy, based on the cost and adverse effects associated with this technique. Considering elevated expression of angiogenesis-related genes in the poor prognosis group, [5], our classifier might be useful for selection of patients for anti-VGEF agents.
In the present study, we converted the conventional predictor to one based on real-time PCR. This new predictor is based on the delta-delta Ct method [9] and requires only the measurement of the cycle threshold (Ct) of diagnostic genes. For the conversion, we first identified a reference gene for real-time PCR. Then we constructed the parameters for the conversion formula using data obtained from the learning set, which was used to construct the original classifier. Finally, the new classifier was validated with a test set. Because there is a linear correlation between microarray data and Ct values [10], the conversion process could be applicable for classifiers based on microarrays.

Patients and tumour samples
Specimens excised from 80 patients with high-grade glioma (69 cases of glioblastoma and 11 cases of anaplastic astrocytoma) at Kyoto University Hospital or nearby regional hospitals between 1998 and 2008 were stored at -70°C until use. All histological diagnoses were performed in the Kyoto University Pathology Unit according to the 2000 or 2007 WHO classifications.
Sixty of the 80 samples were recruited from those used in the previous study [5]. They were collected from patients enrolled in a phase II clinical trial using nimustine, carboplatin, vincristine, and IFN-β with radiotherapy for high-grade gliomas (the KNOG study) [11]. The remaining 20 patients were treated with temozolomide and radiotherapy. The learning set included 44 samples (43 glioblastoma, 1 anaplastic astrocytoma) from the KNOG study. Recurrence was detected in 36 of the 44 patients and their median progression-free survival was 7 months. The test set included 36 samples (26 glioblastoma and 10 astrocytoma). Twenty-three of the 36 patients showed tumour progression, and their median progression-free survival was 8 months.
Institutional approval for this study was obtained from the Institutional Review Board of Kyoto University, and informed consent was obtained from all patients prior to surgery.

RNA extraction and cDNA synthesis
Total RNA was isolated from 100 mg of the tumour specimen using TRIzol (Invitrogen, Carlsbad, CA, USA) according to the manufacturer's instructions. RNA concentrations and A260/A280 ratios were measured using a NanoDrop ND-1000 (NanoDrop Technologies, Montchanin, DE, USA). Only RNA samples with A260/A280 ratios above 1.90 were included in the study. RNA integrity was confirmed by analysis with the Agilent 2100 bioanalyser.
After DNase treatment, 5 μg of total RNA in 10 μl of distilled water was incubated with 1 μl of oligo(dT) primer for 5 min at 70°C. Total RNA was reverse transcribed in a total volume of 20 μl containing 4 μl of 5× first strand buffer, 1 μl of RNase inhibitor (Invitrogen), 2 μl of 0.1 M DTT, 0.5 μl of 20 mM dNTP and 1 μl of SuperScript III Reverse Transcriptase (Invitrogen). The samples were incubated at 45°C for 1 hr. Next, a reaction mixture (total volume of 103 μl) containing 10 μl of 10× Escherichia coli (E. coli) ligation buffer, 2 μl of 20 mM dNTPs, 2 μl of 0.1 M DTT, 2 μl of E. coli ligase (Invitrogen), 1 μl of RNase H (Invitrogen), 4 μl of E. coli DNA polymerase (Invitrogen) and 82 μl of nuclease-free water was added. The resulting reaction mixture was incubated at 16°C for 120 min and then at 70°C for 20 min. The reaction mixture was then diluted five-fold with nuclease-free water and stored at -30°C until RT-PCR analysis.

Primer design and optimisation
Gene sequences were retrieved using the UCSC Genome Bioinformatics http://genome.ucsc.edu/ program, and primers sequences were designed using Primer3Plus http://www.bioinformatics.nl/cgi-bin/primer3plus/pri-mer3plus.cgi. Specific interactions between primers and target genes were confirmed using either NCBI BLAST http://blast.ncbi.nlm.nih.gov/Blast.cgi) or BlastView (http://uswest.ensembl.org/index.html. The specificity of the expected RT-PCR products was determined based on melting curve analyses of reactions with glioma cDNA and human cDNA libraries. The product-specific melting curves showed only single peaks and no primerdimer peaks or artefacts.

Quantitative real-time reverse transcription-PCR
Quantitative PCR amplification assays were performed by a SYBR Green fluorescent assay using the ABI PRISM 7500 real-time PCR sequence detection system (Applied Biosystems, Foster City, CA, USA). Reactions were performed in a 96-well plate with 20-μl reaction solutions containing SYBR Premix Ex Taq II (10 μl) (Takara Bio., Inc., Japan), ROX reference dye II (0.4 μl), 10 μM forward and reverse primers (0.8 μl), 1 μl of cDNA template, and nuclease-free water (7 μl). Cycling conditions included an initial denaturation for 10 sec at 95°C, followed by 40 cycles of 5 sec at 95°C and 34 sec at 60°C. For determination of the reference gene, a standard curve was generated for each assay using seven serial dilutions of an amplified human brain cDNA library ranging from 20 ng to 20 fg.
The delta-delta Ct method was employed for the diagnostic assays. Ct values were calculated following the manufacturer's instructions (Applied Biosystems, Foster City, CA, USA), using UBL5 as the internal reference. The diagnostic genes fulfilled the criterion that the absolute value of the slope of the log input amount vs. ΔCt should be < 0.1.

Data analysis
Thirty primers for the selected gene candidates and for the internal and negative controls were added in triplicate to 96-well plates, and the samples were measured using one plate per sample. The negative controls showed no detectable amplification or background levels of amplification (Ct ≥ 37, compared with 16 to 31 with sample DNAs). The mean and the standard deviation of differences of Ct values between duplicates were 0.060 and 0.086, respectively. Sequence detection software (Applied Biosystems) results were exported as tabdelimited text files and imported into Microsoft Excel for further analysis.
Statistical data processing was performed using Excel and SPSS, and Pearson's correlation coefficients (r) were computed for each cross-platform comparison. Progression-free survival was measured from the day of surgery to the time of the first event of progression or to the last day of follow-up, according to the Kaplan-Meier method. Curves were compared using the log-rank test.

Selection of the reference gene
We chose the delta-delta Ct method [9] for real-time PCR measurement rather than using calibration curves. Although the delta-delta Ct method has stricter requirements, it can substantially reduce the number of PCR reactions.
The handling of gene expression data obtained by ATAC-PCR was similar to the handling of data from two-colour microarrays [12]. In both methods, the relative gene expression level compared to a control sample is measured and used for statistical analysis after data normalisation. In data normalisation of ATAC-PCR, each expression value was divided by the median of gene expression and then logarithmically converted. To choose the reference gene candidates whose expression was least changed between gliomas, we selected twelve genes exhibiting expression patterns that were most similar to the median gene expression pattern from 3,456 genes in the previous gene expression data matrix of 152 gliomas [5]. These twelve genes were expected to produce minimal variations in expression between glioma samples. To select the best reference gene, the expression levels of the twelve genes were measured in 32 glioma samples using real-time PCR. The results are shown in Figure 1. Gene expression values are influenced by the amount of mRNA and the random variation caused by biological and experimental factors [13]. Because variation in the amount of mRNA was common to all of the genes, the difference in measurement was primarily due to the latter. The measurement of UBL5 had the smallest variation; therefore, we selected it as the reference gene. Although the use of multiple reference genes is recommended by several reports [13,14], we chose a single reference gene for this case because the use of multiple reference genes would increase variations in measurement.
The first prognosis predictor developed for gliomas was based on the expression of 58 genes [5]. For the delta-delta Ct method, the amplification efficiency of a gene must be approximately equal to that of the reference gene. We performed real-time PCR amplification and fulfilled this criterion for 30 of the 58 genes. The original prognosis predictor classified gliomas into good and poor prognosis groups. The diagnostic scores calculated with the original 58 genes and the 30 genes chosen in this study had a high correlation (r = 0.95), and there was no difference between the classification results in the test set and those in the previous study [5]. Therefore, we decided to proceed with the 30 genes. A list of the genes and primer sequences is shown in Table 1.

Strategy for conversion
In our previous report of gene expression profiling of gliomas [5], we measured the relative expression levels against a control sample. Because the Ct value is inversely proportional to the amount of target nucleic acid present in the sample, the relative expression level of gene i of sample x, er i (x), is described as follows: Linear classifiers are most commonly used for molecular classification by gene expression profiles; an example is MammaPrint [2]. With a linear classifier, the diagnostic score is the sum of the normalised expression values multiplied by a coefficient determined from the learning data set. The diagnostic score of the prognosis predictor, the PC1 score, is described with Ct values as follows: Here, PC1(x) is the PC1 score of sample x. "a i " is a constant determined from the learning set in the previous study [5]. "n" is the number of diagnostic genes. PC1(x) is alternatively described as follows, defining PC1 rt (x) as the PC1 score of sample x measured by realtime PCR.  Ct UBL5 (x) of the corresponding samples. The conversion formula would then be validated with the test set. It should be noted that this method does not require the use of a control sample (i.e., measurement of Ct i (c) and Ct UBL5 (c)).

Construction of the prognosis predictor based on realtime PCR
Using 44 samples from the learning set, we determined PC1 rt (x) by measuring the Ct values. As expected, there was a high linear correlation between PC1(x) and PC1 rt (x) (r = 0.94), as shown in Figure 2.
We then measured the correlation in individual gene expression (Table 2) between the ATAC-PCR data (lognormalised) and the ΔCt values (ΔCt(x) = Ct i (x) -Ct UBL5 (x)). The correlation for individual genes was less robust than that for the PC1 score: the correlation coefficients ranged from 0.6 to 0.9. These results suggest that the PC1 score could eliminate errors in measurement through the weighted averaging of gene expression. Because three genes (VMP1, TNC and RHOC) exhibited no correlation, we eliminated them from the diagnostic gene set. Because ATAC-PCR uses a single gene-specific primer designed for the 3' end of the mRNA, it may be less specific than conventional PCR using two primers. The absence of correlation may be due to the amplification of different genetic fragments or splicing variants. The parameters of the conversion formula were determined by linear regression (b 1 , -0.37: b 0 , -0.002).
Specific features of the expression of each gene may be obtained from the regression coefficient and intercept. Because the ATAC-PCR data were converted to a common logarithm during normalisation, the regression coefficient should be somewhere between zero and 0.30 (= log 10 2). In reality, the values ranged from 0.2 to 0.43, and ten genes demonstrated values exceeding 0.30. These results suggest a substantial degree of discrepancy between measurements obtained with ATAC-PCR and

Validation of the converted predictor
The converted predictor with 27 genes was validated with an additional sample set consisting of 16 samples from the previous test set [5] and 20 new samples. The samples were from anaplastic astrocytoma (grade III) or glioblastoma (grade IV). The PC1 score (PC1(x)) of each sample was calculated using ΔCt values measured using real-time PCR. The samples were classified into two prognosis groups with the threshold value set at zero, which was the threshold used in our previous study [5]. The performance of the classification was compared to conventional histopathological diagnosis. To have clinical utility, the predictor must have a classification ability superior to that of histopathological classification. The results of the Kaplan-Meier plot from the 36 samples revealed that the molecular classification was superior to histopathological diagnosis (log rank p-values, 0.023 and 0.137, respectively) ( Figures 3A, B). The hazard ratio was 2.70 (95% confidence interval, 1.05-6.92) (p = 0.039) for molecular classification. No significant hazard ratio was obtained with histopathology (p = 0.16). We also noted that the classification results for the 16 samples from the original test set were the same as those previously obtained by ATAC-PCR. Thus, the new predictor based on real-time PCR is comparable to the previous predictor based on ATAC-PCR.

Further considerations
In the delta-delta Ct method, the selection of the reference gene is the most important technical point. It has been frequently noted that housekeeping genes are not necessarily adequate for use as reference genes [14,15] because of their variable expression levels. Although it is possible to use a combination of housekeeping genes [14], a reference gene or a set of reference genes selected from the expression data matrix of the target tissues is more desirable because the measurement of other tissues is not performed in diagnostic practice. We selected a reference gene from a set of genes exhibiting expression patterns that were similar to the median gene expression pattern for the glioma data. Alternative methods to select reference genes should also be applicable to the conversion method described here [13,16]. In the present study, the original classifier was developed from gene expression data obtained by ATAC-PCR. Our conversion method is based on the linear correlation between gene expression profiling data and ΔCt values. A linear correlation was observed between normalised microarray data and ΔCt values regardless of the normalisation procedure [17]. Thus, our method should also be applicable to linear classifiers obtained using microarrays. As described above, the correlation between diagnostic scores is higher than that between individual genes. As demonstrated by diagnostic tests for breast cancer, the scores calculated from multiple gene expression correlate with the biology (malignancy) much better than individual gene expression, which includes noise of biological and experimental origin. The higher correlation of diagnostic scores between the two PCR techniques is not surprising. This result suggests that the conversion should be performed with the diagnostic score; it is not appropriate to perform the conversion at the level of individual gene expression.
It should be noted that validation experiments were performed only for the conversion process and that the predictor itself is in the preliminary stages of development and still needs analytical clinical validation and clinical utility studies. In particular, because the original predictor may also be applicable for the prognosis prediction of grade II gliomas [5], the future cohort should include a large number of grade II gliomas. In grade II and III glioma patients, the optimal timing of radiation therapy is still controversial [18,19]. Precise risk assessment, including the ability to predict possible malignant transformation, may be useful for timing decisions and is the most promising feature of the new classification scheme.

Conclusions
We successfully converted a molecular classifier obtained by ATAC-PCR into a Ct value-based classifier. Our conversion procedure should also be applicable to linear classifiers developed from microarray data. Because errors in measurement are likely to be cancelled out during the calculation, the conversion of individual gene expression data is not an appropriate procedure. The predictor for gliomas is still in the preliminary stages of development and requires analytical clinical validation and clinical utility studies. responsible for the clinical aspects of the study. KK and SK wrote the manuscript. All authors have read and approved the manuscript.