Author's Response to Reviews Author's Response to Reviews: See Over

Title:Translating a gene expression signature for Multiple Myeloma prognosis into a robust high-throughput assay for clinical use. 1 Translating a gene expression signature for Multiple Myeloma prognosis into a robust high-throughput assay for clinical use. ABSTRACT Background: Widespread adoption of genomic technologies in the management of heterogeneous indications, including Multiple Myeloma, has been hindered by concern over variation between published


Background
By coupling immunomagnetic and fluorescence-based cell separation with microarray gene expression profiling, researchers have dramatically improved the understanding of how hematological malignancies, including Multiple Myeloma (MM), develop, progress, and respond to therapy. Multiple Myeloma accounts for 1% of all cancers, affecting an estimated 22,350 people in the US in 2013 and resulting in 10,710 deaths (cancer.gov). Gene expression signatures, generated using tissue obtained at the time of diagnosis, have been demonstrated to accurately predict patient outcome and stratify patients into clinically relevant molecular subgroups in many types of cancers [1][2][3][4][5].
By performing large multidisciplinary studies of multiple myeloma, researchers at University of Arkansas for Medical Sciences (UAMS) developed a 70-gene signature of aggressive disease (GEP70), corresponding to increased risk of relapse and poorer overall survival probability [6]. This signature was independently validated in separate patient populations for its ability to predict risk of relapse and shorter overall survival in newly diagnosed multiple myeloma and proved superior to other prognostic risk scores in multivariate analyses. In the post-relapse setting, GEP70 is able to stratify patients into groups with highly significant differences in overall survival [7]. Since 2006, the UAMS GEP70 assay has been validated on patient cohorts totaling over 4,700 patients, described in the 17 publications listed in Table 1. These validation studies, performed independently by German, French, Italian, British, Dutch, and US-based clinical research groups, have repeatedly shown that the prognostic significance the 70-gene algorithm is superior to both conventional risk stratification methods and other gene expression signatures in multivariate analyses. Patients identified as high risk by GEP70 (ranging from 15-30% of all patients, depending on the characteristics of the patient population profiled) may benefit from alternative treatment regimens and/or referral to an appropriate clinical trial. Importantly, the vast majority of cases, defined as low risk, might benefit from reduced intensity treatments.
In order to translate any gene expression signature from the research setting to routine use in a clinical laboratory, a number of logistical and technical challenges must be overcome. These include defining the minimum amount of patient specimen (e.g. bone marrow aspirate) required to isolate sufficient plasma cell RNA for expression profiling and establishing a comprehensive quality control framework in order to monitor laboratory performance over time and ensure reliability of results. Yet another challenge is how best to present the gene expression algorithm results in order to enable straightforward interpretation by treating physicians and incorporation into patient management regimens.
In this paper we describe the use of a high-throughput process, combining cell isolation, flow cytometry and gene expression profiling to provide physicians with personalized prognostic assessments of multiple myeloma, using bone marrow aspirate, based on the comprehensively validated GEP70 signature. Data are presented to describe the stability of the assay over time as performed in a CLIAcertified clinical laboratory diagnostic setting.

Plasma cell quantification and separation
Processing of bone marrow aspirate specimens submitted for MyPRS® analysis occurs largely as previously described [23]. CD138+ plasma cell isolation from red blood cell lysed bone marrow aspirates is performed by immunomagnetic bead selection with monoclonal mouse antihuman CD138 antibodies using the AutoMACS Pro automated separation system (Miltenyi-Biotec, Auburn, CA). Minimum PC purity of ≥80% homogeneity is confirmed by 2-color flow cytometry using CD38+/CD45− post-sort (after immunomagnetic bead selection) criteria (Becton Dickinson, San Jose, CA).
Determination of CD138+ cell presence is performed on the initial whole bone marrow aspirate by removing an aliquot from the gently homogenized bone marrow aspirate that was mixed with EDTA at the time of collection. This aliquot is incubated with CD138 PE and CD45 FITC antibodies (Miltenyi, CA), and then the red blood cells are lysed. The remaining cells are washed with phosphate buffered saline (PBS) and flow cytometry is performed. The pre-sort cell percentage (prior to immunomagnetic bead selection) is determined by identifying the CD138+/CD45cells from the total population after red blood cell (RBC) lysis. This determination is performed on either the FACS Calibur system or the FACS Aria III system (Becton Dickinson, NJ). Once the presence of CD138+ cells has been confirmed, the bone marrow aspirate undergoes RBC-lysis and is washed with autoMACS Running Buffer (Miltenyi, CA).
Cell count is determined using the Nucleocounter NC-100 (Chemometec, Denmark) according to manufacturer recommendations. Immunomagnetic beads are then bound to the cells and the remaining unbound beads are removed through a second Running Buffer wash. The CD138+ cells are isolated from the remaining cells using the AutoMACS Pro Separator (Miltenyi, CA) according to manufacturer recommendations. If 80% cell homogeneity is not obtained, the specimen either undergoes a second immunomagnetic isolation and/or enriched using CD38 PE and CD45 FITC antibodies on the FACS Aria III (Becton Dickinson, NJ).
In keeping with institutional, federal, and Helsinki Declaration guidelines, all identifiable patients gave written informed consent for undergoing bone marrow sampling for gene expression profiling and the institutional review board of the University of Arkansas for Medical Sciences approved the research studies. Consent was not obtained from patients where data were analyzed anonymously and not associated with any identifiable or longitudinal information.

RNA isolation and microarray analysis
Cell lysis and total-RNA isolation from purified CD138+ plasma cells is performed using the RNeasy Micro Kit (Qiagen, Germany). RNA concentration and purity is determined using a Nanodrop Spectrophotometer (Thermo Scientific, Wilmington) and the integrity is assessed using the Agilent Bioanalyzer 2100 system (CA). Doublestranded complementary DNA (cDNA) and amplified biotinylated antisense RNA (aRNA or cRNA) are synthesized from total RNA using the Affymetrix 3′ IVT Express Kit. The aRNA is fragmented and hybridized to wholegenome U133 Plus 2.0 GeneChip microarrays (Affymetrix, Santa Clara, CA), according to manufacturer recommendations. Hybridized GeneChips are scanned with the Affymetrix GeneChip Scanner 3000DX V2, an FDA-cleared, CE-IVD marked system. Scanned GeneChip files (CEL files) are normalized and assessed for hybridization success and sample quality by a proprietary gene expression data quality control system (ResultsPX™), previously described [2]. GEP70 risk scores are calculated using the method originally described by Shaughnessy et al. [6], with the additional modification of scaling the score to a range of 0 to 100 to assist in interpretation. This scaling is done using the equation [original GEP70 + 1.6] * 20 = scaled GEP70.

Control sample analysis
Positive and negative control specimens are analyzed alongside all clinical MyPRS® samples. Positive controlsample analysis is performed using the multiple myeloma cell line H929 which is grown as recommended (American Type Culture Collection, Chantilly, VA). To prepare the cell line for repeated control sample use, cells are consolidated and analyzed for homogeneity and for CD138+ presence. Once +80% CD138+ homogeneity is confirmed, the cells are pelleted, placed on RLT (plus 2-mercaptoethanol) buffer and frozen at −80°C. Each aliquot of H929 cells is tested over several months to generate sufficient data in order to calculate the standard deviation of its GEP70 risk score. A Levy Jennings plot is used to analyze the positive control specimen processed in parallel to each batch of clinical specimens, with results outside of the median +/−3SD range being rejected.
Negative control analysis is performed using aliquots of RLT (+2-mercaptoethanol) buffer and frozen. The negative control is inserted into a batch of samples at RNA isolation and is carried all the way through aRNA amplification. Detection of aRNA in the negative control specimen prior to microarray hybridization results is a sign of contamination and cause for rejection.

Replication of GEP70 signature between UAMS and Signal Genetics laboratories
To evaluate the difference between GEP70 risk scores generated in the original research laboratory (UAMS) and the clinical laboratory (Signal Genetics LLC, AR) a series of 99 bone marrow aspirates were analyzed. Specimen preparation, microarray hybridization and data analysis methods were performed as described above. GEP70 scores were compared by performing intra-class correlation, Passing and Bablok regression and chisquare analysis of high/low risk group classification in MedCalc 12.7.8 (MedCalc Software bvba, Belgium) [24].
Interpretation of GEP70 score using personalized gene expression heat maps The GEP70 risk score for each MyPRS analysis performed was visualized by creation of a personalized twodimensional heat map generated by the ResultsPX™ genomic data management platform developed by Signal Genetics. This system uses Microsoft SQL Server (Redmond, WA) databases, R [25] and Bioconductor [26] and custom scripts to display the expression profile of the prognostic GEP70 gene score within the context of the 559 multiple myeloma patients from two previously published datasets, including data from patients used to develop the algorithm [6]. The 5-year relapse status of each patient is shown at the top of each gene profile (red: relapse, blue: no relapse) along with the corresponding risk score.
To generate a personalized gene expression data heat map for each patient analyzed, the following steps are performed by Signal Genetics ResultsPX™ analysis software: No 'batch effect' modification is performed during this procedure as the 70 gene x 559 patient dataset was that used to originally develop the GEP70 algorithm. These data were produced by the UAMS MIRT laboratory and were used in the validation studies performed herein to ensure the 70-gene assay produces statistically equivalent results when performed in the Signal Genetics clinical laboratory.

Results
Genomic profiling of paired aliquots of patient CD138+ cells in research and clinical laboratories shows high correlation of GEP70 scores Ninety-nine patient bone marrow aspirate specimens were split into two aliquots and processed as described in both the UAMS research laboratory and at Signal Genetics' CLIA-certified laboratory (Figure 1). The intraclass correlation coefficient for the set of 99 GEP70 scores generated by UAMS and Signal Genetics is 0.98 (95% CI: 0.97 to 0.99), indicating a high level of consistency. The Cusum test for linearity revealed no significant deviation from linearity (P = 0.17) [24].
In order to assess the clinical implications of the small difference in risk scores observed between the two laboratories, ROC analysis was performed using 5-year relapse-free survival as the binary outcome metric. The AUC of the UAMS risk score was 0.67 (95% CI 0.57 to 0.76) compared to the Signal Genetics AUC of 0.66 (95% CI: 0.56 to 0.75), a statistically insignificant difference (P = 0.402). This indicates that no significant difference exists in the association of the GEP70 score with multiple myeloma relapse risk based on the processing of a specimen in the research or clinical laboratory setting.
Analysis of control specimen GEP70 risk score shows high level of consistency in MyPRS® analyses over time Along with each batch of clinical bone marrow aspirate samples analyzed, an aliquot of RNA from a multiple myeloma cell-line (H929) is analyzed and its GEP70 score is assessed for stability. Over a twelve-month period from August 2012 to August 2013, 102 control sample analyses were performed. As shown in Figure 2, the median value of the GEP70 score over time was 91.2, with a standard deviation of 2.7 and 3.0% coefficient of variance.
Next we sought to evaluate the reproducibility of the GEP70 scores across the dynamic range of the assay (i.e. low risk 0 to 45.2, high risk 45.2 to 100). Thirty specimens of MM RNA were analyzed in duplicate, approximately one month apart (Figure 3). A high degree of correlation was observed between the repeated measurements (r 2 = 0.99) with no statistically significant deviation from linearity detected by the Cusum test (P = 0.98).
Variation in clinical bone marrow aspirate specimen plasma cell content has negligible impact on RNA isolation and gene expression profile quality Bone marrow aspirate specimens of varying absolute and relative CD138+ plasma cell content are submitted for GEP70 analysis from treatment centers throughout the United States. Immunomagnetic and fluorescence based isolation of CD138+ plasma cells is routinely performed using methods described on every specimen to isolate, and if necessary enrich, the target plasma cells in the specimen.
We investigated the association between the relative malignant cell content, RNA integrity and the resulting GEP70 risk scores by performing a retrospective analysis of data generated from routine bone marrow aspirate specimens submitted to Signal Genetics over a period of twelve months. Agilent Bioanalzyer RNA integrity number (RIN; range 0-10) and the GEP70 risk score (range 0-100) data were compiled from 1000 randomly selected specimens submitted to Signal Genetics for routine GEP70 analysis between August 2012 and July 2013.
Within this series of 1000 specimens, the CD138+ cell content ranged from the lower acceptance threshold of 0.25% to 96.2% (mean: 12.80%, median: 4.63%). Despite this wide range of cellularity, skewed toward the lower  end of the spectrum, Figure 4a and b show that only a weak correlation exists between the RNA integrity number, GEP70 score and pre-sorted specimen percentage of CD138+ cells (r 2 = 0.13 and 0.010, respectively). After cell sorting, specimens with less than 80% purity are excluded from further analysis in order to ensure the genomic profile represents the cells of interest rather than other potentially contaminating material. Inspection of specimens with cell content between 80 and 100% revealed there is no significant association between the relative CD138+ plasma cell content and RNA integrity, or the GEP70 risk score, r 2 = 0.017 and r 2 = <0.001, respectively (Figure 4c-d).
These data show that the specimen preparation methods used to isolate the malignant cells from a patient's bone marrow aspirate are robust and not impacted by natural variations in specimen quality and relative quantity of malignant plasma cells. This ensures the GEP70 prognostic risk score is an accurate and reproducible prediction of patient prognosis, with negligible impact from biological or other sources of specimen variation.
Determining the minimum amount of CD138+ plasma cell RNA required for reliable gene expression profiling As stated, patient bone marrow aspirate specimens vary in terms of plasma cell number, viability and purity. To determine minimum number of viable CD138+ plasma cells necessary to generate a high quality GEP and reproducible GEP70 score, we performed two titration studies in which varying amounts of pooled MM aRNA were hybridized to Affymetrix microarrays in triplicate.
By hybridizing varying amounts of pooled aRNA (range: 10 μg to 2 μg) to multiple GeneChip we were able to extrapolate to a minimum number of CD138+ plasma cells required to accept a specimen for routine GEP70 analysis. GeneChip quality control metrics and GEP70 risk scores were used to assess the impact of the using lower amounts of aRNA compared to the amounts protocols used in previous studies [6,7,9]. After repeating the experiment twice, it was apparent that there was negligible variation in the GEP70 score, even using as little as 2ug of pooled aRNA, as shown in Table 2. The variance of the GEP70 score in titration experiment 1 was 1.5% and in experiment 2 was 2.8%.
These data also showed no significant changes in GEP data quality metrics across the range assessed. All hybridizations successfully passed the automated series quality control metrics developed by Signal Genetics, which are comprised of chip and data assessments (with associated pass/warning/fail thresholds) that have been established by analyzing large databases GeneChip quality data generated using aRNA concentrations at or above those recommended by the manufacturer [2].
Next we tested individual fresh or archival MM bone marrow aspirate samples submitted for MyPRS® analysis with RNA yields similar to those analyzed in the pooledsample titration study. Twenty-two cases where the amount of either total RNA or aRNA obtained from the patients CD138+ place cells was insufficient for GeneChip hybridization were identified. These were hybridized using standard methods and the resulting GeneChip quality metrics were analyzed to refine the minimum RNA/aRNA thresholds necessary to generate a high quality GEP suitable for clinical use. As shown in Table 3, 12/13 specimens with > =3 ng/μL of total RNA and > =280 ng/ μL aRNA resulted in successful hybridizations, defined as zero failed chip metrics and no more than three warning metrics). Below these thresholds a drop in hybridization quality was observed, indicating the GEP from such cases may be unreliable for clinical use.
Consequently a threshold of > =3 ng/μL of total RNA and > =280 ng/μL aRNA was set for routine MyPRS testing. This amount of total RNA can be isolated from approximately 20,000 CD138+ plasma cells. After Gene-Chip hybridization, each profile must pass the ResultsPX GeneChip QC model before being used to calculate a patient's GEP70 risk score, ensuring result integrity.
Personalized gene expression heat-map assists in interpreting a patient's GEP70 score in the context of patients with known outcomes For each MyPRS® specimen analyzed, the 70 gene expression values used to compute the patients risk score are combined with a matrix of 70-gene data from 559 patients used to originally train and validate the prognostic algorithm [9] (data available at NCBI GEO ID: GSE2658). These gene expression profiles were generated from newly diagnosed patients who were enrolled in Total Therapy 2 (Thalidomide/Dexamethasone or Dex + high dose melphalan (Mel) supported autologous stem cell transplantation (ASCT)) or 3 (TT2; TT3; bortezomib-thalidomide-dexamethasone + Mel-ASCT) at UAMS prior to the commencement of their treatment.
In order to visualize the relationship between direction of differential gene expression and risk of relapse, hierarchical clustering is used to arrange the matrix rows (genes), while the columns (patients) are ordered by increasing GEP70 risk score. The published relapse-freesurvival (RFS) times for patients in this trial are used to label each patient as <5 yrs RFS (blue) or >5 yrs RFS (red), allowing the physician to interpret the gene expression patterns in context with the end point of interest.
As the example shown in Figure 5 illustrates, the GEP profile of the test patient is highlighted (yellow line), allowing the physician to visually compare the expression of the 70 prognostic genes in their patient compared to a large number of other multiple myeloma patients with known outcomes.

Discussion
The UAMS 70-gene expression profile has been established as a powerful predictor of disease outcome in newly diagnosed and relapsed multiple myeloma patients. To enable use of this GEP algorithm in a high throughput clinical setting, a direct comparison of GEP70 scores generated in two laboratories was performed and minimum specimen requirements and quality control metrics were devised in order to ensure reliable, high quality prognostic results.
Outcome prediction was found to be highly similar for specimens analyzed in either the UAMS or Signal Genetics' CLIA-certified laboratories. Importantly, a small number of cases in this study had discordant risk group predictions between the laboratories. These were cases where the risk score was very close to the classification threshold (45.2), indicating that care should be exercised when interpreting the risk score when it is close to this value. The number of cases at the threshold is exceedingly small as indicated by the bi-modal distribution of risk scores [6]. Further validation work was subsequently carried out to determine an appropriate confidence interval for the risk score, based on the observed technical noise present in the system. The prognostic test was found to be highly stable over time as evidenced by the GEP70 scores from the MM cell line H929 control sample over a period of 12 months (Figure 2a). The 3% CV observed in these data over a twelve month period is similar to other microarraybased prognostic assays such as MammaPrint® (Agendia, CA.), a microarray based prognostic assay for breast cancer [27]. Analysis of more recent control sample data (late 2013-early 2014) shows further improvements to the assays consistency over time; CV 1.9% (Figure 2b). As a final and important observation from these control samples, no gradual shift in the risk score over time is detected, highlighting the long-term stability of the test.
Although the technical accuracy of MyPRS® is extremely high, samples close to the threshold have a higher chance of misclassification than samples further away from the threshold. However, the strong bi-modal distribution of scores proves that such cases are extremely rare. In principle, the chance of a patient with a poor clinical outcome incorrectly being assigned to a good prognosis profile should be minimized. Based on the known variation in the GEP70 risk profile, a small proportion of samples with indices close to the prediction threshold may be misclassified. For results that are close to the classification threshold of 45.2, it is recommended to evaluate the result in the context of the Values in bold type correspond to those passing the minimum acceptance threshold. Hybridization success is predicted using post-sort number of cells (>20,000) RNA concenctration (> = 3 ng/μL) and aRNA concentration (> = 280 ng/μL).
prognostic information present in the additional genomic signatures included in the MyPRS® assay; i.e. Molecular Subtype and Virtual Karyotype [9,28]. Additionally, a second sample from a separate anatomical site might be warranted. The implementation and validation of these additional signatures will be described in a separate publication. By performing titration studies and analysis of low cellularity clinical specimens with RNA yields below manufacturer recommendations, we have determined that the Affymetrix GeneChip platform is able to generate high quality, reproducible gene expression profiles with RNA that can be isolated from as little as 20,000 CD138+ plasma cells in this context, even though the RNA yield is lower than the amount previously considered necessary. Multiple comparisons have shown the MyPRS® test is robust to the natural variation in clinical specimens submitted for analysis.

Conclusion
GEP70 has been repeatedly demonstrated to be a statistically superior, standardized method of personalized multiple myeloma prognosis and molecular characterization, with less subjectivity than conventional methods such as FISH and cytogenetics. Even as the expression or mutational status of single genes are shown to influence response to specific myeloma treatments, e.g. expression levels of the glucocorticoid receptor gene NR3C1 and thalidomide [29], 'treatment-independent' risk-stratification assays such as GEP70 are likely to remain an important component in providing tailored treatment plans and ensuring optional outcomes.
The reproducibility of the MyPRS® test and the similarity of its results to those obtained from specimens analyzed in academic research laboratories demonstrate that it is an excellent tool to predict outcome of disease in MM patients. Standardization of specimen processing and the establishment of a comprehensive quality control program make the GEP70 assay highly suitable for the routine diagnostic clinical setting. Despite the wide variation in bone marrow aspirate specimen cellularity, we describe a series of novel quality control measurements that reliably produce high quality gene expression data, suitable for clinical use.
The MyPRS test is a stable, objective and standardized method for predicting prognosis in patients with multiple myeloma, supported by extensive clinical and technical validation data.