This paper highlights complex, but fundamental, issues related to validation of tissue-based genomic biomarkers. In our assessments of the published methods for several clinically-relevant genomic assays, we found that normal contamination is an important source of bias in genomic predictors. However, contaminating normal tissue has different types of impact depending upon the genes included in the assay. While the 70-gene assay provides stable results at high tumor percentage, unpredictable bias occurs when tumor percentage is low (< 70%). The 21-gene assay also showed unpredictable direction of bias due to contaminating normal. Both of these assays, in their commercial forms, have implemented quality control strategies to account for tumor nuclei content. These strategies appear important given that these assays misclassify a large number of non-neoplastic specimens as more aggressive tumor types. For the PAM50 assay, contaminating normal tissue induced predictable and unidirectional changes in subtype. The PAM50 genes have low variability within normal tissue and distinct expression between normal and tumor tissue, perhaps because of the way in which these genes were selected: by identifying genes that had high variation between different tumors and low variation between different samplings of the same specimen. To correct for normal contamination in the PAM50 assay, we calculated the median normal expression across each of the PAM50 genes and applied a simple, linear correction to several datasets representing more than 800 breast cancer patients. Our results demonstrate that computational approaches adjusting for normal tissue contamination bias can improve the predictive value of PAM50 genomic classification.
A few previous studies have attempted to identify gene expression signatures reflective of pure tumor cells [15–23] or associated with percentage of stroma in tumor . Other studies have used microdissection to isolate or enrich for malignant epithelial cells . Our results suggest that identifying genomic predictors that quantitatively estimate percent normal in tumor specimens is a challenging problem; we were unable to validate previous signatures  or identify a new signature that accurately predicts normal contamination in independent datasets. Until such a signature is identified and validated, it will remain difficult to implement correction strategies for individual patients. However, to illustrate the importance of the problem we have used public data to conduct a careful sensitivity analysis of the potential for normal contamination to affect genomic assay results. Our sensitivity analysis was designed to evaluate a plausible scenario for the effects of normal contamination, but actual effects of a given percentage of normal may be over-estimated for some samples in these datasets. This over-estimation of the effects could arise from differences in the yield of RNA per cell between normal and malignant cells , such that histologically evaluated percentage normal does not correspond linearly with a similar percentage change in the gene expression. In fact, for the Naderi data where median tumor cellularity was 60%, only 20% correction was required, suggesting that normal contamination contributes less RNA per cell, or that the signatures are robust to some percentage contamination, or both. Pathologic evaluation of percentage tumor cellularity is also subject to inter-rater variability. However, by assuming 1:1 yield between normal and tumor, the current study shows a plausible worst-case scenario of the biasing effects of a given percentage of normal tissue and highlights how vulnerability may vary across different genomic assays.
Future work should assess the strengths and weaknesses of various strategies for dealing with normal contamination, ranging from pathologist review and dissection to genomic methods for assessing and correcting for normal bias. Microdissection or other methods for gross dissection may be necessary for some assays with genes that are highly variable in normal tissue or stroma. Alternatively, the development of preanalytic criteria, such as requiring a minimal percentage of malignant cells in a particular sampling may remain important for ensuring quality results. The commercial version of the 70 gene assay implements such preanalytic criteria. One disadvantage of requiring high levels of tumor nuclei is that some samples will be excluded. Patients who are likely to be excluded are those who have small tumors at the time of detection. For some assays, computational adjustment may obviate the need for microdissection. For assays with low variation in normal and predictable direction of effect, pathologist evaluation of percentage tumor remains useful in determining the likelihood of normal contamination bias or in identifying bounds on correction rates, but less labor intensive sampling strategies may be possible to minimize the cost of these assays.
For the PAM50 assay, the tolerance to contamination by normal tissue is greatest for Luminal B, HER2E, and Basal-like subtypes or High risk classes. For these assays, the tumor signature is still strongly evident at even low percentages of tumor. The observation of more stable classification for Basal-like breast cancers coincides with the recent observation that Basal-like breast cancers are more robustly identified (relative to other subtypes) in single sample predictors. Conversely, this also demonstrates that accurate identification of Luminal A tumors is dependent upon having high malignant cell percentages (i.e. low levels of normal contamination). The majority of these erroneously classified Luminal A tumors are Luminal B after adjustment for normal contamination, suggesting that Luminal B tumors may "masquerade" as Luminal A tumors due to the presence of high levels of normal tissue in the specimen. This observation is particularly important because misclassification could lead to undertreatment if this error is not modeled and corrected.
It has been argued that the scientific rigor of translational biomarker research has lagged behind that of treatment research  and that second generation genomic tests should deal with limitations of the first generation tests, including the need for higher levels of evidence . Our results suggest that desirable features of second generation tests will include attention to important sources of preanalytic variation in tumor specimens, including normal contamination and its quantitative effects in biasing tumor classification. Discussion of these biases, including direction, magnitude, predictability, and thorough assessment of the assay sensitivity to these biases are important considerations. It is not the case that a given assay is simply resistant or vulnerable to normal contamination, but rather, the particular genes in an assay create complex patterns of bias, that must be further characterized. Other sources of variation in biospecimen processing [29–31] should also be carefully considered using similar sensitivity analyses. The next generation of genomic tests for clinical stratification of breast cancer patients will make important improvements upon the currently available tests by attending to these important variables.