This study reports the development of a tumour cell percentage (TCP) assessment method for breast cancer samples based on transcriptional analysis of 13 genes. The 13-gene molecular profile has been validated on an independent cohort of 238 samples and can accurately identify breast tumour samples with a sufficient number of tumour cells for microarray diagnostics. Tumour percentage scoring based on the molecular profile is identical to the pathologist scoring for more than 90 percent of all analysed tumour samples. Although the variation in pathologist TCP scoring is in agreement with the number of discrepancies between pathological and molecular profile classification, this inconsistency is likely caused by the difference between the number of tumour cells present in a tumour sample and the actual tumour cell specific mRNA levels. In addition to the known variation and inconsistency in pathological scoring of tumour slides (e.g. Figure 1 and [6, 7]), Hsu et al. indicated that formalin fixation of slides can result in tumour cell shrinkage  and can lead to an underestimation of tumour cell content. Since the TCP profile is based on transcriptional levels of high TCP related gene expression within fresh or frozen tumour tissue, we believe that the developed gene profile likely gives a better indication whether a sample is suitable for microarray diagnostics compared to a pathological tumour cell percentage scoring on formalin-fixed H/E stained slides.
The utility of the gene profile lies in its capability to identify tumours with a high percentage of tumour cells compared to tumours with insufficient tumour content for subsequent microarray diagnostics. Conventional histopathological review results in tumour cell percentage scorings up to 10% increments but is laborious and requires an experienced (in-house) pathologist. The profile, on the other hand, is able to distinguish samples with low-, medium- or high-tumour cell content. Although the 13-gene profile provides a more qualitative measurement compared to the quantitative pathological assessment, it is a value tool for an objective TCP scoring based on transcriptional levels that can quickly identify samples suitable for diagnostics.
Since the gene profile was developed in such a way to mimic a pathological scoring which has been described as inconsistent and subjective, one might argue that the TCP profile also suffers from these factors. The main goal of this study, however, was not to develop a more accurate tool for TCP assessment but an objective method that can be performed independent of pathological expertise and which is based on transcriptional gene levels instead of the number of tumour cells. The use of a robust cross validation procedure that included a mimicked pathological variation was therefore included in the selection model that should, in principle, select a set of genes that are robust to this variation. Nevertheless, the 10 percent misclassification between the gene profile and pathological scoring might partly be attributed to this phenomenon.
Future application of the developed TCP profile could significantly improve the throughput of microarray diagnostics. Currently, after sample arrival, RNA processing and expression analysis cannot proceed until a histopathological TCP analysis confirms that the tumour sample contains sufficient tumour cells for analysis. As indicated above, pathological analysis will remain necessary for detection of ductal carcinoma in situ, necrosis and a detailed assessment about the percentage of tumour cells to define the suitability of the specimen. However, replacement of initial pathologist TCP scoring by a faster transcription based analysis that is able to identify whether a sample likely contains sufficient tumour tissue will shorten the processing time for microarray diagnostics. More importantly, the 13-gene molecular profile for tumour cell percentage (TCP) enables gene expression diagnostics for small samples or those on which no H/E evaluation is possible.
While gene expression profiles tend to be more robust with inclusion of a larger set of genes , we decided to limit the developed profile to a relatively small number of genes with optimal performance. The rationale behind this strategy was that transcriptional TCP assessment is preferably done before diagnostic microarray analysis as this assessment will indicate whether a sample is qualified for gene expression profiling. Future development of the 13-gene molecular profile into a RT-qPCR based assay, will allow a less subjective qualification of breast tumour samples as suitable for microarray diagnostics. This way, only samples with a sufficient TCP will be used for microarray diagnostics, saving time, money, and eliminating the need for a pathologist to score TCP on qualified specimens.