In the present study, we used GEP microarrays to analyse 116 lymph node biopsies to assess the feasibility of this technology as a diagnostic tool in a clinical setting. This study is preceded by a significant body of research on GEP of lymphoma that has focused on understanding the pathogenesis of individual subtypes of lymphoma and refining the diagnosis and prognosis of these subtypes. However, our aim was to examine the practical question of whether GEP could be used to classify lymph node samples into the major subtypes of lymphoma and also to distinguish them from reactive lymph nodes.
The ability of GEP to diagnose biopsies of reactive, cHL, DLBCL and FL origin was examined with three strategies: global multi-class classification; local binary-class and global binary-class classification. The global multi-class approach classified each sample into one of the four diagnostic types with limited accuracy, which is known to decrease when more than two classes are considered simultaneously in linear classification algorithms . Our binary comparisons, which compared a particular diagnostic type with either another type (local) or with the remainder of all cases (global), resulted in high (>80%) accuracy rates for independent test sets, except when comparing FL to DLBCL (76.1%), the subtype that was most frequently misclassified. This limitation of GEP in classifying DLBCL may be related to the high degree of heterogeneity of the disease itself. Distinct molecular forms of DLBCL have been identified in other GEP studies [4, 16, 17], although this does not readily explain the misclassified cases of this study, which included both GCB and non-GCB DLBCL as judged by the Hans algorithm for immunohistochemistry. As the partial involvement of a tissue biopsy by lymphoma cannot be excluded, sampling error may also contribute to classification error rates. In regards to the comparison of RL with lymphoma, the two RL samples misclassified were both reactive hyperplasia. It should be noted that our reactive nodes were unselected and as such not all of which would necessarily have been B-cell predominant reactions. Therefore the random sampling of reactive nodes, which have different compartments, may contribute to sampling error. The accuracy to distinguish benign from malignant may be improved by increasing the number of cases used to build the classification, especially since there is an imbalance in the number of reactive biopsies (23) compared to the number of cancerous cases (93).
Application of our findings to clinical practice would require a much larger scale study to not only verify our identified genetic signature of particular types but also to assess the profile of uncommon lymphoma subtypes. We nonetheless feel that this work represents an important step in testing the principle of using GEP, based on simple and inexpensive arrays, as a diagnostic ancillary test for lymph node biopsy. We found that our laboratory practices were easily adapted to allow routine allocation of a portion of biopsy specimen for microarray as routine tests such as flow cytometry and cytogenetics, for diagnosis of lymphoma, also require fresh specimen (not formalin fixed). The development of new techniques such as quantitative nuclease protection assays on formalin-fixed, paraffin-embedded tissue blocks would overcome any difficulty in obtaining fresh tissue for microarray gene expression profiling and make GEP much more widely available even in small biopsies .
The 18% technical exclusion rate of samples arrayed in this study hampers the diagnostic utility of microarray. However, increased familiarity with the assay will reduce the exclusion rate, and in laboratories with a limited caseload, referral to a centralised service may be preferable. Given the substantial improvement of microarray technology since the initiation of this study, the use of newer genome-wide microarray platforms such as Illumina bead arrays would also improve the utility of this technology and contribute to reducing the technical exclusion rate seen in this study. Incorporation of microRNA array data  may also be appropriate, especially given the reported stability of microRNA expression .
In our study, 13 of the 40 classifier genes identified from a specific (local) comparison of cHL with NHL were also strong classifiers when cHL was globally compared to both NHL and reactive samples. This indicates that our classification strategy encompasses unique gene sets that can classify across more than two types of pathological conditions. Although some gene classifiers identified in our study were common to other reported GEP studies, the absence of some previously identified key classifiers may be due to variable probe make-up across different microarray platforms or resulting from differences in the type of diagnostic classes used in our classification compared to most published GEP studies . Our distinct global binary comparisons would have likely identified gene signatures that represent the particular diagnostic type in question as it was compared to a mixture of lymphoma subtypes and non-cancerous samples.
The fact that high expression of CD7, CCL17 and STAT1 has been reported to be associated with cHL supports the reliability of our microarray data presented in this study [22–24]. As Hodgkin and Reed-Sternberg cells only account for on average 1% of the mixed cell types present in HL infiltrates, it is likely that the expression of some of the HL classifiers are derived from the stromal cell population. This should not influence the applicability of lymph node GEP to the diagnosis of HL given that this stromal reaction is likely to be similar across different HL samples and that their gene expression profiles have been reported to predict the outcome of HL . Similarly for FL, our detected reduced expression of CD163, a macrophage marker, may reflect a low number of macrophages present in the node microenvironment in many cases of FL. The importance of this information is not diminished as increased reactive macrophages in a rare subset of FL have been reported to be associated with poorer survival . LMO2, another strong molecular classifier identified for FL, has been reported to be expressed in approximately 50% of FL . However, it is better known as a key gene expressed in GCB cell type of DLBCL  and as a strong predictor of superior outcome in DLBCL . Given the importance of LMO2 expression in DLBCL, its absence in our list of top 20 classifying genes of DLBCL may be due to the fact that only 5 cases examined (26%) are of GCB cell origin by immunohistochemistry. Instead, we have identified the gene cyclin D kinase inhibitor 3 (CDKN3), a known marker of the ABC-like DLBCL , to be expressed higher in our DLBCL samples compared to the other diagnostic types examined in this study.
The lower expression of several immunoglobulin genes in reactive node tissue may reflect the differences in the cellular makeup of the microenvironment of normal lymph node tissue compared to those diseased with lymphoma. Consistent with the phenotype of non-cancerous tissue, we detected reduced expressions of a potentially cancerous gene TAF3, a negative regulator of the tumour suppressor p53 .