Using logistic regression to improve the prognostic value of microarray gene expression data sets: application to early-stage squamous cell carcinoma of the lung and triple negative breast carcinoma
- David W Mount†1,
- Charles W Putnam†2,
- Sara M Centouri3,
- Ann M Manziello1,
- Ritu Pandey1,
- Linda L Garland4 and
- Jesse D Martinez5Email author
© Mount et al.; licensee BioMed Central Ltd. 2014
Received: 15 October 2013
Accepted: 27 May 2014
Published: 10 June 2014
Numerous microarray-based prognostic gene expression signatures of primary neoplasms have been published but often with little concurrence between studies, thus limiting their clinical utility. We describe a methodology using logistic regression, which circumvents limitations of conventional Kaplan Meier analysis. We applied this approach to a thrice-analyzed and published squamous cell carcinoma (SQCC) of the lung data set, with the objective of identifying gene expressions predictive of early death versus long survival in early-stage disease. A similar analysis was applied to a data set of triple negative breast carcinoma cases, which present similar clinical challenges.
Important to our approach is the selection of homogenous patient groups for comparison. In the lung study, we selected two groups (including only stages I and II), equal in size, of earliest deaths and longest survivors. Genes varying at least four-fold were tested by logistic regression for accuracy of prediction (area under a ROC plot). The gene list was refined by applying two sliding-window analyses and by validations using a leave–one-out approach and model building with validation subsets. In the breast study, a similar logistic regression analysis was used after selecting appropriate cases for comparison.
A total of 8594 variable genes were tested for accuracy in predicting earliest deaths versus longest survivors in SQCC. After applying the two sliding window and the leave-one-out analyses, 24 prognostic genes were identified; most of them were B-cell related. When the same data set of stage I and II cases was analyzed using a conventional Kaplan Meier (KM) approach, we identified fewer immune-related genes among the most statistically significant hits; when stage III cases were included, most of the prognostic genes were missed. Interestingly, logistic regression analysis of the breast cancer data set identified many immune-related genes predictive of clinical outcome.
Stratification of cases based on clinical data, careful selection of two groups for comparison, and the application of logistic regression analysis substantially improved predictive accuracy in comparison to conventional KM approaches. B cell-related genes dominated the list of prognostic genes in early stage SQCC of the lung and triple negative breast cancer.
When commercial microarrays encompassing most of the human genome transcripts became available, much attention was focused upon gene expression patterns of primary tumors as indicators of likely disease progression. The presumption was that evidence of dysregulation of certain genes within the excised primary tumor could be used to improve the prognostic discrimination of clinical and pathologic staging alone [1, 2], by indicating the likelihood [3–6] that dissemination of the tumor had already occured [7, 8]. Although this strategy has yielded limited success with certain malignancies, the hope that microarray analysis would provide prognostic data complementary to clinical staging has largely remained unfulfilled [9–16]. This difficulty becomes quite evident when gene lists from similar studies are compared and show little if any overlap. By way of example, to date 13 analyses of large expression data sets of squamous cell carcinoma of the lung (SQCC) cases have been published [11, 17–28]. However, the deduced gene profiles have very few genes in common , even when the same data set was analyzed independently by three different groups [18, 20, 22]. Similarly, Roepman, et al. , compiled prognostic genes from eight analyses of NSCLC and found only five of 327 genes in common. Three of the consensus genes were from two independent reports of the same data set [29, 30].
Although a number of factors, from tissue acquisition to compilation of clinical data, conspire to complicate the task of identifying prognostic gene expressions (reviewed in [31, 32]), we focus here upon two vital considerations in the analysis of microarray data sets: optimal use of clinical data and rigorous, robust mathematical analysis. In this report, we describe the application of the well-established statistical approach, logistic regression, to the analysis of large gene expression data sets which include corresponding clinical data, such as survival or therapeutic response. Typically, an expression data set is analyzed by (1) identifying individual gene expression variations which demonstrate the largest excursions within the data set; (2) grouping the cases into quantiles based on sorted expression values of these genes; (3) comparing survival between quantiles, using Cox proportional hazard models to stratify clinical data and Kaplan Meier (KM) plots ; (4) applying statistical tests to deduce the success of the quantiles in predicting survival; and (5) compiling a predictive “signature” or “metagene” and, often, constructing a mathematical formula in which expression values of the signature genes are weighted to optimize its predictive success.
Our approach differs substantively from KM analysis, and consequently circumvents several limitations of the methodology just described . First, two classes of patient cases - equal in size - are compared (in this report, “earliest deaths” and “longest survivors”) to assess the accuracy of gene expression predictors; this strategy avoids relying upon KM survival plots, which are often based upon incomplete or heavily right-censored clinical data . Second, after isolating a subset of genes which are highly variable across the entire data set, and using the groups just described, logistic regression is employed to identify those genes offering statistically significant predictive value, as judged by the area under the curve (AUC) of a receiver operating characteristic (ROC) plot and statistical examination of the logistic regression model [36, 37]. This initial list of prognostic genes is further refined by first enlarging the two groups and then executing two sliding window analyses of the larger groups of early deaths and longest survivors. The final list of independently prognostic genes is validated by assigning training and testing subsets using a leave-one-out  or similar approach.
Our approach evolved as we sought to identify genes prognostic of early death or long survival in patients with early-stage SQCC, using a large published data set and accompanying clinical information . In this report we describe our analytic process using logistic regression; we ultimately identified 24 genes which have excellent prognostic discrimination. Application of a conventional KM approach to the same data, however, succeeded in identifying only a minority of the 24 genes found by logistic regression. Interestingly, immune cell-related genes, especially those associated with the B cell lineage, dominated the 24-gene list, in agreement with a substantial body of other experimental evidence, as recently reviewed by Whiteside . As further proof of the utility of the logistic regression method for identifying prognostic genes, we extended the same computational methods to a triple negative breast carcinoma data set. Treatment of this disease presents similar clinical challenges to SQCC . Remarkably, the analysis revealed a major role for B-cell and also for other immune-related genes in disease recurrence after tumor resection.
All data analyses including statistical calculations, graphical displays, and probe annotations were produced using R programming tools (http://www.R-project.org) and BioConductor libraries (http://www.bioconductor.org). For the lung study, a previously published data set  of gene expression measurements of tissue samples of non-small-cell lung cancer on Affymetrix HGU133A microarrays was obtained from the GEO (gene expression omnibus data set) at NCBI (http://www.ncbi.nlm.nih.gov/gds). “The samples were collected from patients from the University of Michigan Hospital between October 1991 and July 2002 with patient consent and Institutional Review Board approval” . Additional clinical information was obtained from the original authors’ submission, the soft file in entry GSE4573, and from supplementary data in the published paper. Matching of clinical cases to microarray samples was aided by using Unix scripts. The GDS expression data had been log transformed and normalized across the data sets for each Affymetrix probe. Density plots of each array revealed that the distribution of intensities was similar across the set and thus could readily be compared. The probe data set for each gene was averaged when multiple probes were present. In order to identify genes that were predictors of survival, gene subsets in which the interquartile difference was 0.5 logs or 1.0 logs, and in which > 0.25 of the log values were > 6.6 were chosen.
For the breast study, a total of 2874 HGU133A Affymetrix CEL files was obtained from GEO data sets GSE31519, GSE11121, GSE2034, GSE2990, GSE3494, GSE5327, GSE6532, and GSE7390, and the 98 of those that were triple negative cases were selected. These CEL files were processed using the rma function of the BioConductor affy library, and probes for the same gene were averaged. Since the files originated from multiple data sets, the data for each array were normalized to standard scores centered on zero using the scale function. These standard scores were used in the analysis. (However, similar results were obtained with the original scores.) The clinical data for 578 breast cancer cases were provided by the GSE31519 data set. These data were used to select a set of 63 cases that were suitable for logistic regression analysis of the early recurrence and long term, event-free survival groups. To select cases clinically similar to those used in the SQCC analysis, only patients with breast cancers classified as triple-negative, which carries a particularly poor prognosis , and who had not received adjuvant chemotherapy, were included.
The SQCC cases were first sorted based on given survival times, then the group of 20 earliest death cases was compared with the group of 20 longest survivors. In later analyses, groups of 20 from among the 40 longest survivors were compared to early death cases 1 through 20; conversely, groups of 20 from the 40 earliest deaths were compared to the longest 20 survivors. For each of the 80 comparisons, a logistic model for each of the 8,594 most variable genes was produced, and the accuracy of each model in predicting survival class was evaluated. Accuracy is the area under a ROC curve of 1 – specificity on the x axis and sensitivity on the y axis, where sensitivity is the proportion of true positive cases that are predicted correctly (sensitivity = TP/TP + FN where TP is the number of early death cases predicted correctly and FN is the number of long term survival cases predicted incorrectly), and specificity is the proportion of long survival cases predicted correctly (specificity = TN/TN + FP where TN is the number of long term survival cases predicted correctly and FP is the number of early death cases predicted incorrectly). It should be noted that the area under a ROC curve can be calculated by a simple, intuitive method, as described by Hosmer and Lemeshow . Using this method, the ratios of each value in one class (early death group) with every value in the other class (longest survivor group) are calculated to determine how often the value in one class is less than or greater than the value in the other class. If, for example, 320 of the 400 ratios are greater than 1, the accuracy of that gene in predicting the correct class based on its expression values is 320/400 = 0.8. This ratio is precisely the area under the ROC.
The significance of each gene model was further evaluated using a chi squared ANOVA test of the logistic model slope coefficient, as described . In the leave-one-out validation test, early death cases 5 through 24 were used to refine the gene selection; in our clinical experience, it is unlikely that at least the first four early postoperative deaths were related to SQCC progression.
Results and discussion
Data acquisition and case selection
Initially, we set as the aim of our statistical analyses the identification of individual gene expression changes prognostic of early death versus long survival in patients with stage I or II squamous cell carcinoma (SQCC) of the lung, a subset of patients in whom treatment choices are especially difficult . We used a previously published data set (GDS2373, see Methods) of 130 primary SQCC samples from 129 patients, including 107 stage I and II cases and 23 stage III cases. Gene expression values were derived from tissue samples collected at the time of surgical resection and were analyzed using the Agilent U133A microarray platform . The accompanying clinical data were obtained as described in Methods. The three published reports [18, 20, 21] of this data set included the 23 stage III cases. However, our analysis was limited to data from the 107 stage I and II cases, a selection consonant with the principle of using the clinical data in optimal fashion to achieve the objective of the study; limiting the cases to stages I and II provided a relatively homogenous patient sample in which the most prominent variable was survival.
Once assigned to a group, each case was considered comparable to all other cases in their group, without regard to the precise duration of survival. Doing so, which is possible because of the clinical homogeneity of the patient population under analysis, overcomes a major limitation of Kaplan Meier analysis, its dependency upon accurate survival data . In many studies, the survival data are right-censored to varying degrees because of infrequent assessments and limited follow-ups. The logistic regression approach is less affected by incomplete or heavily right-censored survival data than KM analysis. An additional difficulty with analyses dependent upon durations of survival is that in the elderly population typical of SQCC, patient deaths not infrequently result from co-morbidities , such as infection, heart disease, stroke, emphysema and diabetes, rather than from cancer. Duration of survival, as in the KM method, is therefore an inadequate proxy for disease progression. Comparing groups of early deaths and long survivors minimizes errors introduced by limitations in the available survival data and by deaths not directly attributable to cancer progression. Similarly, in our method, the two groups were not defined by arbitrary time intervals, e.g., deaths within two years or survival greater than five years ; instead an equal number of cases was selected from either extreme of the survival spectrum.
Initial prognostic gene selection by logistic regression
Refinement of the prognostic gene list by sliding window analysis
Just as was found in the initial analysis, the majority of the 24 genes are immune system-related, especially reflecting B cell activity (Additional file 2: Table S2). Because the original tissue samples analyzed for the GDS2373 data set were limited to ones having a tumor cell population greater than 70% (Supplementary Information, ), it is unlikely that stromal cells surrounding the tumor biased the expression data. A second possibility which must be entertained is that the SQCC neoplastic cells themselves might express genes ordinarily assumed to be of immune cell origin, for example IgG [46, 47]. We favor a third hypothesis: namely, that lymphocytes, especially B cells, had infiltrated the tumors to varying degrees, a well-documented phenomenon in solid tumors, as reviewed by Fridman, et al..
Validation of the 24 prognostic genes
The results of the leave-one-out analysis are shown in Figure 8. First, we ascertained that the most accurate genes correctly predicted 80 - 85% of early deaths and long survivals; however, even the two least accurate models (MXI1 and INPPL1) nonetheless predicted 65% of the cases correctly. Second, ANOVA chi-square tests were applied to each of the 24 logistic regression gene models and the range of probabilities for each was determined. These varied from 10-4 - 10-3 for MXI1 to 10-8 - 10-6 for CD79A (Additional file 2: Table S2), indicating, for example, an especially high level of confidence in the logistic model slope coefficient  for the latter gene. Models for the 17 genes in the upper portion of Figure 8 were likewise strongly supported by this analysis.
Finally, when the 24 prognostic genes were clustered based on their case-by-case predictions as shown on the left side of Figure 8, it was evident that five of the early death cases and two of the longest survivor cases were incorrectly predicted by most of the gene models. In a similar context, Zhao, et al.  have discussed the difficulty of predicting clinical outcomes from gene expression data in patients with rapidly progressive disease. Some of the uniformity of predictions – both accurate and erroneous - might be the consequence of disproportionate representation of certain cell types among the tissue samples  or might arise more directly from close functional relationships among the genes, hence an increased likelihood of coordinate gene expression. In support of the latter possibility, the cases which failed prediction by the CD79A model (indicated by closed circles in Figure 7) were consistent outliers; the same cases were incorrectly predicted by most gene models (Figure 8). Pearson correlation coefficients of the expression values for CD79A versus the other 23 genes were greater than 0.7 for 16 and greater than 0.8 for 10 genes, Additional file 4: Table S4. The gene expressions which did not correlate as well with CD79A are the lower six in Figure 8. Well-coordinated genes cannot be considered independent predictors of outcome. Nonetheless, the fact that so many immune-related genes were identified by each of our independent analyses supports their biological and functional relevance to survival. Hence, our data suggest that the strongest genetic signal for long-term patient survival in early-stage squamous cell carcinoma of the lung is an expression pattern reflective of increased number and/or activity of immune cells within the primary tumor.
As a more critical test of validation of the survival models, cases in the same early death and long survival groups (cases 5-24 and 88-107) were each divided into two groups of ten, using a set of every other case in each of the 4 groups. For example, test early cases 1,3,5,7,9,11,13,15,17,19 were used to predict the even numbered early cases, and this process was then reversed. A similar grouping and comparison was performed for the long survival cases - thus providing a total of four comparisons - and the accuracy of these predictions was determined. For two of the best predictive genes in the leave-one-out analysis, CD79A and CD27, their average accuracies in the four-group comparisons were 0.76 and 0.78, respectively, thus further validating the prognostic value of these 2 genes.
One alternative to these approaches is to randomly and repeatedly select groups of 20 patients from the 40-case earliest death and longest survival groups in a bootstrap or resampling type of analysis, and collect a list of most predictive genes. The bootstrap method may be more appropriate if patient survival is not as accurately specified as in the GDS2373 data set or if there are other clinical variables that may be a factor in choosing predictive genes. Additional resampling approaches have been discussed by others [52, 53].
Logistic regression versus Kaplan Meier analysis
The list of 24 prognostic genes identified by logistic regression was also compared to a list of genes obtained from the same data set using the more conventional approach of KM plots of expression quantiles. Initially, the 8594 most variable genes were tested as predictors of survival for the 107 stage I and II cases using right-censored survival for each case and the chi square statistic as a test of equality between four quantiles. Fourteen of the 24 genes found by the logistic regression method were also present in the list of the 40 best scoring genes (P < 10−3) by KM analysis (Additional file 5: Table S5) and five (IGLJ3, IGKC, IGHD, GM2A, DTNB) were in the top ten. The functions of the remaining genes found by KM analysis did not appear to be related to the immune system.
To more closely compare the two methods, a similar KM analysis was also performed using the same 40 cases that were used for the logistic regression analysis shown in Figure 8. Nine of the top 24 genes found by the logistic regression method (IGHM, GM2A, DTNB, INPPL1, CD27, TNFRSF1, LAX1, IGKV4-1, IGHD) were in the list of 24 best scoring genes (P < 10−4) by this modified KM analysis, whereas the remaining 15 were not apparently related to the immune system. Four of these genes were the highest scoring ones (P < 10−5, genes GM2A, INPPL1, CD27, and IGHD) by KM analysis. Thus, the KM method used with all 107 stage 1 and 2 cases, or with a reduced set of 40 early death and long term survival cases, also revealed that a set of immune genes are strongly predictive for survival. Finding similar sets of immune-related genes by the KM and logistic regression methods, which use different computational approaches provides additional confirmation that these genes are reliable predictors. This result also extends the validation analysis of the logistic regression models performed in Figure 8. The two methods contrast in that the KM method predicts a survival curve based on the quantile rank of a gene expression value, whereas the logistic regression method predicts a survival class (early death within two years or long survival greater than six years) for a given gene expression value.
That the GDS2373 clinical data included a preponderance of accurate survival times with long follow-ups undoubtedly contributed to the sensitivity of the KM method in this instance. Ordinarily, patient survival data is derived from a censoring analysis in which the survival time of each patient must be estimated and often, many of the cases have limited follow-ups spaced at longer intervals. As the intervals between censoring assessments increase and their numbers decline, the sensitivity of the KM method decreases . In contrast, the logistic regression method described here only requires of the survival data that two approximately equal-sized groups can be chosen from opposing extremes of the survival spectrum; these groups can be identified with a relatively small number of assessments of patient survival.
One theoretical limitation of the logistic regression method, however, is that by choosing groups at the survival extremes, not all cases in the data set are included in the analysis. In fact, 80 (75%) of the 107 available stage I and II cases were used in our analysis. Moreover, the intermediate survival cases, which are heavily right censored and may thus degrade the analysis, are of lesser significance for predicting survival class and need not be used. The experimental objective articulated in the original analysis of this data set by Raponi, et al.,  was to identify gene profiles that influenced the duration of survival, whereas our logistic regression method was designed to identify genes predictive of a survival class. The latter objective simplifies the experimental design and allows less frequent assessments of survival; thus for clinical studies it may be more practical and less expensive.
In all three of the previously reported studies [18, 20, 21] of the GDS2373 data set, stage III cases were included in the KM survival analyses. Of the 112 genes identified as prognostic in the three studies, only four appear on our 24 gene list. Consequently, we repeated our KM analysis with all 130 cases, including the 23 stage III cases. Only two (INPPL1 and GM2A, which are perhaps not immune-related, Additional file 2: Table S2) of the 24 genes found by the logistic regression method were present among the 40 top scoring genes (4*10−5 > P < 1.4*10−3) found by KM analysis. Many of the remaining 38 (data not shown) were tumor-related genes commonly identified in such studies (e.g., KRT7, VEGFA). An obvious but important conclusion is that immune system genes are identifiable by conventional KM analysis only when the expression data are limited to stage I and II cases. As a further comparison to KM methodology, the logistic regression analysis was repeated but this time including the stage III cases in the data set. Doing so changed the compositions of the 20-case early death and long survival groups with the consequence that immune system genes were less prevalent in the most predictive gene set (data not shown). These differences are not unexpected as the more advanced stage III tumors almost certainly have undergone additional genetic changes , which in turn influence their expression profiles, likely overwhelming the immune cell contributions to the gene expression pool. Also, rapid proliferation and attendant necrosis of cells within stage III primary tumors may alter lymphocyte to tumor cell ratios , again decreasing relative B cell gene expressions.
Although our KM analysis did identify some immune-related genes as prognostic, the logistic regression approach proved superior in that it identified a larger number of highly correlated B cell genes in the stage I and II cases of the GDS2373 data set. Importantly, with logistic regression, one can increase the number of comparisons for each gene model by using sliding and revolving windows of early death and long survival cases, providing additional evidence in support of the prognostic gene list. Our results with logistic regression (and, for that matter, with KM analysis) also demonstrate the essentiality of stratifying the available clinical data commensurate with the study objective in order for the prognostic gene profiles obtained to be of potential clinical value . These results also underscore the importance of using clinical data appropriately to achieve a more informative statistical analysis . As mentioned earlier, stage I and II cases present difficult therapeutic decisions . Somewhat less than half of the patients will ultimately die of disease progression  and therefore should be treated aggressively; however, if every patient is so treated, the majority will suffer the adverse consequences of therapy unnecessarily. Thus, for stages I and II accurate prognostic information complementary to staging will improve therapeutic decision making [42, 57].
Application of the logistic regression method for predicting clinical outcome in a triple negative breast carcinoma (TNB) data set
An immune cell signature has also been found to be predictive for clinical outcome in triple negative breast carcinoma . In the published study, clustering of genes with respect to time of first event (recurrence of the tumor) against gene expression values revealed a group of genes that included immune-related genes. The median gene value of this set was then used in Cox proportional hazard models with clinical variables and KM plots to reveal an influence of immune cell expression on outcome. Because of the clinical similarities of TNB and SQCC with respect to rate and timing of recurrence in early stage cases, we also applied our logistic regression approach to a TNB subset of their data set.
For our logistic regression analysis we selected a group of 63 triple negative breast cancer cases (see Additional file 6: Table S6 for the list of CEL files) from the supplementary data of the original report. The cases selected had complete clinical data, and were early stage lesions classified as T1, N0 malignancies (and tumor grades 1, 2 or 3). All patients included in the long term survival group were event-free at the time of the last follow-up visit. Of the 63 TNB cases, 31 had first events (recurrence of the tumor) within 18 months and 32 were event-free ten years after tumor removal. From 63 triple negative breast cancer cases, a group of 20 cases with the earliest recurrence of the tumor and a second group of 20 cases that had not experienced tumor recurrence for the longest duration were selected. Each gene in the normalized data set was then subjected to a logistic regression analysis and the area under the ROC curve (AUC) determined. Less variable genes were not filtered out as was done for the SQCC data in order to capture the full extent of involvement of the selected genes. AUC values for a set of immune related genes within the data set were then determined. A total of 203 immune-related genes represented on the HGU133A microarray were found using the search terms “immuno”, “lymph”, “B-cell”, and “T-cell”, and by adding 20 of the 24 genes found in the lung study. The list of genes and the AUC values are given in Additional file 7: Table S7. Three of the genes had AUC values > 0.8, 19 genes greater than 0.75, and 45 genes greater than 0.7. The two top-scoring immune genes were BANK1 (AUC = 0.86) and BLNK (AUC = 0.8), which encode a B-cell scaffold protein and a B-cell linker, respectively. A significant difference of the distribution of AUC values between all genes and the sample of 203 immune related genes was also found (P < 0.0016, by Kolmogorov-Smirnov test). There were just three non-immune related genes with AUC values greater than the most predictive immune gene (AUC > 0.86); this list is provided in Additional file 8: Table S8.
The role of B cells in early-stage SQCC of the lung and triple negative breast cancer
Numerous reports have analyzed immune cell, especially T cell, responses to malignancies (reviewed by Whiteside  and Prado-Garcia, et al. ). Recently however, attention has been drawn to B-cell gene expressions, as indicative or suggestive of improved survival, in various solid tumors , including NSCLC, as reviewed by Suzuki et al., ; adenocarcinoma , small cell , and large cell  carcinomas of the lung; breast cancer [64, 65]; and colorectal carcinoma . Prognostic B cell gene expressions in patients with solid tumors have also been documented in analyses of regional lymph nodes  and peripheral blood mononuclear cells [67, 68]. The role of immune cell-related genes, especially those of B cell origin, as prognostic of SQCC survival, has likewise been suggested previously. Roepman, et al., in a 72-gene classifier derived by Cox proportional hazards models from a 172 NSCLC patient data set (of which 53% were SQCC cases), identified a number of immune-related genes, about 20% of their 72 gene list . As in our analysis, the patients in their study were limited to stages I and II and did not receive adjuvant therapy.
Similarly, we have identified numerous immune-related genes as prognostic in triple-negative breast cancer. Although not a novel finding per se, the clarity of the observations suggests that as with SQCC of the lung, TNB cancers should be scrutinized further to better define the role of immune cells in preventing recurrence.
Genome sequencing of tumors has led to the realization that mutations in a relatively small number of driver genes promote tumor development by influencing only a few key signaling pathways, which in turn affect cell survival, cell fate or genome maintenance . Nearly all solid tumors in adults carry, in addition to driver mutations, appreciable numbers of mutations which do not confer a growth advantage; non-small-cell-lung-cancers are especially rich in these passenger mutations because of exposure to carcinogens  before and during tumor cell development. Many of the mutations, of driver and passenger genes alike, can be presumed to influence the gene expression profile of each lung cancer cell, adding to the difficulty of finding common gene profiles; the signal of cancer-related changes must be found against a large, variable background of noise. This background may explain the difficulty in obtained reproducible profiles of genes affecting survival when tissues from different studies are used.
The present study does not, in fact, report conserved tumor cell profiles but rather expression patterns that suggests the presence - among malignant cells of the primary tumor - of immune cells constituting a highly conserved defense system against neoplastic cells. The importance of this defense system is underscored by our observation that immune cell, especially B cell, expressions are greater in nearly all of the SQCC long survivors, compared to the early deaths, of the stage I and II cases in this study. Kawano, et al.  and Rena, et al.  have reported that up to 25% of stage I NSCLC patients in their studies were found to have isolated tumor cells or micrometastases when regional lymph nodes (RLN) removed contemporaneously with tumor resection were carefully examined by immunohistochemistry. However, survival rates were no different in the patients with RLN micrometastases, suggesting that host immune defense responses play a determinant role in the early phase of the disease . The presence of this defense system has been reported previously but has probably more often escaped detection in gene expression analyses, in large part because of inappropriate use of clinical data and the application of less satisfactory analytical methods .
Based upon our application of the logistic regression strategy to the GDS2373 data set, as well as the corroborating observations cited above, we suggest that B cell function within the primary tumor may be an important prognostic indicator for stage I and II cases of SQCC. This conclusion warrants further study, for example, by analyzing comparable tumor samples for B cell gene activity using immunohistochemical methods or RT-PCR, in conjunction with accurate, non-censored survival data. Given the apparent activity of B cells in early-stage SQCC, NSCLC, and other solid tumors, one critical role for these cells might be recognition of tumor-specific antigens. Then, recruitment of T cells to tumor sites and/or occult metastatic foci and the destruction of tumor cells by humoral antibodies and lymphocytes could interface to dictate survival. It has been suggested that over-expressed genes, and specifically their protein and carbohydrate products, by neoplastic cells could be the source of such recognition . Further analysis of expression data, supported by immunochemistry, may result in identification of additional candidate tumor-specific antigens [65, 72–74].
The many large gene expression data sets available in the public domain afford invaluable opportunities for analysing and understanding the effects of genetic and epigenetic effects on cellular phenotypes dictating outcomes in patients with malignancies. In this report we describe a logistic regression methodology for data set analysis which circumvents the principal shortcoming of conventional Kaplan Meier approaches, its reliance upon accurate survival data. Comparing classes of cases allows inaccurate, incomplete survival data to be used effectively. No less important is the careful stratification of cases based on clinical data and the choice of classes for comparison.
Our logistic regression analysis of a previously thrice-analysed SQCC data set revealed a number of B cell immune-related genes, all highly correlated in expression. This represents a novel finding in SQCC, although similar gene lists have been reported for other solid tumors. Indeed, we have also identified the predictive value of B-cell gene expressions in TNB. We propose that B cell activity within primary SQCC tumors is an important indicator of prolonged survival and, as such, merits further examination and experimentation. Understanding the role of B cells in determining outcomes in patients with SQCC may lead to improvements in diagnosis and therapy of this aggressive carcinoma.
Receiver operating characteristic
Squamous cell carcinoma (of the lung)
Kaplan Meier analysis
Area under the curve (of a ROC plot)
Analysis of variance
Non-small cell lung cancer
Regional lymph nodes
Triple negative breast carcinoma.
The authors gratefully acknowledge the developers of the R-project and BioConductor resources which were used for all of the data analysis performed; we also acknowledge the use of NCBI GEO resources for data retrieval. Catherine C. Liu carefully critiqued and edited the manuscript at various stages in its preparation. This work was supported in part by the Arizona Cancer Center Core Support grant NIH P30 CA23074 with funds allocated to the Bioinformatics Shared Service (DWM, RP, AM) and NIH grant CA107510 to JDM.
- Vallieres E, Shepherd FA, Crowley J, Van Houtte P, Postmus PE, Carney D, Chansky K, Shaikh Z, Goldstraw P: The IASLC Lung Cancer Staging Project: proposals regarding the relevance of TNM in the pathologic staging of small cell lung cancer in the forthcoming (seventh) edition of the TNM classification for lung cancer. J Thorac Oncol. 2009, 4: 1049-1059.View ArticlePubMedGoogle Scholar
- Vogelstein B, Papadopoulos N, Velculescu VE, Zhou S, Diaz LA, Kinzler KW: Cancer genome landscapes. Science (New York, NY). 2013, 339: 1546-1558.View ArticleGoogle Scholar
- Bernards R, Weinberg RA: A progression puzzle. Nature. 2002, 418: 823.View ArticlePubMedGoogle Scholar
- Ge M, Wang M, Wu Q, Qin Z, Chen L, Li L, Li L, Zhao X: Genetic fingerprint concerned with lymphatic metastasis of human lung squamous cancer. Zhongguo Fei Ai Za Zhi. 2009, 12: 945-950.PubMedGoogle Scholar
- Hoang CD, Guillaume TJ, Engel SC, Tawfic SH, Kratzke RA, Maddaus MA: Analysis of paired primary lung and lymph node tumor cells: a model of metastatic potential by multiple genetic programs. Cancer Detect Prev. 2005, 29: 509-517.View ArticlePubMedGoogle Scholar
- Kikuchi T, Daigo Y, Katagiri T, Tsunoda T, Okada K, Kakiuchi S, Zembutsu H, Furukawa Y, Kawamura M, Kobayashi K, Imai K, Nakamura Y: Expression profiles of non-small cell lung cancers on cDNA microarrays: identification of genes for prediction of lymph-node metastasis and sensitivity to anti-cancer drugs. Oncogene. 2003, 22: 2192-2205.View ArticlePubMedGoogle Scholar
- Dai CH, Li J, Yu LC, Li XQ, Shi SB, Wu JR: Molecular diagnosis and prognostic significance of lymph node micrometastasis in patients with histologically node-negative non-small cell lung cancer. Tumour Biol. 2013, 34: 1245-1253.View ArticlePubMedGoogle Scholar
- Matthews MJ, Kanhouwa S, Pickren J, Robinette D: Frequency of residual and metastatic tumor in patients undergoing curative surgical resection for lung cancer. Cancer Chemother Rep 3. 1973, 4: 63-67.PubMedGoogle Scholar
- Santos ES, Blaya M, Raez LE: Gene expression profiling and non-small-cell lung cancer: where are we now?. Clin Lung Cancer. 2009, 10: 168-173.View ArticlePubMedGoogle Scholar
- Subramanian J, Simon R: Gene expression-based prognostic signatures in lung cancer: ready for clinical use?. J Natl Cancer Inst. 2010, 102: 464-474.View ArticlePubMedPubMed CentralGoogle Scholar
- Sun Z, Yang P: Gene expression profiling on lung cancer outcome prediction: present clinical value and future premise. Cancer Epidemiol Biomarkers Prev. 2006, 15: 2063-2068.View ArticlePubMedGoogle Scholar
- Zhu CQ, Pintilie M, John T, Strumpf D, Shepherd FA, Der SD, Jurisica I, Tsao MS: Understanding prognostic gene expression signatures in lung cancer. Clin Lung Cancer. 2009, 10: 331-340.View ArticlePubMedGoogle Scholar
- Dupuy A, Simon RM: Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. J Natl Cancer Inst. 2007, 99: 147-157.View ArticlePubMedGoogle Scholar
- Kratz JR, Jablons DM: Genomic prognostic models in early-stage lung cancer. Clin Lung Cancer. 2009, 10: 151-157.View ArticlePubMedGoogle Scholar
- Michiels S, Koscielny S, Hill C: Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet. 2005, 365: 488-492.View ArticlePubMedGoogle Scholar
- Ntzani EE, Ioannidis JP: Predictive ability of DNA microarrays for cancer outcomes and correlates: an empirical assessment. Lancet. 2003, 362: 1439-1444.View ArticlePubMedGoogle Scholar
- Larsen JE, Pavey SJ, Passmore LH, Bowman R, Clarke BE, Hayward NK, Fong KM: Expression profiling defines a recurrence signature in lung squamous cell carcinoma. Carcinogenesis. 2007, 28: 760-766.View ArticlePubMedGoogle Scholar
- Raponi M, Zhang Y, Yu J, Chen G, Lee G, Taylor JM, Macdonald J, Thomas D, Moskaluk C, Wang Y, Beer DG: Gene expression signatures for predicting prognosis of squamous cell and adenocarcinomas of the lung. Cancer Res. 2006, 66: 7466-7472.View ArticlePubMedGoogle Scholar
- Roepman P, Jassem J, Smit EF, Muley T, Niklinski J, van de Velde T, Witteveen AT, Rzyman W, Floore A, Burgers S, Giaccone G, Meister M, Dienemann H, Skrzypski M, Kozlowski M, Mooi WJ, van Zandwijk N: An immune response enriched 72-gene prognostic profile for early-stage non-small-cell lung cancer. Clin Cancer Res. 2009, 15: 284-290.View ArticlePubMedGoogle Scholar
- Zhu CQ, Strumpf D, Li CY, Li Q, Liu N, Der S, Shepherd FA, Tsao MS, Jurisica I: Prognostic gene expression signature for squamous cell carcinoma of lung. Clin Cancer Res. 2010, 16: 5038-5047.View ArticlePubMedGoogle Scholar
- Skrzypski M, Jassem E, Taron M, Sanchez JJ, Mendez P, Rzyman W, Gulida G, Raz D, Jablons D, Provencio M, Massuti B, Chaib I, Perez-Roca L, Jassem J, Rosell R: Three-Gene Expression Signature Predicts Survival in Early-Stage Squamous Cell Carcinoma of the Lung. Clin Cancer Res. 2008, 14: 4794-4799.View ArticlePubMedGoogle Scholar
- Sun Z, Wigle DA, Yang P: Non-overlapping and non-cell-type-specific gene expression signatures predict lung cancer survival. J Clin Oncol. 2008, 26: 877-883.View ArticlePubMedGoogle Scholar
- Baty F, Facompre M, Kaiser S, Schumacher M, Pless M, Bubendorf L, Savic S, Marrer E, Budach W, Buess M, Kehren J, Tamm M, Brutsche MH: Gene profiling of clinical routine biopsies and prediction of survival in non-small cell lung cancer. Am J Respir Crit Care Med. 2010, 181: 181-188.View ArticlePubMedGoogle Scholar
- Inamura K, Fujiwara T, Hoshida Y, Isagawa T, Jones MH, Virtanen C, Shimane M, Satoh Y, Okumura S, Nakagawa K, Tsuchiya E, Ishikawa S, Aburatani H, Nomura H, Ishikawa Y: Two subclasses of lung squamous cell carcinoma with different gene expression profiles and prognosis identified by hierarchical clustering and non-negative matrix factorization. Oncogene. 2005, 24: 7105-7113.View ArticlePubMedGoogle Scholar
- Pelletier MP, Edwardes MD, Michel RP, Halwani F, Morin JE: Prognostic markers in resectable non-small cell lung cancer: a multivariate analysis. Can J Surg. 2001, 44: 180-188.PubMedPubMed CentralGoogle Scholar
- Sun Z, Yang P, Aubry MC, Kosari F, Endo C, Molina J, Vasmatzis G: Can gene expression profiling predict survival for patients with squamous cell carcinoma of the lung?. Mol Cancer. 2004, 3: 35.View ArticlePubMedPubMed CentralGoogle Scholar
- Tomida S, Koshikawa K, Yatabe Y, Harano T, Ogura N, Mitsudomi T, Some M, Yanagisawa K, Takahashi T, Osada H, Takahashi T: Gene expression-based, individualized outcome prediction for surgically treated lung cancer patients. Oncogene. 2004, 23: 5360-5370.View ArticlePubMedGoogle Scholar
- Xie Y, Xiao G, Coombes KR, Behrens C, Solis LM, Raso G, Girard L, Erickson HS, Roth J, Heymach JV, Moran C, Danenberg K, Minna JD, Wistuba II: Robust gene expression signature from formalin-fixed paraffin-embedded samples predicts prognosis of non-small-cell lung cancer patients. Clin Cancer Res. 2011, 17: 5705-5714.View ArticlePubMedPubMed CentralGoogle Scholar
- Beer DG, Kardia SL, Huang CC, Giordano TJ, Levin AM, Misek DE, Lin L, Chen G, Gharib TG, Thomas DG, Lizyness ML, Kuick R, Hayasaka S, Taylor JM, Iannettoni MD, Orringer MB, Hanash S: Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med. 2002, 8: 816-824.PubMedGoogle Scholar
- Lu Y, Lemon W, Liu PY, Yi Y, Morrison C, Yang P, Sun Z, Szoke J, Gerald WL, Watson M, Govindan R, You M: A gene expression signature predicts survival of patients with stage I non-small cell lung cancer. PLoS Med. 2006, 3: e467.View ArticlePubMedPubMed CentralGoogle Scholar
- Ahmed AA, Brenton JD: Microarrays and breast cancer clinical studies: forgetting what we have not yet learnt. Breast Cancer Res. 2005, 7: 96-99.View ArticlePubMedPubMed CentralGoogle Scholar
- Cahan P, Rovegno F, Mooney D, Newman JC, St Laurent G, McCaffrey TA: Meta-analysis of microarray results: challenges, opportunities, and recommendations for standardization. Gene. 2007, 401: 12-18.View ArticlePubMedPubMed CentralGoogle Scholar
- Kaplan E, Meier P: Nonparametric estimation from incomplete observations. J Am Stat Assoc. 1958, 53: 457-481.View ArticleGoogle Scholar
- Nieto FJ, Coresh J: Adjusting survival curves for confounders: a review and a new method. Am J Epidemiol. 1996, 143: 1059-1068.View ArticlePubMedGoogle Scholar
- Vervolgyi E, Kromp M, Skipka G, Bender R, Kaiser T: Reporting of loss to follow-up information in randomised controlled trials with time-to-event outcomes: a literature survey. BMC Med Res Methodol. 2011, 11: 130.View ArticlePubMedPubMed CentralGoogle Scholar
- Lasko TA, Bhagwat JG, Zou KH, Ohno-Machado L: The use of receiver operating characteristic curves in biomedical informatics. J Biomed Inform. 2005, 38: 404-415.View ArticlePubMedGoogle Scholar
- Heagerty PJ, Zheng Y: Survival model predictive accuracy and ROC curves. Biometrics. 2005, 61: 92-105.View ArticlePubMedGoogle Scholar
- Ma S, Huang J: Additive risk survival model with microarray data. BMC Bioinformatics. 2007, 8: 192.View ArticlePubMedPubMed CentralGoogle Scholar
- Whiteside TL: Immune responses to cancer: are they potential biomarkers of prognosis?. Front Oncol. 2013, 3: 107.View ArticlePubMedPubMed CentralGoogle Scholar
- Criscitiello C, Azim HA, Schouten PC, Linn SC, Sotiriou C: Understanding the biology of triple-negative breast cancer. Ann Oncol. 2012, 23: vi13-vi18.View ArticlePubMedGoogle Scholar
- Hosmer DW, Lemeshow S: Applied Logistic Regression. 2000, John Wiley & Sons, Inc: New York, 2View ArticleGoogle Scholar
- Felip E, Martinez-Marti A, Martinez P, Cedres S, Navarro A: Adjuvant treatment of resected nonsmall cell lung cancer: state of the art and new potential developments. Curr Opin Oncol. 2013, 25: 115-120.View ArticlePubMedGoogle Scholar
- Moeschberger ML, Klein JP: A comparison of several methods of estimating the survival function when there is extreme right censoring. Biometrics. 1985, 41: 253-259.View ArticlePubMedGoogle Scholar
- Petrelli F, Barni S: Non-cancer-related mortality after cisplatin-based adjuvant chemotherapy for non-small cell lung cancer: a study-level meta-analysis of 16 randomized trials. Med Oncol. 2013, 30: 641.View ArticlePubMedGoogle Scholar
- Zhao C, Shi L, Tong W, Shaughnessy JD, Oberthuer A, Pusztai L, Deng Y, Symmans WF, Shi T: Maximum predictive power of the microarray-based models for clinical outcomes is limited by correlation between endpoint and gene expression profile. BMC Genomics. 2011, 12 (Suppl 5): S3.View ArticlePubMedPubMed CentralGoogle Scholar
- Chen Z, Gu J: Immunoglobulin G expression in carcinomas and cancer cell lines. FASEB J. 2007, 21: 2931-2938.View ArticlePubMedGoogle Scholar
- Qiu X, Zhu X, Zhang L, Mao Y, Zhang J, Hao P, Li G, Lv P, Li Z, Sun X, Wu L, Zheng J, Deng Y, Hou C, Tang P, Zhang S, Zhang Y: Human epithelial cancers secrete immunoglobulin g with unidentified specificity to promote growth and survival of tumor cells. Cancer Res. 2003, 63: 6488-6495.PubMedGoogle Scholar
- Fridman WH, Galon J, Dieu-Nosjean MC, Cremer I, Fisson S, Damotte D, Pages F, Tartour E, Sautes-Fridman C: Immune infiltration in human cancer: prognostic significance and disease control. Curr Top Microbiol Immunol. 2011, 344: 1-24.PubMedGoogle Scholar
- Cleveland WS: Robust Locally Weighted Regression and Smoothing Scatterplots. J Am Stat Assoc. 1979, 74: 829-836.View ArticleGoogle Scholar
- Moreau Y, Aerts S, De Moor B, De Strooper B, Dabrowski M: Comparison and meta-analysis of microarray data: from the bench to the computer desk. Trends Genet. 2003, 19: 570-577.View ArticlePubMedGoogle Scholar
- Jacobsen M, Repsilber D, Gutschmidt A, Neher A, Feldmann K, Mollenkopf HJ, Kaufmann SH, Ziegler A: Deconfounding microarray analysis - independent measurements of cell type proportions used in a regression model to resolve tissue heterogeneity bias. Methods Inf Med. 2006, 45: 557-563.PubMedGoogle Scholar
- Molinaro AM, Simon R, Pfeiffer RM: Prediction error estimation: a comparison of resampling methods. Bioinformatics. 2005, 21: 3301-3307.View ArticlePubMedGoogle Scholar
- Subramanian J, Simon R: An evaluation of resampling methods for assessment of survival risk prediction in high-dimensional settings. Stat Med. 2011, 30: 642-653.View ArticlePubMedGoogle Scholar
- Panageas KS, Ben-Porat L, Dickler MN, Chapman PB, Schrag D: When you look matters: the effect of assessment schedule on progression-free survival. J Natl Cancer Inst. 2007, 99: 428-432.View ArticlePubMedGoogle Scholar
- Tarin D: Role of the host stroma in cancer and its therapeutic significance. Cancer Metastasis Rev. 2013, 32: 553-566.View ArticlePubMedGoogle Scholar
- Hawson G, Zimmerman PV, Ford CA, Johnston NG, Firouz-Abadi A: Primary lung cancer: characterization and survival of 1024 patients treated in a single institution. Med J Aust. 1990, 152: 230-234.PubMedGoogle Scholar
- Heon S, Johnson BE: Adjuvant chemotherapy for surgically resected non-small cell lung cancer. J Thorac Cardiovasc Surg. 2012, 144: S39-42.View ArticlePubMedGoogle Scholar
- Schmidt M, Bohm D, Von Torne C, Steiner E, Puhl A, Pilch H, Lehr HA, Hengstler JG, Kolbl H, Gehrmann M: The humoral immune system has a key prognostic impact in node-negative breast cancer. Cancer Res. 2008, 68: 5405-5413.View ArticlePubMedGoogle Scholar
- Prado-Garcia H, Romero-Garcia S, Aguilar-Cazares D, Meneses-Flores M, Lopez-Gonzalez JS: Tumor-induced CD8+ T-cell dysfunction in lung cancer patients. Clin Dev Immunol. 2012, 2012: 741741.View ArticlePubMedPubMed CentralGoogle Scholar
- Schmidt M, Hellwig B, Hammad S, Othman A, Lohr M, Chen Z, Boehm D, Gebhard S, Petry I, Lebrecht A, Cadenas C, Marchan R, Stewart JD, Solbach C, Holmberg L, Edlund K, Kultima HG, Rody A, Berglund A, Lambe M, Isaksson A, Botling J, Karn T, Müller V, Gerhold-Ay A, Cotarelo C, Sebastian M, Kronenwett R, Bojar H, Lehr HA, et al: A comprehensive analysis of human gene expression profiles identifies stromal immunoglobulin kappa C as a compatible prognostic marker in human solid tumors. Clin Cancer Res. 2012, 18: 2695-2703.View ArticlePubMedGoogle Scholar
- Suzuki K, Kachala SS, Kadota K, Shen R, Mo Q, Beer DG, Rusch VW, Travis WD, Adusumilli PS: Prognostic immune markers in non-small cell lung cancer. Clin Cancer Res. 2011, 17: 5247-5256.View ArticlePubMedGoogle Scholar
- Eerola AK, Soini Y, Paakko P: A high number of tumor-infiltrating lymphocytes are associated with a small tumor size, low tumor stage, and a favorable prognosis in operated small cell lung carcinoma. Clin Cancer Res. 2000, 6: 1875-1881.PubMedGoogle Scholar
- Eerola AK, Soini Y, Paakko P: Tumour infiltrating lymphocytes in relation to tumour angiogenesis, apoptosis and prognosis in patients with large cell lung carcinoma. Lung Cancer. 1999, 26: 73-83.View ArticlePubMedGoogle Scholar
- Chen Z, Gerhold-Ay A, Gebhard S, Boehm D, Solbach C, Lebrecht A, Battista M, Sicking I, Cotarelo C, Cadenas C, Marchan R, Stewart JD, Gehrmann M, Koelbl H, Hengstler JG, Schmidt M: Immunoglobulin kappa C predicts overall survival in node-negative breast cancer. PLoS One. 2012, 7: e44741.View ArticlePubMedPubMed CentralGoogle Scholar
- Kotlan B, Simsa P, Foldi J, Fridman WH, Glassy M, McKnight M, Teillaud JL: Immunoglobulin repertoire of B lymphocytes infiltrating breast medullary carcinoma. Hum Antibodies. 2003, 12: 113-121.PubMedGoogle Scholar
- Lores B, Garcia-Estevez JM, Arias C: Lymph nodes and human tumors (review). Int J Mol Med. 1998, 1: 729-733.PubMedGoogle Scholar
- Kossenkov AV, Vachani A, Chang C, Nichols C, Billouin S, Horng W, Rom WN, Albelda SM, Showe MK, Showe LC: Resection of non-small cell lung cancers reverses tumor-induced gene expression changes in the peripheral immune system. Clin Cancer Res. 2011, 17: 5867-5877.View ArticlePubMedPubMed CentralGoogle Scholar
- Rotunno M, Hu N, Su H, Wang C, Goldstein AM, Bergen AW, Consonni D, Pesatori AC, Bertazzi PA, Wacholder S, Shih J, Caporaso NE, Taylor PR, Landi MT: A gene expression signature from peripheral whole blood for stage I lung adenocarcinoma. Cancer Prev Res (Phila). 2011, 4: 1599-1608.View ArticleGoogle Scholar
- Kawano R, Hata E, Ikeda S, Sakaguchi H: Micrometastasis to lymph nodes in stage I left lung cancer patients. Ann Thorac Surg. 2002, 73: 1558-1562.View ArticlePubMedGoogle Scholar
- Rena O, Carsana L, Cristina S, Papalia E, Massera F, Errico L, Bozzola C, Casadio C: Lymph node isolated tumor cells and micrometastases in pathological stage I non-small cell lung cancer: prognostic significance. Eur J Cardiothorac Surg. 2007, 32: 863-867.View ArticlePubMedGoogle Scholar
- Van den Eynde BJ, van der Bruggen P: T cell defined tumor antigens. Curr Opin Immunol. 1997, 9: 684-693.View ArticlePubMedGoogle Scholar
- Chen G, Wang X, Yu J, Varambally S, Yu J, Thomas DG, Lin MY, Vishnu P, Wang Z, Wang R, Fielhauer J, Ghosh D, Giordano TJ, Giacherio D, Chang AC, Orringer MB, El-Hefnawy T, Bigbee WL, Beer DG, Chinnaiyan AM: Autoantibody profiles reveal ubiquilin 1 as a humoral immune response target in lung adenocarcinoma. Cancer Res. 2007, 67: 3461-3467.View ArticlePubMedGoogle Scholar
- Jia J, Cui J, Liu X, Han J, Yang S, Wei Y, Chen Y: Genome-scale search of tumor-specific antigens by collective analysis of mutations, expressions and T-cell recognition. Mol Immunol. 2009, 46: 1824-1829.View ArticlePubMedGoogle Scholar
- Kotlan B, Simsa P, Teillaud JL, Fridman WH, Toth J, McKnight M, Glassy MC: Novel ganglioside antigen identified by B cells in human medullary breast carcinomas: the proof of principle concerning the tumor-infiltrating B lymphocytes. J Immunol. 2005, 175: 2278-2285.View ArticlePubMedGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1755-8794/7/33/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.