Relation between smoking history and gene expression profiles in lung adenocarcinomas
© Staaf et al.; licensee BioMed Central Ltd. 2012
Received: 29 November 2011
Accepted: 7 June 2012
Published: 7 June 2012
Lung cancer is the worldwide leading cause of death from cancer. Tobacco usage is the major pathogenic factor, but all lung cancers are not attributable to smoking. Specifically, lung cancer in never-smokers has been suggested to represent a distinct disease entity compared to lung cancer arising in smokers due to differences in etiology, natural history and response to specific treatment regimes. However, the genetic aberrations that differ between smokers and never-smokers’ lung carcinomas remain to a large extent unclear.
Unsupervised gene expression analysis of 39 primary lung adenocarcinomas was performed using Illumina HT-12 microarrays. Results from unsupervised analysis were validated in six external adenocarcinoma data sets (n=687), and six data sets comprising normal airway epithelial or normal lung tissue specimens (n=467). Supervised gene expression analysis between smokers and never-smokers were performed in seven adenocarcinoma data sets, and results validated in the six normal data sets.
Initial unsupervised analysis of 39 adenocarcinomas identified two subgroups of which one harbored all never-smokers. A generated gene expression signature could subsequently identify never-smokers with 79-100% sensitivity in external adenocarcinoma data sets and with 76-88% sensitivity in the normal materials. A notable fraction of current/former smokers were grouped with never-smokers. Intriguingly, supervised analysis of never-smokers versus smokers in seven adenocarcinoma data sets generated similar results. Overlap in classification between the two approaches was high, indicating that both approaches identify a common set of samples from current/former smokers as potential never-smokers. The gene signature from unsupervised analysis included several genes implicated in lung tumorigenesis, immune-response associated pathways, genes previously associated with smoking, as well as marker genes for alveolar type II pneumocytes, while the best classifier from supervised analysis comprised genes strongly associated with proliferation, but also genes previously associated with smoking.
Based on gene expression profiling, we demonstrate that never-smokers can be identified with high sensitivity in both tumor material and normal airway epithelial specimens. Our results indicate that tumors arising in never-smokers, together with a subset of tumors from smokers, represent a distinct entity of lung adenocarcinomas. Taken together, these analyses provide further insight into the transcriptional patterns occurring in lung adenocarcinoma stratified by smoking history.
KeywordsLung cancer Smoking Gene expression analysis Adenocarcinoma EGFR Never-smokers Immune response
Due to high incidence and poor survival, lung cancer is the worldwide leading cause of death from cancer. Small cell lung cancer accounts for about 15% of all lung cancer diagnoses whereas non-small cell lung cancer constitutes the majority of cases, primarily including adenocarcinoma (AC) and squamous cell carcinoma. Although the use of cigarettes is the major pathogenic factor, not all cases of lung cancer can be attributable to smoking . Lung cancer in never-smokers has been suggested to represent a different disease entity compared to lung cancer arising in smokers [2, 3]. Specifically, lung cancer in never-smokers has been associated with female sex, East Asian ethnicity, AC histology, differences in mutational pattern of EGFR KRAS, and TP53, and response to EGFR inhibitors [2–4]. However, despite numerous reports of gene expression derived AC subtypes [5–10], a distinct subtype comprising only or predominantly of never-smokers has not been identified. Taken together, this warrants further investigation of the transcriptional differences between AC arising in never-smokers and smokers.
In the present study, we aimed to delineate transcriptional differences between never-smokers and current/former smokers with AC by both unsupervised and supervised gene expression analysis, combined with conventional molecular assays, measurements of pathway activation by different gene expression metagenes, and histopathological data, across several AC data sets.
The study was approved by the Regional Ethical Review Board in Lund, Sweden (Registration no. 2004/762 and 2008/702). Written informed consent was obtained from all patients diagnosed after 2004, whereas for the retrospective part of the material, i.e. patients diagnosed earlier than 2004, study inclusion was approved by the Regional Ethical Review Board in Lund, Sweden, if patients (or their family members/survivors) not stated otherwise when they were informed about the study in 2006.
Baseline data for used AC cohorts
Beer et al.
Data set type
Affymetrix U133 2plus
Number of AC cases*
Median age (years)
Mean follow-up OS (years)**
External lung AC expression data sets
The DCC  (n = 444, Affymetrix U133A), GSE10072  (n = 58, Affymetrix U133A), GSE12667  (n = 75, Affymetrix U133 2 plus), Beer et al.  (n = 86, Affymetrix HU6800), GSE32863 (n = 58, Illumina WG6 version 3), and GSE11969  (n = 158 including 90 AC, Agilent GPL7015) gene expression data sets were used for supervised analysis and to validate the gene signature derived from unsupervised analysis. The GSE7895  (n = 104, Affymetrix U133A), GSE19027  (n = 52, Affymetrix U133A), GSE19667  (n = 121, Affymetrix U133 2 plus), GSE11952  (n = 83, Affymetrix U133 2 plus), GSE32863 (n = 58, Illumina WG6 version 3), and GSE10072 (n = 49, Affymetrix U133A) data sets were used to investigate the gene signature from unsupervised analysis in histologically normal bronchial airway epithelial cells or normal adjacent lung tissue (GSE32863 and GSE10072). Only probe sets present on the U133A chip were used for U133 2 plus arrays in all analyses. Affymetrix data sets were MAS5 normalized and updated for probe annotations as described  and individually mean-centered. Normalized expression data for GSE11969 were converted to log2 scale and mean-centered using all 158 samples from GEO. Normalized expression data for GSE32863 was obtained from GEO and were mean-centered using either all AC samples, or all normal samples respectively. Only samples in external data sets with smoking annotations were used in comparisons. Clinical and histopathological data for cases in external data sets are summarized in Table 1. Never-smoking patient history was inferred if a specific annotation existed, and/or if pack-years were equal to zero (Beer et al. and GSE11969). Pack-year data for smokers were available for GSE11969, Beer et al. GSE32863, GSE19027, GSE7895, GSE19667 and GSE11952.
Unsupervised gene expression analysis
Unsupervised gene expression analysis was performed on a set of 39 AC analyzed by Illumina Human HT-12 V3 microarrays (Illumina, San Diego, Ca). Total RNA was labeled in a 96-well format using the Total Prep-96 RNA amplification kit, hybridized and scanned according to manufacturer’s instructions. Seventy-two lung carcinomas of various histologies were profiled similarly and quantile normalized gene expression data were extracted for all 39 AC cases from this cohort. Gene expression data for the 72 cases is available through Gene Expression Omnibus  (GEO) as series GSE29016. Normalized gene expression data for the 39 AC cases were subsequently mean-centered across tumors for each probe. Probes with standard deviation >1 of expression (log2ratio) across samples were used in unsupervised analyses. Hierarchical clustering was performed in MeV  using Pearson correlation and complete linkage. Significance Analysis of Microarrays (SAM) analysis , performed in MeV, was used to identify genes discriminating between groups identified from unsupervised analysis. A centroid-based gene expression signature was constructed based on discriminating genes from SAM analysis between the two clusters identified by unsupervised analysis of AC cases. Centroid values for each gene correspond to the average expression of the gene across samples in each group. Illumina probes in the gene expression signature were merged on gene identifier prior to validation in external data sets. When multiple Agilent or Affymetrix probe sets from external data sets matched a gene in the gene signature, the probe set with the highest log2ratio standard deviation across samples was selected to represent the gene. Classification of samples was performed by calculating Pearson correlations between samples and centroids, assigning samples to the gene expression centroid with the highest correlation. The latter implies that there are no unclassified samples.
Supervised gene expression analysis based on smoking history
Supervised analyses between never-smokers and smokers (current or former) were performed for the original Illumina cohort and the DCC, GSE11969, GSE10072, GSE12667, Beer et al., and GSE32863 data sets. For each data set probes/probe sets with log2ratio standard deviation >1 across samples were identified and used in SAM analysis performed in MeV of annotated never-smokers versus smokers. Probes with false discovery rate < 5% from SAM analysis were used to create a never-smoker and a smoker gene expression centroid. Due to the fixed false discovery threshold centroid probe numbers differed between data sets. To ensure that sufficient number of up-regulated/down-regulated probes were present in the centroids for the correlation analyses, centroids were checked for number of up- or down-regulated genes. If a centroid contained < 20 probes with log2 ratio fold change <0, or >0, respectively, then probes with higher false discovery rate were added to the centroids (up to 20 probes in either direction). Centroids for a data set were subsequently used to classify all seven data sets into either smokers or never-smokers. Probes/probe sets in gene expression signatures were merged on gene identifier prior to validation in other data sets. When multiple Agilent, Affymetrix or Illumina probe sets matched a gene in a gene signature, the probe set with the highest log2ratio standard deviation across samples was selected to represent the gene. Classification of samples was performed by calculating Pearson correlations between samples and centroids, assigning samples to the gene expression centroid with the highest correlation. The latter implies that there are no unclassified samples. To investigate the effect of different classification thresholds, we also applied fixed Pearson correlation cut-offs for the DCC-derived centroid classifier, ranging from 0 (all samples classified) to 0.4. This introduced unclassified samples with increasing cut-offs.
Gene expression metagenes for measuring activation of different pathways
A gene expression metagene for proliferation was created by taking the average log2ratio of genes in the CIN70 signature . Gene expression metagenes for 27 cellular processes originally reported by Bryant et al. , referred to as pathways hereon, were computed as described . For external Affymetrix data sets the pathway probe set annotations from Bryant et al.  were used to compute mean pathway expression, otherwise matching was made based on gene symbol.
Functional pathway analysis
Functional analysis was performed using LitVAn  and the Ingenuity Pathway Analysis (IPA) software (Ingenuity Systems Inc, Redwood City, CA). For IPA, a p-value < 0.05 for a canonical pathway was considered significant.
Immunohistochemical (IHC) staining was performed on 3 μm sections after deparaffinization and rehydration. Heat induced antigen retrieval was performed in low pH buffer (PTEN, Dako S1699), high pH buffer (pAKT, Dako S2367) or TE buffer (CD117/cKIT). Antibodies were obtained from either Cell Signaling Technology (PTEN; 1:100 dilution) or Dako (pAKT; 1:15 dilution, CD117/cKIT; 1:500 dilution). Stainings were visualized using Envision™ (pAKT, CD117/cKIT) or LSAB™ (PTEN) (Dako). EGFR were stained using the mouse monoclonal anti-human EGFR clone 2- antibody and the EGFR pharmDX kit (Dako). After IHC staining, sections were counterstained with hematoxylin, dehydrated and mounted.
KRAS mutations were investigated using the TheraScreen K-ras mutation kit (Qiagen). The assay was performed according to the manufacturers’ instructions on a Rotor Gene 3000 instrument (Corbett Research). Mutations of exon 18 through 21 of the EGFR gene and of exons 9 and 20 of the PIK3CA gene were analyzed by direct DNA sequencing using the BigDye Terminator Cycle Sequencing Kit v1.1 (Applied Biosystems). Sequencing products were separated by capillary electrophoresis in an ABI 3130xl Genetic Analyzer (Applied Biosystems) and the sequence curves were analyzed using the 3100 data collection software (Gene Code Corporation). All sequence alterations were confirmed after a repeated extraction of DNA.
Quantitative real time-PCR
Quantitative real time-PCR was performed using Rotor Gene 3000 (Corbett Research) and the binding dye iTaqTM SYBR® Green Supermix (BIO-RAD). To determine the copy number of the EGFR gene we used the genes for albumine and glucokinase as controls. The ratios were compared to similar ratios of control DNA. A standard curve for each run was constructed from serial dilutions. The CT-threshold was set to 0.2. Amplification mixes (20 μL) contained 10 ng sample DNA, 10μL binding dye, 1μL primer and dH2O. Thermal cycling conditions comprised 10 min at 95 °C and 45 cycles at 95 °C for 15 s, 55 °C at 30s and 72 °C at 30s. All the samples were analyzed in triplicate and the serial dilutions were performed in duplicates. Relative gene copy numbers were calculated using the Pfaffl method representing average values of EGFR gene copy numbers in relation to albumin and glucokinase. Ratio ≥1.5 signified amplification.
Unsupervised gene expression analysis identifies subgroups of lung adenocarcinoma associated with smoking history
Validation of the association of adenocarcinoma subgroups with smoking history in external AC data sets
Association of AC1/AC2 subgroups in external AC data sets with smoking history
Nbr NS classified as AC1
% smokers classified as AC1 (All/CS/FS)
Fisher’s exact test P-value **
4 × 10−8/0.01
Beer et al.
Comparison of adenocarcinoma subgroups with results from supervised analysis based on smoking history
Identification of never-smokers based on supervised analysis of never-smokers versus smokers in seven AC data sets
Illumina centroids *
To investigate whether classification sensitivity and specificity could be improved we applied a series of more stringent classification thresholds for the DCC-derived classifier specifically (see Methods). Notably, increased classification stringency improved sensitivity only slightly, specificity less, while introducing a large number of unclassified samples across the seven tested data sets for this classifier (in Additional file 3: Figure S1). Notably, in the DCC, GSE10072, GSE12667, GSE11969, Beer et al., GSE32863, and original Illumina cohort 87%, 92%, 71%, 82%, 80%, 78% and 95%, respectively, of samples classified as never-smokers by the DCC classifier were also classified as AC1. Moreover, analysis of pack-year data from Beer et al., GSE32863, and GSE11969 revealed no significant difference between smokers classified differently by the DCC-classifier (p > 0.05 all comparisons, Student’s t-test). Taken together, these comparisons indicate that the unsupervised and supervised approaches both identify a core set of samples as “potential never-smokers” that comprises both true never-smokers and smokers, with the latter including both current and former smokers.
Functional analysis of gene signatures from unsupervised and supervised analysis
Functional analyses of the AC1/AC2 and DCC-derived gene signatures were performed using LitVAn  and IPA. For the AC1/AC2 signature LitVAn analysis revealed that genes with lower expression in AC1 showed enrichment for only a few gene ontology terms, e.g., fibrinogen. In contrast, LitVAn, and IPA both identified a strong association of genes overexpressed in AC1 with different immunological functions (in Additional file 4: Table S3).
LitVAn analyses of the centroid classifiers from supervised analysis showed that terms associated with proliferation were the main functional associations of classifiers derived from analysis of the DCC (in Additional file 4: Table S3), GSE12667, and GSE10072 data sets. The strong influence of proliferation was further highlighted by the marked differences in CIN70 metagene expression between classification groups for the DCC classifier across investigated data sets (in Additional file 5: Figure S2). Notably, the AC1/AC2 classification showed a similar CIN70 expression pattern as the DCC classifier across the majority of data sets, with lower expression in the AC1 group harboring the true never-smokers, despite differences in functional associations (in Additional file 5: Figure S2). This similarity in CIN70 expression is likely explained by the previously described high overlap between the two classifiers. Moreover, in the GSE11969 data set, representing the only external data set with EGFR, KRAS, and TP53 mutation data, both the AC1/AC2 signature and the DCC derived classifier were strongly associated with EGFR mutations (p = 0.002 and 0.001 respectively, Fisher’s exact test), but not with KRAS or TP53 mutations. In further support of the latter finding, the AC1/AC2 signature and the DCC-classifier were also not associated with p53 status or KRAS mutations in the Beer et al. data set.
Association of tumor derived gene signatures with smoking history in normal airway epithelial samples and adjacent lung tissue
Association of the AC1/AC2 and DCC signature with smoking history in normal airway epithelial samples, and adjacent lung tissue
Number NS classified as AC1
% S/CS/FS classified as AC1
P-value AC1/AC2 classification**
Number correctly identified NS by DCC signature***
% S/CS/FS classified as never-smokers by DCC signature
P-value DCC classification**
8 × 10−7
2 × 10−5
Moreover, we also investigated the AC1/AC2 and DCC classifiers in normal adjacent lung tissue (n = 107) included in two of the AC data sets (GSE32863 and GSE10072). Notably, results for the AC1/AC2 classification and the DCC classifier were in line with the four normal airway epithelial data sets (Table 4). Again, analysis of pack-years in GSE32863 revealed no difference between AC1-smokers and AC2-smokers, or for the DCC-classifier (p = 0.35 and p = 0.08, respectively, Student’s t-test, in Additional file 6: Figure S3). Moreover, overlap between AC1/AC2 and DCC classifications were similar to the airway data sets as 61% and 69% of cases classified as never-smokers by the DCC derived signature were also classified as AC1 in GSE10072 and GSE32863 respectively.
The genetic basis for initiation and development of lung carcinoma has a clinical impact through targeted therapeutics, diagnostic tools, prognostics, and predictive markers. Gene expression and genomic profiling have been used extensively in lung cancer to dissect the diversity of the disease and to derive prognostic gene signatures [5, 6, 8, 10, 25, 26]. Furthermore, such high throughput studies have also been performed to identify gene signatures associated with cigarette smoking in both tumor and bronchial epithelial tissue [13, 15, 16]. Indeed, lung cancer in never-smokers is among the top ten causes of cancer mortality in the world and successful genome-wide characterization of lung cancer stratified by patients’ smoking history may have large future implications for evaluation of lung cancer risk in the absence of smoking. However, although lung cancer in never-smokers has been suggested to represent a different disease entity compared to cancers arising in smokers [2, 3], numerous reports of gene expression derived AC subtypes have reported consistent lack of a never-smokers’ or a never-smoker predominant AC subtype [5–10].
In the current study we aimed to delineate transcriptional differences between AC arising in smokers and never-smokers in seven AC data sets by both unsupervised and supervised gene expression analysis. Notably, these data sets were analyzed by different microarray platforms and represent patient materials of different stage, differentiation, ethnicity, age, and sex. Our initial unsupervised analysis of a small, but well characterized AC cohort (n = 39) broadly divided cases into two main subgroups termed AC1 and AC2 (Figure 1). Intriguingly, AC1 harbored all never-smokers together with more than half of AC smoker cases, including both current and former smokers. We next validated the association of the AC1 group with never-smoking patient status through a derived gene expression signature in six larger external AC data sets (Table 2) and, notably, across all validation sets, confirmed the existence of an AC1 profile displaying roughly similar proportions of smokers (current/former) and never-smokers as in the original cohort (Table 2). Importantly, although the gene signature for the AC1 and AC2 subgroups was derived from initial analysis of a small cohort comprising only nine never-smokers, it was successfully validated across much larger AC data sets, e.g., the DCC (n = 349), profiled by different microarray platforms and comprising in total 687 AC tumor cases. Moreover, characteristics of the AC1 and AC2 groups appear consistent with findings from several studies demonstrating differences between smokers and never-smokers with AC. This includes association with female sex in two of the external AC data sets (DCC and GSE10072, data not shown), successful validation in patient cohorts of different ethnicity, higher proliferation in smoking compared to never-smoking cases within AC1 , and association of AC1 with increased EGFR activity (GSE11969 and our original data). Moreover, in line with subtypes reported by Takeuchi et al.  AC1 cases in GSE11969 were more often classified as terminal respiratory unit (TRU) -type AC (p = 0.03, Fisher’s exact test) proposed to represent a subgroup of AC originating from the peripheral airway epithelium under less influence of smoking and retaining certain progenitor characteristics .
Motivated by the high sensitivity, but lower specificity, in identification of never-smokers by the AC1/AC2 gene signature generated from unsupervised analysis, we also performed supervised analysis between never-smokers and smokers in seven AC data sets (n = 726). For each data set, we identified differentially expressed genes that we used to generate a centroid classifier, which we subsequently used to classify all data sets. Interestingly, the centroid classifiers with the best sensitivity in identifying never-smokers across the seven AC data sets (i.e. classifiers from the DCC and GSE10072 data sets) showed similar performance as the corresponding AC1/AC2 classification (Tables 2 and 3). In line with our original findings from the unsupervised clustering, all centroid classifiers derived from supervised analysis grouped a notable fraction of smokers as potential “never-smokers”, including both current and former smokers (in Additional file 2: Table S2). Moreover, there was a strong overlap of samples classified as never-smokers by the DCC-derived classifier and by the AC1/AC2 classification across all analyzed tumor data sets. This overlap indicates that the two approaches identify a core set of samples as potential never-smokers that comprise both true never-smokers and smokers. Thus, despite differences in the type of analysis, in size of original data sets generating the classifiers, and in apparent functional associations of the two signatures, a consistency regarding classification of AC stratified by smoking history could indeed be demonstrated by the two approaches herein. These results could indicate the existence of a potential molecular subtype of AC with a presumed non-smoking-associated etiology. Landi et al. recently proposed a gene expression signature characteristic of smoking, heavily weighted on cell cycle genes, that separated both smokers from non-smokers in lung tumors and early stage tumor tissue from non-tumor tissue . Interestingly, the DCC-classifier showed considerable overlap with results from Landi et al. . Specifically, seven out of the 20 up-regulated genes reported by Landi et al. were present in the DCC-derived classifier, including nearly all of the genes involved in regulation of mitotic spindle formation highlighted by Landi et al. as an important pathway deregulated between AC arising in smokers and never-smokers. In contrast, none of 20 up-regulated genes reported by Landi et al. were present in the AC1/AC2 signature. Moreover, an average metagene expression value of the 20 up-regulated genes in the Landi signature showed a Pearson correlation of 0.99 with corresponding CIN70 expression values for cases in the original Illumina cohort (data not shown). This correlation suggests that the coherent pattern of DCC classification with CIN70 expression (DCC classified smokers high CIN70, DCC classified never-smokers low CIN70 expression) resembles findings by Landi et al. . However, despite that classification by the supervised DCC classifier to a large extent appear coherent with expression of proliferation associated genes (in Additional file 5: Figure S2), specificity in identifying never-smokers remained low to medium even when markedly increasing the classification threshold in the seven AC data sets, (in Additional file 3: Figure S1).
Interestingly, when the AC1/AC2 and DCC classifiers were applied to four data sets comprising histologically normal airway epithelial tissue (n = 360 cases), and two data sets with normal adjacent lung tissue (n = 107) sensitivity in detecting never-smokers were high for both tumor-derived classifiers. However, similar to the tumor analysis never-smokers could not be singled out as unique group (Table 4). Cigarette smoke exposure has been demonstrated to create a “field of injury” in airway epithelial cells , and genes involved in regulation of oxidant stress, xenobiotic metabolism, and oncogenesis have been reported to be induced by smoking, while genes involved in tumor suppression and inflammation pathways have been reported to be down regulated . The latter, in combination with findings by Landi et al.  that current smoking altered expression of immune response associated genes in non-tumor tissue, appears consistent with the functional association of the AC1/AC2 signature (in Additional file 4: Table S3). Moreover, expression of several genes in the AC1/AC2 signature appear consistent with reports about gene expression changes in relation to smoking in airway epithelial cells. E.g., two (CX3CL1 and PLA2G10) of the 13 genes reported to be irreversibly altered by cigarette smoke by Spira et al.  are present in the AC1/AC2 gene signature. CX3CL1, a well-known chemokine, was found to be irreversibly downregulated in smokers , consistent with its lower expression in AC2 cases, while PLA2G10 was found irreversibly upregulated in smokers , in line with its elevated expression in AC2 cases. Moreover, MUC5AC GPX2 UCHL1, and CABYR have all been associated with increased expression in smokers compared to never-smokers, in line with their higher expression in the AC2 centroid compared to the AC1 centroid (in Additional file 1: Table S1) [28–30]. In addition to genes associated with smoking the AC1/AC2 classifier included several genes implicated in lung cancer tumorigenesis, such as KIT ID1 MMP7 MYCN XRN2, and CYP24A1, as well as type II pneumocyte marker genes such as NKX2-1 (TTF1/TITF1), LAMP3 (CD208), and surfactant proteins SFTPB and SFTPC (in Additional file 1: Table S1). Type II pneumocytes have an intriguing role in lung disease, as anomalies in pulmonary surfactant protein levels have been associated with certain respiratory diseases frequently observed in smokers . Moreover, type II pneumocytes in the alveoli of the lung have been associated with progenitor-like characteristics due to their ability to regenerate the alveolar epithelium after injury and also play an important role in the innate immune response of the lung through secretion of surfactant proteins and different proinflammatory mediators [32, 33]. Notably, the DCC-derived classifier also included, besides genes associated with proliferation, genes reported to be affected by smoking in airway epithelial cells, such as CX3CL1 GPX2 UCHL1 HLF, CYP1B1, and S100A8, with expression consistent with previous reports. In summary, the findings of a considerable number of reported smoking induced genes with consistent expression in the tumor-derived gene signatures suggest that these signatures are in fact related to patient smoking history. However, whether the relationship is due to expression differences in the tumor cells or the surrounding stromal tissue remains to be determined, as delineation of the expression from non-microdissected heterogeneous tissue is highly problematic.
Taken together, results from the current study in combination with previous reports on different AC subtypes [6, 7, 9, 10] indicate that never-smokers can not be completely separated from smokers based on transcriptional differences, and consequently, that AC arising in never-smokers do not appear to represent a distinct entity based on transcriptional patterns. Instead, this may suggest a shared biology between AC arising in never-smokers and in a subgroup of smokers, the latter thus perhaps representing tumors that arised in smokers “by chance”, i.e., possibly independent, or less dependent, of a positive smoking history, which warrants further investigation.
In the current study we have sought to identify transcriptional patterns specific for never-smokers with AC compared to tumors arising in smokers. Both unsupervised and supervised gene expression analysis identified simple classifiers (harboring both smoking induced genes and genes implicated in lung tumorigenesis) with high sensitivity in identifying never-smokers across multiple AC and normal tissue data sets. Furthermore, and consistent between original and validation data sets, a subset of tumors arising in smokers (both current and ex) was classified together with tumors arising in never-smokers, thus together forming a subgroup of AC with shared transcriptional patterns and, as discussed above, also other strong similarities. Taken together, these analyses provide further insight into the heterogeneous transcriptional patterns occurring in AC stratified by smoking history.
MP conceived of the study. JS, GJ, and MP wrote the manuscript. JS and GJ performed data analysis. MJ and AK performed array experiments. MJ, AS, SI, and MP performed IHC and mutational analysis. SBE, PJ, MS, and HP included the patients and contributed with tumor material. LJ initiated consecutive bio banking of fresh frozen lung cancer tissue, did most of the tissue sampling and performed the histopathological examinations. All authors approved the final manuscript.
We thank Erik Gyllstedt and Leif Hagman for including patients, Lena Mårtensson and Birgitta Sjögren for valuable administrative help, the Personnel at Ward no. 1 at Skåne University Hospital in Lund for their excellent involvement in the Southern Lung Cancer Study, the staff at the SCIBLU Genomics Core Facility at Lund University for technical support with Illumina analyses, and Il-Jin Kim at the Thoracic Oncology Program, Department of Surgery, University of California, US for critical reading of this manuscript.
- Jemal A, Bray F, Center MM, Ferlay J, Ward E, Forman D: Global cancer statistics. CA Cancer J Clin. 2011, 61: 69-90. 10.3322/caac.20107.View ArticlePubMedGoogle Scholar
- Sun S, Schiller JH, Gazdar AF: Lung cancer in never smokers–a different disease. Nat Rev Cancer. 2007, 7: 778-790. 10.1038/nrc2190.View ArticlePubMedGoogle Scholar
- Subramanian J, Govindan R: Lung cancer in never smokers: a review. J Clin Oncol. 2007, 25: 561-570. 10.1200/JCO.2006.06.8015.View ArticlePubMedGoogle Scholar
- Rudin CM, Avila-Tang E, Harris CC, Herman JG, Hirsch FR, Pao W, Schwartz AG, Vahakangas KH, Samet JM: Lung cancer in never smokers: molecular profiles and therapeutic implications. Clin Cancer Res. 2009, 15: 5646-5661. 10.1158/1078-0432.CCR-09-0377.View ArticlePubMedPubMed CentralGoogle Scholar
- Bryant CM, Albertus DL, Kim S, Chen G, Brambilla C, Guedj M, Arima C, Travis WD, Yatabe Y, Takahashi T, Brambilla E, Beer DG: Clinically relevant characterization of lung adenocarcinoma subtypes based on cellular pathways: an international validation study. PLoS One. 2010, 5: e11712-10.1371/journal.pone.0011712.View ArticlePubMedPubMed CentralGoogle Scholar
- Hayes DN, Monti S, Parmigiani G, Gilks CB, Naoki K, Bhattacharjee A, Socinski MA, Perou C, Meyerson M: Gene expression profiling reveals reproducible human lung adenocarcinoma subtypes in multiple independent patient cohorts. J Clin Oncol. 2006, 24: 5079-5090. 10.1200/JCO.2005.05.1748.View ArticlePubMedGoogle Scholar
- Beer DG, Kardia SL, Huang CC, Giordano TJ, Levin AM, Misek DE, Lin L, Chen G, Gharib TG, Thomas DG, Lizyness ML, Kuick R, Hayasaka S, Taylor JM, Iannettoni MD, Orringer MB, Hanash S: Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med. 2002, 8: 816-824.PubMedGoogle Scholar
- Garber ME, Troyanskaya OG, Schluens K, Petersen S, Thaesler Z, Pacyna-Gengelbach M, van de Rijn M, Rosen GD, Perou CM, Whyte RI, Altman RB, Brown PO, Botstein D, Petersen I: Diversity of gene expression in adenocarcinoma of the lung. Proc Natl Acad Sci U S A. 2001, 98: 13784-13789. 10.1073/pnas.241500798.View ArticlePubMedPubMed CentralGoogle Scholar
- Takeuchi T, Tomida S, Yatabe Y, Kosaka T, Osada H, Yanagisawa K, Mitsudomi T, Takahashi T: Expression profile-defined classification of lung adenocarcinoma shows close relationship with underlying major genetic changes and clinicopathologic behaviors. J Clin Oncol. 2006, 24: 1679-1688. 10.1200/JCO.2005.03.8224.View ArticlePubMedGoogle Scholar
- Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, Loda M, Weber G, Mark EJ, Lander ES, Wong W, Johnson BE, Golub TR, Sugarbaker DJ, Meyerson M: Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci U S A. 2001, 98: 13790-13795. 10.1073/pnas.191502998.View ArticlePubMedPubMed CentralGoogle Scholar
- Jonsson G, Staaf J, Olsson E, Heidenblad M, Vallon-Christersson J, Osoegawa K, de Jong P, Oredsson S, Ringner M, Hoglund M, Borg A: High-resolution genomic profiles of breast cancer cell lines assessed by tiling BAC array comparative genomic hybridization. Genes Chromosomes Cancer. 2007, 46: 543-558. 10.1002/gcc.20438.View ArticlePubMedGoogle Scholar
- Shedden K, Taylor JM, Enkemann SA, Tsao MS, Yeatman TJ, Gerald WL, Eschrich S, Jurisica I, Giordano TJ, Misek DE, Chang AC, Zhu CQ, Strumpf D, Hanash S, Shepherd FA, Ding K, Seymour L, Naoki K, Pennell N, Weir B, Verhaak R, Ladd-Acosta C, Golub T, Gruidl M, Sharma A, Szoke J, Zakowski M, Rusch V, Kris M, Viale A, Motoi N, Travis W, Conley B, Seshan VE, Meyerson M, Kuick R, Dobbin KK, Lively T, Jacobson JW, Beer DG: Gene expression-based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study. Nat Med. 2008, 14: 822-827. 10.1038/nm.1790.View ArticlePubMedPubMed CentralGoogle Scholar
- Landi MT, Dracheva T, Rotunno M, Figueroa JD, Liu H, Dasgupta A, Mann FE, Fukuoka J, Hames M, Bergen AW, Murphy SE, Yang P, Pesatori AC, Consonni D, Bertazzi PA, Wacholder S, Shih JH, Caporaso NE, Jen J: Gene expression signature of cigarette smoking and its role in lung adenocarcinoma development and survival. PLoS One. 2008, 3: e1651-10.1371/journal.pone.0001651.View ArticlePubMedPubMed CentralGoogle Scholar
- Ding L, Getz G, Wheeler DA, Mardis ER, McLellan MD, Cibulskis K, Sougnez C, Greulich H, Muzny DM, Morgan MB, Fulton L, Fulton RS, Zhang Q, Wendl MC, Lawrence MS, Larson DE, Chen K, Dooling DJ, Sabo A, Hawes AC, Shen H, Jhangiani SN, Lewis LR, Hall O, Zhu Y, Mathew T, Ren Y, Yao J, Scherer SE, Clerc K, Metcalf GA, Ng B, Milosavljevic A, Gonzalez-Garay ML, Osborne JR, Meyer R, Shi X, Tang Y, Koboldt DC, Lin L, Abbott R, Miner TL, Pohl C, Fewell G, Haipek C, Schmidt H, Dunford-Shore BH, Kraja A, Crosby SD, Sawyer CS, Vickery T, Sander S, Robinson J, Winckler W, Baldwin J, Chirieac LR, Dutt A, Fennell T, Hanna M, Johnson BE, Onofrio RC, Thomas RK, Tonon G, Weir BA, Zhao X, Ziaugra L, Zody MC, Giordano T, Orringer MB, Roth JA, Spitz MR, Wistuba II, Ozenberger B, Good PJ, Chang AC, Beer DG, Watson MA, Ladanyi M, Broderick S, Yoshizawa A, Travis WD, Pao W, Province MA, Weinstock GM, Varmus HE, Gabriel SB, Lander ES, Gibbs RA, Meyerson M, Wilson RK: Somatic mutations affect key pathways in lung adenocarcinoma. Nature. 2008, 455: 1069-1075. 10.1038/nature07423.View ArticlePubMedPubMed CentralGoogle Scholar
- Beane J, Sebastiani P, Liu G, Brody JS, Lenburg ME, Spira A: Reversible and permanent effects of tobacco smoke exposure on airway epithelial gene expression. Genome Biol. 2007, 8: R201-10.1186/gb-2007-8-9-r201.View ArticlePubMedPubMed CentralGoogle Scholar
- Wang X, Chorley BN, Pittman GS, Kleeberger SR, Brothers J, Liu G, Spira A, Bell DA: Genetic variation and antioxidant response gene expression in the bronchial airway epithelium of smokers at risk for lung cancer. PLoS One. 2010, 5: e11934-10.1371/journal.pone.0011934.View ArticlePubMedPubMed CentralGoogle Scholar
- Strulovici-Barel Y, Omberg L, O’Mahony M, Gordon C, Hollmann C, Tilley AE, Salit J, Mezey J, Harvey BG, Crystal RG: Threshold of biologic responses of the small airway epithelium to low levels of tobacco smoke. Am J Respir Crit Care Med. 2010, 182: 1524-1532. 10.1164/rccm.201002-0294OC.View ArticlePubMedPubMed CentralGoogle Scholar
- Hubner RH, Schwartz JD, De Bishnu P, Ferris B, Omberg L, Mezey JG, Hackett NR, Crystal RG: Coordinate control of expression of Nrf2-modulated genes in the human small airway epithelium is highly responsive to cigarette smoking. Mol Med. 2009, 15: 203-219. 10.1007/s00894-008-0395-8.View ArticlePubMedPubMed CentralGoogle Scholar
- Ringner M, Fredlund E, Hakkinen J, Borg A, Staaf J: GOBO: Gene Expression-Based Outcome for Breast Cancer Online. PLoS One. 2011, 6: e17911-10.1371/journal.pone.0017911.View ArticlePubMedPubMed CentralGoogle Scholar
- Gene Expression Omnibus: http://www.ncbi.nlm.nih.gov/geo/.
- Saeed AI, Sharov V, White J, Li J, Liang W, Bhagabati N, Braisted J, Klapa M, Currier T, Thiagarajan M, Sturn A, Snuffin M, Rezantsev A, Popov D, Ryltsov A, Kostukovich E, Borisovsky I, Liu Z, Vinsavich A, Trush V, Quackenbush J: TM4: a free, open-source system for microarray data management and analysis. Biotechniques. 2003, 34: 374-378.PubMedGoogle Scholar
- Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A. 2001, 98: 5116-5121. 10.1073/pnas.091062498.View ArticlePubMedPubMed CentralGoogle Scholar
- Carter SL, Eklund AC, Kohane IS, Harris LN, Szallasi Z: A signature of chromosomal instability inferred from gene expression profiles predicts clinical outcome in multiple human cancers. Nat Genet. 2006, 38: 1043-1048. 10.1038/ng1861.View ArticlePubMedGoogle Scholar
- Akavia UD, Litvin O, Kim J, Sanchez-Garcia F, Kotliar D, Causton HC, Pochanard P, Mozes E, Garraway LA, Pe’er D: An integrated approach to uncover drivers of cancer. Cell. 2010, 143: 1005-1017. 10.1016/j.cell.2010.11.013.View ArticlePubMedPubMed CentralGoogle Scholar
- Zhu CQ, Ding K, Strumpf D, Weir BA, Meyerson M, Pennell N, Thomas RK, Naoki K, Ladd-Acosta C, Liu N, Pintilie M, Der S, Seymour L, Jurisica I, Shepherd FA, Tsao MS: Prognostic and predictive gene signature for adjuvant chemotherapy in resected non-small-cell lung cancer. J Clin Oncol. 2010, 28: 4417-4424. 10.1200/JCO.2009.26.4325.View ArticlePubMedPubMed CentralGoogle Scholar
- Lu Y, Lemon W, Liu PY, Yi Y, Morrison C, Yang P, Sun Z, Szoke J, Gerald WL, Watson M, Govindan R, You M: A gene expression signature predicts survival of patients with stage I non-small cell lung cancer. PLoS Med. 2006, 3: e467-10.1371/journal.pmed.0030467.View ArticlePubMedPubMed CentralGoogle Scholar
- Steiling K, Ryan J, Brody JS, Spira A: The field of tissue injury in the lung and airway. Cancer Prev Res (Phila). 2008, 1: 396-403. 10.1158/1940-6207.CAPR-08-0174.View ArticleGoogle Scholar
- Spira A, Beane J, Shah V, Liu G, Schembri F, Yang X, Palma J, Brody JS: Effects of cigarette smoke on the human airway epithelial cell transcriptome. Proc Natl Acad Sci U S A. 2004, 101: 10143-10148. 10.1073/pnas.0401422101.View ArticlePubMedPubMed CentralGoogle Scholar
- Carolan BJ, Heguy A, Harvey BG, Leopold PL, Ferris B, Crystal RG: Up-regulation of expression of the ubiquitin carboxyl-terminal hydrolase L1 gene in human airway epithelium of cigarette smokers. Cancer Res. 2006, 66: 10729-10740. 10.1158/0008-5472.CAN-06-2224.View ArticlePubMedGoogle Scholar
- Chari R, Lonergan KM, Ng RT, MacAulay C, Lam WL, Lam S: Effect of active smoking on the human bronchial epithelium transcriptome. BMC Genomics. 2007, 8: 297-10.1186/1471-2164-8-297.View ArticlePubMedPubMed CentralGoogle Scholar
- Lusuardi M, Capelli A, Carli S, Tacconi MT, Salmona M, Donner CF: Role of surfactant in chronic obstructive pulmonary disease: therapeutic implications. Respiration. 1992, 59 (Suppl 1): 28-32.PubMedGoogle Scholar
- Pastva AM, Wright JR, Williams KL: Immunomodulatory roles of surfactant proteins A and D: implications in lung disease. Proc Am Thorac Soc. 2007, 4: 252-257. 10.1513/pats.200701-018AW.View ArticlePubMedPubMed CentralGoogle Scholar
- Bishop AE: Pulmonary epithelial stem cells. Cell Prolif. 2004, 37: 89-96. 10.1111/j.1365-2184.2004.00302.x.View ArticlePubMedGoogle Scholar
- Beane J, Vick J, Schembri F, Anderlind C, Gower A, Campbell J, Luo L, Zhang XH, Xiao J, Alekseyev YO, Wang S, Levy S, Massion PP, Lenburg M, Spira A: Characterizing the impact of smoking and lung cancer on the airway transcriptome using RNA-Seq. Cancer Prev Res (Phila). 2011, 4: 803-817. 10.1158/1940-6207.CAPR-11-0212.View ArticleGoogle Scholar
- Suzuki R, Shimodaira H: Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics. 2006, 22: 1540-1542. 10.1093/bioinformatics/btl117.View ArticlePubMedGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1755-8794/5/22/prepub