Genome wide SNP comparative analysis between EGFR and KRAS mutated NSCLC and characterization of two models of oncogenic cooperation in non-small cell lung carcinoma

Background Lung cancer with EGFR mutation was shown to be a specific clinical entity. In order to better understand the biology behind this disease we used a genome wide characterization of loss of heterozygosity and amplification by Single Nucleotide Polymorphism (SNP) Array analysis to point out chromosome segments linked to EGFR mutations. To do so, we compared genetic profiles between EGFR mutated adenocarcinomas (ADC) and KRAS mutated ADC from 24 women with localized lung cancer. Results Patterns of alterations were different between EGFR and KRAS mutated tumors and specific chromosomes alterations were linked to the EGFR mutated group. Indeed chromosome regions 14q21.3 (p = 0.027), 7p21.3-p21.2 (p = 0.032), 7p21.3 (p = 0.042) and 7p21.2-7p15.3 (p = 0.043) were found significantly amplified in EGFR mutated tumors. Within those regions 3 genes are of special interest ITGB8, HDAC9 and TWIST1. Moreover, homozygous deletions at CDKN2A and LOH at RB1 were identified in EGFR mutated tumors. We therefore tested the existence of a link between EGFR mutation, CDKN2A homozygous deletion and cyclin amplification in a larger series of tumors. Indeed, in a series of non-small-cell lung carcinoma (n = 98) we showed that homozygous deletions at CDKN2A were linked to EGFR mutations and absence of smoking whereas cyclin amplifications (CCNE1 and CCND1) were associated to TP53 mutations and smoking habit. Conclusion All together, our results show that genome wide patterns of alteration differ between EGFR and KRAS mutated lung ADC, describe two models of oncogenic cooperation involving either EGFR mutation and CDKN2A deletion or cyclin amplification and TP53 inactivating mutations and identified new chromosome regions at 7p and 14q associated to EGFR mutations in lung cancer.


Background
Lung cancer is the leading cause of cancer-related deaths in the western world [1]. Non-small cell lung cancer (NSCLC) accounts for approximately 85% of the cases and represents a heterogeneous group mainly consisting of adenocarcinoma (ADC), large cell carcinoma (LCC) and squamous cell carcinoma (SCC). The incidence of subtypes has changed in the last decades with increasing incidence of ADC. Moreover, while smoking remains the major risk factor for lung cancer, a subgroup of patients develop lung ADC without smoking history. It is not clear whether lung cancer in non-smokers is increasing in western countries but it is obvious that it has particular clinical and biological features. Population based studies showed that lung cancer in non-smokers occurs preferentially in women with almost 20% of non-smoker lung cancer diagnosed in women versus 2.5% in men [2]. Different studies have shown that genetic abnormalities can be specifically identified in cancer from non-smokers. Indeed we and others showed that KRAS mutations were linked to tobacco consumption whereas EGF receptor (EGFR) mutations were found in non smokers [3][4][5]. The development of EGFR targeted therapies demonstrated that patients with major clinical response were those that had never smoked, had ADC with bronchioloalveolar component, were women and had EGFR mutations [6][7][8]. The biology underlying the pathogenesis of the disease may be different from that of smokers and risk factors have not been clearly identified although environmental etiologies are suspected especially in Asians [9]. Transformation of a normal phenotype into a malignant phenotype requires accumulation of multiple genetic and-or epigenetic changes resulting in growth advantage.
The genetic alteration proved to be linked with ADC from non-smokers is the presence of EGFR activating mutations. In order to improve our knowledge of lung cancer biology in non-smokers, one of the first questions to answer is: what are the molecular alterations associated to EGFR mutations in lung cancer?
Lung cancer develops as a result of multiple genetic alterations. Loss of heterozygosity (LOH) and gains of chromosome segments are common mechanisms of disease progression. Recently, high-density oligonucleotidebased single polymorphism have been used to quantify chromosome copy number and has been proved to be efficient [10][11][12]. In an attempt to identify genetic alterations associated with EGFR mutations, we used genome wide SNP assay covering 50000 SNP loci to screen for regions of allelic imbalance (amplified or LOH regions) in a panel of 13 EGFR mutated ADC and 11 non-EGFR mutated ADC.
Then, in a second part, we focused on cell cycle genes that were found to be differentially involved between groups and screened a large series of 98 NSCLC for genetic alterations at CCND1, CCNE1, CDKN2A and RB1. Alterations were studied according to other known mutations (EGFR, ERBB2, BRAF, KRAS, TP53 and STK11). This work led to the characterization of two different models of oncogenic cooperation one linked to smoking and the other not. Moreover, we identified four chromosome regions at 14q and 7p specifically amplified in EGFR mutated ADC.

Patients and methods
Patients with primary lung cancers were enrolled in this study according to French laws and have been previously described [4]. Briefly, patients had surgery for non-small cell lung cancer, no neoadjuvant treatment and were managed to the Georges Pompidou European Hospital in Paris, France from 2003 to 2004. All tumors but numbers 134, 135, 177 and 246 had been characterized for mutations in EGFR, KRAS, BRAF, ERBB2, ERBB3, PIK3CA, TP53 and STK11 [4]. STK11 mutations have not been previously published. Patient characteristics are summarized in Table 3. DNAs were extracted after pulverization in liquid nitrogen and protein kinase digestion using Qiamp tissue kit (Qiagen, Les Ulis, France). Twenty-four DNAs (13 with classic EGFR mutations, 11 without) and 6 non-tumor DNAs were selected for SNP array analysis. All tumors selected were from women.

Single Nucleotide Polymorphism Array Analysis
Genechip ® Mapping 50K-Xba array was used for this analysis. Preparation of DNA targets, labelling, hybridization, washing, staining and scanning was done according to the manufacturer's instructions (Affymetrix, UK) by Partner-Chip (Evry, France).
Data were analyzed using Copy Number Analyser for Affymetrix Gene Chip Mapping (CNAG 2.0) software [13]. We selected randomly 18 independent subjects from Mapping 100k HapMap Trio Dataset provided by affymetrix. Indeed, this software/algorithm uses a set of normal reference individuals and do not require the use of a paired normal sample to perform the analysis.
Data from CNAG were export in aCGH sotfware (R package) that was used for plot performance and statistical comparison of EGFR mutated and non-mutated tumors.

Quantitative PCR experiments
Validation of homozygous deletions (CDKN2A) and regions with focal amplification (CCND1 and CCNE1) was done on the 24 tumors screened by SNP array and extended to a total of 98 NSCLC (Table 2). Human serum albumin (HSA) was used as the reference gene. DNA concentrations were determined using ND-1000 spectropho-tometer Nanodrop technology and were normalized to 12,5ng/ul. Real time quantitative PCR using TaqMan probes was performed using an ABI Prism 7900 sequence detection system (Applied Biosystems, Courtaboeuf, France) with the software program SDS 2.0 (Applied Biosystems). Each assay was run on a 384 plate, tumor DNAs, normal controls (n = 6) and no template controls were run in triplicates for CCND1, CCNE1, CDKN2A and HSA. Primers and probes were described elsewhere [14]. Homozygous deletions at DCC, DSG2 and DSC3 as well as copy number changes of the HDAC9, TWIST1 and ITGB8 region were validated by real time PCR for the 24 tumors screened by SNP array. Primers and probes where designed with Primer Express 2.0 software program (Applied Biosystems) (see Additional file 1). All primers were purchased from Operon (Cologne, Germany) and probes from Applied Biosystems.
The PCR mix consisted of ABsolute™ QPCR MIX 1× (ABgene, Courtaboeuf, France), primers 300 nM, probe 200 nM, H 2 0 and 25 ng of DNA template in a final volume of 10 ul. Cycling condition were denaturation 95°C for 15 min and 40 cycles of 95°C, 15 sec followed by 60°C, 15 sec. Quantification was done by normalizing the results to those of HSA. The normalized amount of gene in tumor samples was determined by designating the average of ΔCt of 6 non-tumor tissues as calibrator. 2 × 2 -ΔΔCt represented an estimation of the number of gene copy in tumor tissues.

STK11 Mutations screening
Exons 1 to 9 were screened by direct sequencing. Primers used for the amplification and sequencing of each exon and intron-exon junctions and PCR conditions are available upon request.

Statistical analysis
Fractional allelic loss (FAL) and fractional allelic amplification (FAA) were calculated for each tumor as the number of chromosome arms with either loss of heterozygosity or amplifications divided by the number of chromosome arms tested (41). Mean FAL and FAA were compared using student T test. Qualitative variables were compared using chi square test or Fisher exact test when necessary. All tests were performed using STATA 7.0 (StataCorp LP, College Station, TX) aCGH software was used to test the existence of meaningful differences between focal chromosome alterations in EGFR mutated and non-mutated tumors. False discovery rate test (FDR) has been used to assess p values. FDR represents the expected percentage of false positive among the claimed positive and estimates global error for multiple testing sit-uations. Therefore p values were adjusted to the number of tests performed.

Array global analysis
Mapping genome wide chromosomal alterations in EGFR mutated lung ADC (n = 13) versus non-mutated ones (n = 11) had two different objectives. First, a global comparison of allelic imbalances and second, a targeted analysis of specific loci to point out genes implicated in the oncogenesis of one subtype or the other. It is to note that, in an effort to homogenize both groups, non-EGFR mutated ADC are all KRAS mutated. Fractional allelic loss (FAL) and fractional allelic amplification (FAA) were calculated as the number of chromosome arm with LOH or amplified loci divided by 41 chromosome arms.

Array targeted analysis
Differential analysis between EGFR mutated and KRAS mutated tumors using aCGH package [15] showed that one region at 14q (p = 0.027) and three regions at 7p (p = 0.032, p = 0.042, p = 0.043) were statistically more frequently amplified in the EGFR group ( Figure 1c). Detailed statistic analysis is shown as supplementary data (see Additional file 2). Genes located in these regions are given in Table 1. The gene located at 14q is a MAM domain protein. The MAM domain is present in many cell surface proteins and is thought to be involved in cell-cell adhesion, protein-protein interactions, and signal transduction, whether this protein could be linked to carcinogenesis remain to be studied [16]. Of the ones located at 7p21.1, it is to note that HDAC9, TWIST1 and ITGB8 are potential targets. Gene copy numbers at 7p21.1 were validated by quantitative PCR using probes HDAC9 and ITGB8 in the 24 tumors and 7 cell lines were tested for copy number changes using same probes (Calu6-H460-A549-H1299-H1650-H1975-H358), both cell lines with EGFR mutation were found amplified (H1650-H1975) versus one out of five without EGFR mutations (H358). EGFR is located at 7p11.2, among tumors with EGFR mutations, one has no copy number alteration at 7p, 9 showed concomitant 7p11.2 and 7p21.1 amplifications (>3) and 3 had a localized amplification, 2 at 7p21.1 and 1 at 7p11.2. EGFR amplifications were not statistically related with EGFR mutation but 5/13 in the EGFR group versus 1/ 11 in the KRAS group had estimated copy number > 4.
Regardless of statistical difference between the two groups, focal amplifications and homozygous deletion are of special interest as they may indicate oncogenes or tumor suppressors. Recurrent regions of deletions and focal amplifications were defined as segments of at least 5 SNP loci in more than 2 tumors. Homozygous deletions were identified at chromosome regions 2q36.3, 9p21.1, 12q13.13, 18q12.1 and 18q21.2. The 9p21.1 locus contains CDKN2A and the 18q21.1 contains DCC, both are well known tumor suppressors. Genes in the 12q13.13 and 2q36.3 regions have not been linked to cancer up to now and the 18q12.1 region contains a cluster of genes coding desmosomal proteins. All homozygous deletions were found in the EGFR mutated group (Table 2). For the 24 tumors, quantitative PCR was ran to estimated gene copy number at CDKN2A, DSG3, DNA copy number alterations by SNP array analysis. Figure 1 DNA copy number alterations by SNP array analysis. Figure 1 represents the fraction of the samples with copy number amplification of at least three copies (green) and copy number reduction (red) across all chromosome SNPs; in the EGFR non-mutated/KRAS mutated group (A) and the EGFR mutated/ KRAS non-mutated group (B). (C) Statistical comparison of both groups showing regions of amplification 20 statistically linked to EGFR mutated tumors (black arrows). False discovery rate (FDR) has been used to estimate global error for multiple testing situations.

Cell cycle related genes and lung carcinogenesis
The presence of CDKN2A homozygous deletion was restricted to the EGFR group and concerned 3 tumors out of 13 while LOH at this locus was present in 4 EGFR mutated tumors versus 2 non-mutated ones. Furthermore, since LOH at RB1 was restricted to EGFR mutated tumors we made the hypothesis that alterations of the G1-S check point could be different in EGFR mutated tumors as compared to non-mutated ones. Differences at CDKN2A and RB1 did not reach statistical significance at the array level however it deserved to be confirmed in a larger series of NSCLC and enlarge to other key regulators as cyclins. LOH at RB1 locus clustered to the EGFR group, therefore we screened the entire coding sequence and exon-intron boundaries for alterations in the subgroup of 13 EGFR mutated tumors. As no mutation was identified in this subgroup of tumors, RB1 sequencing was not done on the entire series. Then, CDKN2A homozygous deletions, CCNE1 and CCND1 amplifications were analyzed by real time quantitative PCR on a series of 98 NSCLC including the 24 tumors previously typed by SNP array. All tumors had been characterized for EGFR, ERBB2, PIK3CA, BRAF, KRAS and STK11 mutations. Briefly, 13 and 6 NSCLC had CCND1 or CCNE1 amplification respectively and 8 tumors had homozygous deletion at CDKN2A locus including the 3 previously found in array analysis (Table  3).

Relation between cyclin amplification and clinicopathological parameters
A significant association was found between CCND1 amplification and tobacco exposure (p = 0.023) and TP53 mutations were linked to CCNE1 (p = 0.006), or CCND1 (p = 0.048) amplifications (Table 4). One tumor showed simultaneous amplification of both cyclins.

Relation between CDKN2A homozygous deletion and clinicopathological parameters
CDKN2A homozygous deletions were significantly associated with EGFR mutations (p = 0.002) and absence of tobacco exposure (p = 0.012). One tumor had CDKN2A homozygous deletion and CCNE1 amplification (

Discussion
In lung cancer, EGFR inhibitors have been shown to be efficient in tumors with activating mutations of the receptor. This molecular alteration defines a subgroup of patients with specific clinicopathological features. The most striking one is the fact that patients are mostly nonsmokers. Understanding the carcinogenesis pathway that drives lung carcinogenesis in non-smokers is therefore of major interest. The present work represents the first comparative study of genome wide allelic imbalance between EGFR mutated lung ADC and KRAS mutated ADC. In our work, EGFR mutated tumors are from non-smoking women with ADC, the control group was selected among smoking women with ADC and without EGFR mutations.
As most of them have KRAS mutations, we chose to consider for the SNP array only tumors with KRAS mutations in order to have a homogenous control group. Moreover, it was already suggested that KRAS and EGFR mutations were hallmarks of tobacco and non-tobacco induced lung carcinogenesis [17]. Different reports showed that SNP array technology provided the opportunity to assess DNA copy number and LOH through the entire genome [12,[18][19][20]. The overall pattern of alterations seen in this study is reliable with previous lung ADC studies. As already noted, frequent chromosome gains are found at 1q, 5p, 7p/q, 8q and 14q and losses at 6q, 8p, 9p, 13q, 18q and 19p [11,[20][21][22]. But new regions of homozygous deletion have been found in EGFR mutated tumors especially at 18q involving the DCC gene and a cluster of desmosomal proteins. New regions of focal amplification at 7p and 14q have also been delineated that encodes potential oncogenes [22].
Using this technology, we showed that genome wide allelic imbalance patterns are different in tumors from non-smokers when compared to that of smokers. Indeed, more LOH and amplified regions were found in the EGFR mutated group when considering either all regions > 5 SNPs or entire chromosome arm losses or amplifications. This fact has already been suggested in a previous work using microsatellite markers and showing that tumors in non-smokers had more alterations than that of smokers [23]. The mechanisms leading to this increased chromosomal instability is not yet understood. In this series TP53 mutations were equally distributed in both groups therefore the impact of TP53 on genome stability cannot be discussed. Other mechanism such as repair defects could be involved in increased chromosomal instability in nonsmokers.     In parallel to DNA assays, RNA expression profiles also demonstrated differences between ADC from smokers when compared with non-smokers [24,25]. In a recent paper dealing with expression profiling of epidermal growth factor receptor/KRAS pathway activation in lung cancer two groups of ADC were individualized, one being a bronchial-type, the other an alveolar-type [26]. Unsupervised classification failed to detect any specific group of tumors that had EGFR mutations but they showed 26 genes preferentially expressed in EGFR mutated group. Among them, three genes EGFR, PTK7 and HMOX2 were found in our series located in focal regions of amplifications. Finally, a genetic classification of lung ADC by CGH array showed that EGFR mutated tumors could be individualized as a specific cluster [27]. All together, these results tend to prove that lung ADC in non-smokers forms a distinct disease at least at a molecular point of view. SNP arrays allow the identification of many loci with either LOH or amplification. Some of which may represent background alterations with no specific involvement in the carcinogenesis process. The identification of regions that are critical for tumor cell proliferation is of major importance. In order to go further in the identification of alterations linked to EGFR mutations we focused on genes implicated in cell cycle regulation and validated SNP array information on a large series of NSCLC. Previous genome analysis using SNP Arrays also showed CDKN2A homozygous deletions and CCNE1 amplifications in lung cancers [12]. We found a link between CDKN2A homozygous deletion and the non-smoking status as already suggested by Kraunz et al. [28]. Moreover, in our series CDKN2A deletion was associated with EGFR mutation and maybe with an activation of the EGFR/PI3K/ AKT/mTOR transduction pathway. Indeed, 7 out of 8 tumors with CDKN2A homozygous deletion have EGFR, STK11 or PIK3CA mutations. Activation of this signal transduction pathway can lead to enhance transcription of cell cycle genes as CCND1 and it is not surprising that no cyclin amplification was observed in this group. For this group of tumors, it seems that proliferation is under the dependence of an activation of cell cycle through EGFR/PI3K/AKT/mTOR signaling and an inactivation of the cell cycle inhibitor CDKN2A preventing cells to downregulate proliferation. The CDKN2A locus at 9p21 encodes two genes, one inhibits CDK4 mediated RB phosphorylation (CDKN2A/p16) and the other binds MDM2 leading to TP53 stabilization (p14/ARF). qPCR experiments reported here, analyzed homozygous deletions at 9p21 and therefore co-deletion of both genes. A recent paper showed that the p14/ARF protein was frequently down regulated in lung cancers with EGFR mutation or in tumors with ERBB2 mutations. As in our work, down regulation of the CDKN2A/p14/ARF locus could be linked to PI3K/AKT/mTOR activation as both EGFR and ERBB2 activate this pathway [29]. Although, regarding our results, it is very surprising that in their experiments, CDKN2A expression measured by immunohistochemistry remained positive when p14/ARF expression was extinguished. It suggested that p14/ARF could be down regulated independently of CDKN2A by mutation or promoter hypermethylation in EGFR mutated tumors. Promoter hypermethylation has indeed been shown to turn off CDKN2A locus in cancer but CDKN2A hypermethylation was linked to heavy smoking and squamous cell cancer which represents a different subgroup of tumors [30].
Here we suggest that CDKN2A/p14/ARF locus homozygous deletion could be an alternative mechanism to down regulate cell cycle inhibitors in ADC from nonsmoking patients. In our series, Illumina GoldenGate Assay for methylation was used to quantify CDKN2A methylation, in the 24 tumors analyzed by SNP array. No difference in methylation status was found between EGFR, KRAS mutated tumors and non-tumor tissues (data not shown). Two other papers reported a negative correlation between EGFR mutation and CDKN2A methylation status in NSCLC [31,32]. Cyclin amplifications were in opposite related to tobacco and associated to TP53 mutations. In this case, cell cycle is activated through direct amplification of cyclins and TP53 inactivating mutations unable cells to repress proliferation. An association between CCND1 and TP53 was already suggested by protein expression analysis [33].
Although these models are not to generalize since other mechanisms such as mutations epigenetic alteration and protein overexpression can activate oncogene or inactivate tumor suppressor, our work enlightened two different carcinogenesis pathways in NSCLC. One is tobacco independent and driven by CDKN2A inactivation and EGFR mutations, the other is smoking dependent and driven by cyclin amplification and TP53 mutation ( Figure  2).
Represents two possible models of oncogenic cooperation in smokers

Conclusion
Although this work concerned a limited series of tumors, we focused on a comparative analysis and showed that patterns of genome wide genetic alterations are different between ADC with and without EGFR mutation. More alterations and a higher frequency of large alterations (entire chromosome arms deletion or amplification) were found in the EGFR mutated group suggesting that carcinogenesis pathways are different. Indeed, for a subset of tumors specific involvement of cell cycle genes were identified and an oncogenic cooperation between EGFR mutations and CDKN2A homozygous deletion was identified.