Genetic lineages of undifferentiated-type gastric carcinomas analysed by unsupervised clustering of genomic DNA microarray data

Background It is suspected that early gastric carcinoma (GC) is a dormant variant that rarely progresses to advanced GC. We demonstrated that the dormant and aggressive variants of tubular adenocarcinomas (TUBs) of the stomach are characterized by loss of MYC and gain of TP53 and gain of MYC and/or loss of TP53, respectively. The aim of this study is to determine whether this is also the case in undifferentiated-type GCs (UGCs) of different genetic lineages: one with a layered structure (LS+), derived from early signet ring cell carcinomas (SIGs), and the other, mostly poorly differentiated adenocarcinomas, without LS but with a minor tubular component (TC), dedifferentiated from TUBs (LS−/TC+). Methods Using 29 surgically resected stomachs with 9 intramucosal and 20 invasive UGCs (11 LS+ and 9 LS−/TC+), 63 genomic DNA samples of mucosal and invasive parts and corresponding reference DNAs were prepared from formalin-fixed, paraffin-embedded tissues with laser microdissection, and were subjected to array-based comparative genomic hybridization (aCGH), using 60K microarrays, and subsequent unsupervised, hierarchical clustering. Of 979 cancer-related genes assessed, we selected genes with mean copy numbers significantly different between the two major clusters. Results Based on similarity in genomic copy-number profile, the 63 samples were classified into two major clusters. Clusters A and B, which were rich in LS+ UGC and LS−/TC+ UGC, respectively, were discriminated on the basis of 40 genes. The aggressive pattern was more frequently detected in LS−/TC+ UGCs, (20/26; 77%), than in LS+UGCs (17/37; 46%; P = 0.0195), whereas no dormant pattern was detected in any of the UGC samples. Conclusions In contrast to TUBs, copy number alterations of MYC and TP53 exhibited an aggressive pattern in LS+ SIG at early and advanced stages, indicating that early LS+ UGCs inevitably progress to an advanced GC. Cluster B (enriched in LS−/TC+) exhibited more frequent gain of driver genes and a more frequent aggressive pattern than cluster A, suggesting potentially worse prognosis in UGCs of cluster B.


Background
Gastric carcinoma (GC) have been classified histologically into intestinal, diffuse and unclassified types by Lauren [1] and the unclassified type was further divided into solid and mixed types by Carneiro [2]. The undifferentiated-type gastric carcinoma (UGC) according to the Japanese classification [3] mostly overlaps poorly differentiated GC, which comprises not only the diffuse type including signet ring cell carcinoma (SIG) but also the solid type and the mixed type with minor tubular component (TC).
Recently it has been proposed that advanced diffusetype GC may derive from either early diffuse-type or intestinal-type GC. Well differentiated tubular adenocarcinoma (TUB) can transform into poorly differentiated adenocarcinoma (POR) after the silencing of cell adhesion-related genes including CDH1 [4,5]. Carneiro's mixed type carcinomas may thus overlap dedifferentiated TUBs. It has been reported that the survival rate of the patients with mixed-type GCs was significantly lower than that of the patients with GCs of other types [2], whereas the survival rate of early GC patients with SIG was higher than that of GC patients without SIG [6]. Thus UGCs may be divided into subgroups with different prognosis. Recently, a mass-screening program for neuroblastomas [7][8][9] was suspended in Japan because a discontinuous genetic lineage was observed between the early-and the late-presenting neuroblastomas. Negative and late-presenting (≥1 year) neuroblastomas exhibited near-diploidy with terminal 1p deletion, whereas positive neuroblastomas in infants exhibited near-triploidy without 1p deletion [10,11]. To perform such subgrouping, we have classified UGCs based on the continuity of genetic lineages as well as the expression of morphological lineage markers.
Our lineage analysis using chromosomal comparative genomic hybridization (CGH) was based on distinctive morphological lineage markers. A layered structure (LS) represents an incipient phase of SIG development [12] and is commonly retained even at an advanced stage in the human stomach. In tumour regions with LS, the mode of cell proliferation resembles that in the normal gastric mucosa. And it is believed that tumour cells remain confined to the mucosa as far as they grow to form the LS [13]. Our lineage analyses confirmed that POR with LS was derived from intramucosal SIG, whereas POR without LS and with a minor TC (< 30%), was derived from TC [14,15]. However, the TC was not always derived from early TUB but could also be derived from SIG, whereas LS was scarcely derived from TUB [15]. Therefore, as a morphological lineage marker, LS may take priority over TC. In addition, UGCs without LS or TC due to secondary loss of these markers are observed, which prompted us to adopt array CGH (aCGH) and unsupervised cluster analyses of the aCGH data to classify UGCs solely on the basis of similarity in the genomic copy number profile.
In differentiated-type gastric carcinomas (DGCs), our recent aCGH-based lineage analyses revealed two genetic lineages: one with copy-number loss of MYC and copynumber gain of TP53 (MYC− and TP53+), a dormant pattern, and the other with the copy-number gain of MYC and/or copy-number loss of TP53 (MYC+ and/or TP53−), an aggressive pattern. The dormant pattern accounted for 70% of intramucosal carcinoma samples and a half of the intramucosal part samples of invasive carcinomas. The invasive parts of invasive carcinomas mostly exhibited the aggressive copy number alteration (CNA) pattern. When the intramucosal part of an advanced cancer was dormant, the lineage was discontinuous between the mucosal and invasive parts. Therefore, the MYC−/TP53+ and MYC+ and/or TP53− CNA patterns may be signatures of dormant and aggressive TUBs, respectively [16].
In the present study, genomic DNA samples from the mucosal and invasive parts of early and advanced UGCs were prepared and subjected to gene copy-number analyses using aCGH, followed by unsupervised cluster analysis of the aCGH data. Based on these results, we examined relationship between morphological and genetic lineage markers and identified several useful lineage marker genes for UGC.

Methods
The Institutional Review Board on Medical Ethics at Shiga University of Medical Science approved this study on the condition that the UGC samples used were anonymous. Written informed consent was not required because this retrospective study used archival samples.

Tissue samples
This study included 29 surgically resected, buffered formalin-fixed, paraffin embedded UGCs: 20 with LS in at least part of the tumour (LS+, 9 intramucosaltumours and 11 invasive tumours) and 9 without LS but containing a small TC (LS−/TC+, all invasive tumours) ( Table 1). TC was defined as a well or moderately differentiated adenocarcinoma component comprising ≤ 30% of the entire tumour [15]. All samples were selected from GC cases diagnosed in our department from 1997 to 2011. Intramucosal LS+ UGC patients averaged 57.6 years of age (range, 48-79) and patients with invasive LS+ UGCs 60.2 years (range, 48-79) and patients with invasive LS−/TC+ UGCs 62.2 years (range; 50-75). The macroscopic classification was determined according to the Japanese Classification of Gastric Cancer with TNM staging [3].

LS evaluation
LS was defined as in a previous study [17]. In brief, LS+ regions had small carcinoma cells confined to the stroma at the gland-neck level that gradually differentiated to signet ring cells in the superficial (and deep) lamina propria (Figure 1a). The absence of LS in intramucosal regions of the tumour was defined by four patterns: 1) contact of small carcinoma cells to the muscularis mucosae in SIG, 2) mucinous adenocarcinoma, 3) POR and 4) the presence of a TC (Figure 1b-f ).

Laser microdissection and DNA preparation
Tumour tissue samples were obtained from 5-μm-thick tissue sections using a LMD6000 laser microdissection system (Leica Microsystems, Wetzlar, Germany). For invasive cancers, DNA samples were obtained from both the intramucosal and invasive parts. For each sample, cancer tissues were obtained from an area >6 mm 2 , in which cancer cells accounted for ≥70% of the total cell count. Tissue samples were digested in 200 μg/ml proteinase K solution for approximately 72 hours at 37.0°C and genomic DNA extracted with phenol/chloroform.

Whole genome amplification
Sample DNA was amplified using the GenomePlex Whole Genome Amplification Kit (WGA2 Kit; Sigma, St. Louis, USA) [18]. For some DNA samples that could not be sufficiently amplified, the WGA5 Kit (Sigma) was employed.

Array CGH
An oligo CGH microarray (60K, 60-mer) (Agilent, Santa Clara, USA) was used in this study, according to the manufacturer's instructions. In brief, the amplified tumour and control DNA samples were nonenzymatically labelled with Cy5 and Cy3, respectively, using the Genome DNA ULS Labelling Kit (Agilent) and competitively hybridized to the microarray. The hybridized array images were captured using a DNA microarray scanner (Agilent) and then the fluorescence  intensity of the tumour and control at each probe dot was calculated by Feature Extraction Ver.9.5.3 (Agilent). The array data were normalized using Genomic Workbench software Ver.5.0 (Agilent). The positions of oligomers are based on the Human Genome February 2009 assembly (hg19). Copy-number gains and losses were defined as changes in the logarithm to the base 2 of the tumour to reference signal intensity ratio (T/R) greater than 0.3219 and less than −0.3219, respectively.

Cluster analysis
To perform novel subtyping of UGC samples based on genomic profile similarity in this study, an unsupervised hierarchical cluster analysis was applied across 63 samples from 29 UGC cases by using the Cluster 3.0 and TreeView software programs. The clustering algorithm was set to complete linkage clustering using an uncentered correlation. To enable unsupervised cluster analysis, we performed unbiased reduction in probe number from around 60,000 to several thousands of probes. For this purpose, we selected large genes because the greater number of corresponding probes resulted in improved signal-to-noise ratio of the representative gene copy numbers. The unsupervised strategy enabled us to set an internal standard to validate clustering results; the copy number profiles in samples of the same tumour should be more similar than any copy number profiles from another tumour because the gene alterations in the process of carcinogenesis are largely common among the samples from the same tumour.

Statistical analyses
Differences in contingency tables were assessed for statistical significance using Fisher's exact test. A P < 0.05 (2sided) was considered statistically significant. The Welch's t test was used to evaluate the difference in mean DNA copy number for each probe between two clusters of samples. The Bonferroni correction was used to correct for multiple comparisons.

Samples analysed with array CGH
Tissue samples were excised from 29 archived GC specimens by laser microdissection. The tissue sample population included 11 regions (from 9 intramucosal SIGs), of which 9 regions were LS+ and the other two LS−, 26 regions (from 11 LS+ invasive UGC), of which 10 were LS+ mucosal regions, 8 were LS− mucosal regions and 8 were invasive regions, and 26 regions (from 9 LS−/TC+ invasive UGCs): 9 intramucosal POR, 9 intramucosal TC and 8 invasive regions.

Genome wide copy number alterations
A plot of the genetic aberration penetrance for all chromosomes is shown for LS+ UGCs and LS−/TC+ UGCs in Figure 2a and Figure 2b, respectively. Copy-number gains and losses were more common in LS−/TC+ UGCs than in LS+ UGCs. The most frequent copy-number gains were detected at 3q26 (7/63 samples), 5p15 (8/63), 8p23 (9/63), 8q24 (7/63) and 12p12 (6/63), while the most frequent copynumber losses were found at 7q36 (5/63) and 12p12 (5/63). Copy-number alterations (CNAs) common to all the samples from the same tumour were called stemline changes [14] and estimated to occur at the earliest stage of tumourigenesis and to be inherent into tumour lineage. Stemline gains of 3q26 were detected in 2/20 cases of invasive LS+ UGCs and none of invasive LS−/TC+ UGCs. In contrast, stemline gains of 5p15, 8p23 and 12p12 were detected in 2/9 cases of invasive LS−/TC+ UGCs but in no case of invasive LS+ UGCs. No stemline losses were detected in any cases of UGCs.

Impartial selection of genes reflecting the whole genome profile
To classify UGC samples based on the overall similarity in the profile of gene copy number changes, we used unsupervised hierarchical cluster analysis. For this purpose, Mucosal LS-part of invasive LS+ UGCs Figure 4 Array CGH data of MYC and TP53 in LS+ UGCs and LS−/TC+ UGCs. LS+ UGCs are divided into intramucosal cancers and invasive cancers. Numerals mean the base 2 logarithm of the test/reference signal intensity ratios of array CGH data. Significant gains and losses are indicated with red and green, respectively. The samples marked with and without grey margin are included in cluster B and cluster A, respectively, in Figure 3.
it was necessary to reduce the number of gene probes used in the cluster analysis from 60K to several thousands. The reduced number of genes should still reflect the whole genome profile if impartially selected. To fulfil these conditions, we selected genes based solely on the size of genes (the numbers of corresponding probes). After repeated trials of cluster analyses using genes of various minimum sizes (or probe numbers per gene), we observed that most CNAs from the same tumour were clustered more closely together than any samples from another UGC case when we analysed only genes with 3 or more probes per gene: a total of 5019 genes.

Classification of UGC using hierarchical cluster analysis
We applied an unsupervised two-dimensional hierarchical clustering algorithm, to a total of the 63 DNA samples from 29 UGCs. The samples were classified into two major clusters A and B, based on similarity in the genome profile ( Figure 3) Figure 4). Therefore, the aggressive pattern was more frequently detected in invasive LS−/ TC+ UGCs than in LS+ UGCs (P = 0.0195). The dormant pattern (MYC− and TP53+) was not detected in any of the UGC samples, even those from intramucosal GCs ( Figure 4).
Copy number alterations of genes other than MYC or TP53 As mentioned above, 5p15 was one of the most frequent gain sites in invasive LS−/TC+ UGCs (8/26; 30.7%), but was not detected in any of the 37 intramucosal and invasive LS+ UGCs (Figure 2). The target genes located at this locus may include the telomerase reverse transcriptase gene (TERT) because a TERT gain was more frequently detected in invasive LS−/TC+ UGCs than intramucosal and invasive LS+ UGCs (16/26 vs. 1/37, P < 0.0001) ( Figure 5). In contrast, losses of TERT were detected in 4/37 samples of intramucosal and invasive LS+ UGCs (10.8%) but not in invasive LS−/TC+ UGCs. Welch's t test was performed to compare the mean T/ R ratio between the samples in cluster A and those in cluster B at each 2756 probe loci of 979 cancer-related  Figure 5 Array CGH data of genes other than MYC and TP53 with significantly different T/R ratio between clusters A and B. UGCs are divided into clusters A and B that were determined in Figure 3. The heat map indicates the base 2 logarithm of the test/reference signal intensity ratios of array CGH data. Gains and losses are indicated with red and green, respectively.  Table 2). Most of log 2 T/R ratios of the 43 distinguishing gene probes were of opposite sign between clusters A and B, with greater in absolute values in cluster B ( Figure 5).

Discussion
Based on chromosomal CGH analysis, we have reported that there are two distinct UGC lineages: the LS+ lineage derived from early SIG and LS−/TC+ lineage dedifferentiated from TUB [15]. The former is characterized by LS and the latter by a small TC. However, there are also UGCs without these morphological lineage markers.
In the present study, we classified UGC based on similarity in the whole genome copy number profile among samples using unsupervised hierarchical cluster analysis and examined the correlation between this gene-based classification and morphological lineage markers. Using 5019 large genes and aCGH data from 63 DNA samples from 29 UGCs, we confirmed that most of the samples examined from the same tumour were clustered more closely together than in any other sample, thus fulfilling the criteria for our internal standard. On the basis of this observation, we performed an unsupervised twodimensional hierarchical cluster analysis. All the samples were classified into two major clusters A and B (Figure 3). Cluster A was rich in LS+ UGCs, whereas cluster B was rich in LS−/TC+ UGCs. This difference was statistically significant (P = 0.0001) and indicates that the classification by the presence or absence of LS and TC is well correlated with the genomic-profile-based classification and validates the LS+ and LS−/TC+ as lineage markers.
All the intramucosal LS+ UGCs were included in cluster A, suggesting that most of UGCs in cluster A were derived from intramucosal SIG, and that the LS−/TC+ UGCs in cluster A may have secondarily lost LS. The LS−/TC+ UGCs in cluster A may also be derived from SIG, as suggested by chromosomal CGH studies [15]. In contrast, LS in advanced LS+ UGCs in cluster B (A107, A108 and A110) was virtually indistinguishable morphologically but showed genomic constitutions different from LS in cluster A. This may be a kind of phenocopy; a fraction of LS+ UGCs were considerably similar in genomic profile to LS−/TC+ UGCs. Although LS exhibits regular cell proliferation and differentiation and a superficially spreading dormant growth [13], it is suggested that LS itself is not a marker of persistent tumour dormancy but has the potential to progress to an advanced stage with the prognosis as poor as that for LS−/TC+ UGCs. This situation may resemble that in chronic myeloid leukaemia, in which blastic transformation occurs after a dormant phase of well retained cellular differentiation.
Most UGCs exhibited the aggressive genomic pattern (TP53− and/or MYC+), even 55% of intramucosal LS+ UGCs, an incidence comparable to that in invasive UGCs. The dormant pattern (MYC− and TP53+) was not detected in any of the UGC samples, even in intramucosal UGCs. These intramucosal UGCs are distinct from early DGCs, in which 70% are of the dormant type [16]. Therefore, TP53 and MYC are not as useful prognostic markers for UGCs.
To explore other genes important for differentiation of genetic lineage and for UGC prognosis, we first compared the profiles of chromosomal copy-number alterations (CNAs) between LS+ and LS−/TC+ UGCs. As shown in Figure 2, CNAs detected in LS+ tumours but not in LS−/TC+ tumours, include 3q26 gain, a locus likely to include SKIL because the average SKIL copy number was greater in LS+ tumours than in LS−/TC+ tumours (P = 0.0060). SKIL encodes SnoN protein that is proto-oncogenic by antagonizing cytostatic responses of TGF-β [28,29] and anti-oncogenic by activating p53 [30]. Those CNAs with the opposite pattern (present in LS−/TC+ tumours but not in LS+ tumours) were gains at 5p15, 8p23 and 12p12. The target genes at 5p15 and 12q12 include TERT, and KRAS, respectively because gains of TERT, and KRAS were more frequently detected in invasive LS−/TC+ UGCs than intramucosal and invasive LS+ UGCs (P < 0.0001 and P = 0.0032, respectively). No target gene was detected at 8q23.
Our second approach to identify lineage-specific CNAs was a screening of genes (from 979 cancer-related genes) that indicated significantly different mean T/R ratios between the samples of clusters A and B. We selected 40 genes that were significantly different between clusters In the column of gene name, "*" indicates genes related to tumour growth and "**" those related to invasion and metastasis.
(using t test after Bonferroni correction), of which 6 were related to enhanced tumour growth and 8 to invasion/metastasis ( Table 2). As shown in Figure 5, genes that drive tumourigenesis were more common in cluster B and showed larger amplitude CNAs. Thus, UGCs in cluster B may be more dependent on oncogenic genomic alterations and less on environmental and epigenetic alterations than those in cluster A. The possible drivers of tumour growth screened included KIT, TERT, and RAS family genes. KIT encodes a receptor tyrosine kinase that is activated by stem cell factor binding and initiates numerous signal transduction pathways linked with the process of apoptosis, proliferation and tumorigenesis [31]. RAS family genes encode small GTPase that plays a key role in transduction of signals from receptor kinase to the pathways of various cellular processes [32]. TERT encodes the telomerase catalytic subunit that plays not only an important role in cellular immortalization by telomere elongation [33,34] but also activates cell proliferation [35]. The possible drivers of invasion and metastasis screened include ETS1 and Ephrin receptor genes. ETS1 encodes a transcription factor, Ets1 protooncoprotein that promotes invasiveness and is an indicator of poor outcome in epithelial cancers through regulation of MMP1, MMP3, MMP9, uPA, VEGF and VEGF receptor expression [36]. Ephrin receptor genes, EPH39B, A7, A5 and A10 genes encode the ephrin receptor with tyrosine kinase activity that affects tumor growth, invasiveness, angiogenesis, and metastasis [37].
There were no significant differences in the mean copy number of CDH1 and its transcriptional repressor genes (SNAI1, SNAI2, ZEB1, ZEB2, TWIST1, etc.) between the clusters A and B, although these genes were reportedly associated with a poorly differentiated phenotype and poor clinical outcome [38]. However, these genes may still participate in UGC tumourigenesis through epigenetic silencing [39].
We are now extending this study to to validate UGCassociated genes as indicated by aCGH by quantitative PCR and to correlate their genomic copy number to gene expression and prognosis. Thereafter, using quantitative PCR analyses instead of aCGH, similar analyses should be applied to a greater number of tumour cases with known outcomes.

Conclusions
Unsupervised cluster analyses of aCGH data of multiple samples from early and advanced UGCs have demonstrated that early UGCs, including LS+ types in which polarity of cell proliferation and differentiation is well retained, have aggressive potential. Therefore, eradication of UGCs at early stages may thus contribute to better patient survival. In addition, it was observed that the two UGC lineages, one derived from early SIG and the other from TUB, have different genomic copy-number alteration profiles, resulting in different sets of genes contributing to tumourigenesis. The latter lineage from TUB may be more dependent on genomic copy-number alterations and have a poorer outcome than UGCs derived from SIG.