An array CGH based genomic instability index (G2I) is predictive of clinical outcome in breast cancer and reveals a subset of tumors without lymph node involvement but with poor prognosis

Background Despite entering complete remission after primary treatment, a substantial proportion of patients with early stage breast cancer will develop metastases. Prediction of such an outcome remains challenging despite the clinical use of several prognostic parameters. Several reports indicate that genomic instability, as reflected in specific chromosomal aneuploidies and variations in DNA content, influences clinical outcome but no precise definition of this parameter has yet been clearly established. Methods To explore the prognostic value of genomic alterations present in primary tumors, we performed a comparative genomic hybridization study on BAC arrays with a panel of breast carcinomas from 45 patients with metastatic relapse and 95 others, matched for age and axillary node involvement, without any recurrence after at least 11 years of follow-up. Array-CGH data was used to establish a two-parameter index representative of the global level of aneusomy by chromosomal arm, and of the number of breakpoints throughout the genome. Results Application of appropriate thresholds allowed us to distinguish three classes of tumors highly associated with metastatic relapse. This index used with the same thresholds on a published set of tumors confirms its prognostic significance with a hazard ratio of 3.24 [95CI: 1.76-5.96] p = 6.7x10-5 for the bad prognostic group with respect to the intermediate group. The high prognostic value of this genomic index is related to its ability to individualize a specific group of breast cancers, mainly luminal type and axillary node negative, showing very high genetic instability and poor outcome. Indirect transcriptomic validation was obtained on independent data sets. Conclusion Accurate evaluation of genetic instability in breast cancers by a genomic instability index (G2I) helps individualizing specific tumors with previously unexpected very poor prognosis.


Background
Despite entering complete remission after primary treatment, a substantial proportion of patients with early stage breast cancers will evolve towards metastatic relapse, sometimes after a delay of many years [1]. Such an evolution led to the concept of cell dormancy in which the metastatic process results from the migration of individual cells capable of forming a new tumoral localization, even after a long latency [2]. This model, suggesting heterogeneity of the metastatic power within the constitutive cells of a primary tumor, found a new interest with the hypothesis of the existence of cancer stem cells capable to generate such secondary localizations [3].
Eliminating these cells is the objective of adjuvant therapy which is given after optimal local treatment. Efficiency of such a therapeutic strategy is well established [4], but accurate identification of patients for adjuvant treatment requires appropriate prognostic factors that are not clearly established. The main conventional prognostic factor in early breast carcinoma is the staging of axillary node involvement reflecting the cancer cells' ability to diffuse and the level of invasion [5]. This criterion however is not completely accurate in predicting patient outcome since 25% of patients without axillary lymph node invasion show metastatic relapse at ten years [6]. Among the many other factors that have been tested, several show proven prognostic value such as tumor size, histological grade [7], peritumoral vascular emboli, or the expression of steroid hormone receptors [8]. With the advent of gene expression profiling and the identification of five intrinsic breast cancer subtypes [9][10][11], prognosis in breast cancer is now considered within each molecular subtype. Subsequent gene expression studies have identified prognostic transcriptomic profiles that appear to be pertinent for the prognostication of short term relapses, specifically in estrogen receptor positive breast cancer [12,13].
The ability of gene signatures from bulk tumors to predict metastatic relapse is difficult to reconcile with the model putting forward that rare tumor stem cells mediate metastasis [14]. It is necessary to conceive that the various described prognostic signatures are the reflection of an intrinsic characteristic of cancer cells rather than a specific biological characteristic including the ability to migrate and to form cell colonies outside of the primary site [15]. Effectively, most of the proposed prognostic signatures reflect an increased expression of proliferation genes, one of the hallmarks of cancer [16]. Because another hallmark of cancer is loss of genetic stability and because gene expression signatures linked to chromosomal instability have shown some predictive value for metastatic relapse in various kinds of cancer [17,18], we explore by array-CGH analysis the prognostic value of genomic alterations in a series of breast carcinomas with known outcomes after 11 years of follow-up and confirm the main results obtained on publicly available sets of tumors.

Patient samples
Tumor samples are from the tumor bank of "Institut Bergonié" and come from 135 patients diagnosed with invasive ductal carcinoma with surgical resection as first treatment performed between 1989 and 1992. The study was performed in accordance with Institut Bergonié's clinical research committee rules. All patients consented to the use of their samples for research purposes, in compliance with the French law on tumor banks (law n°2 004-800). Forty-five tumor samples with metastatic relapse and ninety samples without metastatic relapse were selected, with a minimum follow-up of 131 months (11 years). From each group, tumors were matched for patient age at diagnosis (< or > to 55 years) and for axillary lymph node involvement (Table 1). Mean patient age across both groups was 55 years (range: 29 years-77 years).
Clinicopathological characteristics of tumors are given in Additional file 1. Patients with tumors without axillary lymph node involvement only received local treatment (lumpectomy and radiotherapy, or mastectomy with or without radiotherapy), whereas patients with tumors with lymph node involvement received adjuvant therapy, either chemotherapy or hormone therapy, according to the procedures used at the time.

Array CGH
-Sample preparation A fragment of tumoral tissue was immediately snap frozen in liquid nitrogen after surgical removal and stored at −140°C in the tumor bank of "Institut Bergonié". After grinding in liquid nitrogen, DNA was purified according to a standard methodology based on organic solvents.

Statistical considerations
Clustering of genome copy number profile Samples were clustered based on "gain, normal, loss" (GNL) data, using an Agglomerative Hierarchical Clustering (described in Additional file 2). The number of groups (n = 6) was assessed qualitatively by considering the shape of the clustering dendrogram and the homogeneity of the chromosomal rearrangements within each cluster.

Genomic instability index (G2I)
The proposed score is based on two items: (i) the overall level of genomic alteration (noted A) and (ii) the number of altered genomic regions (noted N). By applying a set of appropriate thresholds on these two items, we can define three groups with genomic scores 1, 2 and 3, characterized by an increasing level of genomic perturbation. For a given sample i, let N i and A i be respectively the computed values N and A. Let a 1 , a 2 , n 1, and n 2 be the thresholds: If A i < a 1 and N i < n 1 then genomic score = 1 (low level of perturbation) If A i > a 2 and N i > n 2 then genomic score = 3 (high level of perturbation) Else genomic score = 2 (average level of perturbation) The calculation of A and N as well as the estimation of the thresholds a 1 , a 2 , n 1, and n 1 are described in Additional file 2. The R script that allows reproducing the results is provided in supplemental data (Additional file 3).

Predictive analysis
A univariate logistic regression model was used to define the odd ratios between the G2I classes and metastatic relapse as well as for the classical prognostic parameters. Factors significant at p < 0.05 in univariate analysis were included in a maximum likelihood logistic regression model in ascending order.

Validation
An external validation using publicly available BAC arrays CGH data from 168 invasive ductal carcinomas of the breast [21] was performed. This set of tumors, including 57 cases with metastatic relapse and 111 tumors without metastatic or loco regional recurrence after a follow-up of at least 5 years (median follow-up: 130 months; range: 71-210), consists only of node negative breast cancers. Array-CGH data are from 6 distinct BAC arrays but similar to this one used in the present study. Application of the G2I to this set of tumors using the previously defined thresholds is described in Additional file 2.
Transcriptomic signature of the G2I-3 tumors To identify genes differentially expressed between G2I-1/2 and G2I-3 tumors, based on the RMA log2 singleintensity expression data, we used Welch's T-tests (t-test function, R package stats) with a threshold of 5x10 -3 on p values leading to 300 probe sets (associated to 222 unique EntrezGene symbols). Then, samples were clustered based on this signature using an Agglomerative hierarchical clustering.

Comparison of four prognostic molecular signatures in three independent datasets
The molecular signature deduced from the genomic instability index (G2I) was compared to three well-known prognostic signatures: Amsterdam [22], GGI of Sotiriou [23] and the intrinsic gene sets used by Sorlie et al. to identify their five molecular subtypes [11]. The four signatures were applied to independent datasets according to an approach inspired from Fan et al. [24] and described in Additional file 2. This comparison is done in three independent datasets corresponding to i) this study, ii) the Rotterdam study [25] and iii) the Loi study [26].

Unsupervised clustering of array-CGH data identifies six groups of tumors
To identify broad patterns of large scale genomic rearrangement, we performed unsupervised clustering based on the "gain, normal, loss" (GNL) profile of each tumor ( Figure 1). The clustering of the tumors into six main groups was driven mainly by gains or losses of whole chromosomal arms, particularly on chromosomes 1, 7, 8, 11, 16, 17 and 20. The dominant changes in each group are more readily seen in whole genome plots showing the cumulative changes at each locus ( Figure 2A). The groups are labeled according to the clusters in Figure 1, which are described below.
"Cluster a" comprises tumors without recurrent changes affecting any particular chromosome. The only copy number change seen in more than 60% of cases was loss of 17p13 (Additional file 1 and Figure 2A-a). Copy number variations involving small genomic regions can be observed, sometimes frequently in a same tumor, but without recurrence of a specific change from one tumor to another.
"Cluster b" comprises tumors with gains of the long arm of chromosome 1 and losses of the long arm of chromosome 16, a common rearrangement frequently linked to the well-known unbalanced translocation t (1;16). The only other rearrangements were losses of 8p and 11q observed in nearly 50% of the tumors (Additional file 1 and Figure 2A-b).
"Cluster c" comprises tumors with two chromosomal rearrangements, gain of 1q and gain of the entire chromosome 7 which were present in 80% of the tumors in this cluster. To our knowledge, the specific association of both of these chromosome rearrangements was not previously noted in breast cancer. Other common changes were loss of 12p13 and gain of 8q (Additional file 1 and Figure 2A-c).
"Cluster d" comprises tumors characterized by the association of rearrangements of chromosomes 8 and 16 with loss of the entire 8p and 16q arms and gain of the 8q and 16p arms in nearly 100% of cases. Other chromosomal rearrangements are less frequently associated, such as loss of 6q, loss of 13q and loss of 17p. Interestingly, some specific rearrangements affecting small regions are observed with high frequency in this specific group of tumors. These are gains of 5p14, 12q13, 15q22, 17q11.2, and loss of 12p13 like in cluster c tumors (Additional file 1 and Figure 2A-d). Genes located within these tiny rearranged genomic regions are listed in Additional file 1.
"Cluster e" comprises tumors with a more complex pattern involving numerous chromosomal arms and regions within arms. The main rearrangements were loss of 1p with a more frequently deleted region at 1pter, gains of 1q and of 8q, and losses of 11q and of 16q. Some regions of gain or loss observed in 50% of the tumors show a more reduced segment with higher frequency of rearrangement. Specifically, these were: gain of the entire chromosome 5 with a specific gain at 5p14; loss of 6q with a specific loss at 6q16; gain of 12q with specific gain at 12q21; gain of 16p with specific gain at 16p13; gain of 17q with specific gain at 17q11; gain of 20q with a specific gain at 20q13.2. Moreover, two tiny regions show specific rearrangement. They are: gain of 4q35 and loss of 12p13 as in the two previous clusters (Additional file 1 and Figure 2A-e). Genes located at these specific loci are listed in Additional file 1.
"Cluster f" comprises tumors with a highly rearranged pattern. The largest recurrent changes were gain of 16p and 20q but most changes involved much smaller genomic regions scattered throughout the genomes. The large number of rearrangements did not allow any description but similar genomic regions seem involved since it is possible to identify at least 74 loci for which a genomic loss is observed in more than 80% of the tumors in this cluster (Figure 2A-f ). This pattern of extreme rearrangement constituted the specter of a specific DNA breakage syndrome or DNA repair defect.
The outcome of patients belonging to these groups of tumors does not show any major difference for the five first clusters even though clusters a, b, and d show a little better prognosis than cluster c and e ( Figure 2B). Conversely, patients belonging to cluster f had a very poor outcome since ten tumors out of twelve belonging to this group showed metastatic relapse during the time of the survey ( Figure 2B).

Amplicons were most common in cluster f
By defining amplicons as regions whose copy number is over three for at least two contiguous clones, 64 tumors contained at least one amplicon. A total of 90 distinct regions were amplified involving all chromosomes except chromosomes 2, 9, and 13. The number of amplicons per tumor ranged from one amplicon (13 tumors) to seventeen (one tumor). The mean was five amplicons for these 64 tumors. The size of amplicons ranged from a few kilobases containing one or a few genes as 6q25 amplification and ADR1 or 14q24.3 amplification and FOS (Additional file 1), to tens of megabases. As expected, the classic known breast cancer amplicons were the most common, including the CCND1 amplicon at 11q13 in 18 tumors, the ERBB2 amplicon at 17q12 in 17 tumors followed by 8p12 and 20q amplicons in 11 tumors.
( Table 2). The distribution of the number of amplicons by tumor was more specific. On average, there were only two amplicons per tumor in clusters a-e, but five in cluster f ( Table 2). The increase in low level copy number changes in cluster f was thus accompanied by a corresponding increase in amplicons.

An array CGH-based index of genomic instability is predictive of clinical outcome
Due to evidence of a correlation between a highly rearranged genome (at the level of copy number variation, breakpoints, gene amplification) and clinical outcome, we built an index of genetic instability based on two parameters linked to the array CGH GNL status.
The first one corresponds to the fraction of the genome altered. It is the mean by chromosome arm of the proportion of lost or gained clones. This parameter varies from 0.004 to 0.73, with a mean value of 0.28.
The second one, corresponding to the number of altered genomic regions reflects the number of breakpoints within the genome. It is the total number of genomic regions showing a difference in copy number status with respect to the neighboring regions. In order to reduce the number of artifacts, we use a "local score" calculation to attribute a similar status (i.e. gain, normal or loss) to a genomic segment (see Additional file 2). The number of altered regions varies from 19 to 129 (mean: 64.7). As shown in Figure 3A, the 135 tumors spread in a cloud of points with a very faint correlation between the two parameters. Tumors with chromosomal aneusomies are predominantly plotted in the lower right quadrant while tumors with numerous small rearrangements lie in the upper left quadrant ( Figure 3B). In applying relapse status to each tumor (dark points on Figure 3A), it appears that the two tumoral populations (i.e. with and without relapse) show a large median overlap but that tumors lying in the lower left quadrant have a lower risk of relapse than tumors in the upper right quadrant thus individualizing three populations of tumors.
To define three grades of genomic instability, we adjusted thresholds for the two parameters that best discriminate tumors according to outcome (see the Additional file 2 for details).
Tumors in the low risk region (G2I-1 for Genomic Instability Index -grade 1) showing an overall level of genomic alteration below 48% and a number of altered regions < = 42, relapsed in one case out of 19. Tumors in the high risk region (G2I-3, for Genomic Instability Index -grade 3) showing an overall level of genomic alteration above 35% and a number of altered regions > = 65 relapsed in 21 out of 28 cases. The difference in risk of relapse between the G2I-1 and G2I-2 tumors was borderline significant (Odd ratio: 0.16 [0.2-1.2] p = 0.08) whereas that between the G2I-2 and G2I-3 tumors was highly significant (odd ratio: 8.5 [3.2-22.6] p < 0.001) in univariate analysis. Similar results were obtained in multivariate analysis adjusted on the Nottingham Prognostic Index (NPI) ( Table 3). The contribution of each CGH cluster class to the three G2I groups is shown in Table 4 and examples of array CGH profiles from the four quadrants of the scatter plot are provided in Figure 3B.

Validation of the G2I on an independent data set
To validate the G2I on independent data, we analyzed 168 breast cancers without axillary lymph node involvement for which BAC array CGH data were available [21]. In this dataset, 57 patients developed metastases while 111 others did not show metastatic or loco regional recurrence after at least 5 years of follow-up (median follow-up: 10.8 years). Using the previously defined thresholds, the G2I could predict clinical outcome with a p-value of 1.08x10 -5 (logrank test) since, among tumors scored as G2I-3, 74% developed metastases, whereas in the G2I-1 group only 16% did ( Figure 4A). The ten year metastasis-free survival ( Figure 4B) analyzed with the log-rank test showed a highly significant  a  b  c  d  e  f  1  2  3   Total tumors  135  34  25  16  13  35  12  19  88  28 Tumors with amplicons 64 (47) 14 (41) 8 (32) 10 (63) 7 (54) 17 (49) 8 (67) 3 (16) 41 (47) (3)  4 (12) 3 (12) 1 (4)  4 (31) 4 (11) 1 (8)  1 (5) 12 (14) 4 (14) 17q21.  (14) 20q ( Comparison of the G2I and the array CGH clusters with classical prognostic parameters The tumors used in this study were matched for age and axillary lymph node involvement but not for other factors, such as the size of the tumors, the histological  Scarff Bloom and Richardson (SBR) grade or the hormonal receptor (HR) status. A search for correlations between the G2I and classical prognostic factors did not show any correlation for histological size, steroid hormone receptor status, or Nottingham prognostic index (NPI) ( Table 5). There was a correlation with the intrinsic classification although the basal tumors belong mainly to the G2I-2 group (Table 6) and with the Mib1 status suggesting a link with proliferation (Table 5).
Interestingly, the G2I-3 remains associated with relapse with respect to the G2I-2 group in the following subgroups: tumors smaller than 20 mm (OR: 5. This last result reflects the higher proportion of node negative tumors in the G2I-3 group than in the G2I-2 group (71% and 49% respectively; Additional file 4). Overall, tumors with grade 3 genetic instability were mainly luminal and lacked axillary lymph node involvement, but had a very high risk of metastatic relapse.

Correlation with TP53 mutations
Because of the link between p53 and genome stability, we searched for alterations of the TP53 gene directly by DNA sequencing and indirectly by immunohistochemistry (IHC) for increased p53 protein expression. Point mutations were detected in 31 tumors (20 missense mutations and 11 truncated mutations) and p53 IHC expression was detected in 28 tumors with a good correlation between missense mutation and protein expression ( Table 7).
The presence of a TP53 alteration (either a mutation or an increase in protein detection) was correlated with the G2I (Table 5) (p =0.0003). No TP53 alterations were detected in the G2I-1 tumors compared with TP53 alterations which were found in 54% of the G2I-3 tumors ( Table 2). TP53 alterations were observed in all CGH clusters with a frequency of 20-30% of tumors in clusters a to d compared with 58% for tumors in the highly unstable cluster f ( Table 2). The expected pattern of TP53 alterations was seen with respect to intrinsic classification: mutations or increased protein expression were seen in 70%, 57%, 42% and 15% of HER2-enriched, basal-like, luminal B and luminal A tumors, respectively.

Genomic rearrangements specific to G2I-3 tumors
Tumors belonging to the G2I-3 group showed a high level of genetic alteration with a large number of small regions showing copy number variation, mainly losses. Some alterations were recurrent and specific to this group indicating a possible selection for these rearrangements. Additional file 4 shows the frequencies of gain and loss for each clone in the three G2I groups. Genomic regions showing significantly more gains or losses in the G2I-3 tumors compared with the two other groups of tumors with a p value < = 10 -4 and a frequency of ≥ 50% are listed in Additional file 1. Six regions on chromosomes 12, 16, 17, and 20, show specific gains for the G2I-3 tumors, and a further 49 regions show specific genomic losses. Most regions contained multiple genes but a few were small enough to allow identifying potential driver genes mentioned in Additional file 1.

A gene expression signature specific to G2I-3 tumors
High quality RNA was available for 46 of the 135 tumors. Fifteen of these belong to the G2I-3 group, 29 to the G2I-2 group, and two tumors to the G2I-1 group. We hybridized cDNA from these tumors to Affymetrix U133 Plus 2.0 genechips. Supervised analysis allowed us to define a signature of 300 probe sets showing differential expression between 14 of the 15 G2I-3 tumors and the rest (Additional file 4). The list of genes for which over or under expression is specific for G2I-3 tumors is provided in Additional file 1. The genes in this signature are not specifically linked to cell proliferation or to any DNA repair system. Several of the genes over-expressed in G2I-1 + 2 tumors are involved in signal transduction, in particular the hedgehog, VEGF and MAPK pathways (Additional file 1). The genes best distinguishing G2I-2 from G2I-3 tumors are not specifically localized at rearranged genomic regions (Additional file 1). However, several over-expressed genes belonging to this signature are located at genomic regions specifically gained in G2I-3 tumors such as JAG1 at 20p12.2 or RPN2, C20orf117, and DHX35 at 20q11.23. Conversely, under-expressed genes are located at specifically lost regions such as CD 109 at 6q13, ELOVL4 at 6q14.1, C9orf46, KIAA1432 and CDC37L1 at 9p24.1, LAMA1 and DLGAP1 at 18p11.31  To test whether this signature has independent prognostic value, we compared it to three previously published prognostic signatures in three independent data sets including ours. The results are summarized in Figure 5. The Amsterdam signature [22], the genomic grade index (GGI) [23], and the intrinsic gene set [11] were all able to split the tumors into two groups according to outcome in the three sets of tumors. As expected, the G2I signature gave the best results for our own data set (p = 5.4x10 -6 ) compared to the results for these tumors with the Amsterdam, GGI and intrinsic signature (p = 0.8; p = 0.32; and p = 0.002 respectively). The G2I transcriptomic signature also showed higher prognostic value than the three other signatures in the Rotterdam study [25]. It showed higher prognostic value (p = 6.4x10 -4 ) than the Amsterdam and intrinsic signature (p = 0.015 and p = 0.004 respectively) in the Loi study [26] but the GGI signature gave the best results in  these tumors, on which it was trained ( Figure 5). We conclude that genomic instability is an important marker of poor prognosis whether it is assessed directly with CGH data or indirectly with gene expression data.

Discussion
Array CGH analysis of breast carcinoma both on BAC array and on oligo array had previously highlighted the genomic heterogeneity of these tumors. The most popular classification distinguishes three classes of tumors. The first one, characterized by only few rearrangements is called "simplex" [1,27] or "1q/16q" [28,29], the second, called "complex sawtooth" [27,30] or "complex" [28,29], is characterized by a large number of rearrangements, including breakpoints and copy number variations for very small genomic segments. The third one called "complex firestorm" [27,30] or "mixed amplifier", [28,29] is characterized by a phenomenon of gene amplification with a high copy number variation restricted to small genomic regions. Indeed, it is possible to allocate some specific tumors to such a class of genomic profiles and for example, tumor number 83 in our series showed a simplex profile with only a 1q gain and 11q and 16 q losses as sole rearrangements, tumor number 43 showed a typical complex sawtooth profile, and tumor number 100 a mixed amplifier profile. However, a large number of tumors in our series showed intermediary patterns and it was not possible to assign them to a specific class. For example, tumor number 7 showed a relatively flat profile with several amplicons on chromosomes 6 and 17 and tumor number 47 showed an intermediary profile between simplex and complex sawtooth.   In fact, three kinds of genomic rearrangements related to various kinds of genetic instability are detectable by array CGH methodologies. They are: i) whole chromosomal or whole chromosomal arm aneusomies related respectively to mitotic malsegregation or centromeric rearrangement, ii) DNA breakpoints with repair defects resulting in copy number variation for short genomic segments and iii) gene amplification. These three kinds of genomic rearrangements are more or less associated in a single tumor and show a continuous variation with a growing level of intensity from one tumor to another.
Thus, a true classification based on genomic alteration criteria remains difficult to implement. The results obtained here suggest that it is possible to distinguish between two groups of tumors. One group shows gain or loss of entire chromosomes or entire chromosomal arms but lack breakpoint within the affected regions. This group corresponds to tumors from the clusters b to e which are characterized by combinations of specific rearranged chromosomal arms. The second group corresponds to tumors from the clusters a and f for which it is not possible to identify a copy number variation  [25], column C: the study by Loi et al. [26].
affecting an entire chromosome arm, either because of a flat profile (cluster a) or because of a huge number of DNA breakpoints (cluster f ). In order to take into account this distinction, we constructed a genomic index based on two parameters representing these two kinds of alterations and showing a continuous distribution of the tumors with a growing level of alterations (Figure 3). Adverse outcome was observed for the most highly rearranged genotypes, corresponding mainly to tumors from clusters e and f.
The transcriptomic intrinsic classification of breast cancer [10] has led to search for correlations between the Sorlie classes and specific genomic profiles. It was effectively possible to correlate the luminal A class with the simplex profile, the luminal B and the Her2-enriched classes with the amplifier profile and the basal-like class with the complex sawtooth profile [1,28,31]. Moreover, a new classification into six classes taking into account these correlations was recently proposed [32]. Such a correlation was also found here between immunohistochemical intrinsic classes and genomic profile. Both the G2I-1 group and the cluster b (1q gain, 16q loss) are mainly composed of luminal A tumors (84% and 80%, respectively). The majority of the tumors belonging to luminal B and Her2-enriched classes show gene amplifications (79% and 90%, respectively). Some results, conversely, are more surprising. If percentages of luminal A tumors decrease progressively from the G2I-1 to the G2I-2 and G2I-3 groups (respectively, 84%, 62.5% and 43%), the fact that 12 luminal A tumors belong to the G2I-3 group was not expected. In the same way, it is surprising that seven of twelve cluster f tumors belong to the luminal A class. Seven tumors belong to the basal-like class. Only one of them appears in the cluster f and in the G2I-3 group. The six other basal-like tumors all belong to the G2I-2 group and to the array CGH cluster a. This cluster, without any specific chromosomal aneusomy, contains in fact two subgroups (Figure 1). The first one (right branch) shows tumors with a flat profile belonging mainly to the luminal A class. The second one (left branch) including the six basal-like tumors shows tumors without chromosomal or chromosomal arm aberrations but with copy number changes affecting small genomic regions that are different from one tumor to another. This profile corresponds to the previously described subtype of high grade ER-negative tumors with low genomic instability index [33]. The fact that six out of seven basal-like tumors did not show metastatic relapse is probably related to a series effect with a small number of cases. It therefore seems that some breast carcinomas of luminal A and luminal B phenotypes, showing important genetic instability with a large number of DNA breakpoints, frequent TP53 mutations, and frequent gene amplification are characterized by very poor outcome.
Prognostic value of genomic alteration in breast cancer has often been reported. Cytogenetic analysis had previously shown the correlation between the unbalanced der (1;16) and good prognosis [34], whereas homogeneously staining regions or gene amplifications were correlated with poor outcome [35,36]. These results were confirmed by array CGH approaches that show associations between gene amplification in Her2-enriched and luminal B classes [28,31,37,38], and poor prognosis or between 16q loss in luminal A tumors and good prognosis [39]. Subsequently, copy number variation concerning various genomic regions was shown to be related to outcome as loss on chromosome arms 19 and 18q [40] or more complex signatures including several regions, either distinct for ER positive and negative tumors [41], or common for these two kind of tumors [21]. The measurement of genetic instability was not so well documented. A signature of chromosome instability was inferred from transcriptomic data as functional aneuploidy related to a clear deviation in expression of contiguous genes from the same loci [17]. The application of this signature to four different published sets of breast cancer was highly predictive of outcome [17]. The fraction of the genome altered (FGA), calculated as the number of probes affected by gain or loss compared to the total number of probes represented on the array [42], was shown to correlate with the classification proposed by Jonsson et al. in which a higher level of FGA was observed for "basal complex" and "luminal complex" types of tumors than for the others [32]. The FGA after correction for tumoral cellularity and named "genome instability index" (GII) fails to find such a correlation but identified a subtype of basal like tumor with low instability [33]. In association with a three chromosomal region predictor, the CGH classifier proposed by Gravier et al. in node negative breast cancer used a measurement of genomic complexity corresponding after segmentation to the total number of segmental alterations along the genome with a threshold of 11. Using this single parameter, the prediction of metastatic relapse was highly significant (p = 0.00056) [21]. Recently, an array CGH-based score of genomic complexity called CAAI (Complex arm aberration index) was shown to have overall independent prognostic power [43]. All these data indicate that the type and the level of genetic instability are major determinants of outcome for breast cancer. These characteristics are probably set up very early during tumor development, conserved at late stages and common to any tumoral cell. They can be detected at the level of a primary tumor, even if only some cell clones will acquire metastatic power. The same explanation could be offered for the prognostic significance of transcriptomic signatures obtained from primary tumors that have been shown to be mainly related to the proliferative activity of the tumors [44].
From a clinical point of view, it is interesting to note that the prognostic value of the G2I is independent of other major prognostic factors except TP53 mutation (Table 5). A faint correlation is also found with others genomic alterations (in particular, the presence of amplicons), with the intrinsic classification and with Mib1 index but not for classical clinico-pathological parameters ( Table 5). Moreover, the G2I maintains a strong predictive value in subclasses of tumors showing variable outcomes, such as small tumors, SBR grade 2 and 3 tumors, hormonal receptors positive tumors and tumors in the moderate class of the NPI. These data are in favour of an independent prognostic value for the G2I but evaluation of the benefit in clinical practice will require better definition of the thresholds used to define the groups and validation on an unselected population-based set of tumors. These investigations are currently in progress. The main result concerns the strong predictive value of the G2I in tumors without axillary lymph node involvement since 80% of G2I3 node negative tumors (16 out of 20) relapsed, whereas only 16% of the G2I-1 and G2I-2 node negative tumors (9 out of 55) did so (OR: 17.5 [4.6-66.7] p < 0.001). This information could have major implications for the indication of adjuvant therapies. The paradox of a poor outcome for tumors that do not show any evidence of lymphatic dissemination at the time of local treatment may suggest that these tumors with high genetic instability are not lymphophilic, instead showing a hematogenic mode of diffusion.