CDCOCA: A statistical method to define complexity dependence of co-occuring chromosomal aberrations

Kumar, Nitin; Rehrauer, Hubert; Cai, Haoyang; Baudis, Michael

doi:10.1186/1755-8794-4-21

Research article
Open access
Published: 03 March 2011

CDCOCA: A statistical method to define complexity dependence of co-occuring chromosomal aberrations

Nitin Kumar¹,
Hubert Rehrauer²,
Haoyang Cai¹ &
…
Michael Baudis¹

BMC Medical Genomics volume 4, Article number: 21 (2011) Cite this article

4724 Accesses
6 Citations
Metrics details

Abstract

Background

Copy number alterations (CNA) play a key role in cancer development and progression. Since more than one CNA can be detected in most tumors, frequently co-occurring genetic CNA may point to cooperating cancer related genes. Existing methods for co-occurrence evaluation so far have not considered the overall heterogeneity of CNA per tumor, resulting in a preferential detection of frequent changes with limited specificity for each association due to the high genetic instability of many samples.

Method

We hypothesize that in cancer some linkage-independent CNA may display a non-random co-occurrence, and that these CNA could be of pathogenetic relevance for the respective cancer. We also hypothesize that the statistical relevance of co-occurring CNA may depend on the sample specific CNA complexity. We verify our hypotheses with a simulation based algorithm CDCOCA (complexity dependence of co-occurring chromosomal aberrations).

Results

Application of CDCOCA to example data sets identified co-occurring CNA from low complex background which otherwise went unnoticed. Identification of cancer associated genes in these co-occurring changes can provide insights of cooperative genes involved in oncogenesis.

Conclusions

We have developed a method to detect associations of regional copy number abnormalities in cancer data. Along with finding statistically relevant CNA co-occurrences, our algorithm points towards a generally low specificity for co-occurrence of regional imbalances in CNA rich samples, which may have negative impact on pathway modeling approaches relying on frequent CNA events.

Peer Review reports

Background

Genetic alterations are an absolute requirement for malignant neoplasias in humans [1, 2]. Both kind of genetic alterations and order of occurrence are important for cancer development and progression [3]. Additionally to sequential event models, large scale analysis of genomes from patient's tumors have shown that multiple genetic abnormalities can promote the development of one cancer entity [4]. Alterations in cancer genome can range from subtle sequence changes (e.g. point mutations) over structural alterations with functional impact on the coding sequence (e.g. generation of fusion genes by chromosomal translocations) to regional or whole-chromosome copy number abnormalities (see e.g. [5–7]).

Through a gene dosage effect, genomic copy number alterations (CNA) may lead to insufficient expression of tumor suppressors or overexpression of proto-oncogenes, respectively. Recurrent CNA have been identified in nearly all cancer entities [8–10]). Comparative Genomic Hybridization (CGH) [11, 12] is a genome wide CNA screening technology which has been widely applied throughout the last two decades. Building on the reverse in situ hybridization principle developed for chromosomal CGH [13], genomic microarray technology (aCGH; [14, 15]) now utilizes intensity values from up to millions of short DNA sequences to derive regional copy number estimates.

Large data sets from copy number screening experiments should provide a powerful resource for oncogenomic data mining studies. In contrast to expression data, copy number data arises from the projection of discrete values into the experimental space. As such, a reduction of the (a)CGH data can result in the minimal information of segmental status (gain/loss/normal) and genomic position. This facilitates efforts to integrate data across large numbers of experimental series and derived from diverse tumor entities. So far, most of these efforts have been of descriptive nature [10, 16] or have been aimed at the definition of disease-specific genomic patterns and useful pattern descriptors ("markers", e.g. [17]). Other publications have attempted the reconstruction of relation and temporal order of oncogenetic events [18–20].

For some cancers types such as subsets of colorectal adenocarcinoma, presence of a limited number of genetic events including several CNA is critical for cancer development [21]. Other neoplasias such as chronic lymphocytic leukemia (CLL) display a paucity of CNA, which however may be correlated to patient survival [22]. These examples illustrate that the presence of certain CNA is not a chance phenomenon, but may either be necessary for cancer development or give a selective edge to affected clones. Previous publications have tried to address the cooperative nature of co-occurring CNA [23, 24]. So far, these approaches have not considered the high variability in the complexity of CNAs among individual malignant tumors. Here, we develop an algorithm CDCOCA for analysis of co-occurring oncogenomic CNA events which considers the genomic complexity of the individual samples. We use our approach for detection of CNA events in real-world example data sets. Furthermore, we compare the results from CDCOCA to a previously published method [23] (which we call "analysis 3" in this paper) and also to a modified version of CDCOCA which does not include the adjustment for genomic complexity.

Methods

Data

Annotated copy number and associated data was selected from our Progenetix (a)CGH database ([25]: http://www.progenetix.net; status as of 2010-03-01). For model development and testing, we choose one hematopoietic (MCL) and one solid tumor entity (BLCA) due to their overall intermediate genomic complexity, without consideration of their previously established genomic imbalance profiles or CNA subset analysis.

For analysis, copy number status data was determined for 320 genomic intervals based on corresponding cytogenetic bands. Sex chromosomes were removed due to possible bias in some of the published series (e.g. use as normalization control in (a)CGH experiments), resulting in 303 genomic intervals. For analysis by CDCOCA/CICOCA, gain and loss status of all genomic intervals were considered separately, leading to a data matrix with 606 categories. Only genomic intervals showing change in at least one sample were considered for analysis resulting in a CDCOCA/CICOCA input matrix with 593 categories for BLCA and 571 for MCL. For analysis 3, the original data matrix containing 303 genomic intervals was used. As a surrogate score for genomic complexity, a case specific score was calculated by adding each type of genomic imbalances (gain and/or loss) occurring on a chromosomal arm [26].

From now onwards we will use the term "genomic interval" for genomic interval status. A gain and loss association on same chromosome (e.g. -1p and + 1q) will be referred as "bidirectional" change. The modified structure of the data matrices is exemplified in Table 1. Any gain/loss status of a genomic interval is represented by the value 1.

Table 1 Binary matrix derived from CGH data.

Full size table

Model

Let D be the data matrix of dimension nxm, where n is the number of samples and m is the number of genomic intervals. D _i,j = 1, if a CNA is present in genomic interval j in sample i else D _i,j = 0. F _j represents the number of sample having CNA at genomic interval j, F _j is given by $\sum_{i = 1}^{n} D_{i j}$ . $P_{w} = (P_{w}^{1} ... P_{w}^{n})$ represents the vector of probability weights given to samples. The prior probability weight for any sample r is defined by the number of CNAs in patient r over total number of CNA across all samples

P_{w}^{r} = \frac{\sum_{j = 1}^{m} D_{r j}}{\sum_{i = 0}^{n} \sum_{j = 0}^{m} D_{t, j}}

Simulation of any genomic interval j is achieved by redistribution of the CNA status over all samples. For genomic interval j, we define $D^{* i} = (D_{1}^{* 1} ... D_{j}^{* n})$ as the corresponding vector representing the CNA status of simulated data. $D_{j}^{*}$ is obtained in a way so that $F_{j}^{*} \approx F_{j}$ .

Overlay between two genomic intervals is computed using Jaccard's index [27]. Jaccard's index gives a value between 0 and 1, where one represents a perfect overlap and zero, no overlap. The Jaccard's index between any two genomic intervals j and k is computed as

J_{j k} = \frac{N_{j k}^{11}}{N_{j k}^{10} + N_{j k}^{01} + N_{j k}^{11}}

$N_{j k}^{11}$ number of samples with CNA in genomic intervals status, j and k.

$N_{j k}^{10}$ number of samples with CNA in genomic interval status j but not k.

$N_{j k}^{01}$ number of samples with CNA in genomic interval status k but not j.

The overlap obtained on permutation is represented by $J_{j k}^{*}$ Frequency of a co-occurrence is computed as

F_{j k} = \frac{N_{j k}^{11}}{n}

F _jk frequency of an overlap between genomic intervals status i and j.

$N_{j k}^{11}$ number of samples having change in both genomic interval status i and j. n total number of samples in the data.

CDCOCA Algorithm

Let S be the number of simulations and C is the counter measuring the number of times the expected (i.e. permuted) overlap is greater than or equal to the observed overlap. We set the counter of C = 0.

1.
Initialize C = 0.
2.
Calculate Jaccard's overlap J _jk between genomic interval j and k.
3.
For genomic interval j simulate the data to obtain $D_{j}^{*}$ as
1. a.
  Obtain a sample index r of size 1, from N = (1,....,n) using $P_{w}^{i}$ such that sample with maximum weight given has a higher probability of getting a change on permutation, update $D_{j}^{* r}$ = 1.
2. b.
  Update N = N[-r].
3. c.
  Update $P_{w}^{i} = P_{w}^{i} [- r], P_{w}^{i} = \frac{P_{w}^{i}}{\sum_{i} P_{w}^{i}}, P_{w}^{i} = \frac{P_{w}^{i}}{1 = P_{w}^{i}}$ .
4. d.
  Repeat step 3a and 3b F _j times to obtain simulated vector $D_{j}^{*}$ .
4.
For genomic interval k simulate the data using step 3 to obtain $D_{k}^{*}$ .
5.
Recompute Jaccard's overlap $J_{j k}^{*}, if J_{j k}^{*} \geq j_{j k}$ increase C = C + 1.
6.
Repeat step 3, 4 and 5 for S times.
7.
At the end of S (5000 in our case) permutations calculate p value as, $p = \frac{C}{S}$ .

The p-value obtained after step 7 represent the probability of co-occurrence of two CNAs in absence of any other CNA in sample. A low p-value cut off will help in enriching for CNAs which occur together even in less heterogenous samples.

Results and Discussion

We here propose a methodology named CDCOCA (Complexity dependence of co-occurring chromosomal aberrations) that defines highly correlated pairs of CNA in cancer samples while correcting for the overall degree of genomic instability.

We determine CNA complexity based on the number of segmental CNA in a sample while accounting for variations introduced through different resolutions and/or segmentation algorithms [10]. A sample is called "CNA complex" if it has acquired a high number of CNA, and conversely "CNA simple" if a low number of segmental imbalances have been detected. In Figure 1 the distribution of copy number complexities is presented for data from selected tumor entities, extracted from the Progenetix database.

The performance of CDCOCA depends on the number of tumor samples, number of genomic intervals and number of iterations. CDCOCA produces a matrix of p values for all possible associations in the data matrix which are then used to enrich for associations dependent on sample complexity. The algorithm is implemented in the R statistical framework and is available through R package "CDCOCA" provided on the Progenetix website [25].

We applied the CDCOCA algorithm to bladder carcinoma (BLCA) and mantle cell lymphoma (MCL) copy number data, considering gains and losses for each interval as separate events. The readout of the analyses consisted of the p values obtained after randomization for all observed associations in both cancers after 5000 permutations each. We used Jaccard's index to calculate the overlap between genomic intervals [27]. Figure 2 and 3 show the log of p values plotted against the log of Jaccard's index. For simplicity, here p values for only 4 chromosomal changes were plotted. Using CDCOCA we observed that most of the genetic associations have very low Jaccard's overlap and arise from genetic changes which occur in CNA complex samples (hence high p values). Associations presenting with high Jaccard's indices and low p-values represent CNA with high probability of specific co-occurrence (i.e. frequent co-occurrence independent of high sample CNA complexity).

Our results show that most of the CNA data for both cancers are derived on a background of multiple and extended CNA. The total number of genetic associations in both cancer types remains beyond scope of the current analysis. However, with CDCOCA we are able to focus on a defined set of statistically relevant, specific changes.

For estimating the performance of our methodology in relation to otherwise discussed models we compared CDCOCA to a modified version "CICOCA" (see supplement) and a previously published method [23]. Both the later algorithms do not include an estimate of sample complexity and primarily identify associations with a high frequency. CICOCA and analysis 3 use different methods to compute overlap resulting in slightly different but overall concordant results.

With CICOCA, a high number of co-occurring changes were obtained after p value cut off (Figure 1 and 2 in additional file 1). In contrast, introduction of complexity estimation leads to a focus on changes arising on a low complexity background (Figure 4 and 5). With analysis 3 (Figure 3 and 4 in additional file 1) a very low number of associations was obtained in our sample data set. As expected these only involved high frequent changes. We could show that most of the CNA obtained by analysis 3 (Figure 3 and 4 in additional file 1) were also detected using CDCOCA (Figure 4 and 5) and CICOCA (Figure 1 and 2 in additional file 1). CICOCA and analysis 3 can be used to describe frequent associations, while CDCOCA additionally allows to test the specificity of associations and to apply thresholds accordingly. Compared to frequency based thresholding, one advantage of CDCOCA is its independence from arbitrary cut-off values. The algorithm scores every association. The p value obtained assigns a statistical significance to the associations which is independent of the frequency of the association in the data but takes the complexity of the sample into account.

Bladder carcinoma

An overview of the most frequent genomic imbalances in urinary BLCA can be found in e.g. [10]. Most frequent gains in BLCA include regions on 1q, 5p, 8q,17, 19 and 21q, while the most frequent losses occur on 2q, 4, 5q, 6q, 8p, 9, and 13q (Figure 1 and 3 in additional file 1 and Figure 4 barplot). Due to the high degree of aneuploidy in BLCA, CNA data is highly complex (Figure 4 matrix plot) resulting in a very high number of total associations (Table 2).

Table 2 Statistic of associations in BLCA

Full size table

A large proportion of associations combine a low frequency with a high Jaccard's index (Figure 4 and 6 matrix plots). We applied a p-value cut off of 0.02 resulting in a false discovery rate (FDR) of 27.5%. At this p-value cut off, 75% of intra-chromosomal associations passed the threshold, confirming the correlation between genetic linkage and involvement in CNA events. Table 2 contains the information about the comparison of results for all three analysis. For simplicity reasons here we limit the display to the 100 most frequent inter-chromosomal changes obtained after p-value cut off.

According to CDCOCA, specific pairs of genomic imbalances in bladder carcinoma include concurrent "bidirectional" losses on 8p and gains on 8q (Figure 7). In the comparative analysis, gains involving chromosome 8q were detected with all three methods (Figure 5 and 6 in additional file 1 and Figure 7). However, with CDCOCA the frequent co-occurrence of these CNA on the background of a low genomic complexity became more apparent. This observation may point to an early appearance of these CNA during tumorigenesis, with a possible role as cancer initiating event. While gains on distal 8q are the most consistent copy number change in epithelial neoplasias with MYC considered a predominant target, recently deletions on 8p23.3 have been associated with aggressive clinical behavior in BLCA [28]. Another observation concerned changes involving concurrent gains on 5p and losses on 5q which were also associated with losses on chromosome 4q and distal 6q (6q22). These co-occurrences (Figure 5 and 6 in additional file 1 and Figure 7). Although one may assume that "bidirectional" changes involving both chromosomal arms are based on simple cytogenetic events, e.g. isochromosome formation, the limitation of this pattern to distinct chromosomes points at an evolutionary advantage of both gain and loss accumulation for the malignant clone. Other event pairs obtained by CDCCOA include gains on 8q23 along with gains on 3q, as well as gain on 20q11 with loss on 18q23.

The abundance of 8p losses, 8q gains, 5q losses, 5p gains, 3q gains, 4q losses points towards the importance of these CNA in tumors carrying them. Genes from TGF-beta receptor signaling (blue triangles) and cellular apoptotic pathways (red triangles) located to the co-occurring changes are shown in Figure 7. The presence of genes from the same pathways on co-occurring CNA point towards a possible cooperative action of these genes. CDC23 (5q31), CASP6 (4q25) and PMAIP1 (18q21) are among TGF-receptor cascade genes with well established role in cancer [29, 30] Other possible targets for genetic cooperation include PMSD2, PAK2, BCL2L1 and FNTA. Genes from apoptotic signaling pathways mapped to these regions include CDC23 (5q31), SMAD2 (18q21), SMAD4 (18q21) and SMAD7 (18q21) which have been shown defective in several cancer entities [31]. As possible target on 5p, loss of SKP2 had been shown to cause cell senescence [32]. On 5q, loss of function mutations including copy number losses of both APC and MCC have been associated with a variety of epithelial neoplasias [33–36].

Mantle cell lymphoma

For MCL, an overall p value distribution similar to that of BLCA was observed (Figure 3). Most common CNA in MCL included gains on chromosomes 3q, 6p, 7p and 8q, while most common losses involved regions on 6q, 8p, 9, 11q and 13q (Figure 7 and 8 in additional file 1 and Figure 5).

A p value cut-off of 0.04 giving a FDR of 30% was applied with CDCOCA (Table 3 and Figure 8). About 80% of intra-chromosomal associations passed this threshold, representing approx. 50% of all post cut-off associations. The 100 strongest associations obtained with CDCOCA are shown in Figure 9. As in BLCA, CDCOCA detected losses on 8p with gains on 8q, which was not described as association in the other analyses. Also, only CDCOCA selected groups of co-occurrences involving low frequency CNA (e.g. associations involving gains 7p, 6p, 12p and 18q). Other changes such as losses on highly occurring 13q along with gains on not so frequently occurring 7q were obtained using CDCOCA and not with CICOCA and analysis 3 in the top 100 events (Figure 7 and 8 in additional file 1).

Table 3 Statistic of associations in MCL

Full size table

As candidate targets, TNF-signaling genes (red triangles) and T-cell receptor signaling genes (blue triangles) are marked on their corresponding band locations in Figure 9. The role of genes such as MDM2 (12p15), TNFRSF1A (12p13), MALT1 (18q21) for neoplastic transformation and/or progression has already been well established [37–39]. Other examples for cancer relevant genes mapping to those regions are STAT2 (12q13), and STAT3 (17q) [40, 41].

Conclusions

We have developed a method CDCOCA to define complexity dependence of co-occurring CNA in cancer samples. In contrast to methods published previously [23] and a modified algorithm which does not include the complexity adjustment step, CDCOCA does not simply focus on the most frequent co-occurrences of regional genomic copy number changes in cancer entities. Here, we determine statistically relevant co-occurring CNA through accounting for the CNA "background noise", introduced e.g. through chromosome scale imbalances (e.g. isochromosomes, chromosomal aneuploidy). In theory, this procedure should highlight specific but comparatively rare CNA events.

As indicated by our analysis of BLCA and MCL, two unrelated cancer entities with overall intermediate copy number complexity, the relevant CNA associations in many specimen are obscured due to the large number and/or extension of regional CNA. When correcting for genomic background heterogeneity most of the associations involving highly recurring CNA were removed. This indicates that many high frequency changes may be related to the overall genomic instability and therefore cannot unanimously be assigned a causative role in oncogenesis. Especially regarding the large number of genes affected by complex genomic imbalances, some of the cancer type specific CNA patterns may represent an epiphenomenon of disturbed genomic maintenance processes rather than the expression of copy number dependent target gene modifications.

However, when accounting for the overall complexity, CNA associations may point towards connected events and/or preferred pathways activated during carcinogenesis. Based on our CNA associations, we found multiple genes from single well defined cancer pathways to be a effected in sample subsets. Alteration of more than one gene in a pathway may potentiate the effect on pathway function and be responsible for a specific clonal phenotype.

CDCOCA should prove to be a powerful tool for defining mutual associations at gene level and to gain insights into cellular mechanisms relevant for oncogenesis. Although we applied our method to CGH data at band resolution, there is no practical obstacle against use with segmented data from high resolution genomic array experiments. In fact, this should facilitate a gene centric analysis and automatic integration with functional data sources.

References

Futreal P, Coin L, Marshall M, Down T, Hubbard T, Wooster R, Rahman N, Stratton M: A census of human cancer genes. Nat Rev Cancer. 2004, 4 (3): 177-83. 10.1038/nrc1299.
Article CAS PubMed PubMed Central Google Scholar
Stratton M, Campbell P, Futreal P: The cancer genome. Nature. 2009, 458 (7239): 719-24. 10.1038/nature07943.
Article CAS PubMed PubMed Central Google Scholar
Kinzler K, Vogelstein B: Lessons from hereditary colorectal cancer. Cell. 1996, 87 (2): 159-70. 10.1016/S0092-8674(00)81333-1.
Article CAS PubMed Google Scholar
Hanahan D, Weinberg R: The hallmarks of cancer. Cell. 2000, 100: 57-70. 10.1016/S0092-8674(00)81683-9.
Article CAS PubMed Google Scholar
Lengauer C, Kinzler K, Vogelstein B: Genetic instabilities in human cancers. Nature. 1998, 396 (6712): 643-9. 10.1038/25292.
Article CAS PubMed Google Scholar
Stallings R: Origin and functional significance of large-scale chromosomal imbalances in neuroblastoma. Cytogenet Genome Res. 2007, 118 (2-4): 110-5. 10.1159/000108291. [Copyright (c) 2007 S. Karger AG, Basel.]
Article CAS PubMed Google Scholar
Weir BA, Woo MS, Getz G, Perner S, Ding L, Beroukhim R, Lin WM, Province MA, Kraja A, Johnson LA, Shah K, Sato M, Thomas RK, Barletta JA, Borecki IB, Broderick S, Chang AC, Chiang DY, Chirieac LR, Cho J, Fujii Y, Gazdar AF, Giordano T, Greulich H, Hanna M, Johnson BE, Kris MG, Lash A, Lin L, Lindeman N, Mardis ER, McPherson JD, Minna JD, Morgan MB, Nadel M, Orringer MB, Osborne JR, Ozenberger B, Ramos AH, Robinson J, Roth JA, Rusch V, Sasaki H, Shepherd F, Sougnez C, Spitz MR, Tsao MS, Twomey D, Verhaak RGW, Weinstock GM, Wheeler DA, Winckler W, Yoshizawa A, Yu S, Zakowski MF, Zhang Q, Beer DG, Wistuba II, Watson MA, Garraway LA, Ladanyi M, Travis WD, Pao W, Rubin MA, Gabriel SB, Gibbs RA, Varmus HE, Wilson RK, Lander ES, Meyerson M: Characterizing the cancer genome in lung adenocarcinoma. Nature. 2007, 450 (7171): 893-8. 10.1038/nature06358.
Article CAS PubMed PubMed Central Google Scholar
Myllykangas S, Himberg J, Bohling T, Nagy B, Hollmen J, Knuutila S: DNA copy number amplification profiling of human neoplasms. Oncogene. 2006, 25 (55): 7324-32. 10.1038/sj.onc.1209717.
Article CAS PubMed Google Scholar
Coe B, Lockwood W, Girard L, Chari R, Macaulay C, Lam S, Gazdar A, Minna J, Lam W: Di erential disruption of cell cycle pathways in small cell and non-small cell lung cancer. Br J Cancer. 2006, 94 (12): 1927-35. 10.1038/sj.bjc.6603167.
Article CAS PubMed PubMed Central Google Scholar
Baudis M: Genomic imbalances in 5918 malignant epithelial tumors: an explorative meta-analysis of chromosomal CGH data. BMC Cancer. 2007, 7: 226-10.1186/1471-2407-7-226.
Article PubMed PubMed Central Google Scholar
Kallioniemi A, Kallioniemi O, Sudar D, Rutovitz D, Gray J, Waldman F, Pinkel D: Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors. Science. 1992, 258 (5083): 818-21. 10.1126/science.1359641.
Article CAS PubMed Google Scholar
du Manoir S, Speicher M, Joos S, Schrock E, Popp S, Dohner H, Kovacs G, Robert-Nicoud M, Lichter P, Cremer T: Detection of complete and partial chromosome gains and losses by comparative genomic in situ hybridization. Hum Genet. 1993, 90 (6): 590-610.
Article CAS PubMed Google Scholar
Joos S, Scherthan H, Speicher M, Schlegel J, Cremer T, Lichter P: Detection of amplified DNA sequences by reverse chromosome painting using genomic tumor DNA as probe. Hum Genet. 1993, 90 (6): 584-9. 10.1007/BF00202475.
Article CAS PubMed Google Scholar
Solinas-Toldo S, Lampel S, Stilgenbauer S, Nickolenko J, Benner A, Dohner H, Cremer T, Lichter P: Matrix-based comparative genomic hybridization: biochips to screen for genomic imbalances. Genes Chromosomes Cancer. 1997, 20 (4): 399-407. 10.1002/(SICI)1098-2264(199712)20:4<399::AID-GCC12>3.0.CO;2-I.
Article CAS PubMed Google Scholar
Pinkel D, Albertson D: Comparative genomic hybridization. Annu Rev Genomics Hum Genet. 2005, 6: 331-54. 10.1146/annurev.genom.6.080604.162140.
Article CAS PubMed Google Scholar
Beroukhim R, Mermel C, Porter D, Wei G, Raychaudhuri S, Donovan J, Barretina J, Boehm J, Dobson J, Urashima M, Henry KM, Pinchback R, Ligon A, Cho Y, Haery L, Greulich H, Reich M, Winckler W, Lawrence M, Weir B, Tanaka K, Chiang D, Bass A, Loo A, Hoffman C, Prensner J, Liefeld T, Gao Q, Yecies D, Signoretti S, Maher E, Kaye F, Sasaki H, Tepper J, Fletcher J, Tabernero J, Baselga J, Tsao M, Demichelis F, Rubin M, Janne P, Daly M, Nucera C, Levine R, Ebert B, Gabriel S, Rustgi A, Antonescu C, Ladanyi M, Letai A, Garraway L, Loda M, Beer D, True L, Okamoto A, Pomeroy S, Singer S, Golub T, Lander E, Getz G, Sellers W, Meyerson M: The landscape of somatic copy-number alteration across human cancers. Nature. 2010, 463 (7283): 899-905. 10.1038/nature08822.
Article CAS PubMed PubMed Central Google Scholar
Liu J, Ranka S, Kahveci T: Markers improve clustering of CGH data. Bioinformatics. 2007, 23 (4): 450-7. 10.1093/bioinformatics/btl624.
Article CAS PubMed Google Scholar
Hoglund M, Frigyesi A, Sall T, Gisselsson D, Mitelman F: Statistical behavior of complex cancer karyotypes. Genes Chromosomes Cancer. 2005, 42 (4): 327-41. 10.1002/gcc.20143. [(c) 2005 Wiley-Liss, Inc.]
Article PubMed Google Scholar
Desper R, Jiang F, Kallioniemi O, Moch H, Papadimitriou C, Schaffer A: Distance-based reconstruction of tree models for oncogenesis. J Comput Biol. 2000, 7 (6): 789-803. 10.1089/10665270050514936.
Article CAS PubMed Google Scholar
Gerstung M, Baudis M, Moch H, Beerenwinkel N: Quantifying cancer progression with conjunctive Bayesian networks. Bioinformatics. 2009, 25 (21): 2809-15. 10.1093/bioinformatics/btp505.
Article CAS PubMed PubMed Central Google Scholar
Vogelstein B, Fearon E, Hamilton S, Kern S, Preisinger A, Leppert M, Nakamura Y, White R, Smits A, Bos J: Genetic alterations during colorectal-tumor development. N Engl J Med. 1988, 319 (9): 525-32. 10.1056/NEJM198809013190901.
Article CAS PubMed Google Scholar
Dohner H, Stilgenbauer S, Benner A, Leupolt E, Krober A, Bullinger L, Dohner K, Bentz M, Lichter P: Genomic aberrations and survival in chronic lymphocytic leukemia. N Engl J Med. 2000, 343 (26): 1910-6. 10.1056/NEJM200012283432602.
Article CAS PubMed Google Scholar
Bredel M, Scholtens D, Harsh G, Bredel C, Chandler J, Renfrow J, Yadav A, Vogel H, Scheck A, Tibshirani R, Sikic B: A network model of a cooperative genetic landscape in brain tumors. JAMA. 2009, 302 (3): 261-75. 10.1001/jama.2009.997.
Article CAS PubMed PubMed Central Google Scholar
Klijn C, Bot J, Adams D, Reinders M, Wessels L, Jonkers J: Identification of networks of co-occurring, tumor-related DNA copy number changes using a genome-wide scoring approach. PLoS Comput Biol. 2010, 6: e1000631-10.1371/journal.pcbi.1000631.
Article PubMed PubMed Central Google Scholar
Baudis M, Cleary ML: Progenetix.net: an online repository for molecular cytogenetic aberration data. Bioinformatics. 2001, 17 (12): 1228-9. 10.1093/bioinformatics/17.12.1228.
Article CAS PubMed Google Scholar
Boerma E, Siebert R, Kluin P, Baudis M: Translocations involving 8q24 in Burkitt lymphoma and other malignant lymphomas: a historical review of cytogenetics in the light of todays knowledge. Leukemia. 2009, 23: 225-234. 10.1038/leu.2008.281.
Article CAS PubMed Google Scholar
Tan PN, Steinbach M, Kumar V: Introduction to data mining. 2005, Bosotn, MA, USA: Addison Wesley
Google Scholar
Eguchi S, Yamamoto Y, Sakano S, Chochi Y, Nakao M, Kawauchi S, Furuya T, Oga A, Matsuyama H, Sasaki K: The loss of 8p23.3 is a novel marker for predicting progression and recurrence of bladder tumors without muscle invasion. Cancer Genet Cytogenet. 2010, 200: 16-22. 10.1016/j.cancergencyto.2010.03.007. [2010 Elsevier Inc. All rights reserved.]
Article CAS PubMed Google Scholar
Wang Q, Moyret-Lalle C, Couzon F, Surbiguet-Clippe C, Saurin J, Lorca T, Navarro C, Puisieux A: Alterations of anaphase-promoting complex genes in human colon cancer cells. Oncogene. 2003, 22 (10): 1486-90. 10.1038/sj.onc.1206224.
Article CAS PubMed Google Scholar
Loro L, Johannessen A, Vintermyr O: Loss of BCL-2 in the progression of oral cancer is not attributable to mutations. J Clin Pathol. 2005, 58 (11): 1157-62. 10.1136/jcp.2004.021709.
Article CAS PubMed PubMed Central Google Scholar
Maliekal T, Antony M, Nair A, Paulmurugan R, Karunagaran D: Loss of expression, and mutations of Smad 2 and Smad 4 in human cervical cancer. Oncogene. 2003, 22 (31): 4889-97. 10.1038/sj.onc.1206806.
Article CAS PubMed Google Scholar
Lin HK, Chen Z, Wang G, Nardella C, Lee SW, Chan CH, Yang WL, Wang J, Egia A, Nakayama KI, Cordon-Cardo C, Teruya-Feldstein J, Pandolfi PP: Skp2 targeting suppresses tumorigenesis by Arf-p53-independent cellular senescence. Nature. 2010, 464 (7287): 374-9. 10.1038/nature08815.
Article CAS PubMed PubMed Central Google Scholar
Groden J, Thliveris A, Samowitz W, Carlson M, Gelbert L, Albertsen H, Joslyn G, Stevens J, Spirio L, Robertson M, al et: Identification and characterization of the familial adenomatous polyposis coli gene. Cell. 1991, 66 (3): 589-600. 10.1016/0092-8674(81)90021-0.
Article CAS PubMed Google Scholar
Kinzler K, Nilbert M, Vogelstein B, Bryan T, Levy D, Smith K, Preisinger A, Hamilton S, Hedge P, Markham A, al et: Identification of a gene located at chromosome 5q21 that is mutated in colorectal cancers. Science. 1991, 251 (4999): 1366-70. 10.1126/science.1848370.
Article CAS PubMed Google Scholar
Nishisho I, Nakamura Y, Miyoshi Y, Miki Y, Ando H, Horii A, Koyama K, Utsunomiya J, Baba S, Hedge P: Mutations of chromosome 5q21 genes in FAP and colorectal cancer patients. Science. 1991, 253 (5020): 665-9. 10.1126/science.1651563.
Article CAS PubMed Google Scholar
Ashton-Rickardt P, Wyllie A, Bird C, Dunlop M, Steel C, Morris R, Piris J, Romanowski P, Wood R, White R, al et: MCC, a candidate familial polyposis gene in 5q.21, shows frequent allele loss in colorectal and lung cancer. Oncogene. 1991, 6 (10): 1881-6.
CAS PubMed Google Scholar
Trauzold A, Roder C, Sipos B, Karsten K, Arlt A, Jiang P, Martin-Subero J, Siegmund D, Muerkoster S, Pagerols-Raluy L, Siebert R, Wajant H, Kalthoff H: CD95 and TRAF2 promote invasiveness of pancreatic cancer cells. FASEB J. 2005, 19 (6): 620-2.
CAS PubMed Google Scholar
Sugano N, Suda T, Godai T, Tsuchida K, Shiozawa M, Sekiguchi H, Yoshihara M, Matsukuma S, Sakuma Y, Tsuchiya E, Kameda Y, Akaike M, Miyagi Y: MDM2 gene amplification in colorectal cancer is associated with disease progression at the primary site, but inversely correlated with distant metastasis. Genes Chromosomes Cancer. 2010, 49 (7): 620-9. [(c) 2010 Wiley-Liss, Inc.]
CAS PubMed Google Scholar
Dierlamm J, Penas EM, Bentink S, Wessendorf S, Berger H, Hummel M, Klapper W, Lenze D, Rosenwald A, Haralambieva E, Ott G, Cogliatti S, Moller P, Schwaenen C, Stein H, Loffer M, Spang R, Trumper L, Siebert R: Gain of chromosome region 18q21 including the MALT1 gene is associated with the activated B-cell-like gene expression subtype and increased BCL2 gene dosage and protein expression in diffuse large B-cell lymphoma. Haematologica. 2008, 93 (5): 688-96. 10.3324/haematol.12057.
Article CAS PubMed Google Scholar
Konnikova L, Simeone M, Kruger M, Kotecki M, Cochran B: Signal transducer and activator of transcription 3 (STAT3) regulates human telomerase reverse transcriptase (hTERT) expression in human cancer and primary cells. Cancer Res. 2005, 65 (15): 6516-20. 10.1158/0008-5472.CAN-05-0924.
Article CAS PubMed Google Scholar
He B, Reguart N, You L, Mazieres J, Xu Z, Lee A, Mikami I, McCormick F, Jablons D: Blockade of Wnt-1 signaling induces apoptosis in human colorectal cancer cells containing downstream mutations. Oncogene. 2005, 24 (18): 3054-8. 10.1038/sj.onc.1208511.
Article CAS PubMed Google Scholar
Shannon P, Markiel A, Ozier O, Baliga N, Wang J, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Research. 2003, 13: 2498-2504. 10.1101/gr.1239303.
Article CAS PubMed PubMed Central Google Scholar

Pre-publication history

The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1755-8794/4/21/prepub

Download references

Acknowledgements

NK is supported through a grant by the Krebsliga Schweiz (Swiss Cancer League). Haoyang Cai is supported through a grant from the China Scholarship Council.

Author information

Authors and Affiliations

Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, Zurich, Switzerland
Nitin Kumar, Haoyang Cai & Michael Baudis
Functional Genomics Center Zurich, University of Zurich, Winterthurerstrasse 190, Zurich, Switzerland
Hubert Rehrauer

Authors

Nitin Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Hubert Rehrauer
View author publications
You can also search for this author in PubMed Google Scholar
Haoyang Cai
View author publications
You can also search for this author in PubMed Google Scholar
Michael Baudis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Baudis.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

NK, MB, HR designed and conceived the experiments; NK implemented the software; NK, MB analyzed the data, HC, MB contributed to the data. All authors read and approved the final manuscript.

Electronic supplementary material

12920_2010_217_MOESM1_ESM.PDF

Additional file 1: CICOCA: A method to define complexity independence of co-occurring chromosomal aberrations. The additional file contains information about the statistical method CICOCA which is compared with CDCOCA. This method (as described in text above) aims in finding co-occurring chromosomal associations independent of the sample complexity. In addition to CICOCA this file also contains all the additional figures which are referred in the paper along with a detail description of all the additional figures. (PDF 2 MB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Authors’ original file for figure 9

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Kumar, N., Rehrauer, H., Cai, H. et al. CDCOCA: A statistical method to define complexity dependence of co-occuring chromosomal aberrations. BMC Med Genomics 4, 21 (2011). https://doi.org/10.1186/1755-8794-4-21

Download citation

Received: 26 July 2010
Accepted: 03 March 2011
Published: 03 March 2011
DOI: https://doi.org/10.1186/1755-8794-4-21

CDCOCA: A statistical method to define complexity dependence of co-occuring chromosomal aberrations

Abstract

Background

Method

Results

Conclusions

Background

Methods

Data

Model

CDCOCA Algorithm

Results and Discussion

Bladder carcinoma

Mantle cell lymphoma

Conclusions

References

Pre-publication history

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors' contributions

Electronic supplementary material

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Medical Genomics

Contact us