CNAReporter: a GenePattern pipeline for the generation of clinical reports of genomic alterations
- Yuri Kotliarov1,
- Serdar Bozdag1,
- Hangjiong Cheng2,
- Stefan Wuchty†1,
- Jean-Claude Zenklusen†1 and
- Howard A Fine†1Email author
© Kotliarov et al; licensee BioMed Central Ltd. 2010
Received: 12 August 2009
Accepted: 9 April 2010
Published: 9 April 2010
Genomic copy number alterations are widely associated with a broad range of human tumors and offer the potential to be used as a diagnostic tool. Especially in the emerging era of personalized medicine medical informatics tools that allow the fast visualization and analysis of genomic alterations of a patient's genomic profile for diagnostic and potential treatment purposes increasingly gain importance.
We developed CNAReporter, a software tool that allows users to visualize SNP-specific data obtained from Affymetrix arrays and generate PDF-reports as output. We combined standard algorithms for the analysis of chromosomal alterations, utilizing the widely applied GenePattern framework. As an example, we show genome analyses of two patients with distinctly different CNA profiles using the tool.
Glioma subtypes, characterized by different genomic alterations, are often treated differently but can be difficult to differentiate pathologically. CNAReporter offers a user-friendly way to visualize and analyse genomic changes of any given tumor genomic profile, thereby leading to an accurate diagnosis and patient-specific treatment.
Genomic copy number alterations are widely associated with a broad range of human diseases . In general, tumors  have genomic abnormalities that are largely characterized by copy number alterations. Specifically, amplifications, deletions and allelic imbalances are hallmarks of human gliomas [3, 4]. Such genomic data offer important biological insights into the pathogenesis of the disease and might serve as valuable clinical and diagnostic tool. By classifying patients into more homogeneous tumor groups, genomic alteration data also might allow the enrichment of patient subpopulations with genetic targets that are more likely to respond to specific molecularly targeted therapy.
Copy number analysis for a single patient usually include several major steps, such as (1) raw data processing with normalization and calculation of log2 ratios of probe intensities in tumor compared to either a reference sample or reference set - the values representing copy numbers; (2) smoothing and segmentation of copy numbers followed by selection of areas of copy number alterations (CNA); (3) if available, analysis of loss-of-heterozygosity (LOH) and (4) visualization of the CNA/LOH profile.
Many tools for the analysis of copy number profiles have been developed by the scientific community and are often freely available and cover all steps of analyses such as Affymetrix CNAT , CNAG , dChipSNP , ArrayFusion , perl-based PennCNV , as well as several R/Bioconductor packages like aroma.affymetrix  and SNPchip . Most of these software packages, however, require that the user has substantial bioinformatics knowledge and computer/programming skills. The output is generally an interactive browser of genomic profiles and/or exported figures/text files. With an ever-increasing demand for patient-specific genomic data by clinical researchers and clinicians, however, there is a great need for analyses tools and output formats that individuals without computational expertise can utilize to generate such information.
Our goal, therefore, was to create an easy to use tool for non-sophisticated users who would want a "snapshot" of the genomic profile for a particular tumor/tissue sample from the raw microarray data and to obtain that data in an easy to understand printable format suitable for clinical trial study charts or medical records. Based on the widely used GenePattern framework , we developed CNAReporter, a reporting tool that interprets experimental measurements from high resolution Human Mapping GeneChip arrays (Affymetrix Inc., Santa Clara, CA) . Providing statistical treatment of such data, CNAReporter determines and annotates regions of genomic alterations in a sample and summarizes results in a printable PDF-file. Specifically, we show the usefulness of CNAReporter as a clinical tool that supports the accurate diagnosis and treatment of patients with primary brain tumors.
Data, Workflow and User Interface
We designed CNAReporter as a GenePattern 3.0 pipeline , consisting of two modules (Figure 1). From paired input data, the module GenerateAffyCNTfiles calculates copy numbers (CN), provides smoothed CN-profiles and calculates LOH status of genomic locations. All results are stored in standard Affymetrix CNT files. The module was written in Perl as a wrapper for platform corresponding binary executables of the Affymetrix DevNet Tools copy-number pipeline , allowing the on-the-fly generation of all intermediate files. Details about the corresponding algorithms can be found in .
Reading generated CNT-files, the module GenerateAlterationReport determines CNA and LOH areas. The final output is a printable PDF-file that provides a table of altered genomic areas and graphic visualizations as a genomic profile and chromosome plot (Figure 1). The module was implemented in MATLAB (The Mathworks, Inc., Natick, MA) requiring the Bioinformatics Toolbox to create chromosome plots and using a Perl script to generate the final PDF report.
CNAReporter provides a user interface to a standard GenePattern pipeline, allowing the input of the aforementioned Affymetrix-specific files. Advanced options include the selection of thresholds for the detection of CNA and LOH as well as the ability to plot genomic profiles for individual chromosomes.
CNAReporter runs on all platforms that GenePattern, MATLAB and Affymetrix tools support, including Windows, Mac, Linux and Sun Solaris. Currently supported arrays include Affymetrix 500K, 100K and 10K human mapping arrays. Since the latest SNP5 and SNP6 arrays require different algorithms to estimate copy numbers, their support will be added in the future.
Determination of copy numbers
Copy numbers are calculated using Affymetrix Copy Number Analysis Tool (CNAT 4), a set of command line programs. We allow filtering out SNPs with large PCR fragment length (MaxFragSize parameter, 600 bp by default) to support samples with partially degraded DNA . For standard fresh-frozen samples this parameter can be set to 0 to include all SNPs. After probe-level normalization and summarization, calculated log2-tranformed ratios are used to estimate raw copy numbers (CN). Using a Gaussian approach, raw SNP profiles are smoothed (>500 kb window by default) and segmented by a Hidden Markov Model approach [18, 19]. Raw and smoothed copy numbers are saved in an Affymetrix-based CNT file.
Determination of loss of heterozygosity (LOH)
LOH calls for each SNP are determined by comparing corresponding genomic calls in the tumor and the germline sample, provided that SNPs are heterozygous in the reference sample. Specifically, we use the LOH algorithm as implemented in CNAT 4 . LOH values are also segmented utilizing a Hidden Markov Model and saved in an Affymetrix-based CNT file.
Utilizing copy numbers and LOHs that characterize the underlying tumor sample by individual SNPs, CNAReporter applies three threshold parameters to define areas of CNAs. Increasing the absolute values of those parameters makes detection more conservative, thereby, decreasing the number of false positive areas but increasing the possibility of missing real changes. In addition, neighboring LOH areas are combined if they are within a certain threshold distance from each other (set by LOHMergeThreshold parameter, 2 Mbp by default). Lowering this parameter may cause splitting of LOH areas due to errors in genotyping calls. Default thresholds are determined empirically from the analysis of other brain tumor samples.
Tissue from fresh frozen tumor specimens and resultant data were collected under an NCI-IRB (FWA # 00005897/IRB# 00000001)-approved protocol (NCI#:02C0140). Informed consent was obtained from each patient and documented in the medical records. Specimens and data were de-identified to comply with patients' privacy rules.
We introduced CNAReporter, a user-friendly, integrated tool that allows the quick analysis and visualization of chromosomal alterations. In particular, CNAReporter provides detailed high-quality reports of genomic alterations in a printable format, allowing our application to be used as a standard tool for clinical diagnostics and decision-making. Specifically, we use CNAReporter routinely for the analysis of genomic alterations of brain tumor tissues, potentially allowing us to make objective tumor subtype diagnoses, stratify patients into biologically more homogeneous tumor subgroups for clinical trials and select patient-specific treatments based on objective genomic data. We designed CNAReporter as a pipeline in GenePattern environment, a well-known open-source web-based framework that supports multiple platforms. Once properly installed on a server, GenePattern does not require additional software to be installed on the user's computer and can be accessed from any site with only a web browser. The GenePattern framework provides security, job management, uniform interface, relative ease of customization and integration with other developers' tools. Currently, GenePattern already has several modules for SNP analysis in its repository, such as the preprocessing SNPFileCreator module (which does not implement paired analysis) and GISTIC for chromosomal aberrations discovery in multi-sample datasets . We believe our tool would be a significant addition to this suite. Due to its open architecture, CNAReporter can easily be further developed in an open-source sense and integrated into other systems for genomic analysis.
Availability and Requirements
CNAReporter with installation and usage instructions as well as all required files can be downloaded from http://gforge.nci.nih.gov/projects/cnareport.
The program is available for Linux/Unix, Mac and Windows operating systems, and requires MATLAB 2007b (or later) with Bioinformatics Toolbox, Perl 5 (including CRAN libraries) and GenePattern 3.0 (or later).
(DNA) copy number
copy number alteration
loss of heterozygosity
single nucleotide polymorphism
(Mega) base pairs.
This research was supported by the Intramural Research Program of the NIH, National Cancer Institute.
- Weber BL: Cancer genomics. Cancer Cell. 2002, 1 (1): 37-47. 10.1016/S1535-6108(02)00026-0.View ArticlePubMed
- Albertson DG, Collins C, McCormick F, Gray JW: Chromosome aberrations in solid tumors. Nat Genet. 2003, 34 (4): 369-376. 10.1038/ng1215.View ArticlePubMed
- Kotliarov Y, Steed ME, Christopher N, Walling J, Su Q, Center A, Heiss J, Rosenblum M, Mikkelsen T, Zenklusen JC, et al: High-resolution global genomic survey of 178 gliomas reveals novel regions of copy number alteration and allelic imbalances. Cancer Res. 2006, 66 (19): 9428-9436. 10.1158/0008-5472.CAN-06-1691.PubMed CentralView ArticlePubMed
- Beroukhim R, Getz G, Nghiemphu L, Barretina J, Hsueh T, Linhart D, Vivanco I, Lee JC, Huang JH, Alexander S, et al: Assessing the significance of chromosomal aberrations in cancer: methodology and application to glioma. Proc Natl Acad Sci USA. 2007, 104 (50): 20007-20012. 10.1073/pnas.0710052104.PubMed CentralView ArticlePubMed
- CNAT 4.0: Copy Number and Loss of Heterozygosity Estimation Algorithms for the GeneChip(r) Human Mapping 10/50/100/250/500K Array Set. [http://www.affymetrix.com/support/technical/whitepapers/cnat_4_algorithm_whitepaper.pdf]
- Nannya Y, Sanada M, Nakazaki K, Hosoya N, Wang L, Hangaishi A, Kurokawa M, Chiba S, Bailey DK, Kennedy GC, et al: A robust algorithm for copy number detection using high-density oligonucleotide single nucleotide polymorphism genotyping arrays. Cancer Res. 2005, 65 (14): 6071-6079. 10.1158/0008-5472.CAN-05-0465.View ArticlePubMed
- Lin M, Wei LJ, Sellers WR, Lieberfarb M, Wong WH, Li C: dChipSNP: significance curve and clustering of SNP-array-based loss-of-heterozygosity data. Bioinformatics. 2004, 20 (8): 1233-1240. 10.1093/bioinformatics/bth069.View ArticlePubMed
- Yang TP, Chang TY, Lin CH, Hsu MT, Wang HW: ArrayFusion: a web application for multi-dimensional analysis of CGH, SNP and microarray data. Bioinformatics. 2006, 22 (21): 2697-2698. 10.1093/bioinformatics/btl457.View ArticlePubMed
- Wang K, Li M, Hadley D, Liu R, Glessner J, Grant SF, Hakonarson H, Bucan M: PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 2007, 17 (11): 1665-1674. 10.1101/gr.6861907.PubMed CentralView ArticlePubMed
- Bengtsson H, Irizarry R, Carvalho B, Speed TP: Estimation and assessment of raw copy numbers at the single locus level. Bioinformatics. 2008, 24 (6): 759-767. 10.1093/bioinformatics/btn016.View ArticlePubMed
- Scharpf RB, Ting JC, Pevsner J, Ruczinski I: SNPchip: R classes and methods for SNP array data. Bioinformatics. 2007, 23 (5): 627-628. 10.1093/bioinformatics/btl638.View ArticlePubMed
- Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, Mesirov JP: GenePattern 2.0. Nat Genet. 2006, 38 (5): 500-501. 10.1038/ng0506-500.View ArticlePubMed
- Komura D, Shen F, Ishikawa S, Fitch KR, Chen W, Zhang J, Liu G, Ihara S, Nakamura H, Hurles ME, et al: Genome-wide detection of human copy number variations using high-density DNA oligonucleotide arrays. Genome Res. 2006, 16 (12): 1575-1584. 10.1101/gr.5629106.PubMed CentralView ArticlePubMed
- Di X, Matsuzaki H, Webster TA, Hubbell E, Liu G, Dong S, Bartell D, Huang J, Chiles R, Yang G, et al: Dynamic model based algorithms for screening and genotyping over 100 K SNPs on oligonucleotide microarrays. Bioinformatics. 2005, 21 (9): 1958-1963. 10.1093/bioinformatics/bti275.View ArticlePubMed
- BRLMM: an Improved Genotype Calling Method for the GeneChip® Human Mapping 500K Array Set. [http://www.affymetrix.com/support/technical/whitepapers/brlmm_whitepaper.pdf]
- Affymetrix DevNet Tools. [http://www.affymetrix.com/partners_programs/programs/developer/tools/devnettools.affx]
- Jacobs S, Thompson ER, Nannya Y, Yamamoto G, Pillai R, Ogawa S, Bailey DK, Campbell IG: Genome-wide, high-resolution detection of copy number, loss of heterozygosity, and genotypes from formalin-fixed, paraffin-embedded tumor tissue using microarrays. Cancer Res. 2007, 67 (6): 2544-2551. 10.1158/0008-5472.CAN-06-3597.View ArticlePubMed
- Fridlyand J, Snijders AM, Pinkel D, Albertson DG, Jain AN: Hidden Markov models approach to the analysis of array CGH data. Jourrnal of Multivariate Analysis. 2004, 90: 132-153. 10.1016/j.jmva.2004.02.008.View Article
- Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, et al: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004, 5 (10): R80-10.1186/gb-2004-5-10-r80.PubMed CentralView ArticlePubMed
- UCSC Genome Browser. [http://genome.ucsc.edu]
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1755-8794/3/11/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.