CLImATHET: detecting subclonal copy number alterations and loss of heterozygosity in heterogeneous tumor samples from wholegenome sequencing data
 Zhenhua Yu^{1},
 Ao Li^{1, 2}Email author and
 Minghui Wang^{1, 2}
DOI: 10.1186/s1292001702554
© The Author(s). 2017
Received: 17 December 2015
Accepted: 8 March 2017
Published: 15 March 2017
Abstract
Background
Copy number alterations (CNA) and loss of heterozygosity (LOH) represent a large proportion of genetic structural variations of cancer genomes. These aberrations are continuously accumulated during the procedure of clonal evolution and patterned by phylogenetic branching. This invariably results in the emergence of multiple cell populations with distinct complement of mutational landscapes in tumor sample. With the advent of nextgeneration sequencing technology, inference of subclonal populations has become one of the focused interests in cancerassociated studies, and is usually based on the assessment of combinations of somatic singlenucleotide variations (SNV), CNA and LOH. However, cancer samples often have several inherent issues, such as contamination of normal stroma, tumor aneuploidy and intratumor heterogeneity. Addressing these critical issues is imperative for accurate profiling of clonal architecture.
Methods
We present CLImATHET, a computational method designed for capturing clonal diversity in the CNA/LOH dimensions by taking into account the intratumor heterogeneity issue, in the case where a reference or matched normal sample is absent. The algorithm quantitatively represents the clonal identification problem using a factorial hidden Markov model, and takes an integrated analysis of read counts and allele frequency data. It is able to infer subclonal CNA and LOH events as well as the fraction of cells harboring each event.
Results
The results on simulated datasets indicate that CLImATHET has high power to identify CNA/LOH segments, it achieves an average accuracy of 0.87. It can also accurately infer proportion of each clonal population with an overall Pearson correlation coefficient of 0.99 and a mean absolute error of 0.02. CLImATHET shows significant advantages when compared with other existing methods. Application of CLImATHET to 5 primary triple negative breast cancer samples demonstrates its ability to capture clonal diversity in the CAN/LOH dimensions. It detects two clonal populations in one sample, and three clonal populations in one other sample.
Conclusions
CLImATHET, a novel algorithm is introduced to infer CNA/LOH segments from heterogeneous tumor samples. We demonstrate CLImATHET’s ability to accurately recover clonal compositions using tumor WGS data without a match normal sample.
Keywords
Copy number alteration Loss of heterozygosity Wholegenome sequencing Intratumor heterogeneity Hidden Markov model Bayesian information criterionBackground
Cancer is a disease that develops through accumulation of various genetic variability [1]. The clonal principles of tumor progression states that once initialization of a single founder cell is activated, various factors associated with carcinogenesis permit the descendants of the founder cell to resist apoptosis and undergo clonal growth, accompanying with acquirement of genomic alterations through multiple rounds of clonal expansion, and ultimately to invade neighboring tissues and metastasize to distant organs [1–5]. A tumor is thus heterogeneous and a mixture of multiple cell populations, and each population is characterized by a distinct complement of genomic aberrations. Genomic aberrations consist of somatic mutations such as singlenucleotide variations (SNV), copy number alterations (CNA), loss of heterozygosity (LOH), and more complicated structure variations (SVs) including inversion, translocation and etc., of which CNA and LOH are two frequently observed features of cancer genomes, and accurate detection of these aberrations is a crucial step to identify cancercausing genes [6, 7].
Cancerassociated studies have been greatly promoted by continuous advances in experimental technologies for landscape of cancer genomes [8–12]. With the advance of sequencing technologies, highthroughput DNA sequencing presents an unprecedented advantage in deconvolving intratumor heterogeneity and detecting subclonal aberrations compared with arraybased technologies. Wholegenome sequencing (WGS) of tumor samples is now a generally adopted approach for comprehensive analysis of structural and nucleotidelevel aberrations that underpin tumor progression [13]. However, analysis of WGS data is generally complicated by several issues. For example, tumor sample is usually impure and mixture of cancerous and normal cells [9, 14]. The fraction of cancerous cells is usually represented as tumor purity. Low tumor purity will significantly diminish sequencingderived signals, making it complicated to distinguish between aberrant and normal regions. Another intractable issue associated with cancer genomes is aneuploidy, caused by deletions or duplications of genomic segments or entire chromosomes in cancer genomes, and the average copy number of cancer genome is usually unknown [15–17]. One other more complex issue is the intratumor heterogeneity that results from the ongoing subclonal evolution [18], and the underlying clonal architecture is usually unavailable when dealing with patients’ tumor samples.
In contrast to related approaches in the literature, our method presents several distinct characteristics. APOLLOH [13] is designed to infer LOH from tumor sequence data, however it does not take into account the issue of tumor heterogeneity. PurBayes [20] uses somatic singlenucleotide variants derived from paired tumornormal samples to estimate tumor cellularity and subclonality, and is only applicable to diploid tumor samples due to no integration of tumor ploidy and CNA information. THetA [21] is designed to infer cancer subclones by addressing a maximum likelihood mixture decomposition problem based on paired tumornormal samples, but it cannot automatically determine the number of underlying subclonal populations. A recently developed method, TITAN [22], is designed to jointly call CNA and LOH segments from WGS data of tumornormal paired samples. TITAN introduces a delicate factorial HMM to infer tumor subclones and performs an exhaustive search for the number of clonal populations up to 5 to find the optimal model structure. Our approach differs from TITAN in two aspects: 1) CLImATHET uses only single tumor sample and no reference or matched normal sample is required, and 2) CLImATHET learns the model structure under a BIC. Another method called OncoSNPSEQ [23] accounts for intratumor heterogeneity by employing a nonclustering approach to model the distinct cell populations [22].
We perform a comprehensive evaluation of CLImATHET using simulated and real WGS data. By quantitative benchmarking on simulated datasets, we demonstrate that CLImATHET has the capacity to accurately infer cellularity of clonal clusters and subclonal CNA/LOH segments, and shows significant advantages when compared with existing approaches. We apply CLImATHET to 5 primary triple negative breast cancer (TNBC) samples to show its ability to identify clonal diversity in the CNA/LOH dimensions.
Methods
CLImATHET pipeline
The pipeline of CLImATHET is illustrated in Fig. 1a. The inputs to the model are extracted from WGS data using our previously published tool DFExtract [19]. Our analysis covers more than 2.6 million known SNPs along the whole genome. Copy number data of N SNPs is represented by read counts d _{1:N }, meanwhile B allele frequency is represented by Ballele read depth b _{1:N } and total read depth T _{1:N }. Following the procedures adopted in CLImAT [19], read counts are obtained by counting the reads within a 1000bp window centered at each SNP, and further processed to correct GCcontent and mappability bias. For B allele frequency, quantile normalization of read depths is automatically performed to eliminate allelic bias based on selection of optimal threshold by a grid search. CLImATHET then jointly analyze d _{1:N }, b _{1:N } and T _{1:N } using an integrated HMM, and iteratively examine the model complexity using BIC for different number of clonal clusters, and finally output clonal/subclonal CNA and LOH segments as well as the cellularity of each clonal cluster.
The statistical models in CLImATHET
It is intractable to precisely depict the genomewide aberration status of tumors containing multiple subclones, therefore we adopt a simple assumption that the observed signals at a genomic locus are generated from underlying three types of cell populations: normal (nontumor) cells, tumor cells with normal genotype, and tumor cells harboring the aberration event of interest (Fig. 1b). Thus, cell populations can be ultimately divided into two parts at a genomic locus: one with normal genotype and relative abundance of (1–β), and another harbors the aberration event with cellularity of β. To avoid extensive local parameters, we further assume all the aberration events can be clustered into a finite number of K groups, of which the kth group corresponds to the kth clonal population with respective cellularity of β _{ k } (β _{ 1 } ˂β _{ 2 } ˂…˂β _{ K }), and tumor purity is equal to β _{ K }. Based on these assumptions, the model parameters can be effectively inferred by borrowing statistical approaches.
The meanings of all other parameters involved in above formulas are the same as the ones described in CLImAT, and we do not go into detail here.
Given the number of clonal clusters K, the EM algorithm is implemented as follows: 1) start with initial parameters θ ^{ 0 } = (π ^{ 0 },A ^{ 0 },β ^{ 0 },λ ^{ 0 },p ^{ 0 }) and calculates the jointposterior probability γ _{ i,c,k }, 2) update θ ^{ 1 } = (π ^{ 1 },A ^{ 1 },β ^{ 1 },λ ^{ 1 },p ^{ 1 }) using Newton–Raphson method, 3) repeat steps (1) and (2) until a specified number of iterations are reached or the convergence criterion is met. The converged parameters \( \widehat{\theta}=\left(\widehat{\pi},\widehat{A},\widehat{\beta},\widehat{\lambda},\widehat{p}\right) \) in the last iteration of the training process will be output as the optimal estimators. Furthermore, we perform a grid search of the initial parameters θ ^{ 0 } = (π ^{ 0 },A ^{ 0 },β ^{ 0 },λ ^{ 0 },p ^{ 0 }) to find the globally optimal solution.
The iteration is stopped once the BIC difference is greater than zero. After the model learning is finished, aberration information including copy number, tumor genotype, aberration type and clonal cluster for each SNP can be inferred from the hidden state with the maximum posterior probability. At the same time, segmentation of all SNPs based on hidden states is performed to output clonal/subclonal CNA and LOH segments.
Implementation of CLImATHET
An efficient implementation of CLImATHET by using C/Matlab is freely available at GitHub [27], our previously published tool DFExtract [19] is used to prepare input files for CLImATHET.
Datasets
Real dataset
WGS data from 5 primary TNBC samples described in a previous study [28] are analyzed in this study. Each sample was sequenced at approximately 30× coverage on the Life/ABI SOLID sequencing platform.
Simulated dataset
To simulate tumors containing multiple clones, we first define one normal genome (denoted as n) and four tumor genomes (denoted as a, b, c and d), and then generate mixtures by mixing five genomes at predefined proportions. As illustrated in Figure S1 (Additional file 2), a is the main clone and the genome is generated by defining a number of segments along the reference, meanwhile each segment is specified by a copy number state including total copy number and major allele copy number. Genome b and c are constructed by introducing new aberrations into genome a, and genome d corresponds to the subclones deriving from the second clonal expansion based on b. We use a normal sample HCC1954BL downloaded from CGHub [29] to generate sequencing data of the simulated genomes by following these steps: 1) For each segment of the genome, reads aligned to the region are randomly and repeatedly sampled from BAM file of the sample HCC1954BL according to the copy number of the segment, 2) nucleotide sequences of the sampled reads are properly modified to match BAF values of the SNPs within each segment, and 3) the processed reads are merged to generate BAM files using SAMtools [30]. We generate mixtures of genomes by sampling reads from the BAM files at predefined proportions, and reads are sampled to 30× coverage at all mixtures. By this way, 20 simulated tumor samples (Table S2, Additional file 3) are generated to evaluate the performance of CLImATHET in detecting clonal and subclonal CNA and LOH segments.
For each simulated sample, the underlying cellularity of clonal clusters are computed based on the relationship between the tumor genomes making up the mixture. For example, the mixture “a_010b_030n_060” is made up by genomes a, b and n with respective proportions of p_{a} = 0.1, p_{b} = 0.3 and p_{n} = 0.6, thus there will be two clonal clusters with the underlying cellularity equal to 0.3 (p_{b}) and 0.4 (p_{a} + p_{b}); similarly, the mixture “a_020b_035c_025n_020” will yield three clonal clusters with cellularity of 0.35 (p_{b}), 0.25 (p_{c}) and 0.8 (p_{a} + p_{b} + p_{c}), while the mixture “a_010b_010d_025n_055” contains three clonal clusters with cellularity of 0.25 (p_{d}), 0.35 (p_{b} + p_{d}) and 0.45 (p_{a} + p_{b} + p_{d}).
Competitive methods
Two advanced methods, OncoSNPSEQ [23] and CLImAT [19], are adopted to make comparison between the performance of CNA and LOH detection algorithms. When running OncoSNPSEQ on the simulated samples, the simulated SNP sites are used to prepare the input file during the preprocessing step, and we run the main program with the option “chr_range [1:22], −normalcontamination, −maxnormalcontamination 0.8, −seqtype illumina” on homogeneous samples, and with an additional option “tumourheterogeniety” on heterogeneous samples. As OncoSNPSEQ generally output multiple solutions per sample, and in this case we select the one associated with the maximum likelihood as the optimal solution. For CLImAT, we use the default configuration on all samples.
Performance evaluation
For the simulated samples, copy number and genotype profiles of all segments predefined in simulation experiment are used as the golden standard for evaluation. Accuracy is calculated for each method by comparing the predictions with the ground truth. We consider a segment to be accurately identified only if any predicted segment covers the 75% size of the segment and have exact matching of total and major copy number with the segment. Accuracy is defined as the proportion of accurately identified segments among all segments.
Results
In this section, we perform a comprehensive assessment of CLImATHET on both simulated and real datasets in terms of inferring cellularity of clonal clusters and identifying CNA/LOH segments.
Results on simulated data
We employ a simulation study to evaluate the accuracy of our estimates of cellularity of clonal clusters, tumor purity and predictions of CNA/LOH segments. Our simulated data is generated based on a real normal sample HCC1954BL downloaded from CGHub as described in the Methods. We run CLImATHET, OncoSNPSEQ and CLImAT to infer CNA/LOH segments and associated cellularity.
Results on real data
We also examine CLImATHET on 5 primary triple negative breast cancer samples, which are sequenced at approximately 30× coverage and also assayed by Affymetrix SNP6.0 array [28]. ASCAT [31] is a widely used software to analyze SNParray data, therefore we use it to infer the tumor purities.
Tumor purity estimated by CLImAT, APOLLOH, CLImATHET and ASCAT for TNBC samples
Sample  CLImAT  APOLLOH^{a}  CLImATHET  ASCAT 

SA220  0.44  0.80  0.62  0.62 
SA223  0.60  0.65  0.66  0.69 
SA224  0.50  0.79  0.52  0.55 
SA225  0.76  0.86  0.75  0.74 
SA227  0.47  0.70  0.50  0.43 
For samples SA220, SA224 and SA225, the measures of goodness of fit returned by ASCAT are 0.95, 0.94 and 0.98 respectively, indicating these samples are probably homogeneous. Accordingly, CLImATHET analysis of these samples shows that there is also no significant evidence for existence of subclonal events, and the inferred cellularity of the single clonal population is 0.62, 0.52 and 0.75 respectively, which are in good concordance with the tumor purities predicted by CLImAT and ASCAT (Table 1). The copy number estimation results of these samples are shown in Figure S24 (Additional file 2).
Discussion
CLImATHET is a novel statistical method for inferring subclonal CNA and LOH segments from WGS data of heterogeneous tumor samples. It is developed based on our previously published algorithm CLImAT, and take into account the issue of intratumor heterogeneity. Compared with CLImAT that only handles homogeneous tumor samples, CLImATHET can efficiently deal with both homogeneous and heterogeneous tumor samples. The read counts and read depths data generated from multiple cell populations is quantitatively represented, and proper emission models are adopted in the HMM. Furthermore, we integrate a BIC module into the CLImATHET framework to effectively determine the underlying number of subclonal populations. These features enable CLImATHET’s advantages in handling complex WGS data. First, the appropriate decomposition of mixed signals improves the CNA/LOH detection performance. Second, CLImATHET is more sensitive to the aberration events represented in minor cell populations when compared to existing methods as shown in simulated study. Third, CLImATHET is able to accurately infer the cellularity each subclonal cluster. Finally, the evaluation results on both simulated and real WGS data demonstrate the advantages of our algorithm.
CLImATHET also has limitations due to its adopted modeling assumptions. The assumption that only one aberrant genotype exists at each genomic locus will not hold if distinct subclonal populations have different aberrant genotypes at the same locus. However, it is difficult to distinguish among multiple subclones that have variable genotypes. The emission models of CLImATHET need to be extended to account for these situations, and the joint analysis of read counts and read depths signals may output multiple solutions. Identifying distinct haplotypes harboring linked mutations might be a good way to identify multiple mutations at the same locus in different populations.
Conclusions
In summary, we demonstrate that CLImATHET represents remarkable advantages in inferring subclonal CNA/LOH segments from tumor WGS data. The high performance of CLImATHET benefits from delicate representation of copy number profiles of distinct cell populations, and efficient statistical methods for inference of the global parameters. We expect that CLImATHET will complement the arsenal of bioinformatics tools developed for investigating the role of tumor tumourigenesis and progression.
Abbreviations
 BAF:

B allele frequency
 BIC:

Bayesian information criterion
 CNA:

Copy number alterations
 EM:

Expectation maximization
 HMM:

Hidden Markov model
 LOH:

Loss of heterozygosity
 NB:

Negative binomial
 SNV:

Singlenucleotide variants
 SV:

Structure variations
 TNBC:

Triple negative breast cancer
 WGS:

Wholegenome sequencing
Declarations
Acknowledgements
This manuscript was prepared using a limited access dataset obtained from British Columbia Cancer Agency Branch (BCCA) and does not necessarily reflect the options or views BCCA.
Funding
This work was supported by grants from National Natural Science Foundation of China (61571414, 61471331 and 31100955).
Availability of data and materials
Our tool can be download from https://github.com/USTCHILAB/CLImATHET.
Authors’ contributions
AL conceived the study. ZY designed and implemented the CLImATHET algorithm, and analyzed the results. ZY wrote the paper, AL and MW improved the manuscript. All authors approved the final manuscript for publication.
Competing interests
The authors declare that they have no competing interests.
Consent for publication
Not applicable.
Ethics approval and consent to participate
This study was approved by the Research Ethics Board of University of Science and Technology of China.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
 Nowell PC. The clonal evolution of tumor cell populations. Science. 1976;194(4260):23–8.View ArticlePubMedGoogle Scholar
 Greaves M, Maley CC. Clonal evolution in cancer. Nature. 2012;481(7381):306–13.View ArticlePubMedPubMed CentralGoogle Scholar
 Yates LR, Campbell PJ. Evolution of the cancer genome. Nat Rev Genet. 2012;13(11):795–806.View ArticlePubMedPubMed CentralGoogle Scholar
 Stratton MR, Campbell PJ, Futreal PA. The cancer genome. Nature. 2009;458(7239):719–24.View ArticlePubMedPubMed CentralGoogle Scholar
 Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell. 2011;144(5):646–74.View ArticlePubMedGoogle Scholar
 BentiresAlj M, Gil SG, Chan R, Wang ZC, Wang Y, Imanaka N, Harris LN, Richardson A, Neel BG, Gu H. A role for the scaffolding adapter GAB2 in breast cancer. Nat Med. 2006;12(1):114–21.View ArticlePubMedGoogle Scholar
 Hicks J, Krasnitz A, Lakshmi B, Navin NE, Riggs M, Leibu E, Esposito D, Alexander J, Troge J, Grubor V. Novel patterns of genome rearrangement and their association with survival in breast cancer. Genome Res. 2006;16(12):1465–79.View ArticlePubMedPubMed CentralGoogle Scholar
 Park PJ. Experimental design and data analysis for array comparative genomic hybridization. Cancer Invest. 2008;26(9):923–8.View ArticlePubMedGoogle Scholar
 Li A, Liu Z, LezonGeyda K, Sarkar S, Lannin D, Schulz V, Krop I, Winer E, Harris L, Tuck D. GPHMM: an integrated hidden Markov model for identification of copy number alteration and loss of heterozygosity in complex tumor samples using whole genome SNP arrays. Nucleic Acids Res. 2011;39(12):4928–41.View ArticlePubMedPubMed CentralGoogle Scholar
 Metzker ML. Sequencing technologiesthe next generation. Nat Rev Genet. 2010;11(1):31–46.View ArticlePubMedGoogle Scholar
 McCarroll SA, Kuruvilla FG, Korn JM, Cawley S, Nemesh J, Wysoker A, Shapero MH, de Bakker PI, Maller JB, Kirby A. Integrated detection and populationgenetic analysis of SNPs and copy number variation. Nat Genet. 2008;40(10):1166–74.View ArticlePubMedGoogle Scholar
 Shendure J, Ji H. Nextgeneration DNA sequencing. Nat Biotechnol. 2008;26(10):1135–45.View ArticlePubMedGoogle Scholar
 Ha G, Roth A, Lai D, Bashashati A, Ding J, Goya R, Giuliany R, Rosner J, Oloumi A, Shumansky K. Integrative analysis of genomewide loss of heterozygosity and monoallelic expression at nucleotide resolution reveals disrupted pathways in triplenegative breast cancer. Genome Res. 2012;22(10):1995–2007.View ArticlePubMedPubMed CentralGoogle Scholar
 Carter SL, Cibulskis K, Helman E, McKenna A, Shen H, Zack T, Laird PW, Onofrio RC, Winckler W, Weir BA. Absolute quantification of somatic DNA alterations in human cancer. Nat Biotechnol. 2012;30(5):413–21.View ArticlePubMedPubMed CentralGoogle Scholar
 Mitelman F. Recurrent chromosome aberrations in cancer. Mutat Res Rev Mutat Res. 2000;462(2):247–53.View ArticleGoogle Scholar
 Storchova Z, Pellman D. From polyploidy to aneuploidy, genome instability and cancer. Nat Rev Mol Cell Biol. 2004;5(1):45–54.View ArticlePubMedGoogle Scholar
 Storchova Z, Kuffer C. The consequences of tetraploidy and aneuploidy. J Cell Sci. 2008;121(23):3859–66.View ArticlePubMedGoogle Scholar
 Navin N, Krasnitz A, Rodgers L, Cook K, Meth J, Kendall J, Riggs M, Eberling Y, Troge J, Grubor V. Inferring tumor progression from genomic heterogeneity. Genome Res. 2010;20(1):68–80.View ArticlePubMedPubMed CentralGoogle Scholar
 Yu Z, Liu Y, Shen Y, Wang M, Li A. CLImAT: accurate detection of copy number alteration and loss of heterozygosity in impure and aneuploid tumor samples using wholegenome sequencing data. Bioinformatics. 2014;30(18):2576–83.View ArticlePubMedPubMed CentralGoogle Scholar
 Larson NB, Fridley BL. PurBayes: estimating tumor cellularity and subclonality in nextgeneration sequencing data. Bioinformatics. 2013;29(15):1888–9.View ArticlePubMedPubMed CentralGoogle Scholar
 Oesper L, Satas G, Raphael BJ. Quantifying tumor heterogeneity in wholegenome and wholeexome sequencing data. Bioinformatics. 2014;30(24):3532–40.View ArticlePubMedPubMed CentralGoogle Scholar
 Ha G, Roth A, Khattra J, Ho J, Yap D, Prentice LM, Melnyk N, McPherson A, Bashashati A, Laks E. TITAN: inference of copy number architectures in clonal cell populations from tumor wholegenome sequence data. Genome Res. 2014;24(11):1881–93.View ArticlePubMedPubMed CentralGoogle Scholar
 Yau C. OncoSNPSEQ: a statistical approach for the identification of somatic copy number alterations from nextgeneration sequencing of cancer genomes. Bioinformatics. 2013;29(19):2482–4.View ArticlePubMedGoogle Scholar
 Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B Methodol. 1977;1977:1–38.Google Scholar
 Rabiner LR. A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE. 1989;77(2):257–86.View ArticleGoogle Scholar
 Ypma TJ. Historical development of the NewtonRaphson method. SIAM Rev. 1995;37(4):531–51.View ArticleGoogle Scholar
 CLImATHET. https://github.com/USTCHILAB/CLImATHET. Accessed 8 Dec 2015.
 Shah SP, Roth A, Goya R, Oloumi A, Ha G, Zhao Y, Turashvili G, Ding J, Tse K, Haffari G. The clonal and mutational evolution spectrum of primary triplenegative breast cancers. Nature. 2012;486(7403):395–9.PubMedGoogle Scholar
 TCGA Mutation/Variation Calling Benchmark 4 at GDC. https://gdc.cancer.gov/resourcestcgausers/tcgamutationcallingbenchmark4files. Accessed 12 Mar 2017.
 Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078–9.View ArticlePubMedPubMed CentralGoogle Scholar
 Van Loo P, Nordgard SH, Lingjærde OC, Russnes HG, Rye IH, Sun W, Weigman VJ, Marynen P, Zetterberg A, Naume B. Allelespecific copy number analysis of tumors. Proc Natl Acad Sci. 2010;107(39):16910–5.View ArticlePubMedPubMed CentralGoogle Scholar