Loss of heterozygosity: what is it good for?

Background Loss of heterozygosity (LOH) is a common genetic event in cancer development, and is known to be involved in the somatic loss of wild-type alleles in many inherited cancer syndromes. The wider involvement of LOH in cancer is assumed to relate to unmasking a somatically mutated tumour suppressor gene through loss of the wild type allele. Methods We analysed 86 ovarian carcinomas for mutations in 980 genes selected on the basis of their location in common regions of LOH. Results We identified 36 significantly mutated genes, but these could only partly account for the quanta of LOH in the samples. Using our own and TCGA data we then evaluated five possible models to explain the selection for non-random accumulation of LOH in ovarian cancer genomes: 1. Classic two-hit hypothesis: high frequency biallelic genetic inactivation of tumour suppressor genes. 2. Epigenetic two-hit hypothesis: biallelic inactivation through methylation and LOH. 3. Multiple alternate-gene biallelic inactivation: low frequency gene disruption. 4. Haplo-insufficiency: Single copy gene disruption. 5. Modified two-hit hypothesis: reduction to homozygosity of low penetrance germline predisposition alleles. We determined that while high-frequency biallelic gene inactivation under model 1 is rare, regions of LOH (particularly copy-number neutral LOH) are enriched for deleterious mutations and increased promoter methylation, while copy-number loss LOH regions are likely to contain under-expressed genes suggestive of haploinsufficiency. Reduction to homozygosity of cancer predisposition SNPs may also play a minor role. Conclusion It is likely that selection for regions of LOH depends on its effect on multiple genes. Selection for copy number neutral LOH may better fit the classic two-hit model whereas selection for copy number loss may be attributed to its effect on multi-gene haploinsufficiency. LOH mapping alone is unlikely to be successful in identifying novel tumour suppressor genes; a combined approach may be more effective. Electronic supplementary material The online version of this article (doi:10.1186/s12920-015-0123-z) contains supplementary material, which is available to authorized users.


Background
Cancer cells undergo multiple genetic and epigenetic hits in the development of tumorigenic phenotypes, including somatic point mutations, increases in copy number, gene deletions, gene rearrangements, translocations and promoter hypermethylation [1]. These random events are selected for due to their effect on oncogenes, where the aberration activates the gene to promote tumorigenesis (e.g. KRAS, MYC), and on tumour suppressor genes (TSG), where the genetic or epigenetic aberrations is inactivating (e.g. TP53, PTEN), since the normal function of these genes is to restrict tumorigenic potential.
Loss of heterozygosity (LOH) is a common genetic event in many cancer types, so-called because of the early observations of a change in polymorphic markers from a heterozygous state in the germline to an apparently homozygous state in the tumour DNA [2]. LOH is a general term that encompasses both LOH with copy number losses (CNL-LOH) and copy number neutral LOH (CNN-LOH). In CNL-LOH all or part of a chromosome is deleted. CNN-LOH originates either through a homologous recombination event ("gene conversion"), or because the retained chromosome was duplicated either before or after the LOH event. LOH is strongly associated with loss of the wild-type allele in individuals with an inherited cancer predisposition syndrome and carry a germline mutation in genes such as RB1 in retinoblastoma or BRCA1 in breast and ovarian cancer [2,3]. This "second hit" hypothesis was initially proposed by Knudson based on his observations of the incidence of familial retinoblastoma [4] and has been widely accepted as a mechanism for the complete inactivation of tumour suppressor genes, both in a germline context and the sporadic cancer context where the first hit is a somatic event, such as mutation of TP53. As a consequence, mapping of common regions of minimal LOH has historically been a popular strategy to pursue the identification of novel TSGs without the need for segregation data from large cancer families. However, such analyses have been generally been unsuccessful leading to speculation that the approach is technically and conceptually flawed [5], and even to whether there is any selective advantage to LOH events. Nonetheless, we previously used SNP mapping arrays to analyse LOH in ovarian carcinomas of diverse histological subtypes, with the rationale that the newer methodology would at least overcome some of the previous technical issues with LOH analyses [6]. We mapped a number of minimal regions of LOH containing tumour suppressor gene candidates, including regions of homozygous deletion encompassing genes such as MAP2K4 [7]. Advances in massively parallel sequencing has enabled the current study where we report targeted sequencing of 980 candidate tumour suppressor genes in 86 ovarian carcinomas, most of which have matched SNP array data enabling the assessment of the importance of LOH in the selection for somatic mutations in ovarian cancer. We evaluated a number of different histological subtypes, since these have different etiologies and causative genes.

Ethics statement
Accrual and use of patient material for this study was approved by the following Human Research

Ovarian tumour cohort
A tumour cohort (n = 86) comprising a variety of histological subtypes including serous (n = 45), endometrioid (n = 28), mucinous (n = 7) and clear cell (n = 6) were obtained through the Australia Ovarian Cancer Study, the Peter MacCallum Cancer Centre Tissue Bank, or from patients presenting to hospitals in the south of England [8]. The majority of tumour DNA samples were needle microdissected to ensure greater than 70 % cancer epithelial cell component; other samples were processed from tissue where the reference haematoxylin and eosin stained section showed >70 % tumour epithelial cells.
Matching peripheral blood samples were also collected from patients at time of tumour collection and used as a source of germline DNA for somatic mutation detection. Details of the cohort are listed in Additional file 1: Table S1.

Library preparation, target enrichment and sequencing
Library preparation was performed as previously described [9] following the Illumina genomic DNA library preparation protocol (Illumina, San Diego, CA) using an input of 200 ng of tumour or matched normal lymphocyte DNA. Seven custom multiplexing adapters compatible with Illumina single-end sequencing were used and indexed DNA samples were pooled equally prior to PCR enrichment. A boutique exon capture (SureSelect, Agilent Technologies, Santa Clara, CA) was used to enrich for coding exons of candidate tumour suppressor genes (n = 980, Additional file 1: Tables S2 and S3) and known cancer genes (TP53, BRCA1, BRCA2) according to the recommended protocol. Capture probes were designed using default parameters in eArray (Agilent Technologies).
Sequencing of target-enriched DNA libraries were performed using an Illumina GAIIx, generating 75 bp single-end sequence reads. Image analysis and base calling was performed using the Genome Analyser Pipeline v1.5-1.7. Sequence reads were aligned to the human reference genome (GRCh37/hg19 assembly) using BWA [10] and any remaining unmapped reads aligned with Novoalign [11]. The mean coverage for bases within target regions was 70-fold and 92 % had at least 10-fold coverage. This was followed by local realignment with GATK [12]. Point mutations and insertions/deletions (indels) were identified using GATK and Dindel [13] respectively, and annotated according to Ensembl release 56. Sequence variants were called as somatic alterations only when (i) the variant was not called in the matched normal sample or identified as a germline alteration in another tumour/normal pair (ii) the variant was not seen in > =2 independent reads in the matched normal sample following manual inspection of sequence reads using the Integrated Genomics Viewer [14] (iii) the variant was identified in bi-directional sequence reads.
A selection of variants that met the above criteria for a somatic mutation (n = 202) were subjected to validation by conventional PCR amplification and bidirectional capillary electrophoresis on the ABI3130 Genetic Analyser using BigDye Terminator v3.1 sequencing chemistry (Applied Biosystems, Foster City, CA).

SNP arrays and loss of heterozygosity
Affymetrix SNP Mapping array data was obtained for the 86 sequenced cases, 54 by SNP6 arrays (GSE19539, [15]), 26 by 500 K arrays (previously published in [6]), and six previously unreported low-grade endometrioid cases. Affymetrix SNP6 CEL files, HM27 methylation array data (level 3), Agilent expression array data (level 3) and somatic mutation data from 266 tumors generated by The Cancer Genome Atlas (TCGA) were downloaded from the TCGA Data Portal. LOH was detected as described previously in Partek Genomics Suite (Partek, St Louis, MO), using allele-specific copy number that compared the tumour genotype to the matching normal genotype, and evaluated the copy number at heterozygous alleles [6]. The "min" allele had to have a value of <0.5 copies to be called LOH, thus excluding regions of allelic imbalance where at least one copy of both alleles was present. The results published here are in whole or part based upon data generated by The Cancer Genome Atlas pilot project established by the NCI and NHGRI. Information about TCGA and the investigators and institutions who constitute the TCGA research network can be found at http://cancergenome.nih.gov.

Results and discussion
A candidate TSG screen in ovarian cancerselection of genes from LOH regions Candidate ovarian tumour suppressor genes (n = 980) were selected for analysis on the basis of their location in frequent regions of LOH or deletion (Additional file 1: Tables S2 and S3) from our previously published SNP array analysis of 122 primary ovarian carcinomas of various histologies [6]. The regions, located on 20 different chromosome arms, met the following three criteria. Firstly, minimal overlapping regions of LOH were included if they were detected in greater than 35 % of all ovarian carcinomas analysed, or secondly, in >35 % of subtype specific minimal overlapping regions of LOH (4 of 9 clear cell carcinomas, 5 of 12 low-grade endometrioid carcinomas (grades 1 and 2), 6 of 16 mucinous carcinomas and 23 of 64 high-grade serous/endometrioid carcinomas (grades 2 and 3 for the serous subtype, grade 3 for endometrioid carcinomas)). Finally, all homozygous deletions within frequent regions of LOH along with the overlapping portion of all recurrent homozygous deletions were included. This gene list included genes with well established roles in cancer such as CDKN2A and PTEN, but for the purposes of this analysis they were included in the "candidate" LOH genes. In addition, the known ovarian cancer genes TP53, BRCA1 and BRCA2 were included despite lying outside the minimal regions of LOH.
A candidate TSG screen in ovarian cancercorrelation of mutations with LOH A targeted mutation screen was conducted on the 86 ovarian cancer cases including high-grade serous and endometrioid, low-grade endometrioid, clear cell and mucinous subtypes. Somatic coding mutations were detected in both candidate (561 variants in 366 genes, Additional file 2: Table S4) and known cancer genes (58 TP53 mutations in 56 cases and two mutations in BRCA1). Eighty-nine genes had two or more nonsynonymous mutations. The classic two-hit hypothesis would predict that driver genes have homozygous, deleterious mutations in samples with LOH. With respect to deleterious mutation status this was certainly true for TP53 and BRCA1 where a high proportion of somatic mutations were truncating (25/58 and 2/2, respectively) compared to an overall truncating mutation frequency of 13 % (72/561). In addition, among the 53 cases with TP53 or BRCA1 somatic mutations where SNP data was available, 50 (94 %) showed LOH of the wild-type allele. This was is sharp contrast with the other candidate genes, where only 181/ 520 showed LOH of the wild-type allele (35 %); in particular, there was no significant difference in LOH of the wild-type allele between non-synonymous mutations (134/381 with LOH, 35.2 %) and synonymous mutations (47/139 with LOH, 33.8 %). The overall frequency of non-synonymous compared to synonymous mutations was 73 % (411/561) for the candidate TSGs, but 100 % of mutations in known cancer genes were non-synonymous (60/60). This difference in ratio suggests that the majority of mutations in candidate TSGs from LOH regions are likely to be passenger events, since this rate might be expected without any strong positive selection [16]. The lack of difference in LOH between synonymous and non-synonymous also implies that there is limited selection for homozygosity for the majority of gene mutations.

Significance analysis of recurrently mutated gene candidates
Within the list of mutated genes, we applied a number of filters to assess whether any genes could function as tumor suppressors under either a one-hit or two-hit mechanism. Firstly, significantly mutated genes were identified using the MuSiC algorithm [17], which determines the significance of the observed mutation rate of each gene based on the background mutation rate in the sample cohort. Three known ovarian cancer genes (TP53, PTEN and CDKN2A) were identified by all three tests (convolution, likelihood ratio and Fisher's combined pvalue tests) with a false discovery rate (FDR) of less than 0.10. At this FDR the genes DNAH9, LINGO1, MEF2C, SAMD11, STARD5, ZNRF4 and ZNF287 were also identified, although each was supported only by the likelihood ratio test.
Finally, genes recurrently targeted by inactivating mutations were identified. Mutations with overtly deleterious consequences were considered for this analysis, including nonsense and essential splice site mutations, frameshift indels and gene deletions. Although missense amino acid changes and in-frame indels can also negatively impact gene function, interpreting these mutations in the absence of functional validation is challenging. Sixteen genes were identified where more than half of their mutations would be considered clearly deleterious, including seven known ovarian cancer genes (PTEN, CDKN2A, MAP2K4, PIK3R1, RB1, FANCA and BRCA1). These three analyses identified 36 genes as possible tumour suppressors (Table 1), and it was notable that seven well characterised tumour suppressors were identified by at least two of the three methods (BRCA1, TP53, RB1, PTEN, CDKN2A, FANCA, and MAP2K4), although others were only identified by one method (NF1, PIK3R1). In contrast, 22 of the 27 (81 %) novel/less well characterized genes were identified by only 1 method, indicating that regions of LOH are not strongly enriching for novel genes with classic tumor suppressor gene characteristics.

Loss of heterozygositywhat is it good for?
From the data above it appears that we did not identify dominant, very frequently mutated novel genes where selection for a classic two-hit tumour suppressor gene was apparent. So what, if anything, is the LOH for? We considered five possibilities (Fig. 1) and assessed each in turn.
1. Classic two-hit hypothesis: high frequency biallelic genetic inactivation of TSG 2. Epigenetic two-hit hypothesis: biallelic inactivation through methylation and LOH 3. Multiple alternate gene biallelic inactivation: low frequency gene disruption 4. Haplo-insufficiency: Single copy gene disruption 5. Modified two-hit hypothesis: reduction to homozygosity of predisposition alleles 1. Classic two-hit hypothesis: high frequency biallelic genetic inactivation of TSG This mechanism is demonstrably true for many known tumour suppressor genes, with TP53 being a clear example of a gene functioning as a classical TSG in ovarian cancer [18,19]. However, from our data and large published studies such as TCGA, it is clear that novel genes with a high frequency of biallelic mutations are exceedingly rare and can not explain the bulk of the observed LOH. For example, 8p undergoes LOH in >40 % of ovarian carcinomas, but no gene in this region is mutated at frequency higher than 3 % in our or any other study, although homozygous deletion can target, for example, CSMD1 in 11 % of cases [20]. It remains a possibility, however, that genes not represented on our targeted or exome sequencing platforms could still be the target of such LOH, for example long non-coding RNAs.

Epigenetic two-hit hypothesis: high frequency biallelic inactivation through methylation and LOH
Somatic gene mutation is not the only mechanism of biallelic inactivation. Some TSGs can be inactivated through a combination of LOH and promoter hypermethylation, for example MLH1. This methylation can be acquired somatically or may be a consequence of imprinting. We assessed this possibility using TCGA ovarian cancer methylation data. Globally, we observed that there was no enrichment for methylation in regions of LOHin samples with LOH at a locus, on average 12.7 % of genes were strongly methylated (probe value of >0.75), whereas 13.65 % of genes were strongly methylated when there was no LOH, (Fig. 2a). CNL-LOH was less likely to have strongly methylated genes than CNN-LOH (12.3 % vs 12.9 %, p < 0.0001, Chisquared test). However, when we analysed the X chromosome separately, we found that samples with any LOH were more likely to have low methylation levels (45.3 % of genes had a probe value of <0.25, compared to 35.5 % in samples without LOH, p < 0.0001 Chi-squared test).
Detection of methylation is challenging from both technical and biological perspectives. Tumour and cell type heterogeneity may influence the degree of methylation detected, so we also took an alternative approach where we used the methylation array data to test whether there were genes that were more strongly methylated in samples with LOH compared to samples without LOH. Using a multiple testing correction p-value threshold of 2.2x10 −6 , there were 1584/22374 (7 %) methylation probes that were significantly differentially methylated. Interestingly, 28 % of these significant probes were located on the X chromosome and indeed 51 % of all probes on the X chromosome were significantly differentially methylated, with lower average levels of methylation in samples with LOH compared to samples without LOH. On the autosomes, the outcome was reversed: 50.3 % of the statistically significant probes had a fold-change difference in mean methylation of >1.5, while only 1.1 % had a fold-change difference of <0.75 (Fig. 2b). Thus, for the X chromosome it appears there is selection for retaining the active copy, perhaps because loss of this copy would be cell lethal as an effective homozygous inactivation of the chromosome. In contrast, for the autosomes there appears to be selection for increased methylation by LOH. We also evaluated whether there was any difference by the type of LOH. For those genes occurring in a region of LOH with at least 20 % frequency, we determined whether a probe was in a CNL-LOH enriched locus (>66 % of samples with LOH also had CN loss) or a CNN-LOH enriched locus (>66 % of samples with LOH were CNN). Of the CNL-enriched probes, 11.3 % were significantly differentially methylated, compared to 21.7 % of CNN-enriched probes (p < 0.0001, Chi-squared test, Fig. 2c). This data would support a model whereby differential methylation is more commonly selected for in regions of CNN-LOH than CNL-LOH.

Multiple alternate gene biallelic inactivation: low frequency gene disruption
Another possibility is that particular loci harbour multiple TSGs but individual tumours only require one to be inactivated and the gene targeted can differ from tumour to tumour. If this is the case then locating the TSGs by mapping overlapping regions of LOH would incorrectly flag the interval between two TSG as the likely location of the TSGin effect then the peak LOH regions may not be the most likely places to find the targeted gene(s). To evaluate this possibility, we used TCGA data to see whether regions of LOH were enriched for somatic mutations on a sample-by-sample basis. Cases with both somatic exome and SNP array data were used (n = 266). There were 13,148 coding somatic mutations, of which 29.7 % were located within a region of LOH in the sample where it was observed. The average overlap of all the genes assayed with regions of LOH per sample was 35.5 %. Thus, somatic mutations are if anything under-represented in regions of LOH (Binomial test p < 0.0001). Given that most of these mutations are likely to be passengers, we evaluated whether this was true for non-synonymous or overtly deleterious mutations (nonsense, frameshift, essential splice site). For deleterious mutations, 38.2 % were in regions of LOH (p = 0.035, Binomial test), whereas only 25.2 % of the non-synonymous mutations were in regions of LOH, similar to the 22.2 % observed for synonymous mutations. The signal for deleterious mutations was substantially reduced if TP53 was excluded (34.9 % of deleterious and 28.5 % of other non-synonymous mutations had LOH). Thus, regions of LOH are slightly enriched for deleterious mutations, but not for other non-synonymous or silent mutations.
We then evaluated whether there was a difference in mutation frequency in CNN versus CNL regions of LOH on a case by case basis as above (Fig. 3). Excluding TP53, there were fewer mutations in regions of CNL-LOH than would have been expected based on the overall percentage of the exome affected (19.8 % of mutations were in CNL-LOH regions, whereas 26.8 % of the exome was affected by CNL-LOH, p < 0.0001, Binomial test). The difference was less striking when considering overtly deleterious mutations only (24.6 % vs 26.8 %, p = 0.09, Binomial test). For CNN-LOH, the overall difference was small (8.8 % of mutations vs 8.7 % of the exome affected by CNN-LOH, p = 0.7, Binomial test), however there were more deleterious mutations than It is possible, therefore, that mutations are seen less often in CNL-LOH regions simply as a consequence of decreased DNA dosage. The enrichment of deleterious mutations in CNN-LOH regions, however, suggests the presence of positive selection for mutations in TSGs. Fig. 1 Models of LOH. Boxes = genes; "X" = inactivating mutation; A, B = alternative alleles of a single nucleotide polymorphism. In the top panels, the black line on the graph represents the overall frequency of LOH observed in tumour samples across the chromosome, while the red bars are the frequency of mutation in a particular gene. Thus, for the classic two-hit model, the frequency of mutation is similar to the frequency of LOH, while in the low frequency model, the frequency of LOH is higher than the mutation rate, because each sample is mutated in a different gene. In the bar graphs below, at left, the red bars represent the frequency of the A allele that is retained in samples with LOH at the locus; thus, the risk locus (*) has a higher proportion of the risk allele (A) retained after LOH compared to a non-risk locus, where the A and B alleles are equally retained. At right, the graphs represents the average reduction in expression of a gene in samples with LOH, compared to samples without LOH; genes in LOH regions show a reduction in expression 4. Haplo-insufficiency: Single copy gene disruption We and others have shown that loss of a single gene copy can reduce gene expression [6,21]. A recent study showed that regions of copy number loss are enriched for tumour-suppressor genes [22], but that each gene might have a limited effect on its own. Chromosome complementation studies, where all or part of a chromosome is introduced into cell lines with LOH of that chromosome via microcell-mediated monochromosome transfer, have frequently been able to show reduction in tumorigenicity of the cell line thus complemented [23,24], but have only rarely been able to implicate a single gene responsible [25,26]. Thus, haplo-insufficiency of multiple genes, each with a small effect, could contribute to the non-random pattern of LOH observed in ovarian cancer, especially for chromosomal regions that are weighted towards CNL-LOH such as 8p and X, rather than CNN-LOH, such as 17.
We previously observed a correlation between the percentage of genes under-expressed and the percentage of cases with CNL-LOH, as opposed to CNN-LOH, in a region-wise comparison of LOH vs. no LOH [6]. In an analysis of TCGA data, we compared the expression of   Fig. 4). This result supports the idea that chromosomal regions with CNL-LOH may contain genes where loss of a single copy results in reduced gene expression and a selective advantage to the cell. In contrast, chromosomal regions with little copy number loss may contain essential genes for which haplo-insufficiency is cell lethal.

Modified two-hit hypothesis: reduction to homozygosity of predisposition alleles
In familial cancer predisposition syndromes, it is common for the remaining wild-type allele to be lost by LOH, for example BRCA1 pathogenic variants are usually reduced to homozygosity in breast and ovarian carcinomas [3,27]. However, common low-penetrance risk alleles could also be targeted by LOH leading to an enhancement of their cancer-promoting role. We assessed this using nine SNP loci identified in the iCOGs study [28,29] as predisposing to all ovarian cancer types or high-grade serous ovarian cancer. Two of these SNPs were present on the Affymetrix SNP 6.0 array, the remainder were represented by SNPs in linkage disequilibrium (r 2 > 0.7, from HapMap [30]). Where possible, up to four linked SNPs were evaluated.
For each SNP, we assessed whether cases were heterozygous in their normal DNA, and what proportion of these with LOH of the region were homozygous for the risk allele in the tumour DNA using TCGA and our own data (n = 364). Interestingly, two SNPs at 10p12 linked to the risk allele rs1232180 were found to be significantly more likely to have lost the non-risk allele than the risk allele (Table 2). Some other SNPs linked to a risk allele also showed significantly non-random loss of the non-risk allele, but the data were not consistent across all SNPs examined at the locus (e.g. 17q21, 3q25 and 9p22). It is not clear whether these discrepancies could be due to technical variation in the SNP calling; alternative methods may be required to assess this possibility. The remainder of SNPs were not significant, however several are uncommon, limiting the power of the analysis. Thus, it is possible that some LOH may be selected for through the phenotypic effect of reduction to homozygosity of predisposition alleles.

Conclusion
The broader relevance of LOH in cancer has been debated for some time [5,31] although many of the criticisms stemmed from technical issues that are being overcome by newer methodologies. Our initial assumption for this study was that we would detect high-frequency mutated genes in the minimal peak regions of LOH we had defined by LOH mapping using these newer methodologies; i.e. a classic two-hit model. However, the biology of LOH does not support this assumption and with large-scale tumour studies it is now possible to explore the many possibilities for the functional significance of this genetic event as summarised in Table 3. We suggest that the non-random patterns of LOH detected in cancer are a result of multiple different mechanisms operating to affect multiple genes, which may differ from tumour to tumour yet collectively play a role in the development of the tumorigenic phenotype. It is worth noting the differences in CNL-LOH versus CNN-LOH, with the latter appearing more relevant for selection of deleterious mutations and methylation, in contrast to global changes in gene expression. Identifying the specific driver genes targeted in a particular cancer remains a challenge given the multiple possible reasons for selection of an LOH event.   a SNPs in bold are those named in the GWAS iCOG publication [29] All others are linked as indicated by the R-squared value of >0.7. If the minor allele is the risk allele named, it is assumed that this will also be the case for the linked SNP b Minor = risk allele is the less frequent allele in the population. A, B = risk allele corresponds to the "A" or "B" allele respectively in the Affymetrix array nomenclature. NA = not on Affymetrix SNP6 array c Het = Number of cases where germline and tumour are heterozygous, hom = cases where germline is heterozygous, AA, BB = germline is heterozygous, tumour is homozygous for A or B respectively, NC = no call in either tumour or germline. N = total number d % LOH is the number of individuals with loss of one allele divided by the total number of heterozygous individuals as measured at that SNP, i.e. not the overall % of LOH that could be determined from all cases using a wider genetic window. This may therefore include regions of extreme allelic imbalance (e.g. likely for 8q24)