Comprehensive genomic characterization of breast tumors with BRCA1 and BRCA2 mutations

Background Germline mutations in the BRCA1 and BRCA2 genes predispose carriers to breast and ovarian cancer, and there remains a need to identify the specific genomic mechanisms by which cancer evolves in these patients. Here we present a systematic genomic analysis of breast tumors with BRCA1 and BRCA2 mutations. Methods We analyzed genomic data from breast tumors, with a focus on comparing tumors with BRCA1/BRCA2 gene mutations with common classes of sporadic breast tumors. Results We identify differences between BRCA-mutated and sporadic breast tumors in patterns of point mutation, DNA methylation and structural variation. We show that structural variation disproportionately affects tumor suppressor genes and identify specific driver gene candidates that are enriched for structural variation. Conclusions Compared to sporadic tumors, BRCA-mutated breast tumors show signals of reduced DNA methylation, more ancestral cell divisions, and elevated rates of structural variation that tend to disrupt highly expressed protein-coding genes and known tumor suppressors. Our analysis suggests that BRCA-mutated tumors are more aggressive than sporadic breast cancers because loss of the BRCA pathway causes multiple processes of mutagenesis and gene dysregulation. Electronic supplementary material The online version of this article (10.1186/s12920-019-0545-0) contains supplementary material, which is available to authorized users.

In this work we combine newly generated sequencing data with previous datasets, and perform an in-depth integrative analysis of genomic and epigenomic data in order to achieve better insights into the mechanism underlying tumor formation in individuals with BRCA gene mutations. Our aim here is to characterize the genomic variation in BRCA-mutated tumors and understand whether and how they are different from common classes of sporadic breast tumors. We present novel results on the differences in point mutation, DNA methylation, and structural variation in BRCA1/2 mutated tumors, and identify specific genes including known tumor suppressors that are frequently damaged by structural variation in these tumors. Mutational signatures are patterns of point mutations in the genome created by specific mutagenic processes, e.g., a chemical mutagen or a defect in a DNA repair enzyme 17 . If BRCA1/2-mutated tumors evolve via distinct point mutation-causing processes, they may possess unusual mutational signatures. We therefore analyzed whether the BRCA1/2-mutated tumors have a different pattern of mutational signatures from the remaining breast tumors.

Results
A previous study 11 applied a widely used approach 17 for extracting mutational signatures from genomic data to the dataset of 560 breast tumors, resulting in 12 mutational signatures. Notably, the resulting signatures are very dense, and many are also very similar to each other. While some have been linked to known mutational processes in breast cancers, others still have no known etiology 19 . This may be due to the fact that this framework extracts as many signatures as required to improve the fit to the data, without testing whether these signatures perform well at fitting unseen data. This can be expected to result in a high number of signatures that potentially overfit the data. For these reasons, we wished to use a more principled approach that incorporates biological knowledge, as well as statistical methods to prevent overfitting.
We recently developed SparseSignatures 20 , a novel framework to identify mutational signatures.
This method incorporates a background model representing the pattern of mutations caused in the normal course of cell division by DNA replication errors -a signature that we assume is present in all tumors.
The background signature is fixed and additional signatures are discovered while incorporating a LASSO constraint to ensure that the signatures are sparse, producing a more biologically accurate and interpretable solution. SparseSignatures also applies a repeated bi-cross-validation strategy 20 to select the number of signatures. This allows us to avoid overfitting by selecting a number of signatures that not only fit the data used to discover them but are also capable of predicting unseen data points.
We applied this approach to 555 breast tumors (we removed 5 tumors with <1000 mutations as previously described 20 ). We discovered 8 mutational signatures in addition to the background (Figure 2a,   Supplementary Table 1). These signatures are statistically strongly supported and most of them are related to known mutagenic mechanisms. Signatures 1 and 2 are associated with defective DNA mismatch repair 21 . Signature 3 is a pattern of elevated TT>GT point mutations, highest in a CTT context. Signature 4 is similar to the previously described 11 'Signature 18', which has recently been associated with DNA damage caused by reactive oxygen species 18 . Signatures 5 and 7 are associated with deregulation of APOBEC cytidine deaminases 19 . Signature 6 is caused by deamination of methylated cytosines at CpG sites into thymine. Finally, Signature 8 is a relatively dense pattern characterized by an elevated rate of C>A, C>G and T>A mutations.
It is notable that despite finding fewer signatures, our solution still provides a better fit to the data (MSE = 364.345) than the previous solution 11 with 12 signatures (MSE = 1118.703). Along with providing a better fit to the data, our discovered signatures are sparser, more clearly differentiated from each other, and lack background noise (Supplementary Table 2).
We do not find two signatures described in the previous study -the highly dense, flat 'Signature 3' and 'Signature 8'. While our Signature 8 bears some similarities to the previous 'Signature 3', it is considerably sparser and shows stronger nucleotide preferences, which may be due to our explicit separation of the background signature, thus preventing its being confounded with other signatures. We also do not find a signature similar to the previous 'Signature 30 '. Compared to sporadic tumors, a higher fraction of mutations is attributed to Signature 8 in both BRCA1 and BRCA2-mutated tumors. While the etiology of this signature is uncertain, it is not simply indicative of BRCA mutation as many sporadic triple-negative tumors also have a similarly high contribution by signature 8. In general, the mutational signature profiles of sporadic triple-negative tumors are very close to those of BRCA1/2-mutated tumors, indicating similar underlying mutagenic processes.

BRCA tumors have lower levels of CpG methylation.
SparseSignatures also calculates the exposure values for each signature, i.e. the number of mutations originating from each signature in each patient (Supplementary Table 3). On average, the background signature (representing DNA replication errors) contributes more mutations than any other signature. The higher number of point mutations in the BRCA-mutated tumors, compared to sporadic tumors, is reflected in a higher exposure to the background signature, suggesting that these tumors have gone through more cell divisions ( Figure 2b); in addition, the BRCA-mutated tumors also show higher exposure to all the discovered signatures except for signature 6, which is underrepresented in BRCA-mutated tumors (Figure 2c). This signature is caused by DNA CpG methylation and subsequent deamination of methylated cytosine to thymine leading to C>T mutation. The ratio of Signature 6 exposure to background signature exposure is significantly lower in both BRCA1 and BRCA2 mutated tumors compared to sporadic tumors (p = 2 x 10 -21 and 1 x 10 -9 respectively; Supplementary Figure 1). Taking the background signature exposure as an indicator of cell division, this suggests that BRCA1/2-mutated tumors may have lower CpG methylation.
As DNA methylation data is not available for this dataset, we tested whether DNA methylation is lower in BRCA1/2-mutated tumors in a cohort of 682 breast cancers and 82 normal breast tissue samples from The Cancer Genome Atlas 22 . This dataset included 20 tumors with inactivating germline or somatic mutations in BRCA1 and 13 with inactivating germline or somatic mutations in BRCA2. We found that global CpG methylation levels are indeed significantly reduced in BRCA1-mutated tumors compared to all classes of sporadic tumors as well as normal tissue samples in the same dataset ( Figure 2d; p(BRCA1-mutated vs. sporadic) = 3 x 10 -4 ; p(BRCA1-mutated vs. normal tissue) = 3 x 10 -5 ). On the other hand, there was no significant difference between BRCA1-mutated and sporadic tumors in the methylation level of the 3081 CpA sites measured on the same platform (Supplementary Figure 2). We did not observe a significant difference in methylation levels between BRCA2-mutated and sporadic tumors. However, we note the low number of samples in this analysis.

BRCA1-mutated tumors have elevated tandem duplications and interchromosomal translocations.
We obtained whole-genome sequencing data for 67 of the 560 tumor samples 11  We used SvABA 24 to identify somatic indels and structural variants in these tumor genomes.
SvABA is a newly developed indel and structural variant caller that uses genome-wide local assembly to obtain superior sensitivity and specificity to previous methods. After filtering the variant calls (see Methods), we identified a total of 7,234 high-confidence somatic indels and 19,684 high-confidence somatic structural variants in the 81 tumor genomes. We then compared BRCA1/2-mutated tumors against sporadic tumors. We included the 2 tumors with somatic BRCA2 inactivation along with those showing germline BRCA2 inactivation. We found that both BRCA1 and BRCA2-mutated tumors had significantly more indels (p = 1.63 x 10 -5 for BRCA1 and p = 1.37 x 10 -3 for BRCA2) and structural variants (p = 5.12 x 10 -7 for BRCA1 and p = 0.029 for BRCA2) per tumor than the sporadic tumors.  We next searched for specific genes enriched for indels or structural variant breakpoints in the BRCA1/2-mutated tumors, using a poisson test. The null model here is that breakpoints are randomly distributed throughout the genome, and we identify protein-coding genes that have significantly more breakpoints than expected from their size. We identified 11 genes enriched for indels/structural variant breakpoints: NME7, KLHL8, EFNA5, PTEN, DHX32, ETV6, RB1, ARGLU1, TP53, P4HB, and RUNX1 (Table 1). After correcting the length of each gene to take into account its copy number in each Structural variant breakpoints are distributed non-uniformly across the genome. In our set of 46 BRCA1/2-mutated tumor samples, only about 39% of the breakpoints disrupt known genes. While this fraction is significantly higher than expected by chance, we also wanted to test whether there are larger regions of the genome, including non-coding regions, that are enriched for breakpoints. These would include breakpoints for variants that span across whole genes, as well as those that affect gene expression by disrupting regulatory regions of the genome.
We divided the genome into 10-Mb long bins, overlapping by 5 Mb. We then combined all the high-confidence indels and structural variants collected from all the BRCA1/2-mutated tumors. We tested whether these tumors are enriched for indel/structural variant breakpoints in each bin using a poisson test, with the null model being that breakpoints are distributed uniformly across the genome. We found 48 bins that had a Bonferroni-corrected p-value of less than 0.05 (Figure 3d). All of these regions were disrupted by at least one indel or structural variant in at least 50% of BRCA1/2-mutated tumors. After correcting the number of bases in each bin to account for copy number changes, 28 bins remained significantly enriched (Bonferroni-corrected p<0.05). These bins are located on chromosomes 3, 5, 6, 8, 10, 11, 12, and 18, and several of them overlap with each other. Their coordinates are listed in Supplementary Table 5.
Validation of interchromosomal translocations using 10X genomics. Our analysis above, as well as previous studies 11.24 , highlight the importance of structural variation in the evolution of BRCA-mutated cancers. However, short-read sequencing is not ideal for accurate detection of large structural variants due to the limited read length. 10X Genomics is a linked-read technology, which uses barcodes to identify short fragments that originate from the same large molecules. Thus it provides long-range information based on short-read sequencing offering improved resolution and detection of structural variants 30 . To validate our findings on structural variants, we sequenced additional DNA from 3 tumors with BRCA1 germline mutations using 10X Genomics sequencing. In addition, we sequenced genomic DNA from 1 BRCA2-mutated tumor and 12 sporadic triple-negative tumors from the same study 22 . We used GROC-SVs 30 to identify structural variants in these genomes.
We reported above a novel finding that BRCA1 mutated tumors have unusually high numbers of interchromosomal translocations. We were able to confirm several of these translocations using 10X sequencing in the 3 BRCA1-mutated samples, providing independent validation of our findings. Further, although the sample size is too small for a statistical test, we observed that these BRCA1-mutated samples had more translocations on average than the sporadic tumors (Supplementary Table 6).
Although structural variants are normally classified into simple categories (such as duplications, deletions, and translocations), recent studies have revealed that some tumor genomes also contain a large number of complex structural variants (CSVs) that cannot be explained by a simple end-joining or recombination event 31 . In our short-read data, we observe that 16% of structural variants are accompanied by a short insertion at the breakpoint; the occurrence of such insertions is not significantly different in BRCA1/2-mutated tumors. However, larger CSVs composed of multiple rearrangements cannot be detected by short reads. The use of 10X read clouds and GROC-SVs allows us to resolve larger complex events, since the read clouds span multiple breakpoints.
Using GROC-SVs, we detected two complex structural variants in the sample T65 which has a germline BRCA1 mutation: a complex rearrangement on chromosome 11 (Supplementary Figure 5a) and a rearrangement involving a translocation between chromosomes 1 and 2 (Supplementary Figure 5b). The mechanisms that give rise to such complex variants are still uncertain, but our observations suggest that these may play a role in the evolution of BRCA-mutated tumors. Further studies are required to ascertain whether BRCA-mutated tumors differ from sporadic breast tumors in the number and type of complex structural variants, as has been characterized for simple structural variants.

Discussion
Tumors carrying mutations in the BRCA1 and BRCA2 genes, particularly in BRCA1, have more point mutations than sporadic breast tumors, which is not explained by their larger genome size owing to copy number alterations. If the increased number of mutations in BRCA samples was a function of more cell divisions, we would expect this to be explained by higher exposure to the background signature. We do see higher exposure to the background signature in these tumors, indicating that they have passed through more cell divisions. However, we also see more mutations attributed to other mutagenic processes, particularly Signature 5 (APOBEC dysregulation leading to C>G mutations) and Signature 8, whose etiology is unknown. This indicates that more cell division may not be the only factor contributing to the higher mutational burden of BRCA1/2-mutated tumors, and that other mutagenic processes are also elevated.
Although BRCA1/2-mutated tumors have have a higher exposure to the background signature, they do not have a higher exposure to Signature 6, which represents deamination of methylated cytosines at CpG sites. Under conditions of constant DNA methylation, we would expect the exposure values for these two signatures to be proportional to each other. The disproportionately low contribution of Signature 6 to BRCA1/2-mutated tumors suggests a global reduction in methylation levels, which is confirmed by an analysis of TCGA data for BRCA1 tumors. If true, the reduced methylation could cause dysregulation of gene expression and altered binding of gene regulatory proteins. An altered methylation state is also indicative of dedifferentiation of a tumor, and may be linked to the fact that these tumors have undergone more cell divisions.
It is notable that BRCA1/2-mutated mutated tumors do not appear to possess any unique mutational signatures, suggesting an absence of unique point mutational processes that arise from the BRCA gene mutations. (Even Signature 8, which is elevated in BRCA1/2-mutated tumors, has a high contribution to triple-negative tumors in general.) Instead, BRCA1 and BRCA2 mutated tumors display a clearly distinct profile of structural variants. We confirm previous findings 11,25 related to tandem duplications and deletions, and also find that BRCA1 mutations are associated with an increased number of interchromosomal translocations, which to our knowledge has not been shown before.
The functional relevance of structural variants in BRCA1/2 mutated tumors is shown by their enrichment in protein-coding genes, particularly genes with high expression in breast tissue. We identified 11 genes that are enriched for indels and structural variant breakpoints in BRCA1/2-mutated tumors; these include well-known tumor suppressors such as TP53 and RB1, showing that in BRCA1/2-mutated tumors, structural variants may carry out the same roles that are more likely to be fulfilled by point mutations in sporadic tumors. We also find additional genes which are candidates for indel/SV-specific driver genes in BRCA1/2 mutated tumors; these frequently damaged genes may have links to the specific biology of tumors with BRCA mutations.

Conclusions
Overall, our study suggests that BRCA1/2-mutated tumors are comparatively more aggressive than sporadic breast cancers because loss of the BRCA pathway(s) causes a perfect storm of mutagenic processes and gene dysregulation: Less DNA methylation is consistent with the propensity to deregulate and dedifferentiate, and the resulting larger numbers of cell divisions cause a greater point mutational burden; other point-mutagenic processes that may be linked to the tissue of origin and occur in sporadic breast tumors are also active (e.g., APOBEC dysregulation); and crucially, loss of double-strand break repair elevates structural variation rates such that there is a greater chance that driver genes that are hard to functionally affect with point mutations are disrupted at a higher rate than in sporadic tumors.

Methods
Preprocessing data for mutational signature extraction. Point mutations occurring in a genome can be divided into 96 categories based on the base being mutated, the base it is mutated into and its two flanking bases. We therefore represent the dataset of 560 patients from Nik-Zainal et al. 11

Implementation of SparseSignatures.
In our analysis, we repeated the bi-cross-validation procedure 300 times and we considered values of ranging from 3 to 10 and λ ranging from 0.05 to 0.15. In Each library was then sequenced on one lane with a paired-end 150bp run using the Illumina HiSeqX platform to obtain 30x genomic coverage.
Sequencing data analysis. BWA 32 v0.7.12 was used to align short-read sequencing data to the human genome. Longranger 33 v1.3 was used to align 10X genomics data.
Structural variant calling. BAM files were generated as described above, and for the publicly available data, we downloaded BAM files from the ICGC data portal (https://dcc.icgc.org). We ran SvABA on all BAM files using the default parameters 17 .
Variants with length >=50 bp, as well as interchromosomal translocations, were defined as Structural Variants (SVs) while smaller variants were defined as indels. High-confidence SVs and indels were obtained by selecting variants with (1)  Data on replication timing was obtained from Carithers et al. 36 . Genomic regions were divided into early-replicating, mid-replicating and late-replicating categories such that a third of the genome for which data was provided was included in each category.
Expression levels in normal human breast tissue was obtained from GTex 28 .

Declarations Ethics approval and consent to participate
Genomic DNA from 14 BRCA1/2-mutated tumors and their matched normal samples was sequenced in this study, as well as genomic DNA from 12 sporadic tumor samples. These samples were obtained as detailed in Telli et al. 23 This protocol was approved by the institutional review board at Stanford University. Informed consent was obtained from all patients.

Consent for publication
Not Applicable.