We have leveraged the power of massively parallel RNA sequencing to interrogate the transcriptomes of BRCA1-mutated breast cancer cell lines and tumors for putative gene fusions. In addition to identifying previously described gene fusions, we identified three novel in-frame fusions, MTAP-PCDH7 in SUM149PT, WWC1-ADRBK2 in HCC3153 and ADNP-C20orf132 in one primary tumor. Only the latter two were confirmed by RT-PCR and Sanger sequencing.
Gene fusions can adversely affect an organism by deregulating the normal expression and disrupting the function of genes. There are two main ways in which this occurs . First, the active domain of one gene is joined with a regulatory enhancer or promoter of another gene, causing an upregulation of the active domain and leading to oncogenesis. Second, a hybrid or chimeric gene fusion is formed such that characteristics from both genes are active. Interestingly, for both WWC1-ADRBK2 and ADNP-C20orf132, we observed discordant expression delineated at the predicted breakpoint region of each gene. In both cases, expression was markedly higher in the 3' partner gene compared to samples that were negative for the gene fusion of interest. Hence, this suggests that they may represent examples of the first mechanism. Gene fusions are also known to be associated with CNVs . While we observed some previously reported CNVs near the selected fusion genes, no conclusions could be drawn on their functional relationship.
None of the predicted gene fusions were found to be recurrent in any of the other samples that were sequenced. This raises the question of whether these fusions represent driver mutations that directly contribute to tumorigenesis or are passenger effects that have minor or no consequence. It is well understood that driver gene fusions are typically found to be recurrent, such as BCR-ABL, ETV6-NTRK3, and TMPRSS2-ERG in prostate cancer , and consequently they are ideal targets for therapeutic intervention. Since we were unable to detect any of our novel fusions in our screening of additional BRCA1-mutated, BRCA2-mutated or BRCA1/2-unrelated breast cancers, they may represent non-recurrent passenger mutations. However, more experimental studies will be required to elucidate their functional role. Moreover, many fusions that have been reported in literature have been found to be rare and have a low recurrence rate . Hence, such low frequency gene fusions that are found in cancer may still be worth noting. If they can be observed in a patient, such private mutations can be potentially used as part of a personalized treatment program. For example, Leary et al.  recently demonstrated the ability to identify patient-specific genomic rearrangements as biomarkers in solid tumors using massively parallel sequencing. Indeed, our identification of ADNP-C20orf132 in a primary tumor represents one example of a private biomarker which may be used to track the status of the patient. Therefore, the use of sequencing-based approaches will be vital for advancing our understanding of tumors and to catalogue all known genetic abnormalities .
For this study, we focused our analysis on a collection of breast cancer samples with BRCA1 mutations. Tumors of this type may possess a distinctive, possibly unique, expression signature, despite their resemblance to basal-like and triple-negative breast cancers [51, 52]. Hence, there has been great interest in elucidating the molecular mechanisms in BRCA1 cancers to identify potential biomarkers and drug targets. For example, genes that are more relied upon by tumor cells as a result of the loss of BRCA1 function can be targeted for inhibition and result in cell death. This synthetic lethal relationship has led to findings of potential drug targets that are sensitive to inhibition in BRCA1 tumors, such as mitogen-activated protein kinase  and poly (ADP-ribose) polymerase (PARP) . In the case of the latter, a PARP inhibitor, olaparib (AstraZeneca) has already been developed and undergone successful clinical trials . Knowledge of the role of gene fusions in BRCA1 breast cancers, however, is limited. In a study by Stephens et al. , gene fusions were identified in breast cancer genomes, but none of them were recurrent. We hypothesized that mutations in BRCA1 may increase the frequency of chromosomal aberrations due to defects in the DNA repair and NHEJ pathways. This in turn, could result in the expression of novel gene fusions that can be observed at the transcriptome level. As discussed above, our analysis of a limited number of samples did not reveal strong evidence of gene fusions as major contributors to the development of BRCA1 breast cancers. However, this does not discount other genomic instabilities and lesions that arise from BRCA1 mutations and are not detected by RNA-Seq. For example, studies have shown that BRCA1 has a role in centrosome function and the organization of chromosomes [55, 56]. We expect future studies to involve analyzing a greater number of samples by massively parallel sequencing at the genomic and transcriptomic level, allowing for a more powerful and comprehensive interrogation of breast cancer.
We demonstrate the merits of using RNA-Seq to discover gene fusions. In particular, we note that our method to examine discordant expression between exons is related to a previously described approach to predict gene fusions using exon arrays . However, while candidate fusion genes can be identified based on discordant exon expression, it can be difficult to determine which pair of genes is involved in the fusion. A sequencing-based approach can overcome this by additionally identifying reads that map across the exon-exon fusion junction. Initially, we explored using a strategy based on SE reads. Maher et al.  previously described an approach that leveraged both longer reads (> 250 bp) followed by short reads (35 bp) to find gene fusions. Since while initially working with HCC1937 we only had short 50 bp SE reads, a major challenge was to identify reads that partially aligned to two different genes. For example, a typical partial alignment may involve finding matching sequences less than 50% the length of the read (in this case, < 25 bp) that map to a gene. Given the large number of genes to be searched, it is likely that many of the shorter sequences will be matched in a non-specific manner. We mitigated this by filtering for matches that occur at or near the boundary of exons. Another approach was through the use of PE reads, which alleviated the reliance on finding junction spanning reads in the initial step . Instead, we first systematically searched for paired reads that fully mapped to the genome but not at the expected distance or orientation, followed by searching for individual reads mapping across the predicted fusion junction. We also note two caveats to our approach. First, we identified candidate gene fusions based on RefSeq annotation for well-characterized content. As a result, some non-RefSeq genes may have been missed. Second, lowly expressed gene fusion transcripts are generally more difficult to detect if there is insufficient read coverage at the junction site and do not have supporting discordant PE reads. Thus, we focused on gene fusion candidates with adequate read coverage for further validation. In general, we found that the paired read property of PE reads allowed us to identify candidates with greater effectiveness than using only SE reads.