This analysis examined a consecutive series of 376,159 individuals who received clinical testing with a hereditary pan-cancer panel test at our Clinical Laboratory Improvement Amendments and College of American Pathology approved laboratory (Myriad Genetic Laboratories, Inc.) from September 2013 through May 2017. All individuals provided written informed consent for clinical testing, and the data presented here were de-identified for analysis. No additional information was obtained from patients or healthcare providers for this analysis.
Genetic testing
The hereditary pan-cancer panel test used in this analysis has been analytically validated and described in detail previously [9]. Panel genes included APC, ATM, BARD1, BMPR1A, BRCA1, BRCA2, BRIP1, CDH1, CDK4, CDKN2A (p16INK4a and p14ARF), CHEK2, EPCAM, GREM1, MLH1, MSH2, MSH6, MUTYH, NBN, PALB2, PMS2, POLD1, POLE, PTEN, RAD51C, RAD51D, SMAD4, STK11, and TP53. Testing included sequencing and LR analysis of all genes except POLD1 and POLE, for which only sequencing of the exonuclease domains was performed. For EPCAM and GREM1, only LR analysis was performed. The panel became available in September 2013 and included all listed genes except POLD1, POLE, and GREM1, which were added in July 2016. For testing, germline DNA from blood or saliva samples were processed through a PCR-based target-enrichment strategy for NGS. Genomic DNA was sonicated and dispersed in oil into picoliter-sized aqueous droplets, which were merged with a custom dropletized target enrichment primer library. The resulting microdroplet emulsion was subjected to PCR amplification. Emulsion PCR products were purified and subjected to secondary PCR to incorporate NGS sequencing adaptors and indexes for individual sample tracking. Indexed samples from 96 individuals were pooled and loaded onto massively parallel next-generation sequencers for 2 × 150 base paired-end reads.
NGS dosage analysis
Quantitative dosage analysis of NGS data was performed to determine copy number abnormalities indicative of deletions or duplications in exon and promoter regions. The analytic accuracy and reproducibility of NGS dosage analysis using in-house-developed review software was characterized previously [9] and approved by the New York State Department of Health prior to clinical use. Pseudogene reads were circumvented through primer design and alignment filters for NGS data analysis. For PMS2 exon 9, exons 11–15, and their flanking regions, this approach was supplemented by dosage quantification involving previously defined paralogous sequence variants (PSVs) between PMS2 and PMS2CL, its highly homologous pseudogene. Approximately 2000 NGS amplicons were used to interrogate coding exons and limited flanking intron regions of tested genes.
Confirmatory LR analysis
The LRs discussed herein affected regions ranging from a few hundred bases to several kilobases. All samples positive for a putative LR on NGS were confirmed using procedures previously validated with positive and negative controls [9]. Most often, LRs were confirmed through targeted microarray-CGH and/or MLPA analysis. In some instances, LR findings on NGS were confirmed solely by a second confirmatory NGS result. For microarray-CGH, approximately 9600 probes interrogated coding exons, limited flanking intron regions, and promoters. Microarray probe design was optimized to avoid known pseudogene regions and included the use of flanking intronic probes in certain genes. Probe signals were analyzed using laboratory-developed software that compared the ratio of bound sample DNA to that of a differentially labeled reference DNA to identify regions of altered copy number. The amplitude of probe clusters was analyzed to elucidate the nature of the LR. MLPA was run using the SALSA MLPA gene-specific kits according to manufacturer specifications (MRC Holland. Amsterdam, Netherlands).
A targeted PCR assay was used to confirm LRs detected initially by NGS. Targeted PCR uses primer pairs that span a specific region involved in the putative LR, generally a breakpoint. The assay used long-range PCR amplification conditions when needed, depending on the size of the affected region. Mutation-specific PCR products were amplified, visualized using gel electrophoresis, and further characterized by downstream Sanger sequencing analysis. In addition to analyzing relative copy number to detect deletions and duplications, the NGS assay included custom amplicons designed to detect a previously characterized inversion of MSH2 exons 1–7 [10]. The inversion was detected on NGS using mutation-specific components that did not amplify the inversion directly but served as a screening tool to trigger additional confirmatory work via targeted PCR.
For the PMS2 gene, part of which is highly homologous to the PMS2CL pseudogene, putative LRs in exons outside the pseudogene region were tested using NGS dosage analysis and confirmed using MLPA. However, putative deletions or duplications that were contained entirely within exon 9 or exons 11–15 were confirmed using PMS2- and PMS2CL-specific sequencing analysis and/or locus-specific, long-range PCR.
Putative single-exon deletions required additional scrutiny, as true single-exon deletions and artifactual single-exon deletions can look the same on amplification-based NGS. The presence of a heterozygous variant under a primer can cause the artifactual appearance of a deletion due to PCR allele drop-out. Therefore, the sequence under the primers for the affected exon was screened for the presence of any heterozygous sequence variants. Retroelement insertions (RE) were detected by the NGS assay when they manifested through a reduced amplicon copy number. If the individual carried no heterozygous variants in the relevant region, additional studies were employed to distinguish between a possible deletion and an RE insertion, as detailed in Fig. 1. A confirmatory assay, such as targeted microarray-CGH, typically was used next. If microarray-CGH revealed a full exon deletion, the result was reported as such. Cases in which the exon appeared partially deleted on microarray-CGH were characterized further using long-range PCR and Sanger sequencing of the product to identify breakpoints. If microarray-CGH results were discordant with the initial NGS LR results (i.e. negative), long-range PCR and Sanger sequencing were performed to determine whether the LR was actually an RE insertion, which is often not detected by microarray-CGH.
Assessment of pathogenicity
Long-range PCR and sequencing of the mutant product was used to define deletion breakpoints or RE insertion points for the purpose of variant classification. Additional investigation was conducted to characterize an LR precisely in every case where variant classification could be impacted. In general, LRs were classified as deleterious mutations (DM) or suspected deleterious mutations (SDM) and thus considered pathogenic based on disruption or loss of critical gene regions, or consensus splice junction removal that is likely to produce abnormal RNA splicing. For instance, RE insertions into critical domains within coding regions were considered pathogenic. Alternatively, in-frame deletions or REs that insert within a non-critical region or a region of unknown function were classified as variants of uncertain significance (VUS).