Cohort constructions
The Mayo Clinic Benign Breast Disease (BBD) Cohort includes 13,455 women, ages 18–85 who underwent benign biopsies at Mayo Clinic between 1967 and 2001. Women who had been diagnosed with invasive or in situ BC before or within six months of biopsy or have undergone risk-reducing mastectomy or breast reduction surgery prior to biopsy were excluded. Among this cohort, a frequency-matched (by age and year of biopsy) case–control sample was selected, where cases were defined as those women with BBD who subsequently went on to develop either ER+ or ER− BC within 16 years, and controls were defined as women with BBD who had not developed BC after at least 16 years of follow-up. Index benign biopsies were screened from women with ER− or ER+ BC and corresponding controls, matched on age at biopsy, year of biopsy, and length of follow-up time/time to BC diagnosis. After determining tissue block availability, adequate DNA amount and quality, and adequate sequencing quality, our final sample set for association analysis included 42 ER+ cases, 36 ER− cases, and 42 controls diagnosed between 1969 and 2001. Demographic and clinical characteristics were compared across groups using Pearson chi-square tests for categorical variables and ANOVA tests for continuous variables.
DNA extraction and sequencing
DNA extraction and sequencing were performed as previously reported [17]. In brief, DNA was extracted from ten micrometer sections of FFPE or fresh frozen tissue using the GeneRead DNA FFPE kit (Qiagen, Germantown, MD, US). After extraction, DNA was quantified using Qubit™ dsDNA BR Assay (ThermoFisher Scientific, Waltham, MA, USA) while quality was assessed using the Advanced Analytical Fragment Analyzer™ High Sensitivity Large Fragment Analysis kit which calculates fragment length and degradation.
The QIAseq Human Breast Cancer Targeted Panel, which targets 93 genes relevant in BC, was used to create libraries using 20–40 ng of DNA as previously reported [17], following Qiagen guidelines for FFPE DNA. Libraries were quantified and sequenced on an Illumina® HiSeq 4000 (Illumina, San Diego, CA, USA) paired end 150-bp.
To facilitate quality control and to provide confidence in results derived from archival FFPE tissue, a set of technical control samples was used, and included four fresh snap-frozen benign breast samples (with pathologic assessment of cryoH&E sections) and paired FFPE tissues from reduction mammoplasties, one CEPH control NA12891 (Coriell Institute for Medical Research, Camden, NJ, USA), one FFPE BBD sample not in the sample set, and a positive control sample from formalin fixed cell lines with 11 mutations at varying allelic frequencies (Horizon Diagnostics LLC, Columbus, GA, USA). Variants were located within the BRAF, cKIT, EGFR, KRAS, NRAS and PIK3CA genes.
DNA-seq alignment and variant calling
The Qiagen Data Portal was used for primary sequencing analysis of the samples [18]. The analysis steps included adapter trimming, coupling molecular tag (MT) sequence to the read IDs, alignment to the reference genome (GRCh37 build), and subsequent variant calling using smCounter, a molecular tag-aware variant calling algorithm. smCounter uses a Bayesian probabilistic model to identify variants and infer genotypes, which has been shown to detect low frequency variants with high sensitivity [19]. Sequenced reads with identical molecular tags were identified as PCR duplicates. Reads identified as PCR duplicates were collapsed to create a consensus read sequence. A molecular diversity score, defined as the proportion of molecular-tag coverage versus raw sequencing coverage (100 × MT-coverage/Raw-coverage), was calculated for each sample. For variant calling, the consensus sequence was compared to the reference genome and a prediction index of the alleles observed at the molecular tag level was calculated for every target position. A variant was called if the prediction index of an allele was higher than the pre-specified prediction index threshold, based on 8 reads per molecular tag [19]. The resulting variant calls were output in the standard VCF. After variant calling, initial variant filtering excluded likely false calls due to technical factors, including shallow molecular tag coverage, strand bias, presence in low complexity regions, and/or low base quality.
Sample QC and acceptance criteria
Samples with an average unique molecular tag (UMT) coverage of < 20 × and with genotyping call rate of SNPs < 80% were excluded. For each sample, variants were called as genotypes based on bins of allele frequencies of 0–0.2 (rare homozygote), 0.4–0.6 (heterozygote), and 0.8–1.0 (common homozygote), and the genotyping call rate was defined as the proportion of SNPs for which the genotype was called. The distribution of genotyping call-rate versus the mean UMT coverage is shown in Additional File 1: Fig. S1.
Sample identity was examined using Spearman correlation of the minor allele frequency of known SNPs across all samples to identify those emanating from the same individual. For quality control, 24 samples with two to 10 replicates were profiled independently, with a total number of 119 replicate-pairs. After strict variant filtering, correlation values of replicates were completely separable from unrelated samples; replicate samples had correlation close to 1.0 (all > 0.85), whereas unrelated samples had correlation centered around 0.6 (all < 0.85) (see Additional File 2: Fig. S2; Metadata on the final sample set are presented in Additional File 3: Table S1).
Additional variant filtering and population allele frequency concordance
As FFPE samples are known to be prone to variant artifacts, additional filtering was required before further analysis. Utilizing paired FFPE and fresh frozen samples that we studied previously with identical methods [17], false discovery rate (FDR) of variant-calling was approximated for seven mutation types (C>A, C>G, C>T, C > T at CpG, T > A, T>C and T>G) (see Additional File 4: Fig. S3). The empirical relationships between alternate allele frequency (AAF) and FDR were utilized to determine further filtering strategies. We defined three sets of variants based on liberal, classical or strict filtering of AAF. For the liberal set, variants with AAF < 0.05 were removed. For the classical set, C>T mutations (with the exception of SNPs annotated with an rsID) with AAF < 0.1 were removed, while other types of mutations with AAF < 0.05 were removed. For the strict set, all variants with AAF < 0.1 were removed.
After additional AAF-based filtering, variant frequencies in our cohort were compared with population allele frequencies to ensure that additional filtering strategies did not skew allele distribution of study samples. In particular, all the detected variants were annotated with population allele frequencies observed in the 1000 Genomes Project and the Exome Aggregation Consortium (ExAC) based on internally developed bioR annotation software [20], and were compared between the overlapping variants.
Gene-level association methods
To prioritize variants for association analyses, those meeting any of the following criteria were removed: 1. Observed in common populations with minor allele frequency (MAF) > 0.5%, according to Genome Aggregation Database (gnomAD) and Trans-Omics for Precision Medicine (TOPMED) studies [21, 22]. 2. A small number of variants (N = 44) variants that were common in BBD controls (average VMF over the controls > 0.05). 3. Defined as low functional impact by the CAVA bioinformatics annotation tool [23]. In the CAVA annotation process, medium and high impact variants were defined as essential splice bases, stop gain, frameshift, nonsynonymous variant, inframe indels, start codon, stop loss, and/or exon end (alters first or last three bases of exon).
The AAFs were summarized at each position, overall, and by group (ER+, ER−, control). Frequencies were also summarized for each gene and overall, across genes.
Gene-level analyses considered mutations in two ways: (1) as a continuous allele frequency, and (2) as a binary presence/absence of mutation variable. Gene-level analyses used logistic regression to compare the sum of continuous variant allele frequencies across variants in cases (all cases, as well as separately in ER+ and ER−) versus controls. Additionally, gene-level analyses of the binary presence/absence of mutations across a gene were conducted with SKAT-O. In addition to the default weighting based on variant allele frequencies, secondary analyses also implemented a more stringent variant weighting where each mutation was down-weighted according to its FDR, and hence C->T mutations were more heavily down-weighted compared to other mutation types. This resulted in four statistical analysis methods: SKAT-O, C-T down-weighted SKAT-O, logistic regression, and C-T down-weighted logistic regression; when combined with the three levels of variant AAF filtering (liberal, classic, and strict, defined above), this yielded 12 combinations of statistical method and filtering criteria.
All models were adjusted for relevant covariates, including epithelial percentage, histologic impression, patient’s age, year of biopsy (including a linear term for year, indicator for whether the biopsy was post-1992, year*1992 interaction due to an FFPE processing change), and SNP call rate. Sensitivity analyses were performed for the different variant QC criteria (strict, classical, and liberal). Primary analyses compared all cases (ER+ and ER−) to controls, but secondary analyses considered ER+ and ER− cases separately.
In order to assess statistical significance of the gene-level results, permutation tests were conducted. The sample labels were permuted 100 times (with the relationship between the covariates and group status preserved), and the gene-level analyses across the 12 combinations of variant filtering and statistical analysis method were performed. Empirical p-values were calculated from the distribution of two quantities: (1) the number of genes with p-value < 0.05 in 4 out of 12 methods (two-sided test) and (2) the number of genes with p-value < 0.05 and OR < 1 under the weighted logistic regression model with classical variant filtering (one-sided test).
Mutational signature analysis
Using the high-impact variants with classic AAF filtering, a set of custom Perl scripts was used to generate a mutation frequency table of all SNVs across each sample to assess the of mutation types being reported. This mutation frequency table was subsequently used to generate plots of mutation signatures for the variants using Perl and R scripts (code available from GitHub repository https://github.com/Liuy12/BBD_generead). Mutational spectrums and de-novo mutational signatures were identified using the MutationalPatterns package (version 1.2.1) [24]. De-novo signatures were extracted based on a non-negative matrix factorization (NMF) algorithm. Through consensus clustering, four stable de-novo mutational signatures were identified and compared with COSMIC mutational signatures v2 (n = 30) based on cosine similarity (Additional File 5: Fig. S4) [25]. Signature A (QGR signature) matches the patterns identified in samples processed with the FFPE DNA protocol used, the QIAGEN GeneRead DNA FFPE Kit ("QGR") [17]. Signature B (FFPE signature) matches patterns observed in FFPE samples but not in fresh-frozen samples. Signature C (Block year signature) was highly associated with the year of FFPE block creation. Signature D (Residual signature) had no clear correspondence with previously found mutational signatures. The estimated signatures are shown in Additional File 6: Table S2.
Immunohistochemistry analysis
In all BBD samples, expression of Ki67 and CD45 was assessed in up to 10 normal lobules using IHC. Immunostaining was performed using the following antibodies: CD45 (Abcam ab10559, 1:800), Ki67 (DAKO M7240, 1:100). Samples were deparaffinized with three changes of xylene, rehydrated in 95% ethanol and rinsed well in running distilled water. Slides were then placed in a preheated Antigen Retrieval solution (pH 6.0, DAKO) for 25 min and then cooled in the buffer for 25 min followed by a five-minute rinse in running distilled water. After the heat-inactivated epitope retrieval step, slides were placed on the DAKO Autostainer at room temperature for the following procedure. Sections were incubated with 3% H2O2 for five minutes to inactivate the endogenous peroxides and then incubated in the primary antibody at dilutions listed above for 60 min at room temperature. Sections were rinsed with Tris-buffered saline/Triton-X100 (TBST) wash buffer and incubated with the secondary antibody (Envision (+) anti-mouse labeled polymer (HRP, K4001) for Ki67 and Envision (+) anti-rabbit labeled polymer (HRP, K4003) for CD45) for 30 min. Slides were then rinsed with TBST wash buffer and sections were incubated in 3,3′-diaminobenzidine (DAB+) (K3467, DAKO) for five minutes, counterstained with Gills I hematoxylin for one minute, followed by a three-minute tap water rinse to blue sections, dehydrated through graded alcohols and cleared in three changes of xylene and mounted with permanent mounting media. Slides were scanned using the Aperio™ Scanscope XT Slide Scanner (Leica Biosystems, Buffalo Grove, IL, USA) with image acquisition and staining quantitation using Aperio™ ImageScope Software (Leica Biosystems, Buffalo Grove, IL, USA). Slides were scanned at resolution of 0.5 µm/pixel with no downstream image processing. Ki67 was measured as percent positive nuclei and CD45 was measured as an H score (a combined measure of the intensity and extent of staining) [26,27,28]. CD45 and Ki67 were assessed as continuous measures. Associations across groups were evaluated as Wilcoxon rank-sum tests, and associations with mutational burden (based on classical variant filtering) were evaluated with Spearman correlation.