### Autosomal mCNVs larger than 200 kb are detected in 65% of patients and cover the majority of the genome

As a first step toward measuring the impact of mCNVs on NIPS performance, we surveyed their frequency, size, and positional bias in 87,255 patient samples. Using a rolling-window z-score algorithm (see Methods), we identified mCNVs ≥200 kb. On average, patients had 1.07 autosomal mCNVs, and 65% of patients had at least one mCNV. There were 37% more deletions than duplications overall, but duplications were generally larger than deletions (median sizes 360 and 260 kb, respectively; Kruskal-Wallis H-test *p* < 0.05).

Chromosomes 13, 18, and 21 are commonly tested in NIPS, and mCNVs on these chromosomes pose the most direct risk for false positives. On these chromosomes, 2.1% of all patients had at least one duplication and 2.5% had at least one deletion with 4.5% having an mCNV of either type (Fig. 2a). On chromosome 21, deletions and duplications were observed at a similar frequency, yet mCNVs larger than 1 Mb were all duplications (21 duplications and no deletions, Fig. 2b, c). The high frequency of mCNVs on the commonly trisomic chromosomes illustrates why an NIPS strategy that results in no-calls for samples with mCNVs would be clinically inviable, as the rate of no-calls and invasive follow-up procedures would be unacceptably frequent.

We investigated the positional distribution of mCNVs to evaluate the previously published premise [13] that if mCNV positions were highly predictable, an algorithm could achieve robustness simply by masking out (or “blacklisting”) such regions. Indeed, we observed that mCNVs were not distributed uniformly (Fig. 2d). Hotspots of mCNVs were common, with some hotspots having an equal number of duplications and deletions, and others having an imbalanced ratio of the two. However, mCNVs were not constrained to hotspot regions, as they were observed across nearly all of the mappable portion of chromosome 21, with only about 14% of the chromosome having no observed mCNVs in our dataset (approximately 7% of chromosome 13 and 9% of chromosome 18 did not have mCNVs; Additional file 1: Figure S5). Though mCNV hotspots suggest that a blacklist approach could partially mitigate the impact of mCNVs, this strategy has drawbacks: either (1) many sites are blacklisted, which would impair sensitivity for aneuploidy detection or (2) few sites are blacklisted, after which many samples would retain mCNVs within the analyzed regions that could lower specificity. This result extends to NIPS assays that apply the blacklist at a biochemical level, e.g., by only targeting certain regions for sequencing [17, 18].

### The impact of mCNVs on z-scores observed in empirical data is recapitulated and supplemented with simulations

We next explored the impact of mCNVs on aneuploidy-calling fidelity as a function of mCNV size (Fig. 3). Empirically observed mCNVs rarely spanned ≥1% of a chromosome, which prohibited a statistically powered assessment of the impact of these large mCNVs. To overcome the sparsity of empirical data, we implemented simulations to systematically analyze the effects of maternal duplications on trisomy detection. To create a simulated sample harboring an mCNV of a given size and position, the bin-level copy-number data corresponding to the region of interest was scaled by an empirically derived factor in a euploid and mCNV-free sample (Fig. 3a, b). Simulated samples strongly resembled their observed counterparts, both at the level of bin profile (Fig. 3a) and the distribution of bin copy-number values (Fig. 3b). The bin copy number within simulated mCNVs was very slightly overdispersed compared to the bin copy numbers within detected patient mCNVs (Fig. 3b). The strong overlap between median z-scores for the empirical and simulated samples (Fig. 3c, thick gray and red lines, both for the “Simple” method) suggests that this dilation effect has a negligible impact on our results.

Maternal duplications exert an upward pressure on z-scores, and this effect was reproduced in our simulated data on autosomes (Fig. 3c, gray and red traces, respectively). Importantly, with the simulated data the effect was more readily observed, as the full size spectrum of potential mCNVs was modeled. Larger simulated duplications led to increasing positive shifts away from the expected median z-score of 0 for a euploid sample (Fig. 3c, red trace). The threat to the clinical performance of NIPS is that this bias toward higher z-scores contributes to false positives and lowers specificity. Indeed, the simulations suggest that the average sample harboring an mCNV spanning 2.4% or more of a chromosome would be expected to yield a false positive using the “Simple” approach (i.e., the median z-score exceeds 3).

### mCNV impact on z-scores can be reduced, but not eliminated, with outlier-robust algorithms

We sought to determine which algorithmic features in an NIPS analysis pipeline minimize the effect of mCNVs on z-scores. Our simulated samples were an ideal data set for this analysis, as the samples have both a “pre-mCNV” z-score (reflecting their original status as both euploid and free of mCNVs; see Methods) and a “post-mCNV” z-score calculated after introducing a modeled maternal duplication. The difference between the post- and pre-mCNV z-scores—which we term *∆z*_{dup}—is a direct measure of the effect of mCNVs on z-scores. A positive *∆z*_{dup} means the aneuploidy z-score was increased with the introduction of a simulated mCNV.

Six analysis strategies were tested on simulated samples with maternal duplications on chromosomes 21 (Fig. 4), 13 (Additional file 1: Figure S6), or 18 (Additional file 1: Figure S7). For each test of a strategy and a chromosome, we evaluated at least 10,000 simulated samples. As described in Methods and summarized in Additional file 1: Table S1, the strategies differ both in their approaches for calculating the central tendency (e.g., mean or median) and dispersion of bin copy-number values across a chromosome and in their filtering methods that determine which bins are used in those calculations. For each method, *∆z*_{dup} was plotted as a function of mCNV size (Fig. 4, middle panels), and these data were sampled to estimate how specificity falls as mCNVs grow (Fig. 4, right panels; see Methods).

The “Simple” approach (Fig. 4a) summarizes the bin copy-number values of a chromosome by the mean and standard deviation, without applying any mCNV-specific or nonspecific filters. As anticipated, this method was the most susceptible to false positives due to mCNVs; at the point where duplication size exceeded 1.3% of chromosome 21 (0.42 Mb, autosomal duplications of this size or greater observed in 13% of patients), the estimated specificity dropped below 95%, and duplications spanning more than approximately 10% of the chromosome always caused false positive results [3]. Methods using an alternative to the z-score while still using the mean and standard deviation in the analysis—such as employing a t-test [19]—would likely be similarly susceptible to mCNVs.

The “Robust” approach (Fig. 4b) improves upon the “Simple” strategy by replacing the mean with the median and estimating the standard deviation of bin copy-number values from their interquartile range, rather than calculating the standard deviation directly. The median and IQR are less susceptible to outlying bins than the mean and standard deviation; therefore, utilizing these values is expected to increase robustness to mCNVs. Indeed, this approach had smaller z-score deflections than the “Simple” strategy for mCNVs spanning <10% of the chromosome but was still suboptimal; specificity dropped below 95% for mCNVs spanning ≥3.6% (1.2 Mb) of chromosome 21, and our patient cohort contained 1168 samples (1.3%) with duplications in that size range (Fig. 2a).

The “Robust+Gaussian” approach (Fig. 4c) adds another layer of nonspecific outlier removal to the “Robust” approach by rejecting bins that fall far outside of a Gaussian fit to the bin copy-number data. This method performed better than both the “Simple” and “Robust” methods, but was susceptible to mCNVs spanning approximately 8.9% of chromosome 21 (2.9 Mb), at which point specificity dropped below 95%. As a consequence of more stringent filtering, the Robust+Gaussian method discards more bins relative to the previous strategies. This excess bin culling would reduce sensitivity because sensitivity of WGS-based NIPS is an increasing function of the number of bins [20].

### Directly accounting for mCNVs boosts specificity

We next considered strategies that specifically address mCNVs, positing that directed approaches would further boost specificity. The “Z-correction” method (Fig. 4d) first calculates a z-score for the chromosome—without removal of mCNV bins—and next subtracts a chromosome- and size-specific z-score offset determined via simulated samples analyzed with the “Robust” approach. In adjusting for mCNVs, this method assumes that the effect of mCNVs on z-score is determined by size and is reproducible across samples. This method performed better in aggregate compared to the previous approaches, as the median of *∆z*_{dup} remained near 0 even for large duplications. However, *∆z*_{dup} values were relatively highly dispersed for simulated duplications around >3% (1 Mb) in size, meaning that an mCNV would still cause large z-score deviations for some samples. The specificity for chromosome 21 dropped below 95% at duplication sizes of approximately 22% (7.0 Mb).

The “Value filtering” approach (Fig. 4e) operates on a simple premise: neutralize mCNVs by purging bins with high (>2.5) or low (<1.5) copy-number values prior to calculating the chromosome-wide average and dispersion. This method was robust to mCNVs that were not extremely large (<95% specificity for mCNVs larger than 27% of chromosome 21, or 8.7 Mb), but showed elevated variability in *∆z*_{dup} for all mCNV sizes relative to other strategies. The increased noise results from filtering out bins too aggressively, leaving fewer data points—and consequently more noise—for z-score calculation. Duplications are still expected to have some bins with copy-number values less than 2.5 but elevated compared to non-duplicated regions, which is likely why large duplications caused a positive *∆z*_{dup}. A variant of this method using cutoff values based on empirical bin copy-number values is shown in Additional file 1: Figure S4. This method showed the most variability in the fraction of bins retained after filtering (Additional file 1:Figure S4, right panels) compared to all other methods that were analyzed, suggesting that it could have a nontrivial and variable impact on aneuploidy sensitivity for samples with mCNVs, as sensitivity depends on the number of bins available for z-score calculation [20].

Finally, the “mCNV filtering” approach (Fig. 4f) performs a sample-specific exclusion of bins included in mCNVs. Treating each sample separately, chromosomes are scanned for the presence of mCNVs (see Methods) and then mCNV-spanning bins are excised prior to all downstream calculations. This method was the most robust to mCNVs compared to the others, with specificity dropping below 95% only for maternal duplications larger than 58% of chromosome 21 (18 Mb). Because the “mCNV filtering” method removes only the data that should be removed, it decreases z-score noise, retains high specificity, and has more consistent sensitivity compared to the “Value filtering” approach due to less noise in the number of bins retained (Additional file 1: Figure S4, right panels).

### mCNV filtering reduces mCNV-caused false-positive rate to fewer than 1 in 520,000

To evaluate the algorithmic strategies through a more clinically relevant lens, we calculated the expected frequency of false-positive aneuploidy calls resulting from mCNVs on chromosomes 13, 18, and 21 (see Methods). Using the measured relationship between duplication size and *∆z*_{dup} (Fig. 3), as well as the size and chromosome of the observed maternal duplications in over 56,000 NIPS samples (the 65% of the 87,255 sample cohort with mCNVs), we estimated the false-positive rate combined across the three chromosomes for each NIPS data-analysis strategy described earlier.

On average, mCNVs are predicted to cause a false-positive result of trisomy 13, 18, or 21 for 1 in 860 patients using the “Simple” approach. This false-positive rate is similar to the rates reported by laboratories prior to incorporating changes that mitigate the effect of mCNVs: in outcome studies, Chudova et al. reported 3 mCNV-caused false positives in 1914 patients (a rate of 1 in 640) [7], and Strom et al. reported 61 mCNV-caused false positives in 31,278 patients (a rate of 1 in 510) [6]. The “Simple” estimated false-positive rate is also consistent with aggregate statistics of NIPS specificity from meta-analyses over the time period when comparable methods were common [3].

Overall, mCNV-aware approaches (“Z-correction”, “Value filtering”, “mCNV filtering”) had higher specificity than mCNV-unaware approaches. All mCNV-aware approaches increased the pooled specificity for the three common trisomies such that the aggregate false-positive rate was fewer than 1 in 100,000 tests. Remarkably, relative to the “Simple” approach with one false positive expected for every 860 samples, the “mCNV filtering” approach is expected to incur only one mCNV-caused false positive for every 520,000 samples, representing a 600-fold reduction.

### mCNVs offer insight into clinical interpretation of novel fetal microdeletions

The high frequency and positional dispersion of CNVs across the genome (Fig. 2) was noteworthy in this ostensibly healthy pregnant population. We were curious about whether the landscape of maternal copy-number variation could inform the potential clinical impact of copy-number variation in the fetal genome. Such knowledge is important because WGS-based NIPS technology can detect novel fetal microdeletions on the order of 10 Mb [10], and it is not yet clear how to interpret the health implications of such variants.

We reasoned that the clinical consequences of a novel 10 Mb microdeletion would be less severe if there are deletions observed throughout the region in a healthy population. Therefore, we calculated the proportion of each autosomal, 10 Mb sliding window that was covered by at least three observed deletions in our mCNV dataset, termed the “deletion span” (Fig. 5a). We assumed that duplications are more likely to be benign than deletions and, therefore, calculated the corresponding duplication span for each region to serve as a proxy to control for CNV propensity. As schematized in Fig. 5a, a window with a high duplication span has several observed duplications covering most of the region, and a window with a low deletion span has deletions only in a few parts of the region. The number of observed mCNVs in a given window is not the sole determinant of the span; for example, a 10 Mb window that had a 200 kb deletion hotspot but no deletions elsewhere would have a small deletion span. Figure 5b shows span values as a function of position across chromosomes 4 and 5 (all other chromosomes in Additional file 1: Figure S8), and Fig. 5c compares deletion and duplication spans for all 10 Mb windows across autosomes. The two span measurements were significantly correlated (Pearson *r* = 0.73, *p* < 0.05), consistent with there being an intrinsic propensity for CNVs (deletions and duplications) that varies by position [21].

Based on the presumption that deletions are more likely to be pathogenic than duplications, we expected that a small deletion span, relative to the duplication span, would be a feature of pathogenic microdeletions. Therefore, we calculated the ratio of spans (“dup:del ratio”) and evaluated whether pathogenic microdeletions had elevated dup:del ratios. Figure 5d shows the histogram of the dup:del ratio for autosomal 10 Mb bins; it highlights five commonly screened pathogenic microdeletions (22q11.21, 5p15, 1p36.32–33, 4p16.2–3, and 15q11.2–13.1). Four of the five pathogenic microdeletions had a dup:del ratio in the 75th percentile or greater, but 22q11.21 had a nearly 1:1 dup:del ratio (10th percentile). These data suggest that a high dup:del ratio could be a common—but not ubiquitous—feature of pathogenic microdeletions.

We investigated the density of genes in a region as a secondary feature that could distinguish whether a deletion is pathogenic. Notably, based on gene density and dup:del ratio, each of the common pathogenic microdeletions was an outlier relative to typical 10 Mb windows in the genome in one feature, the other, or both (a result robust to the mCNV-count threshold used to define a span, Additional file 1: Figure S9, as well as resamplings of the study population, Additional file 1: Figure S10). Microdeletion 22q11.21 had only an intermediate dup:del ratio, as mentioned, but its gene density is very high. Microdeletion 5p15, by contrast, had the opposite: an elevated dup:del ratio (≥99th percentile) but approximately average gene density. Finally, microdeletions 1p36, 4p16, and 15q11 all had both high gene density and elevated dup:del ratio.

To expand the investigation to a larger number of known pathogenic microdeletions, we additionally analyzed expert-curated pathogenic deletions [22] ≥1 Mb in length from the International Collaboration for Clinical Genomics (ICCG, formerly ISCA). Nearly all such variants were outliers in one or both metrics (purple diamonds, Fig. 5e), consistent with the findings for common microdeletions. Two known pathogenic microdeletions (2p15p16.1 and 12q14) had low dup:del ratios (~ 1:1) and relatively low gene density, but they also had low values for both the duplication span and deletion span (≤10%; Additional file 1: Table S2). As such, low span values might represent cases in which the dup:del ratio alone is equivocal for interpreting novel microdeletions.

The above analyses suggest that outlying regions in the plot of gene density versus dup:del ratio are more likely to be pathogenic when deleted. To scrutinize this hypothesis, we tested its inverse, i.e., that deletion of non-outlying regions is benign. We observed multiple samples in our patient cohort with microdeletions ≥4 Mb, most of which we expected to be benign—or to have a mild or incompletely penetrant pathogenic phenotype—because of their presentation in expecting mothers. For all such microdeletions, we evaluated their respective gene densities, duplication spans, deletion spans, and dup:del ratios (yellow dots in Fig. 5e and Additional file 1: Figure S8; Additional file 2: Table S3). All but one of the regions directly supported our hypothesis because they were not outliers on either axis (Fig. 5e). We looked more deeply at the one variant that appeared to counter the hypothesis due to its very high dup:del ratio (yellow dot with arrow in Fig. 5e). Remarkably, this variant is a deletion of 13q34 that has recently been shown to be pathogenic, as it associates with intellectual disability and dysmorphism [23]. Therefore, rather than invalidate or weaken the hypothesis, the observed 13q34 microdeletion reinforces it.

Taken together, these observations suggest that parameterizing putative microdeletions on multiple biologically relevant axes, such as the two investigated here, could facilitate identification of pathogenic outliers and aid the clinical interpretation of novel fetal CNVs identified via NIPS.