Characterizing sensitivity and coverage of clinical WGS as a diagnostic test for genetic disorders

Background Due to its reduced cost and incomparable advantages, WGS is likely to lead to changes in clinical diagnosis of rare and undiagnosed diseases. However, the sensitivity and breadth of coverage of clinical WGS as a diagnostic test for genetic disorders has not been fully evaluated. Methods Here, the performance of WGS in NA12878, the YH cell line, and the Chinese trios were measured by assessing their sensitivity, PPV, depth and breadth of coverage using MGISEQ-2000. We also compared the performance of WES and WGS using NA12878. The sensitivity and PPV were tested using the family-based trio design for the Chinese trios. We further developed a systematic WGS pipeline for the analysis of 8 clinical cases. Results In general, the sensitivity and PPV for SNV/indel detection increased with mean depth and reached a plateau at an ~ 40X mean depth using down-sampling samples of NA12878. With a mean depth of 40X, the sensitivity of homozygous and heterozygous SNPs of NA12878 was > 99.25% and > 99.50%, respectively, and the PPV was 99.97% and 98.96%. Homozygous and heterozygous indels showed lower sensitivity and PPV. The sensitivity and PPV were still not 100% even with a mean depth of ~ 150X. We also observed a substantial variation in the sensitivity of CNV detection across different tools, especially in CNVs with a size less than 1 kb. In general, the breadth of coverage for disease-associated genes and CNVs increased with mean depth. The sensitivity and coverage of WGS (~ 40X) was better than WES (~ 120X). Among the Chinese trios with an ~ 40X mean depth, the sensitivity among offspring was > 99.48% and > 96.36% for SNP and indel detection, and the PPVs were 99.86% and 97.93%. All 12 previously validated variants in the 8 clinical cases were successfully detected using our WGS pipeline. Conclusions The current standard of a mean depth of 40X may be sufficient for SNV/indel detection and identification of most CNVs. It would be advisable for clinical scientists to determine the range of sensitivity and PPV for different classes of variants for a particular WGS pipeline, which would be useful when interpreting and delivering clinical reports. Supplementary Information The online version contains supplementary material available at 10.1186/s12920-021-00948-5.


Miss detection index (MDI)
Unlike SNP and indel detection, CNV detection is more complicated. First, there are no perfect "gold standard" CNV dataset for benchmarking. Although some "gold standard" CNV call set is widely used in published papers, lacking validation of various methods, some CNVs may be false positives with wrong or low resolution boundaries; Second, our results showed that CNV size may influence the sensitivity of CNV detection (Supplementary Figure 1-9); Third, we also observed a substantial variation in the sensitivity of CNV detection across different tools. All these things make the assessment of the recommended depth for CNV detection difficult in proband-only WGS. In this study, we introduced the concept of MDI to solve these problems.
Using 3 CNV call sets (CNV call set 1, 2, 3) and the detection results of 3 CNV tools (CNVnator, BreakDancer and LUMPY), we defined a MDI value in this study. The MDI value for a specific mean DP is defined as the frequency when the specific mean DP shows the "lowest" sensitivity for different CNV size in a CNV call set. Without regard to selection of CNV call set and CNV detection tool, MID can be used to evaluate the recommended depth for CNV detection of proband-only WGS.
In the formula, M means the number of times when mean depth i shows the "lowest" sensitivity of CNV detection, N means the total number of times for all the depth that shows the "lowest" sensitivity of CNV detection. To obtain qualified CNV sizes in a CNV call set for evaluation, some criteria need to be fulfilled for a CNV size. Here, a CNV size is recognized unqualified if: 1) the number of detection rate for a specific mean depth (the detection rate is >10% less than the highest detection rate in this specific CNV size) is less than 3, this CNV size would be deleted because mean depth showed little influence on the detection rate. The "lowest" detection rate includes the lowest 3 detection rate; 2) the percentage of 0 detection rate of a CNV size is more than 50%; 3) the percentage of the same detection rate for a specific mean depth is more than 95%. Taking CNV size into consideration, MDI value could mediate the influence of CNV tools. What is more, although some "gold standard" CNV set is widely used in published papers, lacking validation of various methods, some CNVs may be false positives with wrong or low resolution boundaries. MDI took the top sensitivity of different CNV tools as the detection ceiling, which did not require that all the CNVs in the dataset are "true positives".
Here is an example of how to calculate the MDI value for a DP of 10X (Table S1). CNV size 2 is unqualified because the percentage of 0 detection rate for this CNV size is more than 50% (7/10). For CNV size 3, the "lowest" sensitivities are 10% (10X), 20% (20X) and 30% (40X). Now M 10X = 2, M 20X = 2, M 30X = 1, M 40X = 1, N = 6 (the total number of times for all the depth that shows the "lowest" sensitivity of CNV detection).
As a result, MDI 10X = 2/6 = 0.33, MDI 30X = 1/6 = 0.167. Here is one example of how to calculate the MDI for a specific DP. The differences of MDI used in this example and in the main article are that it used 3 CNV call sets (CNV call set 1, 2, 3) and the detection results of 3 CNV tools (CNVnator, BreakDancer