Skip to main content

Table 1 Significant findings of the second Critical Assessment of Data Privacy and Protection (CADDP) competition

From: Protecting genomic data analytics in the cloud: state of the art and opportunities

1. Certain important genome analysis tasks can already be protected on a large scale. As a prominent example, it was found that the state-of-the-art privacy-enhancing technologies (PET) can already support the calculation of Hamming distance (a widely used distance measure for genomic sequences) across genomes with 100,000 bases in a few minutes, even when the genomic data are fully encrypted or the computation is performed across two geographically distributed institutions. These indicate that it is realistic to use cloud or secure multiparty computation for certain genomic data analysis tasks, even when the cloud or each party are not fully trusted, while still maintaining sufficient protection of patient privacy and scalability of the computation.
2. Gaps remain for certain classes of biomedical computations. When it comes to more complicated computations (e.g., association tests for a GWAS), analyzing encrypted data on a commercial cloud involves a large computational and communication burden.
3. Narrowing the gap between data usefulness and privacy protection requires a joint effort from the biomedical and security communities. Cryptographers and computer scientists need to collaborate with biomedical researchers to move PET techniques closer to practice. We found that an approximation of the Edit distance computation on human genomes is very effective, significantly simplifying computation and allowing it to be calculated, securely, on a large scale.