 Research
 Open access
 Published:
Secure searching of biomarkers through hybrid homomorphic encryption scheme
BMC Medical Genomics volume 10, Article number: 42 (2017)
Abstract
Background
As genome sequencing technology develops rapidly, there has lately been an increasing need to keep genomic data secure even when stored in the cloud and still used for research. We are interested in designing a protocol for the secure outsourcing matching problem on encrypted data.
Method
We propose an efficient method to securely search a matching position with the query data and extract some information at the position. After decryption, only a small amount of comparisons with the query information should be performed in plaintext state. We apply this method to find a set of biomarkers in encrypted genomes. The important feature of our method is to encode a genomic database as a single element of polynomial ring.
Result
Since our method requires a single homomorphic multiplication of hybrid scheme for query computation, it has the advantage over the previous methods in parameter size, computation complexity, and communication cost. In particular, the extraction procedure not only prevents leakage of database information that has not been queried by user but also reduces the communication cost by half. We evaluate the performance of our method and verify that the computation on largescale personal data can be securely and practically outsourced to a cloud environment during data analysis. It takes about 3.9 s to searchandextract the reference and alternate sequences at the queried position in a database of size 4M.
Conclusion
Our solution for finding a set of biomarkers in DNA sequences shows the progress of cryptographic techniques in terms of their capability can support realworld genome data analysis in a cloud environment.
Background
The rapid development of genome sequencing technology enables us to access large genome dataset and it looks poised to make a significant breakthrough in medical research. While genomic data can be used for a wide range of applications including healthcare, biomedical research, and directtoconsumer services, it has numerous special distinguishing features and it can violate personal privacy via genetic disclosure or genetic discrimination [1–3]. Due to these potential privacy issues, it should be managed with care.
There have been various privacyenhancing techniques using cryptographic methods as outsourced analysis tools of genomic data. Recently, it has been suggested that we can preserve privacy through homomorphic encryption (HE), which allows computations to be carried out on ciphertexts. Yasuda et al. [4] gave a practical solution to find the location of a pattern in a text by computing multiple Hamming distance values on encrypted data. Lauter et al. [5] gave a solution to privately compute the basic genomic algorithms used in genomewide association studies.
Homomorphic encryption can be applied to privacypreserving sequence comparison, but it is still impractical for the analysis of entire human genome information. For example, Cheon et al. [6] presented a protocol to compute the edit distance on homomorphically encrypted data but it took about 27 s even on length 8 DNA sequence. It is not easy to efficiently approximate the edit distance over encryption even though the distance to a public human DNA sequence is given [7]. This inefficiency comes from the difficulty of homomorphic evaluation of equality test: Encrypting the inputs bitwise and computing over the encrypted bits yield expensive computation cost (at least linear in the data bitlength).
In this paper, we suggest an efficient method to securely search a set of biomarkers using hybrid RingGSW homomorphic encryption scheme.
Problem setting
The iDASH (Integrating Data for Analysis, ‘anonymization’ and SHaring) National Center organizes the iDASH Privacy & Security challenge for secure genome analysis. This paper is based on a submission to the task 3 in 2016 iDASH challenge: secure outsourcing of testing for genetic diseases on encrypted genomes. The goal of this task is to privately calculate the probability of genetic diseases through matching a set of biomarkers to encrypted genomes stored in a public cloud service. The requirement is that the entire matching process needs to be carried out using homomorphic encryption so that any information about database and query should not be revealed to the server during computation.
Suppose that the client has a Variation Call Format (VCF) file which contains genotype information such as chromosome number and position in the genome. It also contains some information for each position such as reference and alternate sequences, where each base must be one of SNPs: A,T,G, and C. The client encrypts the information using homomorphic encryption and the server calculates the exact match over the encrypted data. The outcome is the absence/presence of the specified biomarkers, that is, an encryption of 1 if matched; otherwise, an encryption of 0. Finally the client decrypts the result by the secret key of homomorphic encryption.
Practical homomorphic encryption
Fully Homomorphic cryptosystems allow us to homomorphically evaluate any arithmetic circuit without decryption. However, the noise of the resulting ciphertext grows during homomorphic evaluations, slightly with addition but substantially with multiplication. For efficiency reasons, for tasks which are known in advance, we use a more practical Somewhat Homomorphic Encryption (SHE) scheme, which evaluates functions up to a certain complexity. In particular, two techniques are used for noise management of SHE: one is the modulusswitching technique introduced by Brakerski, Gentry and Vaikuntanathan [8], which scales down a ciphertext during every multiplication operation and reduces the noise by its scaling factor. The other is a scaleinvariant technique proposed by Brakerski such that the same modulus is used throughout the evaluation process [9].
Let us denote by [ ·]_{ Q } the reduction modulo Q into the interval \((Q/2,Q/2]\cap \mathbb {Z}\) of the integer or integer polynomial (coefficientwise). For a security parameter λ, we choose an integer M=M(λ) that defines the Mth cyclotomic polynomial Φ _{ M }(X). For a polynomial ring \(\mathcal {R}=\mathbb {Z}[X]/ (\Phi _{M}(X))\), set the plaintext space to \(\mathcal {R}_{t}:= \mathcal {R}/t\mathcal {R}\) for some fixed t≥2 and the ciphertext space to \(\mathcal {R}_{Q}:= \mathcal {R}/Q\mathcal {R}\) for an integer Q=Q(λ). Let χ=χ(λ) denote a noise distribution over the ring R. We use the standard notation \(a \leftarrow \mathcal {D}\) to denote that a is chosen from the distribution \(\mathcal {D}\).
The basic scheme
The following is a description of basic homomorphic encryption scheme based on the hardness of (decisional) Ring Learning with Errors (RLWE) assumption, which was first introduced by Lyubashevsky et al. [10]. The assumption is that it is infeasible to distinguish the following two distributions. The first distribution consists of pairs (a _{ i },u _{ i }), where a _{ i } and u _{ i } are drawn uniformly at random from \(\mathcal {R}_{Q}\). The second distribution consists of pairs of the form (a _{ i },b _{ i })=(a _{ i },a _{ i } s+e _{ i }) where a _{ i } is uniformly random in \(\mathcal {R}_{Q}\) and s,e _{ i } are drawn from the error distribution χ. To improve efficiency for HE, we use sparse secret keys s with coefficients sampled from {0,±1} as in [11].

RLWE.ParamsGen(λ): Given the security parameter λ, choose an integer M, a modulus Q, a plaintext modulus t with tQ, and discrete Gaussian distribution χ _{ err }. Output params←(M,Q,t,χ _{ err }).

RLWE.KeyGen(params): On the input parameters, let N=ϕ(M) and choose a sparse random s from {0,±1}^{N}. Generate an RLWE instance (a,b)=(a,[−a s+e]_{ Q }) for e←χ _{ err }. We set the secret key sk←s and the public key pk←(a,b).

RLWE.Enc(m,pk): To encrypt \(m \in \mathcal {R}_{t}\), choose a small polynomial v and two Gaussian polynomials e _{0},e _{1} over \(\mathcal {R}\) and output the ciphertext
$$\begin{array}{ll} \mathsf{ct} & \leftarrow(c_{0},c_{1})\\ & =((Q/t) m,0) + (bv+e_{0}, av+e_{1}) \in \mathcal{R}_{Q}^{2}. \end{array} $$ 
RLWE.Dec(ct,sk): Given a ciphertext ct=(c _{0},c _{1}), output m←⌊(t/Q)·[c _{0}+s·c _{1}]_{ Q }⌉.

RLWE.Add(ct,ct ^{′}): Given two ciphertexts ct=(c _{0},c _{1}) and ct ^{′}=(c0′,c1′), the homomorphic addition is computed by ct _{ add }←([c _{0}+c0′]_{ Q },[c _{1}+c1′]_{ Q }).
Throughout this paper, we assume that the integer M is a power of two so that N=M/2 and ϕ _{ M }(X)=X ^{N}+1. We adapt the conversion and modulusswitching techniques of [12]. The conversion algorithm changes an RLWE encryption of \(m=\sum _{i} m_{i} X^{i}\) into an LWE encryption of its constant term m _{0}, and the modulus switching reduces the ciphertext modulus Q down to q while preserving the message. We note that an LWE ciphertext is represented as a vector in \(\mathbb {Z}_{q}\) for some modulus q, and the decryption procedure is done by an inner product of the ciphertext and the secret key vector.

RLWE.Conv(ct): Given a ciphertext ct=(c _{0},c _{1}) with \(c_{0}=\sum _{i} c_{0,i}X^{i}\) and \(c_{1}=\sum _{i} c_{1,i}X^{i}\), output the vector ct ^{′}=(c _{0,0},c _{1,0},−c _{1,N−1},…,−c _{1,1}).

LWE.ModSwitch(ct): Given a ciphertext \(\mathsf {ct}\in \mathbb {Z}_{Q}^{N+1}\), output the vector \(\mathsf {ct}'\leftarrow \lfloor {(q/Q)\cdot \mathsf {ct}}\rceil \in \mathbb {Z}_{q}^{N+1}\).
An RLWE ciphertext ct=(c _{0},c _{1}) has the decryption structure of the form c _{0}+c _{1}·s=(Q/t)·m+e and its constant term is
It can be represented as an inner product of a vector (c _{0,0},c _{1,0},−c _{1,N−1},−c _{1,N−2},…,−c _{1,1}) and the desired LWE secret key \(\vec s=(1,s_{0},\dots,s_{N1})\). Hence the output of the conversion algorithm can be seen as an LWE encryption of m _{0}. It is also easy to check that if \(\mathsf {ct}\in \mathbb {Z}_{Q}^{N+1}\) satisfies \(\langle {\mathsf {ct}},{\vec s}\rangle =(Q/t)\cdot m+e \pmod {Q}\), then the output of LWE.ModSwitch algorithm satisfies \(\langle {\mathsf {ct}'},{\vec s}\rangle =(q/t)\cdot m+e' \pmod q\) for some e ^{′}≈(q/Q)·e. These techniques have been proposed for an efficient bootstrapping [12], but they will play totally different roles in our application. Finally an LWE ciphertext of modulus q can be decrypted by \(\vec s\) as follows.

LWE.Dec(ct,sk): Given a ciphertext \(\mathsf {ct}\in \mathbb {Z}_{Q}^{N+1}\), output the value \(m\leftarrow \lfloor {(t/q)\cdot [\langle {\mathsf {ct}},{\vec s}\rangle ]_{q}}\rceil \).
If \(\langle {\mathsf {ct}},{\vec s}\rangle =(q/t)\cdot m+e \pmod q\) for some small enough e, it returns the correct message m modulo t. More precisely, the decryption procedure works if te/q<1/2.
The RingGSW scheme
Gentry et al. [13] suggested a fully homomorphic encryption based on the LWE problem, where the message is encrypted as an approximate eigenvalue of a ciphertext. Ducas and Micciancio [12] described its RLWE variant. The RGSW symmetric encryption scheme consists of the following algorithms.

RGSW.ParamsGen(·),RGSW.KeyGen(·): Use the same parameter params and secret key s with the basic RLWE scheme. Additionally set the decomposition base B _{ g } and exponent d _{ g } satisfying \({B}_{\mathsf {g}}^{{d}_{\mathsf {g}}}\ge Q\).

RGSW.Enc(m,sk): To encrypt \(m \in \mathcal {R}_{t}\), pick a matrix \(\mathbf {a}\in \mathcal {R}_{Q}^{2{{d}_{\mathsf {g}}}}\) uniformly at random, and \(\mathbf {e} \in \mathcal {R}^{2{{d}_{\mathsf {g}}}} \simeq \mathbb {Z}^{2{{d}_{\mathsf {g}}} \cdot n}\) with discrete Gaussian distribution χ of parameter ς, and output the ciphertext
$$\mathsf{CT} \leftarrow [\mathbf{b}, \mathbf{a}] +\, m \mathbf{G} \in \mathcal{R}_{Q}^{2{{d}_{\mathsf{g}}} \times 2} $$where b=−a·s+e and the gadget matrix \(\mathbf {G}= \left (\mathbf {I} ~\Vert ~ {B}_{\mathsf {g}}\mathbf {I} ~\Vert ~ \ldots ~\Vert ~ {B}_{\mathsf {g}}^{{{d}_{\mathsf {g}}}1}\mathbf {I}\right)^{T} \in \mathcal {R}_{Q}^{2{{d}_{\mathsf {g}}} \times 2}\) for 2×2 identity matrix I.
Let \(\phantom {\dot {i}\!}\mathsf {WD}_{{B}_{\mathsf {g}}}(\cdot)\) be the decomposition with the base B _{ g }, where the dimension of input vector is multiplied by d _{ g } through this algorithm. The RGSW encryption of m satisfies \(\mathsf {CT} \cdot (1,\mathsf {s})= m\cdot \left (1,\mathsf {s},\dots,{B}_{\mathsf {g}}^{{{d}_{\mathsf {g}}}1},{B}_{\mathsf {g}}^{{{d}_{\mathsf {g}}}1}\mathsf {s}\right)+\mathsf {e}\). Roughly, m is an approximate eigenvalue of \(\phantom {\dot {i}\!}\mathsf {WD}_{{B}_{\mathsf {g}}}(\mathsf {CT})\) with respect to the eigenvector \(\left (1,\mathsf {s},\dots,{B}_{\mathsf {g}}^{{{d}_{\mathsf {g}}}1},{B}_{\mathsf {g}}^{{{d}_{\mathsf {g}}}1}\mathsf {s}\right)\). In [14], the hybrid multiplication between RGSW ciphertexts and RLWE ciphertexts has been defined as follows.

Hybrid.Mult(CT,ct): Given an RGSW ciphertext \(\mathsf {CT}\in \mathcal {R}_{Q}^{2{{d}_{\mathsf {g}}}\times 2}\) and an RLWE ciphertext \(\mathsf {ct}\in \mathcal {R}_{Q}^{2}\) output the vector \(\mathsf {ct}'\leftarrow \mathsf {CT}^{T}\cdot {\mathsf {WD}_{{B}_{\mathsf {g}}}}(\mathsf {ct})\).
If CT and ct are RGSW and RLWE encryptions of m and m ^{′}, respectively, their multiplication ct ^{′} is a valid RLWE encryption of m m ^{′}. For convenience, we will denote Hybrid.Mult(CT,ct) algorithm by \(\boxdot \), i.e., \((\mathsf {CT},\mathsf {ct})\in \mathcal {R}_{Q}^{2{{d}_{\mathsf {g}}}\times 2}\times \mathcal {R}_{Q}^{2}\mapsto \mathsf {CT}\boxdot \mathsf {ct}\in \mathcal {R}_{Q}^{2}\).
Methods
Privacypreserving database searching and extraction
Let us consider a database of a set of n tuples. Each tuple consists of pairs (d _{ i },α _{ i }) for i=1,…,n, where d _{ i } denotes a datatag in the domain \(\{0,1,\dots,\mathcal {T}1\}\) and α _{ i } represents the corresponding value attribute in a plaintext space \(\mathbb {Z}_{t}\backslash \{0\}\).Note that all the tags should be distinct from each other. For instance, in the case of personal information database, α _{ i } may be the age of user whose identity number is d _{ i }.
Given a query tag d from a tag domain and a query value α from a plaintext space, the matching problem is to determine the existence of an index i such that (d,α)=(d _{ i },α _{ i }). Now consider the following simplified search query: select α _{ i } if there exists an index i such that d _{ i }=d; otherwise zero (⊥). The purpose of this section is to store the database and carry out this search query on the public cloud. The server should learn nothing from encrypted query and any information other than the final result should not be leaked to user. Throughout this work, we will use semihonest (honest but curious) adversary model, which is a standard assumption for evaluation of homomorphic encryption.
Our main idea is the following encoding method of database suitable for the efficient computation of equality test and extraction:
The user encrypts this polynomial with the RLWE publickey encryption scheme and stores the ciphertext ct _{ DB } in the server. At the query phase, given a query tag d, the user encrypts the monomial X ^{−d} with the RGSW symmetric encryption scheme and sends the ciphertext CT _{ Q } to the server. We assume that the RGSW encryption scheme has the same secret key sk as the one of RLWE encryption scheme.
Given two ciphertexts CT _{ Q }←RGSW.Enc(X ^{−d}) and ct _{ DB }←RLWE.Enc(DB(X)), the server first performs their multiplication to obtain an ciphertext, denoted by \(\mathsf {ct_{mult}} = \mathsf {CT}_{\mathsf {Q}} \boxdot \mathsf {ct}_{\mathsf {DB}}\). It follows from the previous section that c t _{ mult } is a valid RLWE encryption of the polynomial
Since we use the cyclotomic polynomial ϕ _{ M }(X)=X ^{N}+1 of poweroftwo degree, the polynomial ring \(\mathcal {R}\) has the property X ^{N}=−1. Thus, for any tag d, the constant term of the polynomial DB(X)·X ^{−d} is α _{ i } if there is some index i satisfying d=d _{ i }, otherwise zero.
Now the server applies the RLWE.Conv algorithm on c t _{ mult } to compute an LWE encryption ct _{ conv } of this constant term. This conversion procedure not only prevents the leakage of information that has not been queried but also reduces the size of output ciphertext by half. In addition, the (optional) modulusswitching procedure can be considered to get a ciphertext c t _{ res } with a smaller modulus size and reduce the communication cost. Finally the user decrypts this LWE ciphertext and gets the desired value α _{ i } or zero (⊥). Algorithm 1 summarizes the procedure of secure searchandextraction.
Our method can be modified to support a secure comparison of data values using a hash (oneway) function. If hashed values of α _{ i } are used as polynomial coefficients, our method will return a hashed value of α _{ i } to the user instead of α _{ i }. The user may check whether the resulting value and the hashed query value are the same or not without knowing information about database.
Comparison with related work
Equality test has been traditionally considered difficult to perform on homomorphic encryption, because of its large circuit depth [7, 15, 16]. They evaluate the equality test on each encrypted tuple of database, so at least Ω(n) homomorphic operations are required for searching on database of size n. In addition, Boneh et al. [17] does not protect the database information to the users, that is, the whole database can be recovered by the resulting ciphertext of a query. However, our method is very efficient in parameter size and complexity since it requires only a single hybrid multiplication.
One limitation of this method is that the tags d _{ i } should be bounded by ciphertext dimension N to construct the encoding polynomial DB(X). Since the dimension N has a significant influence on the performance of HE scheme, too large value of N has an impractical impact on the performance. In the next section, we will describe how to overcome this problem in terms of the application to genomic data.
Secure searching of biomarkers
We return to our main goal of task3: secure outsourcing matching of a set of biomakers to encrypted genomes. We describe how to encode and encrypt the genotype information of VCF file in order to apply the privacypreserving database searching and extraction.
VCF file contains multiple genotype information lines, where each of them consists of a triple (ch _{ i },pos _{ i }, SNPs _{ i }) of chromosome number, position, and a sequence of SNP alleles. A chromosome identifier ch ranges from 1 to 22, X, and Y. A nonnegative integer pos represents the reference position with the first base having position 1, and SNPs is a reference or alternate sequence in {A,T,G,C}^{∗}. A query from user is also a triple of the same form and we aim to decide absence/presence of this biomarker in the database file.
We represent the sex chromosomes X and Y as 0 and 23, respectively. Then we define an encoding function \(\mathcal {E}: \mathbb {Z} \times \mathbb {Z} \rightarrow \mathbb {Z}\) by
In the following, we describe how to encode the SNPs. For convenience we set the upper bound for the length of SNPs, so let n _{ SNP } be the maximal number of reference (or alternate) alleles to be compared between the query genome and user genome in the target database. Each of SNP is represented by two bits as
and then concatenated with each other. Next we pad with 1 to the left of the bit string in order to express the staring position of SNPs. Finally it is zeropadded into a binary string of length ℓ _{ SNP }=2·n _{ SNP }+1, and we convert it into an integer value, denoted by α _{ i }. If a single nucleotide variant at the given locus is not known, then it is encoded as 0string. For example, ‘GC’ is encoded as a bit string 11011, which will be represented as an integer 11011_{(2)}=27.
Now consider the case that we wish to encode the reference and alternate alleles together. Let \(\alpha _{i}^{\mathsf {ref}}\) and \(\alpha _{i}^{\mathsf {alt}}\) denote the integer encodings of n _{ SNP } reference alleles and n _{ SNP } alternate alleles, respectively. Then we define an encoding α _{ i } by the concatenation of two encodings, i.e., \(\alpha _{i}= 2^{\ell _{\mathsf {SNP}}} \cdot \alpha _{i}^{\mathsf {ref}}+ \alpha _{i}^{\mathsf {alt}}\) as an integer. Table 1 shows the format of database file and illustrates some examples of encoded genomic data.
A database file is encoded as a set of pair (d _{ i },α _{ i }) for i=1,…,n such that \(d_{i}= \mathcal {E}(\mathsf {ch}_{i},\mathsf {pos}_{i})\) and α _{ i } is the encoded integer of the ith SNP allele string. Then the encodings d _{ i } and α _{ i } are regarded as datatag and value attribute, respectively. The data user constructs a polynomial \(\mathsf {DB}(X)= \sum _{k} c_{k} X^{k}\) such that
The user encrypts the polynomial with the RLWE publickey encryption scheme as described above.
The query genes are also encoded as a pair of integers (d,α), however, we consider only the information of d is encrypted using the RGSW symmetric encryption scheme, that is, the user encrypts the monomial X ^{−d}.
Results and discussion
In this section, we explain how to set the parameters and describe our optimization techniques for the implementation. We also present our results using the techniques. The dataset was randomly selected from Personal Genome Project. Our implementation is publicly available on github [18].
How to set parameters
Since all the matching computation is performed on encrypted data in the cloud, the security against a semihonest adversary follows from the semantic security of the underlying HE scheme. The security of the homomorphic encryption scheme relies on the hardness of the RLWE assumption. We derive a lowerbound on the ring dimension as \(N \geq \frac {\lambda +110}{7.2}\cdot \log _{2} Q\) to get λbit security level from the security analysis of [11].
Given the ciphertext modulus Q, it follows from the estimation of noise growth during evaluations [12] and decryption condition that we get the upper bound on the plaintext modulus t to ensure the correctness of decryption after computation. So we set t as the largest poweroftwo integer less than the upper bound. If the encodings of the allele strings are too large, we divide them into smaller integers so that each of them is smaller than t. Then we repeat the algorithm to construct the corresponding polynomials of each integer.
Optimization techniques
As we mentioned before, the ring dimension N needs to be larger than the encoded integers d _{ i }’s. However, the encoded integers d _{ i } from VCF files have bits size about 32, while a dimension N with about 11≤ log2N≤16 is considered appropriate for implementation of HE schemes to achieve both security and efficiency. Hence direct application of our method to the VCF file would yield an impractical result.
For compression of tag data and its rerandomization, we make the use of a pseudo random number generator H(·) which transforms a tag d _{ i } into a pair of two nonnegative integers \(d^{*}_{i}\) and \(d^{\dagger }_{i}\) less than N. Our implementation adopts SHA3 and extracts log2N=11 bits of the hashed value for each of \(d^{*}_{i}\) and \(d^{\dagger }_{i}\).
We construct two polynomials
by the Algorithm 2. Note that for any 1≤i≤n and \(H(d_{i})=(d_{i}^{*},d_{i}^{\dagger })\in \{0,\dots,N1\}^{2}\), the pair of constructed polynomials DB ^{∗} and DB ^{†} satisfy \(\alpha _{i} = c_{d_{i}^{*}} + c_{d_{i}^{\dagger }}\). The procedure of database encoding for secure search of biomarkers is described in Algorithm 2.
Let \(\mathsf {ct}^{*}_{\mathsf {DB}}\) and \(\mathsf {ct}^{\dagger }_{\mathsf {DB}}\) denote the ciphertexts of the polynomials DB ^{∗} and DB ^{†}, respectively. Similarly, given the query encoding d, the user computes its randomized value H(d)=(d ^{∗},d ^{†}) and encrypts the two polynomials \(X^{d^{*}}\) and \(X^{d^{\dagger }}\). We denote the ciphertexts by \(\mathsf {CT}^{*}_{\mathsf {Q}}\) and \(\mathsf {CT}^{\dagger }_{\mathsf {Q}}\). The server computes the hybrid multiplication to obtain the ciphertexts
Now let ct denote the ciphertext computed by the homomorphic addition between \(\mathsf {ct}^{*}_{\mathsf {mult}}\) and \(\mathsf {ct}^{\dagger }_{\mathsf {mult}}\). Finally the server converts it into an LWE ciphertext and performs the modulusswitching procedure as described above. The Algorithm 3 describes the procedure of secure searchandextraction using our proposed optimization techniques.
Implementation results
The use of variable type ‘int32 _t’ accelerates the speed of implementations and basic C++ std libraries, so we set Q=2^{32} as the ciphertext modulus. We also set t=2^{11} as the modulus parameter of the plaintext space to ensure the correctness for the output ciphertext. We take the following parameters for Gadget matrix G: B _{ g }=128 and d _{ g }=5, so that they satisfy the condition \({B}_{\mathsf {g}}^{{{d}_{\mathsf {g}}}} \geq Q\).
Each coefficient of the secret key sk is chosen at random from {0,±1} and we set 64 as the number of nonzero coefficients in the secret key. As in the work of [12], we considered the Gaussian distribution of standard deviation σ=1.4 to sample random error polynomials.
For the efficiency of homomorphic multiplication, we also used the optimized library for complex FFT, i.e., the Fast Fourier Transform in the West [19]. That is, we use the complex primitive 2Nth root of unity rather than a primitive root in a prime field of order Q. We measure a running time of 0.804 s to set up the FFT environment at dimension 2N=2^{12}. The key generation of two schemes takes about 0.247 ms in total.
Table 2 presents the time complexity and storage for the evaluation of secure searching of biomarkers. All the experiments were performed on a single Intel Core i5 running at 2.9 GHz processor. The chosen parameters provide λ=128 bits of security level.
Conclusions
In this work, we suggested an efficient method to securely search the query tag and extract the corresponding value from a database over hybrid GSW homomorphic encryption scheme. We came up with a solution to the secure outsourcing matching problem by using polynomial encoding and extraction of desired value based on the multiplication of an RGSW ciphertext and an ordinary RLWE ciphertext. And then we applied this method to find a set of biomarkers in DNA sequences.
Our solution shows the progress of cryptographic techniques in terms of their capability can support realworld genome data analysis in a cloud environment. We list a few fascinating open problems to remain. First, we only considered the semihonest adversary model in this work. Other tools such as homomorphic authenticated scheme may lead to more efficient protocols in the malicious settings. Another issue is to support k multiple queries while maintaining the performance and communication cost less than k times of a single query case. We expect to have much faster performance by enabling a batching method.
References
Humbert M, Ayday E, Hubaux JP, Telenti A. Addressing the concerns of the lacks family: quantification of kin genomic privacy. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security. ACM: 2013. p. 1141–52.
Erlich Y, Narayanan A. Routes for breaching and protecting genetic privacy. Nat Rev Genet. 2014; 15(6):409–21.
Naveed M, Ayday E, Clayton EW, Fellay J, Gunter CA, Hubaux JP, Malin BA, Wang X. Privacy in the genomic era. ACM Comput Surv (CSUR). 2015; 48(1):6.
Yasuda M, Shimoyama T, Kogure J, Yokoyama K, Koshiba T. Secure pattern matching using somewhat homomorphic encryption. In: Proceedings of the 2013 ACM Cloud Computing Security Workshop. ACM: 2013. p. 65–76.
Lauter K, LópezAlt A, Naehrig M. Private computation on encrypted genomic data. In: International Conference on Cryptology and Information Security in Latin America. Springer International Publishing: 2014. p. 3–27.
Cheon JH, Kim M, Lauter K. Homomorphic computation of edit distance. In: International Conference on Financial Cryptography and Data Security. Springer Berlin Heidelberg: 2015. p. 194–212.
Kim M, Lauter K. Private genome analysis through homomorphic encryption. BMC Med Inform Decis Mak. 2015; 15(Suppl 5):3.
Brakerski Z, Gentry C, Vaikuntanathan V. (Leveled) fully homomorphic encryption without bootstrapping. ACM Transactions on Computation Theory. 2014; 6(3):13.
Brakerski Z. Fully homomorphic encryption without modulus switching from classical gapsvp. In: Advances in CryptologyCRYPTO. Springer Berlin Heidelberg: 2012. p. 868–86.
Lyubashevsky V, Peikert C, Regev O. On ideal lattices and learning with errors over rings. In: Advances in CryptologyEUROCRYPT. Springer Berlin Heidelberg: 2010. p. 1–23.
Gentry C, Halevi S, Smart NP. Homomorphic evaluation of the AES circuit. In: Advances in CryptologyCRYPTO. Springer Berlin Heidelberg: 2012. p. 850–67.
Ducas L, Micciancio D. Fhew: Bootstrapping homomorphic encryption in less than a second. In: Advances in CryptologyEUROCRYPT. Springer Berlin Heidelberg: 2015. p. 617–40.
Gentry C, Sahai A, Waters B. Homomorphic encryption from learning with errors: Conceptuallysimpler, asymptoticallyfaster, attributebased. In: Advances in CryptologyCRYPTO. Springer Berlin Heidelberg: 2013. p. 75–92.
Chillotti I, Gama N, Georgieva M, Izabachene M. Faster fully homomorphic encryption: Bootstrapping in less than 0.1 s. In: Advances in CryptologyASIACRYPT. Springer Berlin Heidelberg: 2016. p. 3–33.
Cheon JH, Kim M, Kim M. Searchandcompute on encrypted data. In: International Conference on Financial Cryptography and Data Security. Springer Berlin Heidelberg: 2015. p. 142–59.
Cheon JH, Kim M, Kim M. Optimized searchandcompute circuits and their application to query evaluation on encrypted data. IEEE Trans Inf Forensic Secur. 2016; 11(1):188–99.
Boneh D, Gentry C, Halevi S, Wang F, Wu DJ. Private database queries using somewhat homomorphic encryption. In: International Conference on Applied Cryptography and Network Security. Springer Berlin Heidelberg: 2013. p. 102–18.
Kim M, Song Y. Implementation of Secure Searching of Biomarkers. 2016. http://github.com/amedonis/HybridHE.
Frigo M, Johnson SG. The design and implementation of fftw3. Proc IEEE. 2005; 93(2):216–31.
Acknowledgements
The authors would like to thank the referee for helpful comments. The authors would also like to thank the iDASH Secure Genome Analysis Contest organizers, in particular Xiaoqian Jiang and Shuang Wang, for running the contest and providing the opportunity to submit competing implementations for these important tasks.
Funding
Publication of this article has been funded by IT R &D program of MSIP/KEIT (No. B0717160098).
Availability of data and materials
Not applicable.
Authors’ contributions
MK, YS, and JC designed the baseline methods. MK and YS drafted the manuscript and conducted the experiment for the competition. JC guided the experimental design and provided detailed edits. All authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable.
About this supplement
This article has been published as part of BMC Medical Genomics Volume 10 Supplement 2, 2017: Proceedings of the 5th iDASH Privacy and Security Workshop 2016. The full contents of the supplement are available online at https://bmcmedgenomics.biomedcentral.com/articles/supplements/volume10supplement2.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Author information
Authors and Affiliations
Corresponding author
Additional information
From iDASH Privacy and Security Workshop 2016 Chicago, IL, USA. 11/11/2016
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Kim, M., Song, Y. & Cheon, J.H. Secure searching of biomarkers through hybrid homomorphic encryption scheme. BMC Med Genomics 10 (Suppl 2), 42 (2017). https://doi.org/10.1186/s1292001702803
Published:
DOI: https://doi.org/10.1186/s1292001702803