Privacy-preserving approximate GWAS computation based on homomorphic encryption

Background One of three tasks in a secure genome analysis competition called iDASH 2018 was to develop a solution for privacy-preserving GWAS computation based on homomorphic encryption. The scenario is that a data holder encrypts a number of individual records, each of which consists of several phenotype and genotype data, and provide the encrypted data to an untrusted server. Then, the server performs a GWAS algorithm based on homomorphic encryption without the decryption key and outputs the result in encrypted state so that there is no information leakage on the sensitive data to the server. Methods We develop a privacy-preserving semi-parallel GWAS algorithm by applying an approximate homomorphic encryption scheme HEAAN. Fisher scoring and semi-parallel GWAS algorithms are modified to be efficiently computed over homomorphically encrypted data with several optimization methodologies; substitute matrix inversion by an adjoint matrix, avoid computing a superfluous matrix of super-large size, and transform the algorithm into an approximate version. Results Our modified semi-parallel GWAS algorithm based on homomorphic encryption which achieves 128-bit security takes 30–40 minutes for 245 samples containing 10,000–15,000 SNPs. Compared to the true p-value from the original semi-parallel GWAS algorithm, the F1 score of our p-value result is over 0.99. Conclusions Privacy-preserving semi-parallel GWAS computation can be efficiently done based on homomorphic encryption with sufficiently high accuracy compared to the semi-parallel GWAS computation in unencrypted state.


Motivation
Since genome analysis uses genomic data that are very sensitive and irreplacable, privacy on genomic data has come up to be one of the most important issues in genome analysis including GWAS. The usual privacy-preserving methodology in data analysis is anonymization, perturbation, randomization, and condensation [1]; however, those methods leverage the quality of data with privacy resulting in an inaccurate analysis to some extent. The dilemma regarding the balance between personal privacy and analytical efficiency has been resolved by applying many cryptographic primitives, while homomorphic encryption (HE) is noticed as one of the ultimate cryptographic solutions for privacy-preserving data analysis. Conceptually, HE is an encryption scheme which allows computations over encrypted data without decryption. HE not only fundamentally prevents the leakage of input data during the analysis phase, but also provides an accurate result of analysis since it preserves the original data intactly. However, HE causes a significant blowup of computational cost for analysis, and optimization and modification of algorithm for efficient computation in HE is the main problem of applying HE in data analysis.
Since 2014, there has been an annual biomedical privacy competition hosted by Integrating Data for Analysis, Anonymization and SHaring (iDASH), a national center for biomedical computing in the United States. One of three tasks in iDASH 2018 [2] was to develop a solution for privacy-preserving GWAS computation based on HE, and we participated in this competition with our delicately constructed algorithms.

Summary of results
In this study, we propose approximate HE algorithms for privacy-preserving GWAS computation. To be precise, we transform well-known Fisher scoring and semi-parallel GWAS algorithm into HE-friendly algorithms so that we can efficiently evaluate them in encrypted state. Note that the HE-friendly modified Fisher Scoring algorithm can be generally used for logistic regression, not only for GWAS.
The main challenges in transforming the semi-parallel GWAS algorithm (Algorithm 1) to an HE algorithm are complex matrix operations such as multiplication and inversion. Since matrix inversion in HE is complicated and costly, we substitute it by computation of the adjoint matrices and determinant. With this approach, the original Fisher scoring algorithm can be successfully modified to compute encrypted data efficiently. For the efficient computation of semi-parallel GWAS computation based on HE, moreover, we reduced the number of matrix multiplications as many as possible, and modified the original algorithm into an approximate version which requires much less computational cost. The details of our optimization methodologies are well described in "Our optimization methodology" section.
We exploited an approximate HE scheme HEAAN [3,4] with a publicly available library [5] for the implementation of our modified semi-parallel GWAS algorithm based on HE. The HE algorithm takes about 40 minutes for 245 samples each containing a binary phenotype, 3 covariates, and 14,841 SNPs on Linux with a 2.10GHz processor using 8 threads (4 cores).

Related works
Some works on HE-based genome analyses has been studied over the past years: Lauter et al. [6] studied application of HE on basic genomic algorithms such as the Pearson Goodness-of-Fit test, and Wang et al. [7] performed an exact logistic regression on small datasets based on HE. More recently, some solutions [8][9][10][11] submitted to task 3 of iDASH 2017 [12] competition dealt with training the logistic regression model of genomic data based on HE. Some works [13,14] studied the privacy-preserving GWAS based on HE, but they performed rather simple χ 2 test on quite small numbers of SNPs, which is quite different from iDASH 2018 task; logistic regression on large numbers of SNPs.
There are some other alternative tools to deal with privacy issues in real-world applications. One of them is a cryptographic tool called secure Multi-Party Computation (MPC). In MPC, computations are done online by multiple parties without revealing any information of the result to each party. There have been several remarkable works on privacy-preserving genome analysis based on MPC. In 2017, Jagadeesh et al. [15] proposed privacypreserving solutions for several genomic diagnose methods based on MPC with practical implementation over real patient data. Recently, Cho, Wu and Berger [16] successfully constructed a practical MPC-based protocol for privacy-preserving GWAS computation over large-scale genomic data. There have been several previous works [14,[17][18][19] on privacy-preserving GWAS computation based on MPC; however, they had some limitations to be applied in practice since they either require infeasible computational cost or greatly streamlined the task.
Meanwhile, in 2016, Chen et al. [20] proposed a hardware-based solution for privacy-preserving genome analysis with Software Guard Extensions (SGX) [21], the security-related instruction built in Intel CPU which allows secure computations in a private region which cannot be accessed without the private key. They implemented a SGX-based framework for secure transmission disequilibrium test on Kawasaki disease patients with high scalability on the number of SNPs.

Approximate homomorphic encryption scheme HEAAN
For privacy-preserving GWAS computation, we applied an HE scheme called HEAAN proposed by Cheon et al. [3,4], which supports approximate computation of real numbers in encrypted state. Efficiency of HEAAN in the real world has been proved by showing its application in various fields including machine learning [8,22,23] and cyber physical system [24]. The winning solution of iDASH competition in 2017 also applied HEAAN as an HE scheme for privacy-preserving logistic regression on genomic data.
At high-level, for a HEAAN ciphertext ct of some message polynomial m, the decryption process with a secret key sk is done as where e is a small error attached to the message polynomial m. Furthermore, for ciphertexts ct 1 and ct 2 of message polynomials m 1 and m 2 , the homomorphic evaluation algorithms C.Add and C.Mult satisfy i.e., addition and multiplication can be internally done even in encrypted state.
For formal definitions, let L be a level parameter, and q := 2 for 1 ≤ ≤ L. Let R := Z[ X] / X N + 1 for a power-of-two N and R q be a modulo-q quotient ring of R, i.e., R q := R/qR. The distribution χ key := HW(h) over R q outputs a polynomial of {−1, 0, 1}-coefficient having h number of non-zero coefficients, and χ enc and χ err denote the discrete Gaussian distribution with some prefixed standard deviation. We denote the rounding function by · , which outputs the nearest integer of a realnumber input. For a = Finally, [ ·] q denotes a component-wise modulo q operation on each element of R q . Note that those parameters N, L and h satisfying a certain security level can be determined by Albrecht's security estimator [25,26]. The scheme description of HEAAN is as following: • KeyGen(params).
-Sample s ← χ key . Set the secret key as sk ← (1, s). -Sample a ← U R q L and e ← χ err . Set the public key as pk ← (b, a) ∈ R 2 q L where b ←[ −a · s + e] q L .
-Sample a ← U R q 2 L and e ← χ err . Set the evaluation key as evk ← b , a ∈ R 2 • Enc pk (m). For a message m ∈ R, sample v ← χ enc and e 0 , e 1 ← χ err . Output the ciphertext ct = [v · pk + (m + e 0 , e 1 )] q L .
• Dec sk (ct). For a ciphertext ct = (c 0 , For more details of the scheme including the correctness and security analysis, we refer the readers to [3]. The above-mentioned scheme deals with message polynomial m in some integer polynomial ring R. To encrypt real (or complex) value, HEAAN use a (field) isomorphism τ : / X N + 1 , and then rounded off to an integer-coefficient polynomial. However, the naive rounding-off τ −1 ( m) can derive quite large relative error on the plaintext. To control the error, we round it off after scaling up by p bits for some integer p, i.e., 2 p · τ −1 ( m) , so that the relative error is reduced. Clearly, a decoding algorithm for m would be 2 −p · τ (m): • Ecd ( m; p). For m = (m 0 , ..., m N/2−1 ) in C N/2 and a precision bit p > 0, output a polynomial m ← 2 p · τ −1 ( m) ∈ R where the rounding · is done coefficient-wisely. • Dcd(m; p). For m ∈ R, output a plaintext vector To sum up, to encrypt a plaintext vector of real (complex) numbers m, we first encode m into m ← Ecd ( m; p) with a certain precision bit p, and then generate a ciphertext ct ← Enc pk (m) with the public key pk. Now consider ct 1 and ct 2 be ciphertexts of m 1 and m 2 in C N/2 . Since our encoding method scales each plaintext vector up by 2 p , the plaintext vector of a ciphertext ct ← C.Mult evk (ct 1 , ct 2 ) is (approximately) 2 p · m 1 m 2 , not m 1 m 2 , which will result in exponential growth of plaintexts. To deal with this problem, we adjust the scaling factor by the following procedure so-called rescaling: • RS → (ct). For a ciphertext ct ∈ R 2 q , output ct ← (q /q ) · ct q .
After the rescaling procedure ct mult ← RS ct , the plaintext vector of the output ct mult is (approximately) m 1 m 2 , and the ciphertext modulus q L is reduced by 2 p . As a result, the level parameter L should be carefully chosen according to the multiplicative depth of a target computation. In order to present our algorithm in a simple form, we will not describe these rescaling procedures for the rest of the paper, but we remark that in the actual use of HEAAN, there should be delicate consideration on scaling of message.
To deal with a plaintext vector of the form m ∈ C k having length K ≤ N/2 for some power-of-two divisor K of N/2, HEAAN encrypts m into a ciphertext of an N/2dimensional vector ( m|| · · · || m) ∈ C N/2 , where v|| w denotes the concatenation of two vectors v and w. This implies that a ciphertext of m ∈ C K can be understood as a ciphertext of ( m|| · · · || m) ∈ C K for powers-of-two K and K satisfying K ≤ K ≤ N/2 . Finally, the HEAAN scheme provides the rotation operation on plaintext slots, i.e., it enables us to securely obtain an encryption of the shifted plaintext vector m r , . . . , m N/2−1 , m 0 , . . . , m r−1 from an encryption of m 0 , . . . , m N/2−1 . It is necessary to generate an additional public information rk, called the rotation key. We denote the rotation operation as follows.
• Rot rk (ct; r). For the rotation key rk, output a ciphertext ct encrypting the (left) rotated plaintext vector of ct by r(> 0) positions as above example. If r < 0, it denotes the right rotation by (−r) positions.
We omit a subscript of each algorithm of HEAAN for convenience if it is obvious.

Matrix packing method and rotate function
In this subsection, we describe an encoding method to encrypt a matrix structure in a ciphertext which was also introduced in [8]. Consider an n × m matrix Z We first pad zeros to set the number of rows and columns to be powers-of-two, say n and m, and assume that log n + log m ≤ log(N/2). Then we pack the whole matrix in a single ciphertext ct Z in a column-by-column manner. As described above, the algorithm Rot (ct Z ; r) can shift the encrypted vector by r positions. In particular, we can perform row and column rotations of an encrypted matrix with this operation. When r = n · j, and the result will be the (left) column rotation of the encrypted matrix Z by j columns.
For the row rotation of an encrypted matrix, we use socalled masking approach. Consider n × m matrices M i and M i , where the first n − i rows of M i (resp. M i ) are filled with 1 (resp. 0) and the last i rows of M i (resp. M i ) are filled with 0 (resp. 1). Let msk i and msk i be the ciphertext of M i and M i , respectively.
Then, we mask them as ct 1 ← C.Mult (ct 1 , msk i ) and ct 2 ← C.Mult ct 2 , msk i . As a result, the output of C.Add (ct 1 , ct 2 ) is a ciphertext of upper row rotation of Z by i rows.
Those row and column rotations of an encrypted matrix are denoted as follows: • C.ColumnRot ct Z , j . For a ciphertext ct Z of a matrix Z and an integer j, output a ciphertext ct of left column rotation of Z by j columns.
and an integer j, output a ciphertext ct of upper row rotation of Z by i rows.

Semi-parallel GWAS algorithm
A naive application of GWAS analysis can be done by running a logistic regression for each SNP, which resulting in high computational cost since the number of SNPs can be usually hundred thousands or more. To overcome this problem, Sikorska et al. [27] proposed a semi-parallel GWAS algorithm which reduces the required computation time from 6 hours to 10-15 minutes using projections. Let n be the number of samples each of which consists of m (binary) SNP data and k covariate data. Then the whole SNP data and covariate data can be organized as an n × m matrix S and an n × k matrix X 0 , respectively. For k := k + 1, we define a matrix X as the concatenation of a vector whose components are 1 and X 0 , denoted by X = 1||X 0 . Let y be a target binary phenotype vector of length n. With these inputs, the semi-parallel GWAS algorithm outputs the m-dimensional vector − − → pval which indicates the p-value of each SNP with respect to the target phenotype. The detail of the algorithm is described in Algorithm 1.
The semi-parallel GWAS algorithm involves logistic regression on (X, y) in the first step, and Fisher Scoring [28] described in Algorithm 2 is one of the most highly efficient algorithm for logistic regression. In both algorithms, σ (x) := 1/(1 + exp(−x)) is called sigmoid function. For both algorithms, if the input of Algorithm 1 The Original Semi-Parallel GWAS Input: SNP matrix S ∈ {0, 1} n×m , covariate matrix X ∈ Q n×k , phenotype vector y ∈ Q n , and # iteration iter.
functions such as logarithm (log), division (/) and sigmoid (σ ) is a vector, then it means to apply the function component-wisely resulting in the output vector of the same length. The notation T in the superscript denotes the matrix transpose. The operation denotes the Hadamard (component-wise) multiplication of two vectors, and the notation diag(·) with an input of a square matrix means the diagonal vector of the input.

Our optimization methodology
The aim of this study is to construct an HE algorithm for privacy-preserving semi-parallel GWAS computation. Since non-polynomial operations such as matrix inverse or real number inversion is a challenging stuff in HE, we need to modify the original semi-parallel GWAS algorithm into HE-friendly form for efficiency. Moreover, the

Algorithm 2 FisherScoring
Input: Covariate matrix X ∈ Q n×k , phenotype vector y ∈ Q n , and # iteration iter.
super-large data size of GWAS requires too much computational cost in encrypted state, and this issue should be resolved. In this regard, we introduce our optimization methodology to the algorithm.

Modification of fisher scoring
The main obstacle of Fisher Scoring (Algorithm 2) is a matrix inversion for U = X T WX ∈ Q k×k . To overcome this problem, we exploit the adjoint matrix adj(U) and the determinant det(U) of U. For (i, j)-minor M i,j ∈ Q of U, adj(U) and det(U) are obtained from basic linear algebra: We express the inverse matrix U −1 as from which we obtain an iterative updating equation on β (t) as follows: Here one needs to compute the inverse of det(U), but this non-polynomial operation is rather expensive in HE.
The key observation on this equation is that the second term U −1 X T y − p (t) essentially converges to 0 as t → ∞ since β (t) converges to some point. From this, we may expect that the convergence would be still valid even when we neglect the term det(U) −1 and substitute it by some appropriate constant. Namely, we can modify the equation as for some constant α > 0. In practice, this approximate version of the Fisher scoring algorithm works quite well with slightly slower convergence rate.

Computing diag S * T WS * without S *
The main observation of this subsection is that computation of n × m matrix S * in Algorithm 1 is superfluous for obtaining p-values. Rather than computing S * , we directly compute det(U) · diag S * T WS * , which can be obtained without computing (matrix) inversion. To be precise, using the fact that S * = S − XU −1 V for V := X T WS, we get Based on this observation, we compute det(U) · diag S * T WS * by following: 1 Compute U = X T WX and V = X T WS.

Approximate computation of S * T W v *
We also take the main observation of the previous subsection so that we do not compute S * . From the definition of S * and v * , it holds that S * T W v * = S T W v * . Then we have the following equations: where the last approximation is valid since the term X T y − p is sufficiently close to the zero vector, which is resulted from the Fisher Scoring. For example, each component of X T y − p was approximately 10 −30 in our experiment with 4 iterations in Fisher scoring. Refer to "Dataset description" section for the specific description of datasets. Therefore, we compute det(U) · S T y − p which is a reliable approximation of det(U) · S * T W v * , in much less computational costs.

Our modified semi-parallel GWAS algorithm
To sum up all our algorithmic optimization techniques described in above, we have Algorithm 3 and 4, which are modified Fisher Scoring and semi-parallel GWAS algorithms, respectively.
In Algorithm 3, the constant α takes a similar role to the learning rate in gradient descent algorithm [29], which can be adjusted if necessary.
We remark that step 8 of Algorithm 4, a conversion procedure from the squared z-score z i to the p-value pval i , is done in unencrypted state. Namely, we decrypt the ciphertext of z i for 1 ≤ i ≤ m after step 7 so that z i 's are publicized. We stress that the squared z-score has exactly the same information as the p-value, so publishing squared z-scores does not leak any additional information more than publishing p-values.

Homomorphic evaluation of the modified semi-parallel GWAS algorithm
Upon HE-friendly algorithms discussed in the previous section, there still remain computational issues regarding more fundamental operations. Recall that HEAAN basically supports component-wise addition and multiplication along with data slot rotations. However, we encrypt the data matrix by column-by-column manner, and our algorithms include complex operations such as matrix multiplication, evaluation of the adjoint matrix, a sigmoid function, and so on. In this regard, we specify how we can deal with such operations efficiently, reducing the number of multiplications or the total depth of multiplications required which are the main bottleneck of HE.
Note that this section consists of rather technical contents related to HE, since it includes HE algorithms of all building blocks for Algorithm 3 and Algorithm 4. One can simply embrace the fact that every operation required in Algorithm 4 Modified Semi-Parallel GWAS Input: SNP matrix S ∈ {0, 1} n×m , covariate matrix X ∈ Q n×k , phenotype vector y ∈ Q n , # iteration iter, and constant α > 0. Output: p-value vector − − → pval = pval 1 , · · · , pval m ∈ Q m .

Algorithm 3 and Algorithm 4 can be efficiently done based on HE, if not really interested in the details.
Hereafter, [ a] k with an integer a denotes a residue number in [ 0, k − 1] modulo k. An n-dimensional vector a = (a 1 , · · · , a n ) is simply denoted by (a i ) 1≤i≤n , and an n × m matrix A having (i, j)-entry a i,j is denoted by [ a i,j ] 1≤i≤n,1≤j≤m . For both cases, if the size is obvious from context, we simply write a vector by (a i ), and a matrix by [ a i,j ] . Every vector and matrix in this section is assumed to be of size power-of-two, which is in line with our packing method introduced in "Matrix packing method and rotate function" section.

Adjoint matrix and determinant
In step 4 of Algorithm 3 and step 2 of Algorithm 4, we need to compute the adjoint matrix and the determinant of the matrix U =[ u i,j ] i,j = X T WX.
We recall Eqs. 1 and 2 for adjoint matrix and determinant. Given an encryption of U, denoted by C U , we generate (k − 1) 2 ciphertexts C i,j for 1 ≤ i, j ≤ k − 1 from C U , whose plaintext is an i-row (upper) rotation and j-column (left) rotation of U. We first consider the 0-th plaintext slot, i.e., the (0, 0)-position of the plaintext matrix, of the ciphertexts. Since every u a,b for 1 ≤ a, b ≤ k −1 is a (0, 0)entry of plaintext matrix for one and only one ciphertext C a,b , we can compute a ciphertext whose After computing the ciphertext of [ M i,j ] as above, we can obtain the ciphertext C adj of adj(U) by multiplying a ciphertext C sgn of (−1) i+j . Finally, the ciphertext C det of determinant det(U) is easily obtained from the homomorphic multiplication of C U and C adj , followed by log k rotations and summations.

Matrix multiplications
Let A =[ a i,j ] be an n × k matrix with n ≥ k and B =[ b i,j ] be an n × m matrix. We use an Algorithm 6 computing a ciphertext C A T B of A T B from ciphertexts C A and C B of A and B, which is inspired from the hybrid method by Juvekar et al. [30].
Indeed, our GWAS algorithm contains several matrix multiplications of the form A T DB for the diagonal matrix D. For this, we first compute B = DB by Expdiag 0 (D) B, and then obtain A T DB by computing A T B with the

Algorithm 6 C.MatMul(C A , C B )
Input: Ciphertexts C A of A ∈ Q n×k and C B of B ∈ Q n×m with n ≥ k, and masking ciphertexts {dmsk t } t Output: A ciphertext of A T B.
1: C M ← 0 2: for t = 0 to k − 1 do 3: for i = 0 to log k − 1 do 5: Mult (C t , C B ) , t)) 8: end for 9: for i = 0 to log(n/k) − 1 do 10: C M ← C.Add C M , C.RowRot C M , 2 i · k 11: end for 12: return C M above method. Note that this requires only one additional Hadamard multiplication.

Matrix-vector multiplications
By understanding a vector by an one column matrix, we can perform all matrix-vector multiplications in our algorithms, except X β that appears in step 5 of Algorithm 3.
In fact, we can also compute X β by changing Algorithm 6 a little bit. Recall that X =[ x i,j ] is an n · k matrix and β = (β i ) is a k-length vector, and it holds that n ≥ k. We now again compute k ciphertexts of diag t (X), and then compute whose algorithm is described by Algorithm 7. We remark that there is another simple method for matrix-vector multiplication that we use for c = S T y − p . For simplicity, let x = (x i ) := y − p. Then by rotating and summing all rows of S x|| · · · || x = s i,j · x i , we obtain a matrix having the same size with S and consisting of identical rows c = S T x. This requires only one Hadamard multiplication and log n rotations. However, strictly speaking, this resulting ciphertext is not a ciphertext of S T x, since it encrypts a matrix having c row-wisely, not column-wisely. Thus we can only use this simple method only for S T x, where this row-wise packing Algorithm 7 C.MatVecMul(C X , C β ) Input: Ciphertexts C X of X ∈ Q n×k and C β of β ∈ Q k and masking ciphertexts {dmsk t } t Output: A ciphertext of Xβ.

5:
C t ← C.Add C t , C.ColumnRot C t , 2 i 6: end for 7: does not matter after then. The detail of this algorithm is described in Algorithm 8.

Fast diag A T BA computations
To obtain diag A T BA , one can perform matrix multiplication followed by diagonal extraction, but it is obviously not optimal since this computes unnecessary entries of A T BC other than diagonal entries. Thus we use another method that only compute the diagonal entries.
Let A = a i,j be an n × m matrix. As an incremental step, we first consider diag A T DC where D = d i,j is an n × n diagonal matrix and C = c i,j is an n × m matrix. Then it holds that diag A T DC j = n−1 i=0 d i,i · a i,j · c i,j . Now, from an encryption of D, we compute an encryption of n × m matrix Expdiag 0 (D) and then by rotating and summing through all rows, we obtain a matrix consisting of identical rows diag A T DC . One can easily check that Algorithm 8 with input C A and C.Mult C diag(D) , C C ) exactly performs this computation, and then we omit the explicit algorithm. Note that this can be directly applied for diag S T WS computation of step 5 of Algorithm 4. Toward our goal diag A T BA with a full matrix B, we exploit the above diagonal-case method after decomposing B into diagonal matrices. Let B t be a diagonal matrix with the diagonal diag t (B) for 0 ≤ t ≤ n − 1, then it holds that Therefore, after obtaining encryptions of Expdiag t (B) and ρ t (A) from encryptions of B and A, we can directly apply the diagonal-case method on each diag(A T · B t · ρ t (A)) computation for 1 ≤ t ≤ n and finally obtain the encryption of diag A T BA .
Here we again remark that, since these methods use Algorithm 8, they also ruin the column-wise packing as we already pointed out. Hence after applying these methods, it would be hard to perform another matrix operation. Indeed, one can check that the diagonal extractions are required for step 5 of Algorithm 4, which is the last part of algorithm that uses matrix structure.

Approximate computation of sigmoid
Since the sigmoid function σ (x) = 1/(1 + exp(−x)) is not a polynomial, we exploit an approximate polynomial of the function to evaluate based on HE. Following the methodology of [8,22], we used least square approximation method over the interval [ −8, 8]. The approximate polynomials g(x) of degree 7 is computed as The maximal error between σ (x) and g(x) is approximately 0.032.

Inverse of real numbers
In step 7 of Algorithm 4, we need to compute the inverse of d i for 1 ≤ i ≤ m. To compute the inverse of real numbers, we exploit the Goldschmidt's division algorithm [31], which outputs an approximate value of the inverse through iterative polynomial evaluations. Refer to [32] for more details of the algorithm.

Results
In this section, we present the experimental results of our modified semi-parallel GWAS algorithm based on HEAAN with a publicly available library [5]. All experiments were implemented in C++ 11 standard, and performed on Linux with Intel Xeon CPU E5-2620 v4 at 2.10GHz processor with multi-threading (8 threads) turned on.

Dataset description
We used a dataset of 245 samples where each sample contains a binary phenotype, 3 covariates (height, weight, age), and 25,484 SNP data provided by iDASH 2018 competition. The dataset is divided into two sets named by iDash_Test and iDash_Eval each composed of 245 samples containing common phenotype and 3 covariates but different number of SNPs; 10,643 and 14,841 SNPs, respectively. We used iDash_Test to set optimal parameters, and iDash_Eval was used to evaluate our algorithm in the competition. Note that the first column of the covariate matrix X ∈ Q n×k is a vector of which all the components are 1. Therefore, the parameters are (n, m, k) = (245, 10643, 4) for iDash_Test and (n, m, k) = (245, 14841, 4) for iDash_Eval.

Experimental setting and parameter selection
We propose two HEAAN parameter sets achieving 128bit or higher security for two experiments denoted by Exp I and Exp II in Table 1. The security levels of HEAAN parameter sets were estimated with Albrecht's security estimator [25,26] of which inputs are the ring dimension N, the modulus Q, the Hamming weight h of a secret polynomial, and the error distribution χ err . Note that since the modulus of the evaluation key evk is 2 2L , the security of HEAAN is estimated with input N, Q = 2 2L , h, χ err .
Exp I is a streamlined version operating Algorithm 4 only until step 5; that is, it does not perform the last division process. Note that Exp II that includes the division process (step 7 of Algorithm 4) naturally requires larger L than Exp I.
We first set the scaling parameter p to be sufficiently large so that errors derived from HEAAN do not effect on significant bits of plaintexts. The level parameter L is chosen by considering the fact that p levels are consumed for each homomorphic multiplication. See "Approximate homomorphic encryption scheme HEAAN" section for specific definitions of HEAAN parameters. Besides thoese HEAAN parameters, one also needs to select an appropriate constant α > 0 in Algorithm 3. After experimenting on several values, we set α = 8. Note that the choice of α merely depends on the size of X, but not on S.

Experimental results and evaluation
We demonstrated our modified semi-parallel GWAS algorithm in encrypted state, and evaluated the accuracy   Fig. 1a shows that the accuracy of our algorithm increases with the number of iterations for Fisher scoring, where the data set iDash_Test is used. The Fig. 1b shows that the accuracy of Exp II, which includes a division procedure, is comparable to that of Exp I without the division, where the data set iDash_Eval is used. Comparing Exp I and Exp II, there exists a tradeoff between computational time and information leakage. The output of Exp I is the vector of squared statistics (z i ) 1≤i≤m which has exactly same information with the p-value vector pval, but it takes 20 minutes longer than Exp I. On the other hand, since Exp I outputs the numerator det(U) · c 2 i and the denominator d i (in Algorithm 4) separately, it leaks some information more than p-values. However, it still seems to be very hard to extract any important information of input data from the numerator and denominator.
For more concrete evaluation, we classified each SNP as positive or negative depending on whether the corresponding p-value is larger or smaller than the given threshold (e.g. 10 −2 , 10 −5 , or 10 −12 ). Then, the accuracy of our algorithm compared to the original algorithm can be checked by a well-known statistical measure called F 1 score. The F 1 score of our algorithm is calculated by regarding the positive SNPs classified by the original GWAS algorithm to be the correct positive samples. For the formal definition of F 1 score, we refer readers to [33].
The performance of our algorithm including the computation time and the F 1 score on each parameter set is described in Table 2, where iter denotes the number of iterations in Fisher Scoring, Comp. time denotes the running time of our algorithm in encrypted state, and TH denotes the threshold of p-values for classification.
As we have seen in Fig. 1, more iterations of Fisher scoring provides higher accuracy measured by higher F 1 score. Note that 4 iterations suffice to provide high F 1 score even in a very small threshold such as 10 −12 . Also, Exp II calculating approximate inverse in encrypted state provides almost similar F 1 score to Exp I without such approximation. It implies that the error of inverse approximation does not seriously impact the whole approximation. Furthermore, Exp II shows even higher F 1 score than Exp I due to the cancellation of errors from our algorithmic approximation and that from the inverse approximation.
For about 15,000 SNP data, our algorithm works in less than 40 minutes when we exclude step 7 of Algorithm 4 in encrypted state, or in about 60 minutes otherwise. We emphasize that each iteration of Fisher scoring takes about 3 minutes while the Goldschmidt's division algorithm takes less than 30 seconds. Exp II takes much longer time than Exp I due to the larger level parameter L.

Scalability
Our algorithm is executed and evaluated with about hundreds of samples each containing ten thousand SNPs, and 3 covariates which can be seen as a small-size data in usual GWAS analysis. We emphasize that our algorithm is highly scalable in the number of samples or SNPs, since we circumvent the naive execution of large-sized matrix operations through the proper algorithmic modification. To test the scalability of our algorithm in practice, we randomly generated 500 samples each of which consist of 3 covariates and 30,000 SNPs. Each column of the covariate matrix was uniform randomly generated in the interval [ 150,200], [ 40,100] and [ 20,80] considering height, weight and age, respectively. Each element of the SNP matrix was uniform randomly chosen as a binary matrix. The experiment Exp 1 on this random dataset encrypted with properly chosen hyperparameters iter = 4 and α = 2 −7 still showed quite accurate p-value result compared to the result obtained by running Algorithm 2 in unencrypted state within 2 h.

Fisher Scoring
Our HE-friendly modified Fisher scoring (Algorithm 3) works quite well in practice, but there still remains to obtain some theoretical results on the convergence of the algorithm with respect to the new parameter α. Furthermore, we should consider an error in every operation derived from HEAAN when homomorphically evaluate the algorithm. As a result, research on the convergence of the erroneous version of our modified Fisher scoring algorithm should be very interesting topic as a further work.
In addition, we note that our modified Fisher scoring algorithm can be generally used for logistic regression, not restricted to GWAS algorithm.

Conclusions
Interest on privacy-preserving genome data analysis based on HE has grown up very rapidly since the annual iDASH competition was launched, and GWAS is one of the most important technologies in this area which was also selected as one of three tasks in iDASH 2018 competition. Our HE-friendly modified semi-parallel GWAS algorithm was successfully implemented based on an approximate HE scheme HEAAN, and we could obtain the p-value result in about 30-40 minutes for 10,000-15,000 SNP data with sufficiently high accuracy compared to the result obtained in unencrypted state.