The GMW framework allows us to transform any computation in a form of a circuit into a privacypreserving one against semihonest adversaries. Therefore, we focus on designing a depthoptimized circuit with XORAND and NOT gates to compute setmaximal matches. Then, using the generic protocol described in the previous section, we have a secure protocol for setmaximal matches that requires at most as many rounds of interaction as the depth of the circuit.
Problem definition
We give an efficient and depthoptimized algorithm for computing setmaximal matches. More specifically, the problem specification is as follows:
Input: A genomic database Y containing m sequences, each of size n, a query genomic sequence x of size n, and a threshold value t. Output: A matrix M of size m×n such that the element M_{j,i} is equal to the length of the match between x and y_{j} ending at position i if the match is setmaximal with threshold t and 0 otherwise.
We note that the output as described above leaks the position of setmaximal matches. Because of the very sensitive nature of genomic sequences, it is beneficial to hide even this information. Therefore, we slightly modify the above problem so that the list of setmaximal matches is revealed after applying a random permutation to each row of the output.
Input: A genomic database Y containing m sequences, each of size n, a query genomic sequence x of size n, a threshold value t and m permutations \((\pi _{k})_{k \in \{1,\dots, m\}}\). Output: Let M be a matrix of size m×n such that the element M_{j,i} is equal to the length of the match between x and y_{j} ending at position i if the match is setmaximal with threshold t and 0 otherwise. The output is the permutation of each row k of M according to π_{k}.
We observe that the output still reveals the lengths of setmaximal matches for each index. This information could be hidden by applying a random permutation on all the entries of M, instead of each row. However, it seems that this would reduce the applicability of the computation, since this information seems central for certain application.
In the secure protocol, the database Y and the permutations \((\pi _{k})_{k \in \{1,\dots, m\}}\) are the input of the server, the query x is the input of the client and the threshold t is known to both parties. To avoid leakage to the server, the protocol has selective reconstruction to the client.
Algorithm description
We describe our algorithm. We include a sample execution in Fig. 1.

1
We compute a matrix M of size m×n such that M_{j,i} is equal to the length of the match between the query x and the database entry y_{j} ending at position i (Fig. 1b).

2
We set M_{j,i} to 0 if the match of x and y_{j} ending at position i is below the threshold t (Fig. 1c).

3
We compute a matrix L of size m×n such that L_{j,i} is 0 if there is a j^{′}≠j such that \(\phantom {\dot {i}\!}\mathbf {M}_{j',i} > \mathbf {M}_{j,i}\) and 1 otherwise (Fig. 1d).

4
We compute K such that K_{j,i} is 0 if there is a j^{′} (may be equal j) such that \(\phantom {\dot {i}\!}\mathbf {L}_{j',i}= y{L}_{j',i+1} = 1\). Namely, there exists a match that is extended to position i+1 (Fig. 1e).

5
We set M_{j,i}←M_{j,i}K_{j,i}L_{j,i} and we permute the row M_{k} according to permutation π_{k} (Fig. 1f).
After the steps described above, the matrix M contains the correct output:

The output M contains the correct length of matches computed in step 1.

The output M contains no matches of length less than t, since all such matching have been removed in step 2.

The output M does not output matches strictly contained in another match. If a match is contained in another larger match, then either there is a match with a preceding starting point or a match with a succeeding ending point or both. In the first and third case, this match is excluded in step 3, since there is another database entry with larger value in the corresponding positions of M. In the second case, the match is excluded in step 4, since there is another extendable match in the corresponding positions.

The output M contains only locally maximal matches. After step 1, M contains the length of matches from their starting position, so it is not possible for a match to be extendable toward a previous position. Additionally, from step 4, M does not contain matches extendable towards the next position.
We now describe how to implement each of these steps using boolean circuits with AND,XOR and NOT gates in a depthoptimized way. Compute the length of matches: First, we compute an auxiliary matrix B^{(0)} such that \(\mathbf {B}^{(0)}_{j,i} = 1\) if x[ i]=y_{j}[ i]. Namely, \(\mathbf {B}^{(0)}_{j,i} = \lnot (\mathbf {x}[\!i] \oplus \mathbf {y}_{j}[\!i])\). We observe that B^{(0)} indicates whether a match has length greater than or equal to one. Using B^{(0)}, we compute whether a match has length greater than or equal to two by setting \(\mathbf {B}_{j,i}^{(1)} \leftarrow \mathbf {B}_{j,i}^{(0)} \wedge \mathbf {B}_{j,i1}^{(0)}\). More generally, if B^{(k)} indicates whether a match has length more than 2^{k} or not, then it can be updated to indicate if the length of a match is more than 2^{k+1} by setting \(\mathbf {B}_{j,i}^{(k+1)} \leftarrow \mathbf {B}_{j,i}^{(k)} \wedge \mathbf {B}_{j,i(2^{k}1)}^{(k)}\).
Concurrently, in each iteration we compute a bound on the length of each match by setting M^{(0)}=B^{(0)} and \(\mathbf {M}_{j,i}^{(k+1)} \leftarrow \mathbf {B}_{j,i(2^{k}1)}^{(k)} \mathbf {M}_{j,i(2^{k}1)}^{(k)} +\mathbf {M}_{j,i}^{(k)} \). We observe that this computation gives the actual length of a match if it is less than or equal to 2^{k} and returns the lower bound of 2^{k} otherwise. Hence, after ⌈log(n)⌉ iterations, we set M=M^{(⌈log(n)⌉)}.
The addition is computed using a depthoptimized adder, which has ANDdepth proportional to the logarithm of the bit length of the numbers returned [23]. Namely, the ANDdepth of the adder is O(log(logn)). Overall, the ANDdepth of the length computation is O(log(n) log(logn)). Remove matches with length below the threshold: Let t_{k}≡t (mod 2^{k}) for \(k \in \left \{1, \dots, \lceil \log (n)\rceil \right \}\) and \((b_{\lceil \log (n)\rceil }, \dots, b_{1})\) be the bit decomposition of t, where b_{⌈log(n)⌉} is the most significant bit and b_{1} is the least significant bit. Let T^{(k)} be m×n matrices that indicate candidate matches; initially \(\mathbf {T}_{j,i}^{(0)} = 1\) for all i and j. Similarly to the length computation, we define the auxiliary matrix B^{(0)} that initially indicates whether a database and query position match or not. In each iteration, we update B^{(k)} as in the length computation; namely, we set \(\mathbf {B}_{j,i}^{(k)} \leftarrow \mathbf {B}_{j,i}^{(k1)} \wedge \mathbf {B}_{j,i(2^{k}  1)}^{(k1)} \). In the kth iteration if b_{k}=1, we update \(\mathbf {T}^{(k)}_{j,i} \leftarrow \mathbf {B}_{j,i  (t_{k}  1)}^{(k)}\mathbf {T}_{j,i}\), otherwise T^{(k)}=T^{(k−1)}. Finally, we set \(\mathbf {M}_{j,i} \leftarrow \mathbf {M}_{j,i} \mathbf {T}_{j,i}^{(\lceil \log (n) \rceil)}\).
This computation intuitively splits the t positions preceding a specific position i into parts of increasing powers of two, then it iteratively checks whether each of these parts is a match. Splitting a number into increasing powers of two is equivalent to computing its bit decomposition. Even though the ANDdepth of this step is O(log(n)), it can be performed in parallel to the length computation where we also use the same auxiliary matrix B. So, this step does not increase the depth of the circuit. Remove matches contained in other matches: We first compute the maximum length of a match for each position i across all the database entries and then perform an equality check between M_{j,i} and the maximum for position i to compute each L_{j,i}. Each such maximum is computed using the D&C comparison circuit [24] in ANDdepth O(log(logn) log(m)). The equality check can be implemented with a depthoptimized circuit in O(log(logn) logm) depth in which the comparison of the bits across the database entries is performed in parallel. Remove extendable matches: A match between x and y_{j} is extendable at position i if B_{j,i+1}=1. So, in order to remove the extendable matches, we compute \(\phantom {\dot {i}\!}\mathbf {K}_{j,i}= \lnot \max _{j'}\left \{\mathbf {L}_{j',i} \wedge \mathbf {L}_{j',i+1}\right \}\) that indicates whether an extendable match exists. Finally, we update M_{j,i}←K_{j,i}∧L_{j,i}∧M_{j,i}. Using the D&C comparison, this operation requires O(log(logn) logm)ANDdepth. Permute matches: Finally, to remove the information regarding the position of matches, we permute each row k of the matrix M according to a permutation π_{k}, which is given as an input. All permutations are performed in parallel and require ANDdepth O(log(n)) using the Waksman permutation network [25].
The total ANDdepth of the above circuit is O(log(n)+ log(logn) log(m)). Since in practice n>>m, the depth can be assumed to be proportional to O(log(n)).
We now present an optimization that reduces the output size and the computational cost of the permutations. Even though this does not offer an asymptotic optimization, it is definitely helpful in the experimental efficiency of the protocol.
Output size reduction and efficient permutations: Before the permutation step, the above circuit outputs a matrix M that contains all setmaximal matches with threshold t. We note that the threshold t guarantees that there exists no setmaximal matches ending at positions with distance less than t. In other words, in each row of the matrix M there is at most one none zero element for every t positions. By combining every t columns of M into one equal to their XOR, we reduce the size of the output by a factor of t without losing the desired information about setmaximal matches.
After the output reduction, we need to permute each row of a matrix of size m×n/t. Therefore, the Waksman permutation network has depth only O(log(n/t)).
Experimental evaluation
We have implemented our protocol for secure computation of setmaximal matches using the ABY framework to evaluate its efficiency. We use the network configuration included in ABY for the communication between the two parties and the default method for precomputing multiplication triplets via oblivious transfer. Then, we build the computation circuit gatebygate and let ABY handle the sharing procedures and secure computation. The problem has three parameters: n, the size of the genomic sequences, m, the size of the database and t, the threshold of setmaximal matches. We run a series of simulations in a single machine of 16GB RAM and Quadcore 2.8 GHz CPU that simulates both the server and the client to evaluate the efficiency of the protocol with respect to these parameters.
In our implementation, the server has as input the database, which defines the parameters n and m, and the threshold value t. Similarly, the client’s input is a query of size n, the database size m and the threshold value t. We note that because of the nature of the GMW protocol both parties need to know the values of all three parameters n, m and t. We evaluate how our protocol scales as n increases for database size m=10,100,1000 and threshold t=1 and t=2^{k}, where k is the bit length of n. Even though the threshold value is an input to the protocol, we plot only two values for clarity of exposition. The two values represent the best and worst values in terms of efficiency. The threshold value 2^{k} maximizes the efficiency gain form the output reduction optimization. On the contrary, when the threshold value t=1, this optimization is not in use, and hence this value corresponds to the worst case running time. In Fig. 2, we observe that for larger enough values of n indeed there is an improvement of both the running time and the depth due to the output reduction technique.
Figure 2a, c, and e show the running time of the protocol for three different values of m. For small n and m, the running time essentially depends on the precomputation phase, but when m and n are large enough the running time for a given m depends almost linearly in n. We observe that for small values of n, there is almost no difference on the running time for the two threshold values. In this case, the output is reduced by a small factor when t=2^{k}, whereas the precomputation time increases, since the computation circuit needs to include the output reduction. Hence, for small values of n, there is no improvement in the efficiency from the output reduction optimization.
In Fig. 2b, d, and f, we notice that the depth depends mainly on log(n), which is what was expected by the analysis in the previous section.