 Methodology
 Open Access
 Published:
Regionbased interaction detection in genomewide casecontrol studies
BMC Medical Genomics volume 12, Article number: 133 (2019)
Abstract
Background
In genomewide association study (GWAS), conventional interaction detection methods such as BOOST are mostly based on SNPSNP interactions. Although single nucleotides are the building blocks of human genome, single nucleotide polymorphisms (SNPs) are not necessarily the smallest functional unit for complex phenotypes. Regionbased strategies have been proved to be successful in studies aiming at marginal effects.
Methods
We propose a novel regionregion interaction detection method named RRIntCC (regionregion interaction detection for casecontrol studies). RRIntCC uses the correlations between individual SNPSNP interactions based on linkage disequilibrium (LD) contrast test.
Results
Simulation experiments showed that our method can achieve a higher power than conventional SNPbased methods with similar typeIerror rates. When applied to two real datasets, RRIntCC was able to find several significant regions, while BOOST failed to identify any significant results. The source code and the sample data of RRIntCC are available at http://bioinformatics.ust.hk/RRIntCC.html.
Conclusion
In this paper, a new regionbased interaction detection method with better performance than SNPbased interaction detection methods has been proposed.
Background
Genomewide association study (GWAS) has served as an important tool to investigate the relationship between genomic variants and human traits [1]. The genetic variants investigated in GWAS are mainly single nucleotide polymorphisms (SNPs). SNPs are single nucleotide variants whose genotypes are not fixed in the population and exhibit diversities among different individuals. Most GWAS analysis protocols follow the singlelocus test procedures aimed at detecting the marginal effects of SNPs [2, 3]. However, it’s well recognized that genetic variants work synergistically through certain pathogenic pathways [4]. The interactions among SNPs are not guaranteed to be discovered by marginal effect detection, especially for SNPs with weak marginal effects but strong interaction effects [5]. Many methods have been developed to address this problem [4, 6], including PLINK [7], BOOST [5], MDR [8], ReliefF [9], BEAM [10], and LD contrast test [11].
An important problem in SNPSNP interaction detection is the stringent threshold when considering multiple testing correction. For marginal effect detection, a SNP can only be considered as significant when its corresponding pvalue is at the order of 10^{−8} (assuming we use Bonferroni correction). In SNPSNP interaction detection, the threshold has to go down further to the order of 10^{−14}. As a result, interactions with weak or moderate effect sizes might remain undiscovered.
In this paper, we proposed a regionbased interaction detection method to address this problem. Regionbased methods have been successful in marginal effect detection [12, 13]. The basic idea is to group the effects of nearby SNPs together and test their aggregation rather than investigating the elements separately. The benefit is twofolds: Firstly, the size and the number of regions are controllable. We can achieve the balance between the resolution of the results and the statistical significance threshold after Bonferroni correction. Secondly, the effect size might be enhanced by taking the whole region into account. SNPs are the basic genomic units. Neverthless, they are not necessarily the functional units of diseases. Different SNP mutations in a gene can all lead to changes of protein functions. Therefore, grouping different SNPs together provides a possible alternative.
To group different SNPSNP pairs together, the key is to quantitatively measure and account for the relationships between different SNPSNP pairs. To the best of our knowledge, no existing method is available to test regionregion interactions for casecontrol studies, where we only have two groups of people: healthy people (controls) and people with the investigated disease (cases). Although Ma et al. [14] proposed a regionbased interaction detection method to analyze continuous traits based on the linear regression model, it is not easy to extend their method to the casecontrol setting due to the difficulty of deriving the covariances of test statistics under the logistic regression model that is commonly used in casecontrol studies. In this paper, we use the LD contrast test method instead of the logistic regression in interaction detection. We derive the correlation coeffcients of the correpsonding SNPSNP interaction test statistics. Then we further extend regionbased methods to the casecontrol setting by accounting for the covariances between SNPbased test statistics. We name this method RRIntCC (regionregion interaction detection for casecontrol studies). Experiment results illustrate that RRIntCC achieves a higher power than conventional SNPSNP interaction detection methods at the same typeIerror rate.
Methods
Here we propose a novel regionbased interaction detection method for genomewide casecontrol studies that utilizes SNPbased interaction test statistics and their covariances. LD contrast test is adopted to measure SNPbased interaction effects. We derive the covariance of LD contrast test statistics, which enables a robust aggregation of SNPSNP interactions within a region pair. The determination of regions comes from gene definitions or BOOST results.
Genomic data formats
There are two alleles for almost every base pair (bp) position in the human genome, one from the maternal chromosome and the other from the paternal chromosome. A combination of the two alleles is denoted as a genotype of this bp position. SNPs are defined as the base pairs that could exhibit different genotype values in different individuals. Normally a SNP only has two possible allele values in the population, one major allele with a higher probability (denoted as B), and one minor allele (denoted as b). Correspondingly, there exist three genoytpes for a typical SNP, i.e., BB, Bb and bb, where Bb is called a heterogeneous genotype and the rest two are called homogeneous genotypes. GWAS uses microarrays to generate SNP genotype data. In SNP data analysis, we use 0/1/2, 0/1/1, and 0/0/1 for BB/Bb/bb as the encoding scheme for additive, dominant, and recessive genetic models, respectively. A more flexible strategy is to estimate the effects of three genotypes independently, at the price of an increased degree of freedom. Allele data could also be used for analysis, with 0/1 as the numerical values of major/minor alleles. However, statistical inference needs to be performed in advance to retrieve allele information from original genotype data, which is called haplotype phasing in the GWAS community. In this paper, we focus on the analysis of genotype data.
LD contrast test for SNP interaction detection
Current interaction detection methods are mainly based on the deviation from additive effect by assuming a linear or logistic regression model. Nevertheless, this approach is not necessarily the most powerful method due to the uncertainty of underpinning biochemical mechanisms. Linkage disequilibrium (LD) contrast test provides another valuable perspective to investigate this problem. Empirical studies have shown that LD contrast test can achieve higher power than logistic regression under certain disease models for casecontrol studies [6]. In this paper, LD contrast test is adopted to generate SNPbased interaction test statistics because of its clear statistical meaning and mathematical simplicity.
LD represents the statistical association between two genetic loci with allele values, defined as the deviation from the independence of two SNPs (A and B)
To avoid the ambiguity caused by haplotype phasing, composite LD (CLD) which only requires genotype data is commonly used to approximate LD. CLD is defined as [15]:
where the subscript and the superscript represent two gametes that are passed to offsprings and P denotes the probability of the specific gamete combination. CLD could be regarded as a simplified version of phasing to facilitate the analysis based on genotype data. The statistical properties of CLD have been well studied [16, 17]. One important fact is that CLD corresponds to the sample correlation coefficient \(\hat {r}\) of genotype values under the additive model,
where p=p(A),q=p(B),D_{A} and D_{B} represent HardyWeinberg disequilibriums, i.e. D_{A}=p_{AA}−p^{2}(A), D_{B}=p_{BB}−p^{2}(B). D_{A} and D_{B} are nearly 0 in GWAS datasets after quality control.
A similar result holds for the original LD and allele values,
Therefore, CLD could also be viewed as an approximation of LD by using the correlation coefficient of 0/1/2 genotype data under the addtive model to replace that of 0/1 allele values, at the price of implicitly conducting phasing with equal probabilities for twoallele combinations.
Suppose two SNPs work synergistically to contribute to the same pathways, they are less likely to be separated during recombination and will be inherited together to offsprings in the case group. As a result, the SNPSNP pattern should be different between patients and healthy people. Therefore, checking the difference of LD patterns between cases and controls provides an alternative way to detect interaction. LD contrast test was proposed to statistically test this difference [11]. The test statistic based on CLD has the following form:
which follows a 1df χ^{2} distribution under the null hypothesis that there is no LD difference between cases and controls.
Covariance between SNP interactions
The key issue in the aggregation of individual SNPSNP interaction effects is the correction of inflated effect sizes caused by the correlations among individual test statistics. The fact that LD is actually the sample covariance of two SNPs is leveraged to derive the correlation coefficients of LD contrast test statistics.
Suppose two SNP pairs (X,Y) and (U,V) have interactions with contrast LDs
The corresponding LD contrast test statistics read:
The covariance of the two test statistics reads:
In GWAS, it’s commonly assumed that population samples are independent. Under this assumption, we can derive the following theorems.
Theorem 1.
The covariance of contrast LDs can be decomposed into components from cases and controls separately,
Proof 1.
ΔLD is the difference of the two sample covariances in cases and controls. By the linear property of covariance, cov(ΔLD_{XY},ΔLD_{UV}) can be decomposed into four covariances of two sample covariances. Because individuals are assumed to be independent, the two terms with one sample covariance from cases and the other from controls are 0. Therefore, Theorem 1 holds. □
Theorem 2.
The covariance of sample covariances reads
where
Proof 2.
The covariance of sample covariances can be rewritten as
We consider the following four conditions. (1) i=j or k=l. (2) i≠j,i≠k,i≠l,j≠k,j≠l and k≠l. (3) i≠j and { i=k,j=l or i=l,j=k}. (4) i≠j,k≠l, and { i=k or i=l or j=k or j=l}. The basic covariance unit in (11) can be rewriten as
There are 2n^{3}−n^{2},n(n−1)(n−2)(n−3),2n(n−1) and 4n(n−1)(n−2) items for the four conditions respectively. We can further separate (12) into 16 components and calculate their values under different conditions. The derivation is straightforward. Our conclusion thus holds. □
Theorem 3.
The sample mean of (X−EX)(Y−EY)(U−EU)(V−EV)is an asympototically unbiased estimator of δ_{4},
Proof 3.
Equation (13) can be rewritten as
Again (14) can be separated into 16 components which are solvable under the independence assumption. The rest of the proof is omitted due to page limit. □
By integrating (813), the covariance of the LD contrast test statistics can be estimated. Note that the variance of the standardized LD contrast test statistic is approximately 1,
Therefore, the covariance of T_{XY} and T_{UV} can be reduced to the corresponding correlation coefficients,
The test statistic for regionbased interactions
To aggregate SNPSNP interaction test statistics, a minimum pvalue based method is adopted. In detail, we assume a multivariate normal distribution MVN(0,Σ) for the observed test statistics z_{i},i=1,2,...,k_{1}k_{2}, where k_{1} and k_{2} are the number of SNPs in the two regions. The covariance matrix Σ is estimated using (813).
Then the regionbased pvalue is defined as the probability that we observe a value that is larger than the largest absolute value of SNPSNP interaction test statistics under MVN(0,Σ). Denote the absolute value of the test statistic related to the minimum pvalue as T:
Then the pvalue for this regionregion interaction reads,
In this paper, We use the results of GBOOST [18], the GPU version of BOOST, to specify candidate regions. The regions could also be selected by checking potential pathogenic pathways or proteinprotein interaction networks.
Results
We conducted simulations under various settings to examine whether the proposed method can correctly control typeIerror rates and outperform SNPbased methods in terms of statistical power. To mimic real LD patterns, we picked all genotyped SNPs from two genomic regions (A and B) with intensive LD patterns in the dataset from Myocardial Infarction Genetics Consortium (MIGen) [19]. Region A is of size 157.874 kbp, located in chromosome 1, with 34 genotyped SNPs inside and 9 tag SNPs selected by haploview. Region B is of size 267.528 kbp, located in chromosome 3, with 50 genotyped SNPs and 10 tag SNPs.
We developed the software RRIntCC in C++. The source code of RRIntCC is available at http://bioinformatics.ust.hk/RRIntCC.html. The results of RRIntCC and SNPbased methods were compared for empirical power experiments. We further applied RRIntCC to MIGen and a renal complication dataset of type 2 diabetes (T2D) patients. RRIntCC reported several significant region pairs in both datasets while conventional SNPbased interaction detection tools failed to identify any SNP pairs.
TypeIError rate control
For typeIerror rate evaluation, we randomly selected 1000, 2000, 3000, 4000, and 5000 samples from MIGen dataset and maintained their genotype values to preserve the LD patterns. Phenotype values for the randomly picked samples were assigned using a Bernoulli distribution with equal probabilities for case and control disease status. 1000 simulations were run for each sample size to determine the empirical typeIerror rates under two commonly used significance levels, i.e. 0.05 and 0.01. We repeat the experiment 20 times to examine the robustness of empirical typeIerror rates. As shown in Fig 1, simulations of empirical typeIerror rates indicated that the results of RRIntCC are not inflated at given significance levels.
Empirical statistical power
For power evaluation, phenotype values were generated using the public software GWASimulator [20], which uses haplotype information to simulate LD structure and produces phenotype values according to preset disease prevalence, causal SNPs and interactions with certain effect sizes. In total, 12084 haplotypes of these two regions were generated by PLINK [7]. We performed 1000 simulations for 1000, 2000, 3000, 4000, and 5000 samples, respectively. Results of original LD contrast test (LDCont) and GBOOST were also given for comparison.
GWASimulator simulated genotypes of all SNPs in the two regions, while only the tag SNPs were analyzed. Even though nontag SNPs could be selected as causal SNPs, we can still observe interaction effects between tag SNPs due to LD between tag SNPs and nontag SNPs. We designed six experimental settings with different tag status and allele frequencies for the causal interacted SNP pair. The effect sizes were determined by the relative risk ratio. The increment of relative risk ratio by observing one disease allele was set as \(\sqrt {2}\), so that the ratios for genotype combinations 1/1, 1/2, 2/1, and 2/2 were 2, \(2\sqrt {2}, 2\sqrt {2}\), and 4, respectively. The results are summarized in Table 1. Under all settings, RRIntCC achieves a higher power than LDCont. GBOOST outperforms RRIntCC and LDCont when the MAFs of both causal SNPs are large. However, when the MAF of even one causal SNP goes down, the power of GBOOST drops dramatically and RRIntCC is the most powerful method under such settings. Even in the cases where both MAFs are large, RRIntCC is still valuable when sample size is small. The results support the use of our regionbased interaction detection method in GWAS studies, especially considering that GWAS datasets usually have quite limited sample sizes compared to the huge number of SNPs.
Experiment using real datasets
We applied our method to the dataset of Myocardial Infarction Genetics Consortium (MIGen) with 649370 genotyped SNPs and 2967/3075 cases/controls, and the renal complication dataset collected in Hong Kong with 1257031 SNPs and 882/2231 cases/controls. Current computation capability cannot support wholegenome interaction analysis using LD contrast test. Instead, GBOOST [18] was first used as probes to generate regionpairs qfor regionbased interaction analysis. We adopted 5×10^{−10} as a suggestive pvalue threshold to screen out SNP pairs that are unlikely to be associated. The remaining SNP pairs were clumped into regions with size 200 kbp, which is roughly the size of typical genes. After identifying the ranges of clumped regions, all genotyped SNPs in MIGen dataset were mapped into these regions. For computation efficiency, the maxmium number of SNPs in each region was set to be 31, so that the total number of SNPSNP interactions within each region pair was controlled below 1000. The choice of this number is arbitrary. In case that the real number of SNPs inside a region is larger than this limit, we randomly choose 31 SNPs to represent this region.
Table 2 lists the top four SNP pairs found by GBOOST in the MIGen dataset and their corrected familywise error rates (cFWER). None of them can pass the Bonferronicorrected pvalue threshold. Moreover, even the smallest pvalue is 100 times larger than the threshold. Table 3 lists the top four region pairs found by RRIntCC. One region pair, chr3: [177577480, 177777480] ∼ chr7: [81695481, 81895481], passes the Bonferronicorrected pvalue threshold. The second and third region pairs share the same region in chr3 and overlap in the region in chr20, which indicates that these two region pairs actually refer to only one region pair with size larger than the preset 200 kbp. Therefore, we further analyze the region interaction between chr3: [187498383, 187698383] with size 200 kbp and chr20: [39109460, 39444799] with size 335.339 kbp, leading to a cFWER of 0.0536. Multiple genes, including CACNA2D1, DGKG, AK057298, TOP1, BC035080, PLCG1, ZHX3, LPIN3, and EMILIN3, are located in these two region pairs. CACNA2D1 has been found to be involved in cardiomyopathy pathway [21, 22]. Besides, ZHX3 is reported to be associated with left ventricle wall thickness [23]. Both ZH3 and EMILIN3 are reported to be associated with resting heart rate [24]. The regions identified by RRIntCC might provide clues for factors affecting myocardial infarction risks.
We also applied GBOOST and RRIntCC to the renal complication dataset. GBOOST has no significant finding, while RRIntCC found one region pair, chr12: [103040398, 103240398] and chr15: [33102602, 33302602], with a cFWER of 0.00382. Two genes, PAH and FMN1, are involved in this region pair. Both PAH and FMN1 were reported to be related to kidney disorders [25][26], which implies a potentially target pathway for the study of renal complications in patients with T2D.
Discussion
There still remain several issues that could be improved in our method. First, the computation complexity of calculating the covariance matrix is O(n^{2}), which is unacceptable for whole genome analysis. Second, the genomic resolution has been sacrificed by replacing SNPs with regions. One potential remedy is to extend statistical fine mapping methods for interaction detection to determine the leading SNP pairs within the significant region pairs.
Conclusions
In this paper, we proposed a regionbased interaction detection method named RRIntCC. We derived the correlation coefficients between SNPSNP interaction test statistics by using LD contrast test. We aggregated SNPSNP interaction test statistics by assuming a multivariate normal distribution with the estimated covariance matrix to account for the potential intensive LD pattern within the regions. By using regionbased strategy, we reduced the total number of tests and were therefore able to use a less stringent Bonferronicorrected pvalue threshold. Simulation results support that our regionbased strategy outperforms SNPbased method in terms of statistical power at similar typeIerror rates.
Availability of data and materials
The data of Myocardial Infarction Genetics Consortium are publicly available from The database of Genotypes and Phenotypes (dbGaP), accession number: phs000294.v1.p1. The data of renal complication in T2D patients are are available from the Themebased Research Scheme (T12402/13N) of the Hong Kong Research Grant Council (RGC) but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of the Themebased Research Scheme (T12402/13N).
Abbreviations
 bp:

Base Pair
 cFWER:

Corrected familywise error rate
 CLD:

Composite linkage disequilibrium
 GWAS:

Genomewide association study
 LD:

Linkage disequilibrium
 MAF:

Minor allele frequency
 MIGen:

Myocardial infarction genetics consortium
 RRIntCC:

Regionregion interaction detection for casecontrol studies
 SNP:

Single nucleotide polymorphism
 T2D:

Type II diabetes
References
 1
Welter D, et al. The NHGRI GWAS Catalog, a curated resource of SNPtrait associations. Nucleic Acids Res. 2014; 42(D1):D1001–6.
 2
Burton PR, et al. Genomewide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007; 447(7145):661–78.
 3
Visscher PM, Brown MA, McCarthy MI, Yang J. Five years of GWAS discovery. Am J Human Genet. 2012; 90(1):7—24.
 4
Cordell HJ. Detecting gene–gene interactions that underlie human diseases. Nature Rev Genet. 2009; 10(6):392–404.
 5
Wan X, et al. BOOST: A fast approach to detecting genegene interactions in genomewide casecontrol studies. Am J Human Genet. 2010; 87(3):325–40.
 6
Hu JK, Wang X, Wang P. Testing gene–gene interactions in genome wide association studies. Genet Epidemiol. 2014; 38(2):123–34.
 7
Purcell S, et al. PLINK: a tool set for wholegenome association and populationbased linkage analyses. Am J Human Genet. 2007; 81(3):559–75.
 8
Ritchie MD, et al. Multifactordimensionality reduction reveals highorder interactions among estrogenmetabolism genes in sporadic breast cancer. Am J Human Genet. 2001; 69(1):138–147.
 9
Moore JH, White BC. Springer: Machine Learning and Data Mining in Bioinformatics; 2007, pp. 166–75.
 10
Zhang Y, Liu JS. Bayesian inference of epistatic interactions in casecontrol studies. Nature Genet. 2007; 39(9):1167–73.
 11
Zaykin DV, Meng Z, Ehm MG. Contrasting linkagedisequilibrium patterns between cases and controls as a novel associationmapping method. Am J Human Genet. 2006; 78(5):737–46.
 12
Li MX, Gui HS, Kwan JS, Sham PC. GATES: a rapid and powerful genebased association test using extended Simes procedure. Am J Human Genet. 2011; 88(3):283–93.
 13
Liu JZ, et al. A versatile genebased test for genomewide association studies. Am J Human Genet. 2010; 87(1):139–45.
 14
Ma L, Clark AG, Keinan A. Genebased testing of interactions in association studies of quantitative traits. PLoS Genet. 2013; 9(2):e1003321.
 15
Hill WG. Estimation of linkage disequilibrium in randomly mating populations. Heredity. 1974; 33(2):229–39.
 16
Wu X, Jin L, Xiong M. Composite measure of linkage disequilibrium for testing interaction between unlinked loci. Eur J Human Genet. 2008; 16(5):644–51.
 17
Schaid DJ. Linkage disequilibrium testing when linkage phase is unknown. Genetics. 2004; 166(1):505–12.
 18
Yung LS, Yang C, Wan X, Yu W. GBOOST: a GPUbased tool for detecting gene–gene interactions in genome–wide case control studies. Bioinformatics. 2011; 27(9):1309–10.
 19
Kathiresan S, et al. Genomewide association of earlyonset myocardial infarction with single nucleotide polymorphisms and copy number variants. Nature Genet. 2009; 41(3):334–41.
 20
Li C, Li M. GWAsimulator: a rapid wholegenome simulation program. Bioinformatics. 2007; 24(1):140–2.
 21
Cataldo S, Annoni GA, Marziliano N. The perfect storm?Histiocytoid cardiomyopathy and compound CACNA2D1 and RANGRF mutation in a baby. Cardiol Young. 2015; 25(1):174–6.
 22
Bourdin B, et al. Functional characterization of CaV α2δ mutations associated with sudden cardiac death. J Biol Chem. 2015; 290(5):2854–69.
 23
Wild PS, et al. Largescale genomewide analysis identifies genetic variants associated with cardiac structure and function. J Clin Investigation. 2017; 127(5):1798–812.
 24
Eppinga RN, et al. Identification of genomic loci associated with resting heart rate and shared genetic predictors with allcause mortality. Nature Genet. 2016; 48(12):1557.
 25
LichterKonecki U, Hipke CM, Konecki DS. Human phenylalanine hydroxylase gene expression in kidney and other nonhepatic tissues. Mole Genet Metabol. 1999; 67(4):308–16.
 26
Dimitrov BI, et al. Genomic rearrangements of the GREM1FMN1 locus cause oligosyndactyly, radioulnar synostosis, hearing loss, renal defects syndrome and Cenani–Lenzlike nonsyndromic oligosyndactyly. J Med Genet. 2010; 47(8):569–74.
Acknowledgements
We would like to thank the anonymous reviewers for their helpful suggestions.
About this supplement
This article has been published as part of BMC Medical Genomics, Volume 12 Supplement 7, 2019: Selected articles from the 14th International Symposium on Bioinformatics Research and Applications (ISBRA18): medical genomics. The full contents of the supplement are available at url=https://bmcmedgenomics.biomedcentral.com/articles/supplements/volume12supplement7
Funding
This study is partially funded by the Themebased Research Scheme (project T12402/13N) of the Hong Kong Research Grant Council (RGC). We like to declare that RGC is not involved in the design of the study, collection, analysis, and interpretation of data, as well as in the writing of the manuscript.
Author information
Affiliations
Contributions
SZ designed the method, implemented the C++ package, performed statistical experiments and drafted the manuscript. WJ contributed to the design of the study and the design of the method. RCWM provided the data and contributed to the design of the study. WY conceived and supervised the study, and helped to draft the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Zhang, S., Jiang, W., Ma, R.C. et al. Regionbased interaction detection in genomewide casecontrol studies. BMC Med Genomics 12, 133 (2019). https://doi.org/10.1186/s1292001905837
Received:
Accepted:
Published:
Keywords
 GWAS
 Statistical interaction detection
 Regionbased method
 LD contrast test