Interaction among apoptosis-associated sequence variants and joint effects on aggressive prostate cancer

Background Molecular and epidemiological evidence demonstrate that altered gene expression and single nucleotide polymorphisms in the apoptotic pathway are linked to many cancers. Yet, few studies emphasize the interaction of variant apoptotic genes and their joint modifying effects on prostate cancer (PCA) outcomes. An exhaustive assessment of all the possible two-, three- and four-way gene-gene interactions is computationally burdensome. This statistical conundrum stems from the prohibitive amount of data needed to account for multiple hypothesis testing. Methods To address this issue, we systematically prioritized and evaluated individual effects and complex interactions among 172 apoptotic SNPs in relation to PCA risk and aggressive disease (i.e., Gleason score ≥ 7 and tumor stages III/IV). Single and joint modifying effects on PCA outcomes among European-American men were analyzed using statistical epistasis networks coupled with multi-factor dimensionality reduction (SEN-guided MDR). The case-control study design included 1,175 incident PCA cases and 1,111 controls from the prostate, lung, colo-rectal, and ovarian (PLCO) cancer screening trial. Moreover, a subset analysis of PCA cases consisted of 688 aggressive and 488 non-aggressive PCA cases. SNP profiles were obtained using the NCI Cancer Genetic Markers of Susceptibility (CGEMS) data portal. Main effects were assessed using logistic regression (LR) models. Prior to modeling interactions, SEN was used to pre-process our genetic data. SEN used network science to reduce our analysis from > 36 million to < 13,000 SNP interactions. Interactions were visualized, evaluated, and validated using entropy-based MDR. All parametric and non-parametric models were adjusted for age, family history of PCA, and multiple hypothesis testing. Results Following LR modeling, eleven and thirteen sequence variants were associated with PCA risk and aggressive disease, respectively. However, none of these markers remained significant after we adjusted for multiple comparisons. Nevertheless, we detected a modest synergistic interaction between AKT3 rs2125230-PRKCQ rs571715 and disease aggressiveness using SEN-guided MDR (p = 0.011). Conclusions In summary, entropy-based SEN-guided MDR facilitated the logical prioritization and evaluation of apoptotic SNPs in relation to aggressive PCA. The suggestive interaction between AKT3-PRKCQ and aggressive PCA requires further validation using independent observational studies.


Background
Prostate cancer (PCA) is the most frequently diagnosed cancer and the 2 nd leading cause of cancer-related deaths among men in the United States [1]. The American Cancer Society estimates that 26-29% of all new cancer cases and cancer-related deaths are attributed to PCA cancer. Well established PCA risk factors include older age, black race, and family history of PCA. However, other potential contributors of this disease may include lifestyle and genetic factors as well as imbalances within important biological pathways.
Genome wide association studies (GWAS), involving the evaluation of millions of SNPs within various biological pathways, has resulted in the detection of numerous PCA susceptibility loci [38]. However, most GWAS and PCA epidemiology studies place emphasis on individual SNP effects. Consequently, researchers often ignore the fact that complex diseases such as PCA are governed by complex gene-gene and gene-environment interactions within distinct biological pathways. Consequently, we sought to evaluate millions of interactions among apoptosis SNPs and their joint modifying impact on PCA risk and disease progression.
This report focuses on the impact of complex interactions among 172 apoptosis-related SNPs on PCA among 2,286 European-American male participants of the Cancer Genetic Markers of Susceptibility Study (CGEMS). The exhaustive assessment of all possible two-, three-, and four-way interactions among 172 SNPs, totaling more than 36 million SNP combinations is computationally burdensome. The statistical challenge stems from the prohibitive amount of data needed for multiple hypothesis testing. To address this issue, for the first time, we use statistical epistasis networks (SEN) guided multi-factor dimensionality reduction (MDR) to efficiently pre-process our genetic data, prior to modeling higher order interactions. As previously reported by Hu and co-workers (2011), SEN uses network science to generate a large connected cluster of pairwise interactions embedded within a genetic dataset [39]. The resulting network, consisting of a subset of high susceptibility SNPs, is used to guide SNP interactions using MDR. MDR is a rigorous statistical tool designed, in part, to evaluate main effects and complex interactions in relation to discrete outcomes [40][41][42][43][44][45][46][47][48]. This report may serve as a foundation for researchers interested in a state-of the art bioinformatics technique, namely SEN-guided MDR, which facilitates the logical prioritization of SNPs for gene-gene interaction analyses in relation to PCA susceptibility and prognosis.

Cancer Genetic Markers of Susceptibility (CGEMS) Population Description
CGEMS study participants consisted of middle aged non-Hispanic Caucasian men, ranging in age from 55 to 81. Relative to control subjects, men diagnosed with PCA were more likely to have a family history of the disease (11.4% versus 6.3%), PSA levels ≥ 4 ng/ml (48.5% versus 6.5%), and abnormal DRE exams (60.7% versus 46.2%) [data not shown]. Men with aggressive PCA had both a Gleason score ≥ 7 and tumor stage III/IV. Although non-aggressive PCA cases primarily had a Gleason score < 7 and tumor stage I/II, a small percentage (1.8%) of them had a Gleason score > 6 and a tumor stage I/II. Among cases, there were no significant differences in family history of disease or PSA levels between men diagnosed comparing men with aggressive and non-aggressive disease (data not shown). The vast majority of subjects with aggressive PCA had higher Gleason scores than subjects with non-aggressive PCA, although 9 of the 488 non-aggressive cases (1.08%) had a Gleason score greater than 6.

Single SNP Effects
We examined 172 apoptosis-related sequence variants in relation to PCA risk and disease aggressiveness, as shown in Tables 2 and 3. Age and family history of PCA did not modify the relationship between the
CARD8 (Caspase recruitment domain family, member 8) CARD family protein; involved in various pathways which regulate caspases or NFB; isoforms interact with caspases to signal apoptosis.

BCL2-associated X (BAX)
Forms a heterodimer with BCL2 and functions as an apoptotic activator involved mitochondrial release of cytochrome c.

BCL2-antagonist/killer 1 (BAK1)
Induces apoptosis by increasing cytochrome c release; interacts with the TP53 after exposure to cell stress.

BCL2-associated agonist of cell death (BAD)
Forms heterodimers with BCLXL and BCL2 to reverse their death repressor activity.

BH3 interacting domain death agonist (BID)
Induced by CASP8; CASP8 cleaves the protein encoded by this gene, and the COOH-terminal part translocates to mitochondria, which triggers cytochrome c release.

BCL2-interacting killer (BIK)
Interacts with survival-promoting proteins to enhance programmed cell death.

PRKCD
Translocates into nucleus during apoptosis. Nuclear PRKCD regulates initiation of cytosolic apoptosis machinery, and subsequent caspase activation and DNA fragmentation.
Anti-apoptotic AKT3 Phosphorylate and inactivate BAD. Activates NFB via IB kinase regulation. Regulates cell signals in response to insulin and growth factors.

B-cell CL/lymphoma 2 (BCL2)
Blocks the release of pro-apoptotic cytochrome c and caspase activation.

BCL2-related protein A1 (BCL2A1)
Reduces cytochrome c release from mitochondria and blocks caspase activation.
Baculoviral IAP repeatcontaining 2 (BIRC2) (aka CIAP1); Inhibits apoptosis by binding to tumor necrosis factor receptor-associated factors TRAF1 and TRAF2.    Table 3. In addition, possession of IKBKE rs1539243 TT and PIK3CB rs500687 CC genotypes were moderately linked with a 44-56% reduction in aggressive PCA susceptibility (p ≤ 0.019). However, none of these loci remained significant after adjusting for multiple comparison bias.

Combined Gene Effects
We investigated the ability of gene-gene interactions to predict PCA outcomes using statistical epistasis networks (SEN) guided multifactor dimensionality reduction (MDR), information gain (IG) measure, and hierarchical interactions graphs (hIG). SEN was used to generate a network to visualize the genetic architecture of PCA outcomes, prior to an exhaustive search of SNP interactions. This approach uses information theory to build an epistasis network from the strongest pairwise SNP interactions. A network, consisting of vertices (i.e., SNPs) joined by edges (i.e., SNP pairs), enabled us to limit our MDR analysis to a subset of the most informative SNPs in relation to PCA outcomes. We were unable to build a statistically significant epistasis network for PCA susceptibility. However, the network for aggressive PCA was topographically significant (p < 0.05), given the results from 1000 permutated data sets. This network consisted of 91 vertices (i.e., individual SNP effects), 80 edges (i.e., pairwise interactions), and 18 total components (i.e., "sub-networks" of vertices and edges). The largest connected component of the aggressive PCA SEN diagram involved 24 vertices and 34 edges, as shown in Figure 1 andAdditional file 1. Consequently, MDR analysis was limited to < 13,000 interactions among 24 SNPs, instead of more than 36 million interactions for 172 SNPs. SEN-guided MDR revealed complex interactions between AKT3 rs12031994-PRKCQ rs571715 as well as AKT3 rs12031994-BID rs366542-PRKCQ rs571715 that were significantly associated with disease aggressiveness (p ≤ 0.009). However, the network and entropy graphs indicated these findings were mainly driven by AKT3 rs12031994. This is evident by the marker's large vertex size (i.e., large main effect) in the SEN graph ( Figure 1) and a lack of a strong pairwise interaction between AKT3 rs12031994-PRKCQ rs571715 in the entropy graph. The AKT3-PRKCQ interaction resulted in joint mutual information (I) of only 0.54% relative to the mutual information (I) of 0.87% for AKT3 alone. Therefore, prior to repeating MDR, we removed the AKT3rs12031994 marker. This resulted in the detection of a modest synergistic interaction between AKT3 rs2125230-PRKCQ rs571715 (p = 0.011), as shown in Table 4. Based on MDR, this model was the best two-factor model. It was selected 9 out of 10 times by MDR based on a cross-validation consistency (CVC) score of 90% and predictive accuracy of 56.3%. Furthermore, we observed a higher mutual information for the combined effect of AKT3 rs2125230-PRKCQ rs571715 (I = 0.66%), in contrast to AKT3 rs2125230 alone (I = 0.45%) or PRKCQrs571715 alone (I = 0.21%), as revealed in Figure 2.
To confirm the validity of the SEN results, MDR analysis was also conducted on all 172 SNPs (Figure 1 andAdditional file 1) as well as 148 loci that were not included in the main component of the network. As expected, we did not detect any significant two-, three-, or four-way interaction models (p ≥ 0.184).

Discussion
Previous genome-wide association studies (GWAS) have identified linkages about 40 PCA loci, including between PCA genetic alterations detected in the 8q24 region, βmicroseminoprotein (MSMB), and allele -8 of the microsatellite DG8S737 [38]. These studies have limited their scope to individual SNPs across the entire genome. Such assessments tend to ignore the genetic architecture of PCA that potentially involves complex interactions along key regulatory pathways. For the first time, the current study evaluates complex interactions among 172 apoptosis-related SNPs in relation to PCA risk and disease aggressiveness among 2,286 European-American men using SEN-guided MDR. Specifically, SEN was used to build a topographically significant aggressive PCA epistasis network, prior to evaluating complex interactions. This inferred epistasis network consisted of 24 SNPs and 34 pairwise interactions, and reduced MDR analysis from > 36 million to < 13,000 SNP interactions. Consequently, we observed a non-linear and modest interaction between AKT3 rs2125230-PRKCQ rs571715 in relation to aggressive PCA. This state-of-the-art bioinformatics technique facilitates the logical prioritization of SNPs for gene-gene interaction analyses in relation to complex diseases. Unfortunately, there are no published reports on the functional consequence of these two intronic SNPs in AKT3 and PRKCQ in relation to mRNA stability/expression protein expression/structure/function or PCA outcomes. However, we speculate that the AKT3 rs2125230 and PRKCQ rs571715 sequence variants, with minor allele frequencies ranging from 14.4-22.9 among men of European descent, may alter transcription regulation, leading to increased mRNA expression. Increased mRNA/protein expression AKT3 rs2125230 and PRKCQ rs571715 may cause: decreased apoptosis, an escape of transformed cells from programmed cell death, increased accumulation of genetic alterations, genomic instability, and ultimately an invasive PCA phenotype. Thus, in vitro and in vivo assays using (short hairpin RNAs) shRNAs or small interfering RNAs (siRNAs) are needed to elucidate the impact of AKT3-PRKCQ genetic alterations on protein expression, apoptosis capacity, and prostate tumorigenesis.
Although AKT3 and PRKCQ are involved in pro-survival pathways, their interaction is not fully understood. However, their interactions with other related protein kinases may offer biological clues on the mechanism by which AKT3 and PRKCQ synergistically influence aggressive PCA. For example, PRKCQ interacts with another AKT family member, AKT1, to activate NFB [50]. If the AKT3-PRKCQ axis has a similar function as other protein kinases, namely AKT1 and PRKCQ, then these pro-survival markers may synergistically activate NFB. As a result, activated NFB may enable the tumor to escape programmed cell death and progress toward an aggressive PCA phenotype.
We considered the strengths, limitations and future directions of the current study. Although SEN-guided MDR only identified a nominally significant network for disease aggressiveness in the current study, this approach overcomes the computational challenge of detecting all possible two-, three-and four-way SNP combinations involved in PCA progression. SEN, in the current study, was used to prioritize > 10 million possible interactions by focusing on a "sub-network" of informative SNPs in relation to aggressive PCA. Reduction of our genetic data set to the most informative markers improved the feasibility to detect interactions that may have otherwise remained undetected. Unfortunately, the genome-wide association studies (GWAS) database used for the current study did not include apoptosis-related sequence variants (e.g., TNF-308 rs1800629, TNFSF10 rs1131532, BCL2 -936 rs2279115) previously reported in published cancer epidemiology studies [52][53][54]. Future studies in our laboratory will focus on high-throughput targeted sequencing to evaluate the impact of novel and commonly reported sequence variants on PCA susceptibility and disease prognosis. In light of recent GWAS reports, it is tempting to assume that extremely large case-control study sets, involving tens of thousands subjects are required to evaluate millions of SNP interactions in relation to PCA outcomes. However, the current study had adequate statistical power to evaluate individual and SNP combination effects in relation to prostate cancer. In particular, MDR has 80% statistical power to evaluate all possible two-, three-, and four-way gene-gene interactions with as low as 200 cases and 200 controls [55]. MDR remains effective even in the presence of 5% genotyping errors and/or 5% missing data [55]. It is anticipated that 5% of the about 13,000 possible interactions among 24 apoptosis-related SNPs will result in approximately 650 significant relationships due to chance alone. However, MDR coupled with permutation testing adjust for multiple comparison bias. Given the low prediction accuracy affiliated with the interaction between AKT3 and PRKCQ, our study findings require replication within independent study sets. However, recent simulation studies demonstrate that even modest disparities in genotype allele frequencies among study participants of independent study sets may interfere with the capacity to replicate complex interactions [56]. Thus, to ensure reproducibility within future studies, we plan to select study sets with similar genetic architecture (i.e., ancestry identification markers and SNP genotypes) as CGEMS PCA case-control subjects.

Conclusion
In summary, we have identified a marginal interaction between two apoptosis-related SNPs linked to PCA disease aggressiveness using SEN-guided MDR. Future molecular and pre-clinical studies may help to clarify the functionality of these genetic variants and their role in PCA disease progression. Our state-of-the art bioinformatics approach will enable researchers to pre-process millions of SNP interactions using GWAS or cancer consortia genotype data. In order to replicate findings within independent study sets, researchers will have to ensure that comparative sub-populations have the same genetic architecture and ancestry backgrounds. Therefore, the application of SEN-guided MDR may ultimately help researchers identify and validate genegene and/or gene-environment interactions to serve as effective cancer prognostication tools.

Study population
The Prostate, Lung, Colon, and Ovarian (PLCO) Cancer Screening Trial is a randomized, well-designed, and multi-center investigation sponsored and coordinated by the National Cancer Institute (NCI) [57,58]. Between 1993 and 2001, the PLCO Trial recruited study participants aged 55 to 81 years old to evaluate the effect of screening on disease specific mortality, relative to standard care. Men randomized to the PCA arm of the trial, received annual PSA and DRE exams. For the current study, 2,286 European-American men were included in our study if they had GWAS data available through the Cancer Genetic Markers of Susceptibility (CGEMS) data portal. (http://cgems.cancer.gov/) [38]. The CGEMS study population consists of nationally available genetic data for nearly 500,000 sequence variants. Incident PCA cases (488 non-aggressive and 687 aggressive) were identified through various sources including: screening exams; reports from patients, physicians, or relatives; or linkage with the National Death Index or state cancer registries. Incident PCA cases were pathologically confirmed with aggressive (Gleason score ≥ 7 and tumor stage III/IV) or non-aggressive [Gleason score (< 7) and tumor stage I/II] disease. Controls (n = 1,111) were matched to cases identified based on age, time since initial screening, and year of blood draw using incidence density sampling. All participants signed informed consent documents approved by both the NCI and local institutional review boards. Access to clinical and background data collected through examinations and questionnaires was approved for use by the PLCO.

Gene selection
A panel of 73 apoptosis-associated genes was selected from published cancer epidemiology or molecular biology studies as well as pathway databases and tools, including Kyoto Encyclopedia of Genes and Genomes (KEGG, http:// www.genome.jp/kegg), Gene Ontology (GO), BioCarta (http://www.biocarta.com), ProteinLounge (http://www. proteinlounge.com), and Ingenuity (http://www.ingenuity. com) [59][60][61][62][63]. We searched PubMed for articles using the following keywords: [(prostate OR prostatic) AND (cancer OR neoplasms) AND apoptosis (gene variants OR single nucleotide polymorphisms OR targets) AND epidemiology]. Pathway tools, such as KEGG, BioCarta, ProteinLounge, and Ingenuity were used to visualize genegene and protein-protein interactions essential to regulating apoptosis [59][60][61]63]. The Entrez Gene database of the National Center for Biotechnology Information (NCBI, Figure 1 Visualization of the interaction among apoptosis-related sequence variants using Statistical Epistasis Network Modeling (SEN)-guided MDR. The Statistical Epistasis Network (SEN) was generated using a pairwise interaction strength cut-off or 1%. Overall, the model has 91 vertices (single SNPs), 80 edges (pairwise SNPs), and 18 components ("sub-networks" of vertices and edges). The largest connected component, shown in the center and spanning the entire length of this model, has 24 vertices and 34 edges. Each number in Figure 1 corresponds with a specific apoptosis-related SNP analyzed in the current study, as summarized in Additional file 2. http://www.ncbi.nlm.nih.gov) was used to ensure that selected targets were involved in the apoptotic pathway [64].

Criteria for SNP selection and data management
Prior to uploading our initial list (i.e., 73 apoptosis genes) into the CGEMS data portal, we secured the HUGO gene name equivalents for the targets of interest using NCBI Entrez Gene. SNP profile data was available for more than 1197 SNPs [38]. However, upon further investigation, many of these SNPs were eliminated if they were: not located within 2.5 kb of the 5' or 3' end of the gene; or detected within a gene that was not related to apoptosis. Among SNPs that were associated with apoptosis, we focused on gene regions located in coding, promoter, or "near gene" regions. "Near gene" regions were defined as 2.5 kb up-and downstream of the 5' or 3' ends of the selected genes, respectively. We excluded any sequence variants that had a minor allele frequency (MAF) < 5% as reported in the NCBI Entrez SNP [64]. In addition, one SNP was removed since its genotype frequency distribution among controls deviated substantially from the Hardy-Weinberg Equilibrium, p ≤ 0.005 (n = 1). Following data cleanup, 172 apoptotic SNPs were analyzed among 2,286 men of European descent (687 aggressive cases, 488 non-aggressive cases, and 1,111 controls).

Predicted function of selected SNPs
The SNPinfo (http://snpinfo.niehs.nih.gov/) webserver enabled us to annotate and/or predict the functional consequence of selected apoptosis variants, as summarized in Additional file 2 [65]. This server consists of several pipelines to predict whether alternative alleles of a SNP may alter one or more of the following: transcriptional regulation via transcription factor binding site (TFBS) activity; premature termination of amino-acid sequence (stop codons); the splicing pattern or efficiency at mRNA splice sites, exonic splicing enhancers (ESE) or silencers (ESS); protein structures or properties by changing single amino Figure 2 Interaction Entropy model. This graphical model, describes the percent entropy that is explained by each apoptosis-related SNP or pairwise combination within our study population. Positive percent entropy indicates information gain (IG) or synergy; whereas, negative percent indicates redundancy or lack of information gain (IG). Schematic coloration used in the visualization tools represents a continuum from synergy (i.e. non-additive) to redundancy. The colors range from red representing a high degree of synergy (positive information gain (IG)), orange a lesser degree, and gold representing independence and a midway point between synergy and redundancy. On the other hand, green and blue represent redundancy, which is not apparent in the current analysis. acids (i.e., non-synonymous SNPs); or mRNA transcription or protein translation by altering microRNA (miRNA) binding sites.

Statistical analysis for single gene effects
Univariate and multivariate analyses were used to evaluate apoptosis associated SNPs among men of European descent in relation to PCA outcomes. To assess whether possession of one or more apoptotic alleles influence the risk of developing PCA, we tested for significant differences in the distribution of homozygous major, heterozygous, or homozygous minor genotypes between cases and controls using the chi-square test of heterogeneity. A case-only analysis was used to examine the relationship between apoptosis-related alleles and aggressive PCA. We evaluated differences in the distribution and inheritance of apoptosis-related genes comparing men with aggressive disease (Gleason score ≥ 7 and tumor stage > 2) to those with non-aggressive disease (i.e., other PCA cases that did not have both Gleason score ≥ 7 and tumor stage > 2). The associations between PCA outcomes and selected polymorphic genes, expressed as odds ratios (ORs) and corresponding 95% confidence intervals (CIs), were estimated using unconditional multivariate logistic regression (LR) models adjusted for age and family history of PCA. LR analyses for genetic variants and PCA outcomes were conducted using the major or common genotype as the referent category. All chi-square test and LR analyses were conducted using SAS 9.2 (SAS Institute Inc., Cary, NC) and SVS software (Golden Helix, Inc., Bozeman, MT). Statistical significance was assessed using a p-value < 0.05. Adjustments for multiple comparisons were made using Bonferroni correction and permutation testing.

Statistical power
We performed calculations to determine the statistical power of our sample to detect significant relationships between apoptosis-related polymorphisms and PCA development. The expected risk estimates of our study were determined by specifying values for a number of parameters, including: an average minor allele frequency of at least 26.0%, NCI's estimate of PCA disease prevalence (19%); statistical power (80%); sample size (1,175 PCA cases and 1,111 disease-free individuals or 687 aggressive PCA cases and 488 non-aggressive cases). We assumed the outcome was in complete linkage disequilibrium with an apoptosis-predisposing variant (r 2 = 1.0). Based on the aforementioned parameters, we have > 80% power to detect genetic markers with odds ratios (ORs) of ≥ 1.4 (or 0.71 for protective effects) for PCA risk and ≥ 1.6 (or 0.62 for protective effects) for disease aggressiveness, assuming a codominant model with 1 degree of freedom (df). Power calculations were performed using Power for Genetic Association Version 2 Software [66].

Analysis of gene interactions using multi-factor dimensionality reduction (MDR)
To evaluate the single-and joint-modifying effects of 172 candidate apoptosis-related SNPs within a large dataset, such as CGEMS, is computationally challenging [67,68]. In order to overcome this problem, open-source and freely available MDR 2.0 software (htpp://http://www.epistasis. org) was used to analyze interactions among apoptotic sequence variants in relation to PCA outcomes [69]. MDR is a data reduction approach, designed to detect and characterize high-order interactions in case-control studies. With MDR, information from various genetic loci is categorized into high-risk or low-risk groups of disease. The resulting one-dimensional multi-locus genotype variable is then evaluated for its ability to classify and predict a disease outcome through cross-validation and permutation testing. MDR uses a 10-fold cross validation to estimate the testing accuracy of a model by leaving out 1/10 th of the data as an independent test set. The model is developed on 9/10 th of the data and then evaluated on the remaining test set. This process is repeated for each 1/10 th of the data and the resulting prediction accuracies are averaged.
In the current study, the model with the greatest cross validation consistency (i.e., CVC ≥ 8/10) and highest prediction accuracy [i.e., Average Testing Accuracy (ATA)] was selected as the best predictor of disease outcome. Accuracy is a function of the percentage of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) as defined as (TP + TN)/(TP + TN + FP + FN). ATAs are averaged across all 10 pieces of the data, in order to provide an estimate of the predictive ability of the loci in relation to the outcome of interest. We used crossvalidation consistency (CVC) to measure the degree to which the same best MDR model was selected across the 10 divisions of the data. Models with a CVC of ≥ 8/10 using a 10-fold cross-validation were considered more carefully. If the MDR model met the CVC criteria, we selected models that had the highest ATAs. The combination of CVC and permutation testing were used to control for multiple hypothesis testing. Permutation testing results were statistically significant at the ≤ 0.05 level.
The current version of MDR used in this study enabled the incorporation and adjustment of multiple covariates [70]. To remove the covariate effects [i.e., age-group and family history of PCA (yes or no)], we integrated two sampling methods (i.e., over-and under-sampling). This approach is computationally efficient, since it allows for the adjustment of multiple covariates without significantly increasing computational burden.

Visualization of interaction models using interaction entropy algorithms, hierarchical graphs and statistical epistasis network (SEN)
The interaction entropy algorithm, based on information theory, is a method to verify, visualize, and interpret combination effects identified by parametric (e.g., LR) and non-parametric (e.g., MDR) statistical test [43,[71][72][73][74]. Jakulin and Brakto (2003) have developed a metric to gauge whether the gain in information (i.e., information gain) about a class variable (i.e., ability to predict disease status) from a combination of two variables provides more information than each variable considered independently [72,73]. In the current study, measures of interaction were used to build interaction entropy graphs to visualize and interpret interactions among selected apoptosis-related markers in relation to PCA risk and disease progression. Individual and all possible pairwise loci were assigned a mutual information (I) percentage score in relation to disease risk or aggressiveness; whereby typical scores for genetic loci are < 5%. Pairwise SNP combinations were deemed important if the joint mutual information [I = (SNP 1 , SNP 2 ; disease status)] was greater than the total mutual information of each individual locus considered separately [I (SNP 1 ; disease status) + I (SNP 2 ; disease status)]. Interactions were further visualized using an interaction entropy graph, which uses a color-coding scheme to interpret interactions. Strong interacting factors, coded either red or orange, indicated high and medium levels of synergistic effects on outcomes, respectively. Weak interacting factors, coded either blue or green, denoted high or modest levels of redundancy between markers, respectively. Gold depicted independence and a midway point between synergy and redundancy. Design of interaction entropy graphs was accomplished using the Orange software [75].
An important limitation of any gene-gene interaction analysis method is the combinatorial nature of the problem. Exhaustive analysis of all possible two-, three-, fourway combinations of SNPs creates a computational burden and significant multiple testing problems. We addressed this issue by pre-processing the data using statistical epistasis network (SEN) modeling, which has been described elsewhere [39]. Briefly, this tool uses information theory, as previously described, to develop a network based on the amount of mutual information from SNP 1 , SNP 2 , and a class variable (i.e., disease status). Thus, it describes the pairwise interactive effect on a discrete outcome. An epistasis network, or graph, consists of a set of vertices (individual SNPs) and edges (pairwise interactions) that attach the vertices. In the current study, the statistical epistasis networks (SEN) were built by incrementally adding edges if their strengths were greater than a given threshold. Each vertex and each edge connecting two vertices were assigned a weight commensurate with the strength of main and interactive effects, respectively. The weight of a vertex is depicted by its size; whereby, a larger size is indicative of a higher main effect. Similarly, the thickness of the line between SNP pairs is directly proportional to the strength of their interaction.
We used SEN to visually summarize the strongest individual SNP effects and SNP-pairwise interactions. The topology of the graph, consisting of SNPs and their pairwise interactions, was used to guide MDR modeling. This resulted in a reduction in the total number of SNP combinations. For instance, we reduced the gene-gene interaction analysis from 172 to 24 SNPs that resided within the major connected component with the largest number of vertices (i.e., main effects). The derived major component plot was validated by comparing: (1) the number of vertices and edges among the various connected "sub-networks"; and (2) the size of the major component plot of real data versus randomized data that underwent 1000 permutations. The latter compared data using a pairwise interaction strength cut-off of 1% and significance level of 0.05. To ensure that no significant interactions were identified, non-informative SNPs outside the major component plot were also analyzed by MDR. In essence, SEN-guided MDR facilitated logical prioritization of SNPs for genegene interaction analysis in relation to PCA outcomes.

Additional material
Additional file 1: Table S1. Corresponding Gene and dbSNP IDs for apoptosis-related sequence variants depicted in the Statistical Epistasis Network Modeling Graph, presented in Figure 1.
Additional file 2: Table S2. Functional consequence of selected apoptosis-related Polymorphisms. Authors' contributions NAL: conceptualized the project, conducted the statistical analysis and initial MDR modeling, interpreted study results, and composed the initial manuscript draft. ENR, SY, LRK: Intellectually contributed toward the introduction and/or discussion. TH: performed statistical epistasis network modeling. JZ and JR: conducted MDR modeling and/or SEN-guided MDR analysis. GNB & LRK: co-supervised data-management and statistical/ bioinformatics analyses. LRK, DWH, GNB, JHM, KSK: served as mentors in the study design and implementation as well as data interpretation. All authors contributed to draft reviewing, editing, and approving the final manuscript draft. Submit your manuscript at www.biomedcentral.com/submit