In this study we presented our results of a candidate gene association study in childhood acute lymphoblastic leukemia evaluated by traditional frequentist-based methods and our newly developed BN-BMLA method.
According to the frequentist-based evaluation, among the successfully genotyped 62 SNPs in 19 genes, 6 SNPs in 2 genes associated with increased susceptibility to ALL. But, as the SNPs in both genes were in LD, this corresponded to one signal in each gene, namely in ARID5B and IKZF1. Both associations were specific for B-cell ALL, and in all loci the effect on risk was dose-dependent, as homozygous states associated with higher risks.
ARID5B belongs to a family of transcription factors important in embryonic development, cell type–specific gene expression, and cell growth regulation. SNPs in the gene have been found associated with ALL in several previous genome wide and candidate gene association studies and in different populations [3, 4, 6, 20]. In some studies the associations were restricted to B-hyperdiploid ALL and to males. In our study no such difference was found between boys and girls, and the SNPs were not associated with hyperdiploidy, but the important role of the gene in B-cell ALL susceptibility was confirmed. It must be added, however, that all of the associated SNPs in the ARID5B gene are in intron, and presently it is not known, how they influence the risk to ALL.
SNPs in the IKZF1 gene were identified by independent genome wide association studies in Caucasian children; although later in some studies and in some populations the association was not confirmed [3, 4, 21]. IKZF1 (encoding the lymphoid transcription factor IKAROS) is deleted in approximately 80% of the Philadelphia chromosome–positive ALL with constitutively active BCR-ABL1 tyrosine kinase [22, 23]. Furthermore, Ikaros proteins are master regulators of lymphocyte development, thus IKZF1 is a good candidate gene for ALL. Earlier, one of the SNPs, the rs4132601, which showed an association with increased susceptibility to ALL, and influenced in an in vitro system the expression level of the gene in a dose-dependent fashion, with lower expression being associated with the risk alleles . Since human and mouse studies suggest that diminished expression of IKZF1 interrupts lymphocyte development, creating conditions that maintain the rapidly dividing lymphoblasts that characterize ALL, the lower expression associated with the rs4132601 might contribute to the increased risk to the disease.
SNPs in the STAT3 gene were found to decrease the risk to hyperdiploid ALL. The signal transducer and activator of transcription protein 3, encoded by STAT3, has been identified as a regulator of cell survival after exposure to apoptotic signals [24–26]. STAT3 serves, among others, as a substrate for SYK tyrosine kinase. SYK is capable of associating with and phosphorylating STAT3 in human B-lineage leukemia/lymphoma cells challenged with oxidative stress . Inhibition of SYK with a small molecule drug candidate prevented oxidative stress-induced activation of STAT3 and overcame the resistance of human B-lineage leukemia/lymphoma cells to apoptosis. The decreased risk associated with the SNPs in the STAT3 gene in our study correlates with the results of other studies, e.g. the rs12949918 SNP was found to be associated with decreased susceptibility to different malignancies, like risk to B-cell non-Hodgkin lymphoma or renal cell carcinoma . According to in vitro studies, the SNP affects STAT3 mRNA levels, with the minor allele having a lower STAT3 expression . Presently it is not known how it might influence the risk to hyperdiploid ALL.
We also investigated whether the SNPs influence the survival of the disease. It must be noted, however, that the rate of died patients is lower in our study population (p < 0.001), than in the whole Hungarian ALL population (see Methods), thus our results could be biased in this respect. In the frequentist-based evaluation, the rs11667351 in the BAX gene showed the strongest association and gave the lowest p value (P = 1E-3), but it still did not reach the significance threshold (P ≤ 3.42E-4).
Next, we evaluated our results with the BN-BMLA method. Earlier, we applied this method to evaluating a partial genome screening in asthma . In that study the frequentist-based method identified two genes for asthma susceptibility (FRMD6 and PTGDR), while the BN-BMLA identified 3 additional genes. When we analyzed the cause of this difference, it turned out that the other 3 genes indirectly associated with asthma risk, i.e. in different gene-gene interactions. Then, as BN-BMLA is also capable of analyzing multiple targets, we involved additional phenotypic, target variables in the analysis. In this case the BN-BMLA identified 3 additional genes. The SNPs in these genes influenced the susceptibility to asthma through other target variables, like rhinitis, IgE, or eosinophil levels. As all of these phenotypic variables are in strong association with asthma, association with these might also cause association with the disease. This latter is called transitive associations.
In the present study we could not involve additional target variables in the evaluation, as there are no common known phenotypic characteristics in controls, which significantly change the risk to ALL. The BN-BMLA could confirm the association of SNPs in the ARID5B and IKZF1 to B-cell ALL with high posterior probability. Additionally, however, as explained in the detailed characterization of association relation, the results of the analysis gave additional information about the nature of the relations between the SNPs and the disease. In this case no strongly relevant interactions were found, but the analysis suggested several weak interactions. E.g. in case of ALL susceptibility, rs10821936 in ARID5B and rs17405722 in STAT3 showed a weak interaction, and in case of T-cell lineage sample group, the gender showed a weak interaction with three SNPs in three genes (Figure 4). Interestingly e.g., as it is also known from the scientific literature, the male gender increased the risk of T-cell ALL, but carrying an allele of rs1143684 in the NQO2 gene slightly decreased the risk. The BN-BMLA is also capable of calculating redundancy of the variables. E.g. in the present study the rs10821936 and rs4509706 SNPs in ARID5B gene, showed a moderate redundancy, i.e. their effects are interchangeable with each other. It is also an important finding, that there are no interactions or redundancies between the two most relevant genes, ARID5B and IKZF1. This is also a confirmation of the results of other studies where no interactions were found between the two genes .
Several interactions have been detected in hyperdiploid ALL (Figure 4 C). Although the number of patients in this group is relative low, thus the results must be handled with some reservations, most of these interactions are biologically plausible. E.g. in our analysis a strong interaction was found among SNPs in the NOTCH1, STAT1, STAT3 and BCL2 genes. In the scientific literature it is known that STAT3 is activated in the presence of active Notch. Notch-IC stable transfectants increased STAT1-dependent transcription in response to IFN-gamma. In a zebra fish model of human NOTCH1-induced T-cell leukemia, the leukemia onset was dramatically accelerated when the transgenic fish was crossed with another line over expressing the zebra fish Bcl2 gene, indicating synergy between the Notch pathway and the Bcl2-mediated antiapoptotic pathway [31–33]. All of these results suggest that the pathways represented by these genes are overlapping and there are interactions between them.
Analyzing the effects of the variables on the survival of the patients with BN-BMLA resulted in two SNPs in two genes with strong relevance. The relevance of the rs11667351 SNP in the BAX gene was convincing (posterior probability > 0.75), while that of the rs1040356 in the CEBPA gene was moderate. As can be seen in Figure 6, the SNP in the BAX gene was also involved in an interaction with a SNP in the BCL2 gene. This interaction is biologically plausible, as there are a lot of known interactions between the products of the two genes . E.g. BCL2 prevents BAX/BAK oligomerization, and BCL2 binds to and inactivates BAX. There are also data, although inconclusive, about the role of BAX in the relapse of children with ALL. High levels of BAX protein have been associated with an increased probability of relapse in one study, while in another study both BAX expression levels and the BAX/BCL2 ratio were significantly lower in samples at relapse compared to samples at initial diagnosis [35, 36]. In our study, children homozygous to the minor allele of the rs11667351 SNP had a very poor survival rate (40%, Additional file 5 and Additional file 6). In a study, this variant was associated with lower BAX mRNA in lymphocytes. It must be added, however, that the number of patients homozygous to the minor allele of the rs11667351 was very low in our population.
The protein encoded by CEBPA is a transcription factor which can bind as a homodimer to certain promoters and enhancers. Its mutations have been found to be implicated in acute myelogenous leukemia with favorable prognosis . In our study minor allele carrier status showed better survival rate, than major allele homozygotes, suggesting certain concordance between the two observations.
The BN-BMLA also detected known connections between risk groups, lineage and survival, but also revealed some possible interactions between certain variables like gene-gene, lineage-gene or lineage-gender (Figure 6).
Evaluating the effect of SNPs on the survival rate of the patients resulted in some discrepancies between the two methods. The frequentist-based method detected only nominally significant associations, which, according to the accepted rules, have to be rejected, while the BN-BMLA, especially in the case of the BAX gene found strong, convincing relevancy. It is generally accepted, that the frequentist-based methods cannot properly handle the multiple testing problem. To avoid type I error, sometimes the frequentist methods are too conservative and unable to detect weak effects or interactions. The findings of the BN-BMLA are biologically plausible, but additional studies are needed to confirm these results.
In the present paper we show the ability of BN-BMLA to evaluate a candidate gene association study. As can be seen from the results, the advantage is not that the BN-BMLA can detect more relevant variables, but the Bayesian networks offer a rich language for the detailed representation of types of relevance, including direct and indirect aspects. Additionally, Bayesian statistics offer an automated and normative solution for the multiple hypothesis testing problems . In this study we could not utilize the full potential of the BN-BMLA, since we could not include multiple targets present also in controls, we did not have data from different sources (e.g. from gene expression analysis) and did not involve data from other databases and did not involve a priori knowledge in our evaluation. Furthermore, the investigated population was relative small. Childhood acute lymphoblastic leukemia is a relative rare disease, with an incidence of 50–70 cases in a year in Hungary. In this respect the 543 ALL children in this study can be regarded as a large population. However, in a gene association study, where 62 SNPs are investigated, it is very difficult to detect in such a population weak effects or gene-gene interactions. Still, the BN-BMLA was able to reveal, besides the strongly relevant ARID5B and IKZF1 polymorphisms, several possible interactions, and showed the possible types of them. According to our studies in larger, artificial datasets, if there are real complex interactions among the variables, the method is able to reveal complex network of interactions, significantly more complex than in Figure 1 in this study .
The Bayesian statistical framework allows the calculations of posteriors over a wide range of hypotheses, such as strong relevance of variables, pairs of variables, triplets of variables, etc. [9–13]. This shows the advantage of the Bayesian framework, because it allows the selection of appropriate level of complexity of hypotheses, which is not possible in the traditional hypothesis testing approach. Furthermore, this Bayesian global relevance analysis method provides posteriors, which are direct statements about hypotheses, thus it can also be used to construct probabilistic data analytic knowledge bases in genetic association studies to support complex quering, off-line meta-analysis, and fusion with background knowledge.
We offer the BN-BMLA method for academic purposes. The tool is available at a public website .