Skip to main content

c-Diadem: a constrained dual-input deep learning model to identify novel biomarkers in Alzheimer’s disease



Alzheimer’s disease (AD) is an incurable, debilitating neurodegenerative disorder. Current biomarkers for AD diagnosis require expensive neuroimaging or invasive cerebrospinal fluid sampling, thus precluding early detection. Blood-based biomarker discovery in Alzheimer’s can facilitate less-invasive, routine diagnostic tests to aid early intervention. Therefore, we propose “c-Diadem” (constrained dual-input Alzheimer’s disease model), a novel deep learning classifier which incorporates KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway constraints on the input genotyping data to predict disease, i.e., mild cognitive impairment (MCI)/AD or cognitively normal (CN). SHAP (SHapley Additive exPlanations) was used to explain the model and identify novel, potential blood-based genetic markers of MCI/AD.


We developed a novel constrained deep learning neural network which utilizes SNPs (single nucleotide polymorphisms) and microarray data from ADNI (Alzheimer’s Disease Neuroimaging Initiative) to predict the disease status of participants, i.e., CN or with disease (MCI/AD), and identify potential blood-based biomarkers for diagnosis and intervention. The dataset contains samples from 626 participants, of which 212 are CN (average age 74.6 ± 5.4 years) and 414 patients have MCI/AD (average age 72.7 ± 7.6 years). KEGG pathway information was used to generate constraints applied to the input tensors, thus enhancing the interpretability of the model. SHAP scores were used to identify genes which could potentially serve as biomarkers for diagnosis and targets for drug development.


Our model’s performance, with accuracy of 69% and AUC of 70% in the test dataset, is superior to previous models. The SHAP scores show that SNPs in PRKCZ, PLCB1 and ITPR2 as well as expression of HLA-DQB1, EIF1AY, HLA-DQA1, and ZFP57 have more impact on model predictions.


In addition to predicting MCI/AD, our model has been interrogated for potential genetic biomarkers using SHAP. From our analysis, we have identified blood-based genetic markers related to Ca2+ ion release in affected regions of the brain, as well as depression. The findings from our study provides insights into disease mechanisms, and can facilitate innovation in less-invasive, cost-effective diagnostics. To the best of our knowledge, our model is the first to use pathway constraints in a multimodal neural network to identify potential genetic markers for AD.


Alzheimer’s disease (AD) is the most common form of dementia, characterized by a gradual loss of cognition and memory. AD is expected to affect around 78 million older adults by 2030 [1]. Diagnosis of probable or possible AD is based on symptom presentation and neuropsychological testing according to NINCDS-ADRDA (National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer’s Disease and Related Diseases Association) criteria [2]. Although neuroimaging and biofluid-based biomarkers exist for diagnosis in living patients, they are invasive, not widely accessible and not amenable for definitive diagnosis [3]. On the other hand, blood tests are less invasive, facilitate screening and early diagnosis, and confer significant cost benefits [3]. Therefore, blood-based biomarker discovery has become a key area of clinical research in AD [4].

Disease-related genes are commonly identified using genome-wide association studies (GWAS) [5], in which several million single nucleotide polymorphisms (SNPs) are tested for association with a specific trait or disease. While APOE ε4 has been established as a significant risk factor for AD susceptibility, AD is considered a complex trait in which a combination of genetic and environmental factors influences disease pathology [5]. Several genes, including CR1, PICALM, ABCA7, HLA-DRB5/HLA-DRB1, and SLC24A4/RIN3 have been identified as AD susceptibility loci by GWAS, and are associated with inflammation, immune response, lipid metabolism and intracellular trafficking [5]. Nevertheless, AD-associated variants (except APOE ε4) have small effect sizes and are not sufficient to explain a large component of genetic heritability, which is estimated to contribute up to 80% of the observed disease phenotype [6].

Machine learning (ML) methods have greater power to capture interactions between single variants and genes, which may explain heritability to a greater extent. For instance, Segura et al. [7] showed that tree-based methods prioritized SNPs located in genes PVRL2, TOMM40, APOE, and APOC1. Genomic profiles showed interactions between specific SNPs in both UK BioBank and the Alzheimer’s Disease Neuroimaging Initiative (ADNI) datasets. The telescopic ML-based GWAS strategy proposed by Squillario et al. [8] integrated results at SNP, gene, and pathway levels of information. The authors identified TOMM40 and GRM7 as strongly associated with APOE ε4 status at the SNP, gene and pathway levels, thus providing support for cumulative polygenetic susceptibility to AD. Given that AD pathology is associated with systemic changes reflected in other parts of the body [9,10,11], we hypothesize that ML models are better suited to identify blood-based markers for AD.

Furthermore, recently published artificial intelligence (AI) models have focussed on interpretability, such as the deep learning framework proposed by Qui et al. [12], which provides a disease probability map as an intermediate output after training a fully convolutional network on magnetic resonance imaging (MRI) data. The disease probability map is fed into a multilayer perceptron (MLP) for binary classification along with clinical information such as age, gender and mini-mental state examination (MMSE) scores. Deep learning models in cancer have incorporated biological information in the form of pathway constraints from KEGG (Kyoto Encyclopedia of Genes and Genomes) [13] and Reactome [14] to increase prediction performance and enhance interpretability. Moreover, several studies [15,16,17,18] have employed SHapley Additive exPlanations (SHAP) [19], to understand model prediction and derive insights into disease markers and pathology.

Therefore, in our study, we present c-Diadem, a constrained dual-input AI model for Alzheimer’s disease which incorporates KEGG pathway constraints to accurately predict disease status. We used genomic and expression data from the Alzheimer’s Disease Neuroimaging Inititative (ADNI) [20] for training, validation, and testing. We further explain and interpret the model using SHAP to identify novel, potential genetic markers of AD.


We have summarized our data preprocessing, model development and interpretation steps in a flow diagram in Fig. 1. The datasets and methods have been described in detail in the following sections.

Fig. 1
figure 1

Flow diagram showing an overview of data preprocessing, model development and interpretation

Alzheimer’s Disease Neuroimaging Initiative (ADNI)

Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database ( The ADNI was launched in 2003 as a public–private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early AD. For up-to-date information, see In addition to MRI and PET neuroimaging of patients at regular intervals, ADNI has collected and analyzed whole blood samples for genotyping and gene expression analysis. Table 1 provides a summary of the genotyping data provided by ADNI. Blood gene expression profiling was conducted using Affymetrix Human Genome U219 Array for 744 samples in the ADNI2 and ADNI-GO (ADNI-Grand Opportunity) phases [20].

Table 1 ADNI genotyping data summary

Data preprocessing

Genotyping data for ADNI1, ADNI2/ADNIGO and ADNI3 are available for 1877 participants in total. Out of 1877 participants, 626 participants from the ADNIGO/ADNI2 phase have also provided whole blood samples for the gene expression assay. Thus, these 626 patients were selected for our study and their genotyping data was used as the genomic data source for our model. All SNPs have been identified using the hg18 build and therefore, the input data does not contain any missing information. SNP data for the selected patients were downloaded in PLINK binary format, consisting of.bed,.bim and.fam files. The.bed file is the primary representation of genotype calls of biallelic variants. The.bim file accompanies the.bed file and provides extended variant information, i.e., SNP IDs, base-pair coordinates, and the minor and major alleles. The.fam file provides sample information, including parent IDs and phenotype. After minor allele frequency (MAF) filtering, we utilized the.bim file to identify the minor allele and encoded SNPs using additive representation (i.e., 0 = homozygous dominant, 1 = heterozygous, 2 = homozygous recessive). The dbSNP ID was used as the unique identifier for SNPs. Then, coding SNPs (which occur in coding sequences, or CDS) were mapped to their corresponding gene loci. Input values represent the aggregate additive value of all SNPs mapped to the coding sequences of individual genes. The gene expression data merged with the SNP data matrix represents model inputs. For the development of a constrained model, KEGG pathway data was used to populate the constraints matrix.

We applied a train/test/validation split of 56–30-14%. The dataset consists of 212 CN, 317 MCI and 97 AD samples. Thus, the datasets were balanced using target stratification and the Synthetic Minority Oversampling Technique (SMOTE). SMOTE boosted model performance compared to other oversampling techniques such as ADASYN (adaptive synthetic), SVMSMOTE (support vector machines SMOTE) and borderline SMOTE (oversampling limited to borderline cases). Moreover, SMOTE has been used previously in predicting Alzheimer’s disease using MRI images [21]. Therefore, SMOTE was the preferred strategy for addressing data imbalance in our study. Finally, sample labels were converted to 2 × 1 scalars using one-hot encoding.

AI model design

We used Python v3.8 with the Functional API of keras v2.4.3 to design and build the constrained and unconstrained neural networks. The constrained model topology is shown in Fig. 1a. The input data for our model includes genotyping data for 5188 mapped genes and gene expression data for 19,403 genes. The SNP data and gene expression data were provided as separate inputs. The edges between the genotyping input layer and the pathway layers were constrained using prior biological information from KEGG pathway datasets [22]. KEGG data from the Molecular Signatures database (MSigDB) [23] was used to create constraints encoded as a binary weights matrix which sets all non-existent connections among the genes and pathways to zero. Therefore, the edges from the input genes to the unrelated pathways were set to 0 and remained constant during training. The genes were mapped to 186 KEGG pathways, which form the second layer of the constrained model. Then, the pathway features were transformed using a 1D convolutional layer (kernel size 12) before being concatenated with the gene expression input. The concatenated data was passed through a batch normalization layer and three hidden layers. The output layer contains two nodes representing the binary outcomes of CN and MCI/AD. Softmax activation was used to convert the output into the respective CN and MCI/AD probabilities, with the higher probability used for classification of disease status. The hyperparameters are summarised in Table 2.

Table 2 Model hyperparameters

Performance evaluation

The predictive performance of the model was evaluated on the validation and test sets using area under the curve (AUC), accuracy, precision, recall and F1 score. The best-performing weights were chosen using a callback on validation loss. The AUC, accuracy, precision, and recall were calculated for each epoch, for both validation and training. The formulae for the performance metrics are provided below.

$$\begin{array}{c}Accuracy=\frac{TP+TN}{TP+FN+TN+FP}\\\begin{array}{c}Precision=\frac{TN}{TN+FP}\\Recall=\frac{TP}{TP+FN}\\F1score=2\cdot\frac{Precision\cdot Sensitivity}{Precision+Sensitivity}\end{array}\end{array}$$

Here, true positives (TP) refer to the number of correctly predicted MCI/AD cases. True negatives (TN) refer to the number of correctly predicted CN cases. False positives (FP) refer to the CN cases incorrectly predicted as MCI/AD. False negatives (FN) refer to MCI/AD cases incorrectly predicted as CN.

Model interpretation with SHAP

Model interpretation is essential to gain user trust and overcome the ‘black box’ reputation of deep learning models. Lundberg and Lee [19] proposed SHAP values as a unified measure of feature importance, computed using game theory. SHAP scores were computed using the Python shap package (v0.39.0) to identify genes which could potentially be considered biomarkers for diagnosis with prodromal and advanced Alzheimer’s. The top twenty selected genes were then analyzed using STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) [24] to identify enriched pathways and their role in Alzheimer’s disease.

Statistical analysis

Differences in clinical features between CN and MCI/AD subjects were analyzed using statistical tests for significance. We performed the analysis of variance (ANOVA) test on the age of onset and years of education. We used the Chi-square test for differences in the proportion of male and female participants and the presence of the APOE ε4 allele. Student’s t-tests were utilized for the neuropsychological test scores, namely the MMSE (Mini-Mental State Examination) and CDRSB (Clinical Dementia Rating – Sum of Boxes) scores.


We developed a constrained deep learning binary classifier which was trained and tested on genotyping and gene expression data from 626 ADNI participants, with a train-test-validation split of 56–30-14%. The data was imbalanced with 212 CN participants and 414 patients with MCI/AD. Therefore, we applied target stratification and SMOTE to balance the training dataset. The inputs for our model include 5188 genes mapped from SNPs and gene expression data for 19,403 genes.

Clinical cohort characteristics

The clinical characteristics of the CN and MCI/AD are summarized in Table 3. The CN and MCI/AD groups differ significantly in terms of mean age (p = 0.001) and proportion of female participants (p = 0.015). Both CN and MCI/AD groups have similar years of education. We also observe a higher proportion of MCI/AD patients with the APOE ε4 allele compared to CN subjects (p < 0.001). As expected, MCI/AD patients show significantly worse performance in neuropsychological tests such as CDRSB (p < 0.001) and MMSE (p < 0.001).

Table 3 Clinical cohort characteristics

Model development

The constrained deep learning model is a parsimonious feed-forward neural network with 14 layers, which uses SNPs and microarray data from ADNI. The topology of the model is depicted in Fig. 2a. The model was used to predict whether a given patient was cognitively normal (CN) or was in the AD spectrum (either the prodromal stage of MCI or advanced AD). The SNP inputs were constrained using KEGG pathway information before concatenation with gene expression data and passed through dense, hidden layers.

Fig. 2
figure 2

Model topology and performance. a Model topology for unconstrained and constrained model. Pathway constraints are used for the constrained model on the SNP input data. b)Receiver-operating characteristic (ROC) curves show increased performance (area under the curve (AUC) = 0.70) for constrained model. TPR and FPR stand for True Positive Rate and False Positive Rate respectively

The model was trained with the early stopping callbacks on validation loss monitoring to prevent overfitting. The maximum number of epochs was set at 80 with a default batch size of 32. We also developed an unconstrained network with 9 layers, including an embedding layer of size (186, 3), depicted in Fig. 2a, for comparison.

Model performance

We evaluated the ability of the constrained model to classify the patients as MCI/AD or CN. Our model, c-Diadem, achieved an accuracy of 69% and an AUC of 70% on the test dataset. The inclusion of KEGG pathway constraints increased model accuracy and AUC, as shown in Fig. 2b. The performance metrics of our model have been compared with other classifiers in Table 4. Our model shows a high F1 score of 0.69 compared to the SNP-only deep model (F1 score = 0.53). The F1 score includes both the sensitivity and specificity of the model. With SNPs data, our model shows an accuracy of 0.64 and an AUC of 0.67 (F1 score = 0.64). The accuracy and AUC of the model improve with the inclusion of gene expression data. Therefore, combining SNPs and gene expression inputs helps our model significantly outperform previous models (accuracy = 0.69, AUC = 0.70, F1 score = 0.69). An AUC cut-off of 0.70 or more indicates a moderate-to-high predictive ability for models of dementia risk [25].

Table 4 c-Diadem performance metrics compared with current models

Feature importance using SHAP

To determine the relative importance of genetic features (genotyping and gene expression data), we computed SHAP values in the constrained model for the respective inputs. Figure 3 provides SHAP values for the top twenty features which have the highest impact on model classification. Our results show that SNPs in PRKCZ, PLCB1 and ITPR2 are considered important for prediction of disease status (both MCI and AD). On the other hand, the expression of HLA-DQB1, EIF1AY, HLA-DQA1, and ZFP57 has more predictive value compared to the expression of other genes. The genes selected by SHAP may be considered as potential biomarkers.

Fig. 3
figure 3

SHAP results based on ADNI (a) genotyping and (b) gene expression data, computed using the constrained model

We analyzed the interaction network formed by SHAP-identified genes using the STRING database. From a total of 40 genes, 34 genes were used to construct the interaction network. Some genes did not have a representative protein and so were not included in the network (LOC100653194, EIF1AY, HLA-DRB4, ENSG00000206239, MGC39372, LOC100505946). The resulting network had significantly more edges than expected (protein–protein interactions (PPI) enrichment p-value = 0.00247) which indicates biological relationships among the genes. The network was further clustered into subnetworks using k-means clustering. The subnetwork with the most significant enrichment (p = 5.56 × 10–6) was associated with pathways involved in the release of Ca2+ ions into the cytosol (p = 0.0016), as well as pathways commonly associated with long-term depression (false discovery rate (FDR) = 3.77 × 10–5) and salivary secretion (FDR = 4.55 × 10–6). The results of the pathway enrichment analysis have been as summarized in Table 5. The interaction network with clusters (colored red, green and blue) is represented in Fig. 4.

Table 5 Pathway enrichment for gene markers by SHAP
Fig. 4
figure 4

STRING network showing colored clusters


Clinical diagnosis through blood sampling would be preferable to cerebrospinal fluid (CSF) sampling, as it is less invasive for patients and amenable for longitudinal monitoring. However, there are several challenges to testing for conventional biomarkers in blood samples. One major issue in blood sampling as a diagnostic tool is the high background of plasma proteins such as albumin and immunoglobulin against which extremely low levels of amyloid-β and tau proteins would need to be quantitated. The low levels of these proteins may be further subject to metabolization and clearing by physiological processes [29]. Moreover, α-synuclein is highly expressed by red blood cells, precluding its use as a blood-based biomarker [30]. Thus, we have implemented a constrained AI model to probe genomic and gene expression data from ADNI for alternative blood-based biomarkers.

Recent studies indicate that the development of Alzheimer’s disease is associated with systemic changes in the neuronal environment reflected in other parts of the body. Studies in peripheral blood mononuclear cells (PBMCs) from amnestic MCI and AD patients show differential expression of senescence markers, such as cell cycle blockade (p16 and p53), DNA damage response (γH2AX) and proinflammatory IL-6/-8 mRNA levels [9]. Garfias et al. [10] have reported significantly higher levels of activated lymphocytes in AD patients. Moreover, a gene expression analysis of PBMC samples in the AddNeuroMed cohort [31] identified DEGs significantly enriched in pathways related to T cell and neutrophil activation in immune response, lymphocyte differentiation, protein serine/threonine kinase activity, GTPase and DNA transcription factor binding [11, 32].

Our findings indicate that the number of SNPs located in ITPR1, PLCB1, DMD, ITPR2 and RYR3 genes are indicative of the dysregulation of processes related to Ca2+ sequestration and release. Increased calcium loads in the cytosol can lead to the formation of mitochondrial pore complexes and consequently, apoptosis. Increased Ca2+ influx and mitochondrial sequestration is observed in CA1 (cornu ammonis 1) neurons in the hippocampus, which are said to be selectively vulnerable to neurodegeneration in Alzheimer’s disease [33]. The Ca2+ influx is facilitated by NR2B overexpression and lower levels of calcium-binding proteins (CBPs) among other factors [34]. Using our model, we found markers of a key event (Ca2+ homeostasis dysregulation) in data extracted from patient blood samples. We also observe that pathways related to long-term depression are enriched in our study. Depression is a known risk factor for cognitive decline [35] and up to 50% of AD patients are known to suffer depression [36]. Moreover, from the selected genes, HLA-DQA1 has been reported previously as a risk factor in late-onset Alzheimer’s disease using GWAS [37] and differential expression analysis [38].

It is important to note that our model is limited by the pathway constraints derived from our dataset of choice, KEGG. The genotyping input data of our model has been populated by coding SNPs. Coding SNPs were utilized as they are easily mapped to genes and their respective pathways. Nonetheless, we believe our model’s performance can be enhanced by the inclusion of non-coding SNPs as well as other types of -omics data. We also recognize that the potential genetic biomarkers identified by our study require experimental validation. We hope to explore biomarkers at different stages of Alzheimer’s by refining and training the model to distinguishing between MCI and AD patients in a future study. Refining the model and addressing limitations to enhance performance will help emphasis the validity of our findings.


The development of alternative biomarkers in Alzheimer’s are foundational for developing less-invasive diagnostics as well as breakthroughs in drug development. Therefore, we have developed a constrained, explainable deep learning model incorporating biological information to accurately predict the disease status of patients. Further, we used SHAP to identify potential biomarkers, which are associated with pathways known to be dysregulated in Alzheimer’s. Therefore, our method can be used to drive research in drug development and diagnostics for Alzheimer’s disease and other dementias.

Availability of data and materials

The models are made available on GitHub at The data that support the findings of this study are available from ADNI, but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available.



Alzheimer’s disease


Alzheimer’s Disease Neuroimaging Initiative


Area under the curve


Artificial intelligence


Calcium-binding proteins


Cerebrospinal fluid


Clinical Dementia Rating – Sum of Boxes


Cognitively normal


Cornu ammonis 1


Deep neural network


Differentially expressed genes


False discovery rate


False Positive Rate


Gene Ontology


Genome-wide association studies


Kyoto Encyclopedia of Genes and Genomes


Machine learning


Magnetic resonance imaging


Mild cognitive impairment


Mini-mental state examination


Molecular Signatures database


Multilayer perceptron


National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer’s Disease and Related Diseases Association


Not available


Peripheral blood mononuclear cell


Positron emission tomography


Protein-protein interactions


Random Forest


Receiver-operating characteristic


Search Tool for the Retrieval of Interacting Genes/Proteins


SHapley Additive exPlanations


Single nucleotide polymorphisms


Synthetic Minority Oversampling Technique


True Positive Rate


  1. Gauthier S, Rosa-Neto P, Morais JA, Webster C. World Alzheimer Report 2021: Journey through the diagnosis of dementia. London: Alzheimer’s Disease International; 2021.

    Google Scholar 

  2. McKhann GM, Knopman DS, Chertkow H, Hyman BT, Jack CR Jr, Kawas CH, et al. The diagnosis of dementia due to Alzheimer’s disease: recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimers Dement. 2011;7(3):263–9.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Padala SP, Newhouse PA. Blood-based biomarkers in Alzheimer’s disease: a mini-review. Metab Brain Dis. 2023;38(1):185–93.

    Article  PubMed  Google Scholar 

  4. Teunissen CE, Verberk IMW, Thijssen EH, Vermunt L, Hansson O, Zetterberg H, et al. Blood-based biomarkers for Alzheimer’s disease: towards clinical implementation. Lancet Neurol. 2022;21(1):66–77.

    Article  CAS  PubMed  Google Scholar 

  5. Reitz C. Genetic diagnosis and prognosis of Alzheimer’s disease: challenges and opportunities. Expert Rev Mol Diagn. 2015;15(3):339–48.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Raghavan N, Tosto G. Genetics of Alzheimer’s disease: the importance of polygenic and epistatic components. Curr Neurol Neurosci Rep. 2017;17(10):78.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Segura AM, Bini G, Orth FD, Samaras E, Kassis M, Aisopos F, et al. Machine learning methods applied to genotyping data capture interactions between single nucleotide variants in late onset Alzheimer’s disease. Alzheimers Dement (Amst). 2022;14(1):e12300.

    Article  Google Scholar 

  8. Squillario M, Abate G, Tomasi F, Tozzo V, Barla A, Uberti D, et al. A telescope GWAS analysis strategy, based on SNPs-genes-pathways ensemble and on multivariate algorithms, to characterize late onset Alzheimer’s disease. Sci Rep. 2020;10(1):12063.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Salech F, SanMartín CD, Concha-Cerda J, Romero-Hernández E, Ponce DP, Liabeuf G, et al. Senescence markers in peripheral blood mononuclear cells in amnestic mild cognitive impairment and Alzheimer’s Disease. Int J Mol Sci. 2022;23(16):9387.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Garfias S, Tamaya Domínguez B, Toledo Rojas A, Arroyo M, Rodríguez U, Boll C, et al. Peripheral blood lymphocyte phenotypes in Alzheimer and Parkinson’s diseases. Neurologia (Engl Ed). 2022;37(2):110–21.

    Article  CAS  PubMed  Google Scholar 

  11. Sood S, Gallagher IJ, Lunnon K, Rullman E, Keohane A, Crossland H, et al. A novel multi-tissue RNA diagnostic of healthy ageing relates to cognitive health status. Genome Biol. 2015;16(1):185.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Qiu S, Joshi PS, Miller MI, Xue C, Zhou X, Karjadi C, et al. Development and validation of an interpretable deep learning framework for Alzheimer’s disease classification. Brain. 2020;143(6):1920–33.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Zhao L, Dong Q, Luo C, Wu Y, Bu D, Qi X, et al. DeepOmix: a scalable and interpretable multi-omics deep learning framework and application in cancer survival analysis. Comput Struct Biotechnol J. 2021;19:2719–25.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Elmarakeby HA, Hwang J, Arafeh R, Crowdis J, Gang S, Liu D, et al. Biologically informed deep neural network for prostate cancer discovery. Nature. 2021;598:348–52.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. El-Sappagh S, Alonso JM, Islam SMR, Sultan AM, Kwak KS. A multilayer multimodal detection and prediction model based on explainable artificial intelligence for Alzheimer’s disease. Sci Rep. 2021;11:2660.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Bogdanovic B, Eftimov T, Simjanoska M. In-depth insights into Alzheimer’s disease by using explainable machine learning approach. Sci Rep. 2021;12:6508.

    Article  CAS  Google Scholar 

  17. Hernandez M, Ramon-Julvez U, Ferraz F, with the ADNI Consortium. Explainable AI toward understanding the performance of the top three TADPOLE Challenge methods in the forecast of Alzheimer’s disease diagnosis. PLoS One. 2022;17(5):e0264695.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Lombardi A, Diacono D, Amoroso N, Biecek P, Monaco A, Bellantuono L, et al. A robust framework to investigate the reliability and stability of explainable artificial intelligence markers of Mild Cognitive Impairment and Alzheimer’s Disease. Brain Inform. 2022;9(1):17.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Lundberg SM, Lee S-I. A unified approach to interpreting model predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc.; 2017. p. 4768–77.

    Google Scholar 

  20. Saykin AJ, Shen L, Yao X, Kim S, Nho K, Risacher SL, et al. Genetic studies of quantitative MCI and AD phenotypes in ADNI: Progress, opportunities, and plans. Alzheimers Dement. 2015;11(7):792–814.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Liu Y, Tang K, Cai W, Chen A, Zhou G, Li L, et al. MPC-STANet: Alzheimer’s Disease Recognition Method Based on Multiple Phantom Convolution and Spatial Transformation Attention Mechanism. Front Aging Neurosci. 2022;14:918462.

  22. Kanehisa M, Goto S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 2000;28(1):27–30.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Liberzon A, Subramanian A, Pinchback R, Thorvaldsdóttir H, Tamayo P, Mesirov JP. Molecular signatures database (MSigDB) 3.0. Bioinformatics. 2011;27(12):1739–40.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Szklarczyk D, Gable AL, Nastou KC, Lyon D, Kirsch R, Pyysalo S, et al. The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 2021;49(D1):D605–12.

    Article  CAS  PubMed  Google Scholar 

  25. Hou XH, Feng L, Zhang C, Cao XP, Tan L, Yu JT. Models for predicting risk of dementia: a systematic review. J Neurol Neurosurg Psychiatry. 2019;90(4):373–9.

    Article  PubMed  Google Scholar 

  26. Lee T, Lee H. Prediction of Alzheimer’s disease using blood gene expression data. Sci Rep. 2020;10(1):3485.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Venugopalan J, Tong L, Hassanzadeh HR, Wang MD. Multimodal deep learning models for early detection of Alzheimer’s disease stage. Sci Rep. 2021;11:3254.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Oriol JDV, Vallejo EE, Estrada K, Taméz Peña JG, Alzheimer’s Disease Neuroimaging Initiative. Benchmarking machine learning models for late-onset Alzheimer’s disease prediction from genomic data. BMC Bioinformatics. 2019;20(1):709.

    Article  Google Scholar 

  29. Blennow K, Zetterberg H. Biomarkers for Alzheimer’s disease: current status and prospects for the future. J Intern Med. 2018;284(6):643–63.

    Article  CAS  PubMed  Google Scholar 

  30. Lashley T, Schott JM, Weston P, Murray CE, Wellington H, Keshavan A, et al. Molecular biomarkers of Alzheimer’s disease: progress and prospects. Dis Model Mech. 2018;11(5):dmm031781.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Lovestone S, Francis P, Kloszewska I, Mecocci P, Simmons A, Soininen H, et al. AddNeuroMed–the European collaboration for the discovery of novel biomarkers for Alzheimer’s disease. Ann N Y Acad Sci. 2009;1180:36–46.

    Article  CAS  PubMed  Google Scholar 

  32. Wang X, Wang D, Su F, Li C, Chen M. Immune abnormalities and differential gene expression in the hippocampus and peripheral blood of patients with Alzheimer’s disease. Ann Transl Med. 2022;10(2):29.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Padurariu M, Ciobica A, Mavroudis I, Fotiou D, Baloyannis S. Hippocampal neuronal loss in the CA1 and CA3 areas of Alzheimer’s disease patients. Psychiatr Danub. 2012;24(2):152–8.

    PubMed  Google Scholar 

  34. Muddapu VR, Dharshini SAP, Chakravarthy VS, Gromiha MM. Neurodegenerative Diseases - Is Metabolic Deficiency the Root Cause? Front Neurosci. 2020;14:213.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Hakim A. Perspectives on the complex links between depression and dementia. Front Aging Neurosci. 2022;14:821866.

    Article  PubMed  PubMed Central  Google Scholar 

  36. Chi S, Yu JT, Tan MS, Tan L. Depression in Alzheimer’s disease: epidemiology, mechanisms, and management. J Alzheimers Dis. 2014;42(3):739–55.

    Article  PubMed  Google Scholar 

  37. Zhang X, Zou M, Wu Y, Jiang D, Wu T, Zhao Y, et al. Regulation of the Late Onset Alzheimer’s Disease Associated HLA-DQA1/DRB1 Expression. Am J Alzheimers Dis Other Demen. 2022;37:15333175221085066.

    Article  PubMed  Google Scholar 

  38. Panitch R, Hu J, Xia W, Bennett DA, Stein TD, Farrer LA, et al. Blood and brain transcriptome analysis reveals APOE genotype-mediated and immune-related pathways involved in Alzheimer disease. Alzheimers Res Ther. 2022;14(1):30.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references


Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database ( As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at:

About this supplement

This article has been published as part of BMC Medical Genomics Volume 16 Supplement 2, 2023: The 21st International Conference on Bioinformatics (InCoB2022): medical genomics. The full contents of the supplement are available online at


This work is supported by Khalifa University of Science and Technology under Award no. FSU-2021–005. Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12–2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health ( ). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.

Author information

Authors and Affiliations




AMAS acquired funding, conceptualized the study, and supervised the project. SJ performed the literature review, designed the study, pre-processed data, built the models, analyzed the data, generated visualizations, and wrote the manuscript. AMAS provided expert opinion and verified the methods. Both SJ and AMAS reviewed the manuscript. Alzheimer’s Disease Neuroimaging Initiative (ADNI) provided data used in preparation of this article. Please see acknowledgements for more details about ADNI.

Corresponding author

Correspondence to Aamna AlShehhi.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database ( As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at:

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jemimah, S., AlShehhi, A. & for the Alzheimer’s Disease Neuroimaging Initiative. c-Diadem: a constrained dual-input deep learning model to identify novel biomarkers in Alzheimer’s disease. BMC Med Genomics 16 (Suppl 2), 244 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: