Skip to main content

Prediction of metabolic syndrome using machine learning approaches based on genetic and nutritional factors: a 14-year prospective-based cohort study

Abstract

Introduction

Metabolic syndrome is a chronic disease associated with multiple comorbidities. Over the last few years, machine learning techniques have been used to predict metabolic syndrome. However, studies incorporating demographic, clinical, laboratory, dietary, and genetic factors to predict the incidence of metabolic syndrome in Koreans are limited. In the present study, we propose a genome-wide polygenic risk score for the prediction of metabolic syndrome, along with other factors, to improve the prediction accuracy of metabolic syndrome.

Methods

We developed 7 machine learning-based models and used Cox multivariable regression, deep neural network (DNN), support vector machine (SVM), stochastic gradient descent (SGD), random forest (RAF), Naïve Bayes (NBA) classifier, and AdaBoost (ADB) to predict the incidence of metabolic syndrome at year 14 using the dataset from the Korean Genome and Epidemiology Study (KoGES) Ansan and Ansung.

Results

Of the 5440 patients, 2,120 were considered to have new-onset metabolic syndrome. The AUC values of model, which included sex, age, alcohol intake, energy intake, marital status, education status, income status, smoking status, dried laver intake, and genome-wide polygenic risk score (gPRS) Z-score based on 344,447 SNPs (p-value < 1.0), were the highest for RAF (0.994 [95% CI 0.985, 1.000]) and ADB (0.994 [95% CI 0.986, 1.000]).

Conclusions

Incorporating both gPRS and demographic, clinical, laboratory, and seaweed data led to enhanced metabolic syndrome risk prediction by capturing the distinct etiologies of metabolic syndrome development. The RAF- and ADB-based models predicted metabolic syndrome more accurately than the NBA-based model for the Korean population.

Peer Review reports

Introduction

Metabolic syndrome is currently defined as a constellation of metabolic abnormalities, including decreased high-density lipoprotein, central obesity, hypertension, and elevated serum triglycerides [1]. In a meta-analysis up to 2021, the global prevalence of metabolic syndrome varied from 12.5 to 31.4%, depending on the various diagnostic criteria [2]. In Korea, the prevalence of metabolic syndrome has increased from 27.1% in 2001 to 33.2% in 2020 [3]. Metabolic syndrome has significant effects on the development of cardiovascular diseases and type 2 diabetes [4] and has become a major public health challenge. Therefore, early detection of individuals at high risk for metabolic syndrome is essential not only for the prevention of but also to decrease associated complications. Therefore, prediction models for metabolic syndrome using machine learning and deep learning techniques have been developed using a decision tree algorithm [5,6,7], tree-based random forest [8,9,10], extreme gradient boosting (XGBoost) [11, 12], Gaussian NB model [11], artificial neural network [13, 14], logistic regression [15], and support vector machine (SVM) [7, 16] with various features, including genetic and clinical data. Recently, body mass index, waist circumference, waist-to-height ratio, waist-to-hip ratio, and systolic and diastolic blood pressure were found to be the most predictive variables, and models with 78.4% and 63.5% accuracy, and 81.2% and 75.3% sensitivity were obtained for men and women, respectively, using support vector algorithms in an Iranian cohort study [16]. Furthermore, by adding dietary features, such as total vegetables, legumes, dairy products, percent polyunsaturated fatty acids, percent protein, and percent added sugars to the model, the accuracy of the model in women improved by 3.7% [16].

Dietary factors play a significant role in development of metabolic syndrome [17,18,19]. Seaweed, a part of the traditional diet, contains compounds that may reduce the prevalence of metabolic syndrome. These compounds include polysaccharides, peptides, pigments, vitamins A, B, C, and E, dietary fiber, ω-3 fatty acids, and essential amino acids [20, 21]. Consumption of 4–6 g/day of seaweed was associated with a low prevalence of metabolic syndrome in a randomized double-blinded placebo-controlled trial [22]. Genetic factors are also associated with metabolic syndromes. Heritability estimates for metabolic syndrome range from 10–30% [23,24,25], indicating that metabolic syndrome is partially heritable. A systematic review suggested an association between metabolic syndrome and SNPs in the FTO, TCFL72, IL6, APOA5, APOC3 and CETP genes [26].

Our study adopted a comprehensive framework integrating a broad spectrum of variables, including dietary factors such as seaweed intake, genetic predisposition, and demographic and clinical biomarkers. In line with previous research, this combination lines up with the recent shift in metabolic syndrome research towards more complex, precise, and data-intensive models that aim to capture the multifaceted nature of the disease [27]. However, the utility of incorporating genomic and seaweed data into metabolic syndrome risk prediction has not yet been demonstrated. This study aimed to develop machine learning-based predictive models for metabolic syndrome by incorporating demographic, genetic, clinical factors and seaweed intake status.

Materials and methods

Source and study participants

This study included a prospective community cohort from the Korean Genome and Epidemiology Study (KoGES) Ansan and Ansung. The KoGES investigated the lifestyle and nutritional factors influencing the prevalence and occurrence of chronic diseases in Koreans. The study was conducted between 2001 and 2002. A total of 10,030 individuals between the ages of 40 and 69 years in the Ansung and Ansan regions were enrolled and followed up every 2 years. This study used data from 2001–2002 to 2015–2016 (7th follow-up). Among the 10,030 participants at baseline, the following were excluded from the study: individuals without SNP rs6950857 data (n = 1217); those without data on age, smoking, drinking status, metabolic equivalent of task (MET), body mass index (BMI), region, education, income, and laver and sea mustard/kelp intake (n = 3,700); and those without person-years data and with metabolic syndrome at baseline (n = 134). A total of 4,979 participants (2,548 men and 2,431 women) were included in this study (Fig. 1). The Institutional Review Board (IRB) of Inha University approved the use of these data on February 18, 2022 (IRB number: 220215–1 A).

Assessment of seaweed intake

A semi-quantitative food frequency questionnaire (FFQ) was used to obtain dietary information. The FFQ included 106 food types commonly consumed by Koreans as well as standard amounts to calculate the average intake frequency and amount per year. Sea mustard/kelp, laver, and total seaweed intakes were measured to determine seaweed consumption. Seaweed intake per serving was multiplied by the average daily intake frequency to determine the average daily seaweed intake. Food intake frequency was examined in nine stages: never, 1, 2–3 times per month; 1–2, 3–4, and 5–6 times per week; and 1, 2, and 3 times per day. One serving of sea mustard or kelp and laver was equivalent to a bowl of soup and a sheet of laver, respectively. The average daily consumption of sea mustard, kelp, and laver was used to determine total seaweed consumption. Participants were classified into quintiles according to their daily average seaweed intake.

Polygenic risk scores

Genomic DNA was extracted from peripheral whole-blood samples. For the imputed genotype, data were generated from the Korea Biobank Array (Korean Chip, KCHIP, Seoul, South Korea), which was previously used to study genetic drivers of diseases in the Korean population [28]. Independent SNPs were selected using the following exclusion criteria: monomorphic SNPs, SNPs greater than or equal to three alleles, minor alleles < 0.01, Hardy-Weinberg equilibrium p-value < 0.001, and missing genotype frequency > 20%. In this study, we used SNPs identified in a GWAS to calculate individual genome-wide polygenic risk scores (gPRS). There are three genetic models: additive, dominant, and recessive. While calculating the polygenic risk score, A refers to the risky allele and B refers to the reference allele. The BB genotype weighing 0. If an additive model is suggested, AB and AA apply weights of 1 and 2, respectively. When the dominant model is suggested, both AA and AB had weights of 1. An additive model was applied when the HR for metabolic syndrome was higher for two alleles than for one allele. The dominant model was applied when the HR for metabolic syndrome was higher in one allele than in two alleles. The recessive model assumes that the trait effect is related to the presence of both the minor alleles. However, there were no such cases when the recessive model was applied in this study. Thus, the polygenic risk score for each participant was a weighted summation of the contributions of significant SNPs using additive and dominant models. We used seven different cutoff points for p-values (< 0.0001, < 0.001, < 0.01, < 0.05, < 0.1, < 0.2, and < 1.0) to select SNPs, yielding seven sets. As a result, 23 SNPs were selected based on p-value < 0.0001, 364 SNPs were selected based on p-value < 0.001, 3,397 SNPs were selected based on p-value < 0.01, 17,637 SNPs were selected based on p-value < 0.05, 34,652 SNPs were selected based on p-value < 0.1, 69,684 SNPs were selected based on p-value < 0.2, and 344,447 SNPs were selected based on p-value < 1.0. The weights are the log (HR) of the SNP associated with metabolic syndrome. Then, the Z-score for the PRS of all individuals was calculated by subtracting the mean gPRS and dividing it by the standard deviation of the PRS.

Demographic and lifestyle characteristics and biochemical measurements

We included eight risk factors for metabolic syndrome: sex, age, alcohol intake, total energy intake, marital status, education, income, and smoking status, which are known to affect metabolic syndrome. Next, we applied univariate Cox regression to extract significant variables associated with metabolic syndrome (p < 0.05). We then applied forward stepwise variable selection methods to minimize the Akaike Information Criterion (AIC) values: dried laver intake, BMI, HbA1c, history of hypertension, r-GTP, RBC, insulin, sleep hours, ALT, WBC, BUN, C-reactive protein, and albumin levels.

Machine learning algorithms

For feature selection, we followed a five-step process: (1) We selected variables to include in the model: epidemiological variables (sex, age, alcohol intake, total caloric intake, marital status, education status, income status, smoking status, and dried laver intake) as well as PRS. Demographic and lifestyle variables were chosen based on previous literature associated with metabolic syndrome [29,30,31]. (2) We performed univariable Cox regression for each remaining variable and extracted only those with p-value <0.05. (3) We included variables selected in step 1, then applied forward stepwise variable selection to variables from step 2, repeating until an optimal model minimizing AIC was created. (4) Among variables selected in step 3, we excluded those causing multicollinearity (variables with variance inflation factor (VIF) > 5.0). (5) We constructed the final model with remaining features. Based on these features, we developed seven models to predict metabolic syndrome incidence at year 14: Cox multivariable regression (using R package ‘survival’ and ‘coxph’ function with Breslow method for tie handling), deep neural network (using ‘neuralnet’ package with 5-10 hidden layers, 0.25 threshold, and 1,000,000 stepmax), SVM (using ‘e1071’ package and ‘svm’ function with options: kernel=“radial”, degree=3, gamma=if (is.vector(x)) 1 else 1/ncol(x), coef0=0, cost=1, nu=0.5), stochastic gradient descent (SGD with step size=0.1 and tau step size=0.5), random forest (RAF; using ‘randomForest’ package with 500 trees), Naïve Bayes (using ‘e1071’ package and ‘naiveBayes’ function), and AdaBoost (using ‘JOUSBoost’ package). For each model, we calculated AUC, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), DeLong’s test p-value, and F1-score.

Statistical analyses

Categorical data were presented as frequency (%) based on the Fisher’s exact test, and continuous data were presented as mean and standard deviation based on the Wilcoxon rank-sum test to compare the general characteristics of individuals according to metabolic syndrome status. Hazard ratios (HRs) and 95% confidence intervals (CI) for the relationship between genotypes and metabolic syndrome, the relationship between seaweed consumption and metabolic syndrome, and the effects of seaweed and PRS interaction on metabolic syndrome were computed using a multivariable Cox proportional hazards model. Statistical analyses were performed using R language version 4.2.2 (R Foundation for Statistical Computing, Vienna, Austria) and the T&F program ver. 4.0 (YooJin BioSoft, Goyang, Republic of Korea).

Results

The characteristics of the 5,440 participants (61% men and 39% women) are presented in Table 1. Of these, 2,120 were considered to have new-onset metabolic syndrome and 3,320 were considered normal. The differences in the characteristics of the variables between the normal and new-onset metabolic syndrome groups are shown in Table 1. Significant differences were observed in all variables except sex, metabolic equivalent of task (MET), drinking status, alcohol intake, smoking status, oral diabetes medication, myocardial infarction diagnosis, congestive cardiac failure diagnosis, coronary artery disease diagnosis, gastritis/stomach ulcer diagnosis, and allergic disease diagnosis.

Table 1 General characteristics of study participants

The total intake and intake of other nutrients were examined based on the status of the new-onset metabolic syndrome (Table 2). Total seaweed intake and frequency of sea mustard/kelp consumption did not significantly differ according to new-onset metabolic syndrome status. The frequency of laver consumption differed according to the new-onset metabolic syndrome status (p = 0.033). The intake of fat, carbohydrates, calcium, vitamin B2, retinol, and cholesterol differed significantly according to the status of new-onset metabolic syndrome (all p-values < 0.05).

Table 2 Seaweed and nutrient intake characteristics of study participants

The biochemical characteristics according to the status of new-onset metabolic syndrome in the study participants are presented in Table 3. There were significant differences in all variables (p < 0.05), except for creatinine, total protein, sodium, potassium, and chloride levels.

Table 3 Biochemical characteristics of study participants

gPRS Z-scores were significantly and independently associated with the incidence of metabolic syndrome. We used 7 different cut-off points for p-values (< 0.0001, < 0.001, < 0.01, < 0.05, < 0.1, < 0.2, < 1.0). The number of SNPs included as the cut-off points for p-values decreased from the smallest, < 0.0001, to the largest, < 1.0, and the HR (95% CI) significantly increased from 1.355 (1.288, 1.426) to 6.798 (6.282, 7.357) (Supplementary Table 1).

Results on multivariable Cox proportional hazards regression analysis using epidemiological variables and gPRS Z-scores in relation the incidence of metabolic syndrome are presented in Table 4. Being female compared to male, current smokers compared to non-smokers, and having a previous history of hypertension were associated with an increased incidence of metabolic syndrome. Age, total energy intake, BMI, HbA1C, r-GRP level, RBC count, WBC count, and BUN level were positively associated with the incidence of metabolic syndrome.

Table 4 Multivariable Cox proportional hazards regression analysis using epidemiological variables and genome-wide polygenic risk score (gPRS) Z-scores in relation the incidence of metabolic syndrome

The area under the ROC curve (AUC), sensitivity, specificity, PPV, NPV, and F1 scores for the predictive performance of the metabolic syndrome prediction models are shown in Table 5. The predictive performance of Model 1, which included sex, age, alcohol intake, energy intake, marital status, education status, income status, smoking status, and dried laver intake in terms of AUC, was 0.989 (95% CI: 0.980–0.998) for RAF. The AUC values of model 2, including sex, age, alcohol intake, energy intake, marital status, education status, income status, smoking status, dried laver intake, and gPRS Z-score 1 based on SNPs with p-value < 0.0001, were the highest for RAF (0.990 [95% CI: 0.981, 0.999]). In model 3, which included sex, age, alcohol intake, energy intake, marital status, education status, income status, smoking status, dried laver intake, and gPRS Z-score 2 based on SNPs with p-value < 0.001, the AUC for RAF slightly decreased (0.987 [95% CI: 0.977, 0.998]). In model 4, which included sex, age, alcohol intake, energy intake, marital status, education status, income status, smoking status, dried laver intake, and gPRS Z-score 3 based on SNPs with p-value < 0.01, the AUC for RAF was 0.990 [95% CI, 0.981–0.999]. In model 5, which included sex, age, alcohol intake, energy intake, marital status, education status, income status, smoking status, dried laver intake, and gPRS Z-score 4 based on SNPs with p-value < 0.05, the AUC for RAF was 0.993 [95% CI: 0.985, 1.000]. In model 6, the AUC for RAF was 0.994 [95% CI: 0.986, 1.000]. In model 6, which included sex, age, alcohol intake, energy intake, marital status, education status, income status, smoking status, dried laver intake, and gPRS Z-score 5 based on SNPs with p-value < 0.1, the AUC for RAF was 0.994 [95% CI: 0.986, 1.000]. In model 7, which included sex, age, alcohol intake, energy intake, marital status, education status, income status, smoking status, dried laver intake, and gPRS Z-score 6 based on SNPs with p-value < 0.2, the AUC for RAF was 0.992 [95% CI: 0.982–1.000]. The AUC values of model 8, which included sex, age, alcohol intake, energy intake, marital status, education status, income status, smoking status, dried laver intake, and gPRS Z-score 7 based on SNPs with p-value < 1.0, were the highest for RAF (0.994 [95% CI: 0.985, 1.000]) and ADB (0.994 [95% CI: 0.986, 1.000]).

Table 5 Results on predictive performance metrics for metabolic syndrome prediction analysis using machine learning algorithms based on epidemiological, nutritional and genome-wide polygenic risk score (gPRS)

The density of SNPs around metabolic syndrome-associated genes is presented using a Circus plot (Fig. 1). A histogram of the data density for metabolic syndrome status based on the gPRS value is shown in Fig. 2. The distribution of gPRS in patients with metabolic syndrome was more left-skewed than that in those without metabolic syndrome when gPRS included SNPs based on p-value < 0.0001 for all SNPs based on p-value < 1.0, which indicated that the gPRS of patients with metabolic syndrome was larger than that of the non-metabolic syndrome participants.

Fig. 1
figure 1

Circus plot represents density of single nucleotide polymorphisms (SNPs) for metabolic syndrome. The outer track shows density of tagged SNPs for genome-wide polygenic risk score (gPRS), and the inner track shows p-values of corresponding. Green dots represent metabolic syndrome associated SNPs with p-value < 0.001, blue dots represent metabolic syndrome associated SNPs with p-value < 0.0001, and red dots represent metabolic syndrome associated SNPs with p-value < 0.00001. gPRS, genome-wide polygenic risk score; SNPs, single nucleotide polymorphisms

Fig. 2
figure 2

Comparison of the distribution of genome-wide polygenic risk score (gPRS) Z-scores considering metabolic syndrome occurrence at 168 months. The figure presents density distributions of gPRS Z-scores for individuals with (red) and without (green) metabolic syndrome across seven p-value thresholds. Single nucleotide polymorphisms (SNPs) were selected based on varying p-value thresholds (p < 0.0001, p < 0.001, p < 0.01, p < 0.05, p < 0.1, p < 0.2, and p < 1.0 (all SNPs)). As the p-value threshold becomes less stringent (from p < 0.0001 to p < 1.0 (all SNPs)), the distinction between distributions for individuals with and without metabolic syndrome becomes increasingly pronounced. gPRS, genome-wide polygenic risk score; SNPs, single nucleotide polymorphisms

Discussion

The present study is the first to develop a predictive model for metabolic syndrome by incorporating demographics, clinical biomarkers, genetic information, and seaweed intake status, using various machine learning techniques. In our study, random forest machine learning analysis demonstrated the best model performance for predicting metabolic syndrome. The random forest model provided a balanced combination of good interpretability and performance in terms of the highest F1-score indicating that the model was good at identifying both positive and negative cases. Furthermore, the AdaBoost machine learning analysis showed better performance (AUC = 0.994) in predicting metabolic syndrome. Different techniques have been used for the development of metabolic syndrome prediction models, such as decision tree [6] and SVM [7], Light Gradient Boosting Machine [10], and naïve Bayes classification [32]. Other studies have used variables on sociodemographic attributes, clinical, laboratory, lifestyle characteristics, and genetic information (10 polymorphisms), and few models exist for incorporating extensive genetic information using Z-scores from the PRS, as in our study.

There was a reasonable improvement in the predictive performance when the Z-scores of the PRS were sequentially incorporated by different cut-off points of p-values ranging from p < 0.0001 for all SNPs. The status of metabolic syndrome was clearly classified when the gPRS was constructed using all SNPs compared to the gPRS constructed for SNPs based on a p-value of < 0.0001. Improvement in the performance of the metabolic syndrome risk prediction model, which uses more genetic information by incorporating all SNPs, would be applicable to the current study. The gPRS is closely associated with metabolic syndrome, suggesting that genetic information may contribute to improvements in predicting metabolic syndrome.

This study showed that gPRS enhanced the accuracy of metabolic syndrome risk prediction, and our findings substantiate the value of gPRS in the prediction of metabolic syndrome risk. Better prediction ability was achieved when PRS was combined with previously known metabolic syndrome risk factors including demographic factors, lifestyle factors and clinical factors in our study. It was uncertain whether the improvement in performance of the metabolic syndrome risk prediction model that uses concurrent demographics, lifestyle factors, and genetic and clinical information would be applicable to an independent cohort of non-European ethnicity. Precisely forecasting who’s likely to develop metabolic syndrome enables health officials to spot high-risk individuals early. This early identification creates an opportunity to implement preventative actions, such as recommending diet changes or promoting healthy habits, to delay or prevent the onset of metabolic syndrome.

In this study, we found that consumption of dried laver was inversely associated with the incidence of metabolic syndrome in Korean adults. In parallel with our findings, in a randomized double-blind placebo-controlled trial, consumption of 4–6 g of seaweed per day was associated with a low prevalence of metabolic syndrome [22]. In the Korean Multi-Rural Communities Cohort Study, dietary seaweed consumption was inversely associated with the incidence of metabolic syndrome in postmenopausal women [33]. The authors explained that dietary seaweed has a high binding affinity for estrogen [34], and it has been reported that seaweed supplementation lowers serum estradiol levels in U.S. premenopausal women with menstrual dysfunction [35].

The present study had several limitations. First, our study was derived from the Korean population only; thus, the development of predictive models for metabolic syndrome may not be generalizable to other races and ethnicities. Secondly, our findings were not validated outside the study population. It is important to validate and replicate the predictive ability of the model in different cohorts. Thirdly, as missing responses for features < 30% have been included in the analysis, this may have led to the loss of valuable information, resulting in a model that lacks robustness. Finally, we did not investigate molecular pathways and mechanisms because of the large number of SNPs in the prediction models. Despite these limitations, this study has several strengths. First, we incorporated genetic information, demographic attributes, and clinical, laboratory, and lifestyle factors with seaweed intake to predict the incidence of metabolic syndrome. Second, we used diverse machine learning approaches and compared the performance of each model in predicting metabolic syndrome. Third, we used relatively long-term (14-year) prospective cohort data to predict the incidence of metabolic syndrome.

In conclusion, we developed a predictive model for metabolic syndrome by incorporating demographic, clinical, laboratory, dietary, and genetic factors into a Korean population. Genetic factors, along with other lifestyle factors, contribute to the etiology of metabolic syndrome, and the Z-score based on gPRS has improved the predictability of metabolic syndrome with machine learning approaches.

Data availability

The data underlying the results of our study are not publicly available owing to KoGES data policy. Data are available from the Division of Biobank, Korea National Institute of Health, Korea Disease Control and Prevention Agency for researchers who meet the criteria for access to confidential data.

Abbreviations

gPRS:

Genome-wide polygenic risk score

SNPs:

Single nucleotide polymorphisms

COX:

Cox multivariable regression

DNN:

Deep neural network

SVM:

Support vector machine

SGD:

Stochastic gradient descent

RAF:

Random forest

NBA:

Naïve Bayes

ADB:

AdaBoost

FFQ:

Food frequency questionnaire

AUC:

Area under the ROC curve

PPV:

Positive predictive values

NPV:

Negative predictive values

References

  1. Zimmet P, Magliano D, Matsuzawa Y, Alberti G, Shaw J. The metabolic syndrome: a global public health problem and a new definition. J Atheroscler Thromb. 2005;12(6):295–300.

    Article  CAS  PubMed  Google Scholar 

  2. Noubiap JJ, Nansseu JR, Lontchi-Yimagou E, Nkeck JR, Nyaga UF, Ngouo AT, Tounouga DN, Tianyi F-L, Foka AJ, Ndoadoumgue AL, et al. Geographic distribution of metabolic syndrome and its components in the general adult population: a meta-analysis of global data from 28 million individuals. Diabetes Res Clin Pract. 2022;188:109924.

    Article  CAS  PubMed  Google Scholar 

  3. Park D, Shin MJ, Després JP, Eckel RH, Tuomilehto J, Lim S. 20-year trends in metabolic syndrome among Korean adults from 2001 to 2020. JACC Asia. 2023;3(3):491–502.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Wilson PWF, D’Agostino RB, Parise H, Sullivan L, Meigs JB. Metabolic syndrome as a precursor of cardiovascular disease and type 2 diabetes mellitus. Circulation. 2005;112(20):3066–72.

    Article  CAS  PubMed  Google Scholar 

  5. Yu C-S, Lin Y-J, Lin C-H, Wang S-T, Lin S-Y, Lin SH, Wu JL, Chang S-S. Predicting metabolic syndrome with machine learning models using a decision tree algorithm: retrospective cohort study. JMIR Med Inf. 2020;8(3):e17110.

    Article  Google Scholar 

  6. Shin H, Shim S, Oh S. Machine learning-based predictive model for prevention of metabolic syndrome. PLoS ONE. 2023;18(6):e0286635.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Karimi-Alavijeh F, Jalili S, Sadeghi M. Predicting metabolic syndrome using decision tree and support vector machine methods. ARYA Atheroscler. 2016;12(3):146–52.

    PubMed  PubMed Central  Google Scholar 

  8. Worachartcheewan A, Shoombuatong W, Pidetcha P, Nopnithipat W, Prachayasittikul V, Nantasenamat C. Predicting metabolic syndrome using the random forest method. Sci World J. 2015;2015(1):581501.

    Article  Google Scholar 

  9. Szabo de Edelenyi F, Goumidi L, Bertrais S, Phillips C, MacManus R, Roche H, Planells R, Lairon D. Prediction of the metabolic syndrome status based on dietary and genetic parameters, using Random Forest. Genes Nutr. 2008;3:173–6.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Daniel Tavares L, Manoel A, Henrique Rizzi Donato T, Cesena F, André Minanni C, Miwa Kashiwagi N, Paiva da Silva L, Amaro E, Szlejf C. Prediction of metabolic syndrome: a machine learning approach to help primary prevention. Diabetes Res Clin Pract. 2022;191:110047.

    Article  PubMed  Google Scholar 

  11. Kim J, Mun S, Lee S, Jeong K, Baek Y. Prediction of metabolic and pre-metabolic syndromes using machine learning models with anthropometric, lifestyle, and biochemical factors from a middle-aged population in Korea. BMC Public Health. 2022;22(1):664.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Xiaoxue W, Zijun W, Shichen C, Mukun Y, Yi C, Linqing M, Wenpei B. Risk prediction model of metabolic syndrome in perimenopausal women based on machine learning. Int J Med Inf. 2024;188:105480.

    Article  Google Scholar 

  13. Eyvazlou M, Hosseinpouri M, Mokarami H, Gharibi V, Jahangiri M, Cousins R, Nikbakht HA, Barkhordari A. Prediction of metabolic syndrome based on sleep and work-related risk factors using an artificial neural network. BMC Endocr Disord. 2020;20(1):169.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Murguía-Romero M, Jiménez-Flores R, Méndez-Cruz AR, Villalobos-Molina R. Predicting metabolic syndrome with neural networks. In: Advances in Artificial Intelligence and Its Applications: 12th Mexican International Conference on Artificial Intelligence, MICAI 2013, Mexico City, Mexico, November 24–30, 2013, Proceedings, Part I 12: 2013: Springer; 2013: 464–472.

  15. Zhang H, Chen D, Shao J, Zou P, Cui N, Tang L, Wang X, Wang D, Wu J, Ye Z. Machine learning-based prediction for 4-year risk of metabolic syndrome in adults: a retrospective cohort study. Risk Manag Healthc Policy. 2021;14(null):4361–8.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Mohseni-Takalloo S, Mozaffari-Khosravi H, Mohseni H, Mirzaei M, Hosseinzadeh M. Metabolic syndrome prediction using non-invasive and dietary parameters based on a support vector machine. NMCD. 2024;34(1):126–35.

    CAS  PubMed  Google Scholar 

  17. de Oliveira EP, McLellan KCP, Vaz de Arruda Silveira L, Burini RC. Dietary factors associated with metabolic syndrome in Brazilian adults. Nutr J. 2012;11(1):13.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Andersen CJ, Fernandez ML. Dietary strategies to reduce metabolic syndrome. Rev Endocr Metab. 2013;14(3):241–54.

    Article  CAS  Google Scholar 

  19. Salas-Salvadó J, Guasch-Ferré M, Lee C-H, Estruch R, Clish CB, Ros E. Protective effects of the Mediterranean diet on type 2 diabetes and metabolic syndrome. J Nutr. 2016;146(4):S920–7.

    Article  Google Scholar 

  20. Rajapakse N, Kim SK. Nutritional and digestive health benefits of seaweed. Adv Food Nutr Res. 2011;64:17–28.

    Article  CAS  PubMed  Google Scholar 

  21. Kumar SA, Brown L. Seaweeds as potential therapeutic interventions for the metabolic syndrome. Rev Endocr Metab Disord. 2013;14(3):299–308.

    Article  CAS  PubMed  Google Scholar 

  22. Teas J, Baldeon ME, Chiriboga DE, Davis JR, Sarries AJ, Braverman LE. Could dietary seaweed reverse the metabolic syndrome? Asia Pac J Clin Nutr. 2009;18(2):145–54.

    CAS  PubMed  Google Scholar 

  23. Bosy-Westphal A, Onur S, Geisler C, Wolf A, Korth O, Pfeuffer M, Schrezenmeir J, Krawczak M, Müller MJ. Common familial influences on clustering of metabolic syndrome traits with central obesity and insulin resistance: the Kiel obesity prevention study. Int J Obes (Lond). 2007;31(5):784–90.

    Article  CAS  PubMed  Google Scholar 

  24. Henneman P, Aulchenko YS, Frants RR, van Dijk KW, Oostra BA, van Duijn CM. Prevalence and heritability of the metabolic syndrome and its individual components in a Dutch isolate: the Erasmus Rucphen Family study. J Med Genet. 2008;45(9):572–7.

    Article  CAS  PubMed  Google Scholar 

  25. Bellia A, Giardina E, Lauro D, Tesauro M, Di Fede G, Cusumano G, Federici M, Rini GB, Novelli G, Lauro R, et al. The Linosa Study: epidemiological and heritability data of the metabolic syndrome in a caucasian genetic isolate. Nutr Metab Cardiovasc Dis. 2009;19(7):455–61.

    Article  CAS  PubMed  Google Scholar 

  26. Povel CM, Boer JMA, Reiling E, Feskens EJM. Genetic variants and the metabolic syndrome: a systematic review. Obes Rev. 2011;12(11):952–67.

    Article  CAS  PubMed  Google Scholar 

  27. Ojurongbe TA, Afolabi HA, Oyekale A, Bashiru KA, Ayelagbe O, Ojurongbe O, Abbasi SA, Adegoke NA. Predictive model for early detection of type 2 diabetes using patients’ clinical symptoms, demographic features, and knowledge of diabetes. Health Sci Rep. 2024;7(1):e1834.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Moon S, Kim YJ, Han S, Hwang MY, Shin DM, Park MY, Lu Y, Yoon K, Jang HM, Kim YK, et al. The Korea Biobank array: design and identification of coding variants associated with blood biochemical traits. Sci Rep. 2019;9(1):1382.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Buckland G, Salas-Salvadó J, Roure E, Bulló M, Serra-Majem L. Sociodemographic risk factors associated with metabolic syndrome in a Mediterranean population. Public Health Nutr. 2008;11(12):1372–8.

    Article  PubMed  Google Scholar 

  30. Dallongeville J, Cottel D, Ferrieres J, Arveiler D, Bingham A, Ruidavets JB, Haas B, Ducimetiere P, Amouyel P. Household income is associated with the risk of metabolic syndrome in a sex-specific manner. Diabetes Care. 2005;28(2):409–15.

    Article  PubMed  Google Scholar 

  31. Park HS, Oh SW, Cho S-I, Choi WH, Kim YS. The metabolic syndrome and associated lifestyle factors among South Korean adults. Int J Epidemiol. 2004;33(2):328–36.

    Article  PubMed  Google Scholar 

  32. Choe EK, Rhee H, Lee S, Shin E, Oh SW, Lee JE, Choi SH. Metabolic syndrome prediction using machine learning models with genetic and clinical information from a nonobese healthy population. Genomics Inf. 2018;16(4):e31.

    Article  Google Scholar 

  33. Park J-K, Woo HW, Kim MK, Shin J, Lee Y-H, Shin DH, Shin M-H, Choi BY. Dietary iodine, seaweed consumption, and incidence risk of metabolic syndrome among postmenopausal women: a prospective analysis of the Korean Multi-rural communities Cohort Study (MRCohort). Eur J Nutr. 2021;60(1):135–46.

    Article  CAS  PubMed  Google Scholar 

  34. Li L, Li L, Shi H, Chen P, Qi B, Yang X. Adsorption effect of dietary fibers from seaweeds on estrogens. Chin J Mar Drugs 1994.

  35. Skibola CF. The effect of Fucus vesiculosus, an edible brown seaweed, upon menstrual cycle length and hormonal status in three pre-menopausal women: a case report. BMC Complement Altern Med. 2004;4:1–8.

    Article  Google Scholar 

Download references

Acknowledgements

This study was conducted with biosources from the National Biobank of Korea, Korea Disease Control and Prevention Agency (KBN-2020-016). We would like to thank YooJinBioSoft (http://www.yoojinbiosoft.com) for the statistical analysis of this study.

Funding

This research was part of a project titled “Efficacy/standardization technology development of marine healing resources and its life cycle safety” funded by the Ministry of Oceans and Fisheries, Republic of Korea (grant no. 20220027). This work was also supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (RS-2024-00340086).

Author information

Authors and Affiliations

Authors

Contributions

DS: Conceptualization, Methodology, Formal analysis, Investigations, Resources, Writing – Original Draft, Writing – Review & Editing, Project Administration, Funding Acquisition.

Corresponding author

Correspondence to Dayeon Shin.

Ethics declarations

Ethics approval and consent to participate

This study involving human participants was conducted in accordance with the ethical principles of the Declaration of Helsinki and approved by the Institutional Review Board (IRB) of Inha University approved the use of these data on February 18, 2022 (IRB number: 220215-1A). Written informed consent was obtained from all participants.

Consent for publication

Not Applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary Material 2

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shin, D. Prediction of metabolic syndrome using machine learning approaches based on genetic and nutritional factors: a 14-year prospective-based cohort study. BMC Med Genomics 17, 224 (2024). https://doi.org/10.1186/s12920-024-01998-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12920-024-01998-1

Keywords