Blood cell parameters and risk of nonalcoholic fatty liver disease: a comprehensive Mendelian randomization study

Background Nonalcoholic fatty liver disease (NAFLD) is on the rise globally, and past research suggests a significant association with various blood cell components. Our goal is to explore the potential correlation between whole blood cell indices and NAFLD risk using Mendelian randomization (MR). Methods We analyzed data from 4,198 participants in the 2017–2018 National Health and Nutrition Examination Survey to investigate the link between blood cell indicators and NAFLD. Using various methods like weighted quantile sum and multivariate logistic regression, we assessed the association. Additionally, two-sample Mendelian randomization were employed to infer causality for 36 blood cell indicators and NAFLD. Results Multivariate logistic regression identified 10 NAFLD risk factors. Weighted quantile sum revealed a positive correlation (p = 6.03e-07) between total blood cell indices and NAFLD, with hemoglobin and lymphocyte counts as key contributors. Restricted cubic spline analysis found five indicators with significant nonlinear correlations to NAFLD. Mendelian randomization showed a notable association between reticulocyte counts and NAFLD using the inverse-variance weighted method. Conclusions Hematological markers pose an independent NAFLD risk, with a positive causal link found for reticulocyte count. These results emphasize the importance of monitoring NAFLD and investigating specific underlying mechanisms further. Supplementary Information The online version contains supplementary material available at 10.1186/s12920-024-01879-7.


Introduction
Nonalcoholic fatty liver disease (NAFLD) is the hepatic manifestation of metabolic dysfunction, encompassing liver pathologies like simple steatosis, steatohepatitis, fibrosis, and cirrhosis [1,2].NAFLD had a global prevalence of 25.3% in 1990-2006 to 38.0% in 2016-2019, and is expected to increase considerably with the growing and aging population [3].Therefore, it is crucial to increase NAFLD surveillance to avoid societal burdens associated with the disease and its progression [4].However, NAFLD has a complex etiology involving many risk factors and genetic susceptibilities that are not fully understood [5].An urgent need exists for a deeper understanding of the causality and varying effect sizes of different NAFLD risk factors.NAFLD onset and progression are linked to metabolic, inflammatory, genetic, and environmental factors, resulting in prolonged immune system activation and mild inflammation, contributing to chronic organ inflammation strongly correlated with NAFLD [6].Recently, the nomenclature of NAFLD has been updated to metabolic dysfunction-associated steatotic liver disease (MASLD), which includes the presence of at least 1 of 5 cardiometabolic risk factors, to provide a more comprehensive definition and better understanding of its pathogenesis [7].
Blood components and associated cytokines play vital roles in the diagnosis and progression of metabolic diseases.While previous studies have identified correlations between specific blood cell components and NAFLD, the overall involvement of blood cells in NAFLD needs further elucidation [8][9][10].No clear correlation exists between overall blood cells and NAFLD development.The role of specific blood cell markers in NAFLD pathogenesis remains unclear.Importantly, a close connection exists between circulating inflammatory factors and blood cells [11].Studies have highlighted the significant presence of inflammatory factors in NAFLD [8,11].However, large-sample, multidimensional studies are needed to establish a causal link among these components.
National Health and Nutrition Examination Survey (NHANES) evaluates the health and nutrition status in the USA, widely used in observational studies.These studies face challenges like bias and confounding.Mendelian randomization (MR) simulates a randomized controlled trial, minimizing biases by leveraging random gene assignment unaffected by population or environment [12].Combining MR with NHANES enhances research reliability, allowing precise evaluation of exposure impact on outcomes while considering confounders.This contributes to improved public health and quality of life.
This study investigated the correlation between blood cell indices and NAFLD using two methods: observational analysis with NHANES data and Mendelian randomization.Multivariate logistic regression and weighted quantile sum assessed their influence, while a restricted cubic spline model examined potential nonlinear relationships.Despite observational limitations, MR provided comprehensive evidence for a potential causal link between blood cell indicators and NAFLD.
Sedentary status was categorized by sitting time.Hypertension diagnosis followed the 2017 American Heart Association recommendations, considering antihypertensive medication use and questionnaire responses.Diabetes diagnosis adhered to American Diabetes Association criteria: glycated hemoglobin > 5.7% or 6.5%, fasting blood glucose > 100/126 mg/dL, physician-diagnosed diabetes, or impaired glucose tolerance from the questionnaire.

WQS regression and RCS analysis
WQS regression is a statistical model for the multiple regression of high-dimensional data sets that is commonly used for environmental exposures [18].We utilized the gWQS package to assess the overall impact of blood cells on NAFLD and performed logistic regression to analyze the results.Additionally, RCS analysis was conducted to explore linear correlations between blood cell components and NAFLD [19].Most regression models assume a linear relationship between independent and dependent variables.The RCS, an extension of the regression spline, maintains linearity in two intervals at each end of the independent variable data range.For assessing the correlation in factors influencing NAFLD onset, RCS analysis was conducted using functions from the RMS package.

MR analysis
Summary statistics were retrieved and obtained from OpenGWAS (https://gwas.mrcieu.ac.uk/), comprising blood cells, circulating cytokines, and growth factors associated with NAFLD GWAS summary data [20][21][22].GWAS data comprised routine blood indices from 173,480 individuals of European descent in the UK  (FINRISK 2002 andFINRISK 1997) [21].GWAS data for NAFLD were acquired from the FinnGen Biobank, which comprises 218,792 samples of European ancestry [20].All participants provided informed consent in all the corresponding original studies.All data used in this work are publicly available from studies with relevant participant consent and ethical approval.Ethical approval from an institutional review board was not necessary for the present study as only publicly available summary level data was used.Instrumental variables (IVs) were selected based on criteria (p < 5e-08, r² < 0.001, and kb distance > 10,000).Single nucleotide polymorphisms (SNPs) with p > 5e-08 in NAFLD outcome variables were excluded post data integration.The genetic variant must be strongly associated with the exposure of interest.This means the variant should significantly influence the levels or presence of the risk factor being studied.If the genetic variant does not have a strong association with the exposure, it cannot be a reliable instrument.The genetic variant must be independent of confounders.This assumption requires that the genetic variant is not associated with any confounding factors that could influence both the exposure and the outcome.This is often referred to as the assumption of no pleiotropy, meaning the genetic variant affects the outcome only through its effect on the exposure, not through other pathways.The genetic variant affects the outcome only through the exposure.This means there should be no direct pathway from the genetic variant to the outcome that bypasses the exposure.Any other pathway (except through the exposure) by which the genetic variant could affect the outcome would violate this assumption and potentially lead to biased results.The TwoSampleMR R package (version 0.5.6) in R (version 4.3.1)was used to conduct all MR analyses.TwoSampleMR package facilitated expose-mediator and expose-outcome two-sample analyses [23].Multivariate Mendelian analysis assessed outcomes, including positive exposure and mediator variables.Mediating effects were analyzed using these results.Weak instrumental variables (IVs) were identified (F-statistic < 10) and excluded from subsequent analysis.Heterogeneity and sensitivity analyses used the TwoSam-pleMR package.If inverse-variance weighted (IVW) estimates were significant (p < 0.05) without pleiotropy evidence, they were considered causal.Multivariate Mendelian randomization (MVMR), an expansion of MR, determined the effects of multiple exposures on NAFLD using correlated genetic variants.Direct effects of blood cell indicators, circulating cytokines, and growth factors on NAFLD were obtained.The blood cell indicators → circulating cytokines and growth factors → NAFLD pathway mediating effect was determined.The effect of blood cells on circulating cytokines and growth factors was analyzed using the equation:

Baseline characteristics of study participants
We

Association between blood cells and NAFLD
To depict the association between the

WQS analysis and RCS analysis of blood cells for NAFLD
We conducted a WQS analysis to validate the relationship and contribution of the 14 significant blood cell parameters (shown in Table 1) to NAFLD.Results without adjustment for WQS showed a positive association (p = 1.83e-15) between total blood cell indices and NAFLD (Fig. 2B).After adjusting for significant baseline covariates (hypertension, sedentary behavior, BMI, race, sex, age, and diabetes) and all covariates, a positive association remained in two of the above models (Fig. 2D, F; p = 1.36e-06, p = 6.03e-07).LBXHGB and LBDLYMNO consistently contributed the most.Nonlinear relationships were assessed using RCS for the 14 parameters,   3).

MR analysis
To explore the causal link between blood cells (Detailed information on IVs shown in Table S4) and NAFLD, we conducted an MR analysis using blood cell-related indicators as exposure, NAFLD as the outcome, and circulating cytokines and growth factors as mediators.Two-sample MR with SNPs as IVs, utilizing IVW methods, revealed a significant positive causal relationship between reticulocyte count and NAFLD (IVW, OR = 1.361, 95% CI: 1.065-1.740,p = 0.014)(complete results in Table S1).
The MR-Egger test, weighted median, simple mode, and weighted mode tests also showed directional coherence (Fig. 4A).Heterogeneity assessment and funnel plot analysis suggested no observed heterogeneity (p of Q-value = 0.125) and correct IV selection (Fig. 4C).Egger's method indicated no pleiotropy (Table 3).Steiger directionality test and leave-one-out test supported no reverse causality and stability of IVW results (Table S2).Univariable Mendelian randomization and MVMR revealed that circulating cytokine and growth factors were not significant mediator (complete results in Table S3).

Discussion
In this study, we investigated the correlation between various blood cell indicators and NAFLD, combining a 2017-2018 NHANES cross-sectional study with a twosample MR analysis.Ours is the first study to explore this correlation by integrating NHANES data and extensive genetic analysis.Ultimately, we established a connection between blood cell indicators and an increased risk of NAFLD.
The inflammatory response is associated with NAFLD based on the hypothesis of multiple parallel hits that connects intestinal and adipocyte cells [24].Elevated white blood cell counts and hemoglobin levels are independently correlated with NAFLD, which might reflect subclinical, low-grade systemic inflammation [25].Platelets act as mediators of thrombosis and play a crucial role in  NAFLD progression as they promote a pro-thrombotic and pro-inflammatory environment [26].Increased binding of leukocytes and platelets, through potentially mechanisms such as neutrophil extracellular traps, may exacerbate inflammation and contribute to the development and progression of NAFLD [27].Blood cell indicators could be promising biomarkers for NAFLD monitoring in the clinic.However, the overall connection between blood cell parameters, the significance of each indicator, and whether a causal relationship exists with NAFLD remain unclear.
In our observational study, WQS analyses of 14 variables showed a significant positive correlation with NAFLD incidence.Hemoglobin and lymphocyte counts carried most of the weights, indicating their significant influence.Lymphocytes are a very important immune-related cell class, and the finding of a non-linear relationship highlights the significance and complexity of lymphocytes in NAFLD, which may be closely related to the different subtypes of lymphocytes and their secreted factors [28].Specifically, interleukin-6 possesses both inflammation-promoting and inflammation-suppressing attributes [29], while an explicit mechanism requires further study.Given its significance, the lymphocyte count may be an appropriate indicator in blood routine examination to monitor the course and progression of NAFLD.Observational studies may be biased by unmeasured or uncontrolled factors that affect the accuracy of results and indicate reverse causation, so it is important to explore the relationship between blood cell indicators and NAFLD risk from a genetic perspective.
In the MR analysis, no causal link was found between routine blood cell indicators and NAFLD.However, a surprising positive association was observed with reticulocyte count.To establish a robust correlation, we selected IVs strongly linked to hematological markers from the GWAS summary that meet a genome-wide significance threshold (p < 5 e -08) after linkage disequilibrium was removed, ensuring more reliable and independent outcomes [30].Through pleiotropy and heterogeneity examinations, IVs were found to directly impact the onset of NAFLD through reticulocytes.However, the study did not establish a mechanism for how reticulocytes could cause NAFLD or liver-related metabolic disorders.Circulating cytokines and growth factors were not found to be mediators, suggesting unknown reticulocyte mediators in NAFLD pathophysiology.Therefore, further investigation is needed to understand these mechanisms.
The main advantage of this study is the use of the NHANES database and MR analysis to explore the association between blood cells parameters and NAFLD risk.This combination of methods improves the reliability of the findings.Our MR analyses, which included extensive data, provided sufficient statistical power to estimate the correlation between blood cell indicators and NAFLD.However, a few limitations of our study should be acknowledged.Information on NAFLD was obtained from questionnaire and CAP.Although CAP is a useful non-invasive tool to assess NAFLD, it still has limitations compared to biopsy, especially in clarifying the severity of NAFLD, but is more applicable in studies with largescale populations.Due to limitations in study sample size, variable details and data access, lipid-lowering medications and alcohol consumption were excluded as covariates, making it impossible to adequately assess their impact in this study, and limiting the exclusion of certain alcohol-related diseases, such as alcohol-induced chronic pancreatitis, alcoholic fatty liver.Although diabetes was covariate adjusted in our study sample, the strong association between diabetes mellitus and NAFLD cannot be disregarded.So in future studies we will include larger samples for a more comprehensive study.NHANES data included individuals with missing blood cell information, which may have introduced selection bias, although the consistency of the distribution of participants suggested otherwise.RCS analyses explored nonlinear associations, but limited statistical power might have masked substantial connections.Our findings were based on European  MR studies may oversimplify relationships and not capture full causality for complex traits influenced by many genetic and environmental factors.Although we took particular care to use a variety of statistical methods, including heterogeneity analyses, Egger's method for detecting pleiotropy, and sensitivity analyses to increase the robustness of the results, but it is difficult to thoroughly address all hypothetical challenges, and the possibility of residual confounding or pleiotropic effects may still exist.The study concluded that a positive causal connection exists between reticulocyte count and the incidence of NAFLD and only identified concurrent phenomena related to liver disease [31] and inflammatory bowel diseases [32]; however, the mechanism of the interaction has not been elucidated, and further prospective and mechanistic studies are required for validation.

Conclusions
Our study suggests that certain hematological markers may pose independent NAFLD risk factors, although lacking a causal link with its incidence.A positive causal relationship was identified with reticulocyte count and NAFLD occurrence.These findings lay a foundation for monitoring NAFLD pathogenesis, but detailed mechanisms underlying this correlation require further investigation.
Biobank and the INTERVAL study conducted by Astle et al.The analysis encompassed 36 indicators associated with red blood cells, white blood cells, platelets, and other related factors.Adjusted for age, sex, and BMI, GWAS data for 41 peripheral circulating cytokines and growth factors were sourced from the Ahola-Olli et al. study, involving 8293 Finnish individuals in the Cardiovascular Risk Study of Young Finns and FINRISK studies

Fig. 2
Fig. 2 WQS analysis of blood cells for nonalcoholic fatty liver disease.A: Unadjusted impact of blood cells on NAFLD; B: Unadjusted association between blood cell WQS index and NAFLD; C: Partially corrected impact of blood cells on NAFLD; D: Partially corrected association between blood cell WQS index and NAFLD; E: Fully corrected impact of blood cells on NAFLD; F: Fully corrected association between blood cell WQS index and NAFLD.(WQS: Weighted Quantile Sum)

Fig. 3
Fig. 3 Restricted cubic spline analysis between blood cell and NAFLD.A: Restricted cubic spline analysis of LBXRDW and NAFLD; B: Restricted cubic spline analysis of LBDEONO and NAFLD; C: Restricted cubic spline analysis of LBDLYMNO and NAFLD; D: Restricted cubic spline analysis of LBXPLTSI and NAFLD; E: Restricted cubic spline analysis of LBXWBCSI and NAFLD; (LBXWBCSI: White blood cell count, LBDEONO:Eosinophils number, LBDLYMNO: Lymphocyte number, LBXRDW: Red cell distribution width, LBXPLTSI: Platelet count SI)

Fig. 4
Fig. 4 MR result of Reticulocyte count on NAFLD.A: Causal Forest Plot of Reticulocyte count on NAFLD; B: Causal Scatter Plot of Reticulocyte count on NAFLD; C: Heterogeneity Analysis Funnel Plot of Reticulocyte count on NAFLD; IVW: Inverse-variance weighted Categorical variables, presented as frequencies, underwent analysis using the chi-square test.Weight calculation utilized WTINT2YR, and the survey package was employed for weighted logistic and linear regressions.Significance for p was assumed if not otherwise specified.
Statistical analysisR software (version 4.3.1)was employed for all data computations and statistical analyses.Continuous variables are expressed as medians (interquartile ranges) and were tested using the Wilcoxon rank-sum test.

Table 2
Multivariate logistic regression of blood cells on NAFLD (all participants)

Table 3
Heterogeneity and Pleiotropy test between Reticulocyte count and NAFLD