New susceptibility alleles associated with severe coronary artery stenosis in the Lebanese population

Coronary Artery Disease (CAD) is the narrowing or blockage of the coronary arteries. It is closely associated with numerous genetics and environmental factors that have been extensively evaluated in various populations. In recent studies, severe phenotypes have been strongly linked to genetic risk factors. This study investigated the association of clinical, demographic, and genetic factors with severe coronary artery stenosis phenotypes in our population composed of 1734 individuals with severe coronary stenosis (≥ 50% in coronary vessels) and comparing them to 757 controls with no evidence of stenosis on angiography. We performed generalized linear model (GLM) genome-wide association studies to evaluate three stratification models and their associations to characteristics of the clinical disease. In model 1, patients were not stratified. In model 2, patients were stratified based on presence or absence of CAD family history (FxCAD). In model 3, patients were stratified by young age of CAD onset. Eight SNPs (single nucleotide polymorphism) were significantly associated with severe CAD phenotypes in the various models p<5×10-7\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left( {p < 5 \times 10^{ - 7} } \right)$$\end{document}, four of these SNPs were associated with severe CAD and the four others were specifically significant for young CAD patients. While these SNPs were not previously reported for association with CAD, six of them are present in genes that have already been linked to coronary disease. In conclusion, this study presents new genetic factors associated with severe stenosis and highlights different risk factors associated with a young age at diagnosis of CAD.

Family and twin studies estimated that heritability of CAD ranges between 40 and 60% [8]. One loci (9p21) was shown to be associated with CAD and was associated with a 30% increased risk per copy of the risk allele [9][10][11][12]. Additional loci have been identified in different populations and mainly of European descent, via analyses on groups with significantly larger sample sizes [13,14]. In 2015, the CARDIoGRAMplusC4D Consortium published a genome wide association study (GWAS) metaanalysis of 185,000 CAD cases and controls. This analysis investigated 6.7 million common variants as well as 2.7 million low-frequency variants and identified 10 novel loci associated with CAD [15].
Our study aims at comparing a group of patients having more than 50% stenosis in any coronary vessels to control subjects with no angiographic evidence of stenosis in these vessels. By studying the two extremes of this condition, we target to unravel strong genetic and environmental contributors to CAD. Our study revisits the significance of the previously mentioned risk factors of CAD in the Lebanese population and explores novel genomic loci.

Patient description
Patients were recruited at two major hospitals in Lebanon, between May 2007 and June 2010 as part of a multi-center cross-sectional study for the FGENT-CARD Consortium (https:// cordis. europa. eu/ proje ct/ rcn/ 85024/ facts heet/ en) [16]. The degree of stenosis in selected coronary arteries was assessed by attending cardiologists and assigned a score as a percentage. In all the sample population, vessels were visualized and evaluated by angiography, namely the left main coronary artery, the left anterior descending artery, the right coronary artery, and the circumflex artery. The results from the angiography lead to the classification of the individuals depending on the level of stenosis in the visualized coronary artery. The date of the first workup cardiac catheterization, coronary artery bypass graft (CABG) and/or percutaneous transluminal coronary angioplasty (PTCA) was used to determine the age at CAD diagnosis.
Cardiologists performing the coronary angiography collected a 20 mL blood sample from the peripheral arterial access site of patients. Genomic DNA was extracted using a standard phenol chloroform extraction procedure. Trained healthcare professionals collected further data on the socio-demographic background of all patients (Additional file 1).
The epidemiological factors were determined from patient's charts or status at time of enrollment. Positive smoking was defined as smoking status of the patient at the time of enrollment. Type 2 diabetes mellitus (T2DM) was mainly based on the patient medical records and confirmed with medication intake, such as insulin or oral hypoglycemic drugs, at time of enrollment. Hypertension and hyperlipidemia were also diagnosed by a physician according to guidelines at the time of presentation and confirmed by the prescription of anti-hypertensive and hypolipidemic drugs earlier to the time of enrollment.
Consanguinity was evaluated with three categories of relationship. The first category (c1) represented individuals whose parents are first cousins, the second category (c2) represented individual with parents being second cousins, and the third consanguineous category (c3) represented individuals with parents having a more distant relationship. In addition, family history of CAD and consanguinity status were combined in four different groups to evaluate the age at diagnosis in different combinations of these two factors. The groups were generated as follow: 'f0c0' representing patients with no evident family history nor consanguinity, 'f0c1' in which patients with firstcousin consanguinity but no family history were grouped, 'f1c0' for patients with family history of CAD but no significant consanguinity, and finally 'f1c1' for patients with both family history and first-cousin consanguinity.

Study population
The initial cohort consists of 7710 individuals. In this study however, we included 2643 individuals that corresponded to the phenotypic selection of severe CAD and controls and on whom GWAS data was available. Previous analyses on the population showed consistency in the associations across the population stratifications with a possible increase in variability due to the small sample size [12].

Selection of cases and controls
Subjects were assigned as cases or controls according to their stenosis levels. Cases were patients with at least 50% obstruction in any of the coronary arteries visualized and controls were patients with no stenosis [17]. Out of the 2643 subjects, we identified 1734 patients as cases and 757 subjects as controls making a total study population of 2491.

Statistical analysis
The association of the different variables with the disease was evaluated by Pearson chi-square test adjusted through logistic regression and the comparison of means was done through independent t-tests, with significance threshold of p < 0.05. For every binomial variable, the risk estimate was evaluated by odds ratio (OR) with 95% confidence intervals. The adjustment of odds ratio in the logistic regression was done for standard risk factors for coronary artery diseases including age, sex, BMI, type 2 diabetes mellitus, hypertension, hyperlipidemia, smoking and family history of CAD, that were previously established to be related to CAD and young CAD [18]. The generation of all means, frequencies, and standard deviations was done using IBM-SPSS [19].

Genome wide association study
DNA from the study population was genotyped by Illumina Human610-Quad BeadChip or Illumina Human660W-Quad BeadChip and the genotype data was pruned in PLINK [20,21] for selection of valid genomes and autosomal regions. A total of 1745 subjects with more than 97% success genotyping rate were used for the analyses.
Three different models were used in this analysis using different combinations of factors previously proven significant in the study population. Three different models were evaluated with 'snpStats' [22] package in R 3.5.3 [23].
Model #1 includes the basic epidemiological determinants for coronary artery disease: sex, T2DM, hyperlipidemia, and hypertension.
Model #2 includes all factors included in Model #1 in addition to family history of CAD. Family history (Fx) was established for the patients if any first-or seconddegree family member was clinically diagnosed with stenosis in coronary arteries.
Model #3 includes all the factors included in Model #2 in addition to Young age at diagnosis. Young age at diagnosis of CAD (YADCAD) was added as a variable to the population and was assigned to patients that were diagnosed at an age one standard deviation (SD) lower than the mean age of the population [18].

SNP identification and annotation
The R package 'qqman' was used to plot the p values generated from the generalized linear model (GLM) representing the association of SNPs with extreme stenosis [24]. The Manhattan Plot was annotated with a p value of 5 × 10 −7 for significance threshold for the autosomes only. The Quantile-Quantile plot was generated to show appreciable deviation from the expected logarithmic p values for the significant nucleotide variations in the three different models.
The list of annotated SNPs that were reported to be significant were crossed with several genome databases such as the Human GRCh38/hg18 from the University of California, Santa Cruz [25,26] and the "1000 Genomes Browser" from the National Center for Biotechnology Information [27,28]. Results were reported based on the most recent version of the genome database: Genome Reference Consortium Human Genome build 38 [29].

Linkage disequilibrium calculation
The evaluation of linkage disequilibrium (LD) for the most significant SNPs in the association models was done through the LD link application from the National Institute of Health [30]. Both r 2 and D' were extracted for SNPs in pairs in the European (EUR) population of the database including Utah Residents from North and West Europe (CEU), Tuscans in Italy (TSI), Finnish in Finland (FIN), British in England and Scotland (GBR), and Iberian population in Spain (IBS).
LD was calculated between the three significant SNPs from association models 1 and 2 and one variant previously associated with CAD in the Lebanese population.

Population and subgroup descriptive statistics
Our study consisted of 2491 participants (Additional file 2) with 757 (30.3%) individuals having no stenosis and 1734 (69.5%) diagnosed with more than 50% stenosis in at least one of their major coronary arteries. A total of 254 (14.6%) affected individuals were categorized as young for CAD diagnosis, with a mean age of 44.4 years (± 4 0.4) compared to 64.2 years (± 8.8) for affected patients older than 49 years old, the latter being the threshold between the two categories.
The overall mean age of onset in severe CAD patients was 60.9 (± 11.1) years compared with the control population mean age of 57.6 (± 11.5) year with a significance of p = 4.5 × 10 −7 . Within the affected population, the mean age at diagnosis for subgroup 1 'f0c0' was 63.3 years, compared to 62.2 years for subgroup 2 'f0c1' , 59.8 years for subgroup 3 'f1c0' and 59.5 years for subgroup 4 'f1c1' . Even though the difference in the age of onset between the four subgroups did not reach statistical significance, the mean age of onset decreased across positive family history and consanguinity subgroups (Fig. 1). This trend shows a probable correlation of the age at diagnosis with positive family history and consanguinity.
Family history shows a significant association with the disease phenotype with a p value of 0.002 and an OR of 1.4 (CI = [1.1-1.8]). In addition, 64% of the general population identified a direct family member with CAD and 71.5% of the affected population identified as having a family history of CAD (Table 1). The direct effect of consanguinity was not statistically significant (p = 0.43). This might be due to the fact that consanguinity was hard to evaluate in our population because of the social bias ( Table 2).
Hypertension had a very strong association with the phenotype of the patients with a p value < 0.001 and odds ratio of 2.0 (CI = [1.6-2.5]) ( Table 1). A total of  In addition to standard epidemiological factors evaluated in CAD, we considered the young age at diagnosis as another factor of stratification. This additional filtering aims at identifying factors that play a role in the earlier expression of the disease. The results yielded a significant correlation of the disease with smoking. The mean age at diagnosis for the young cases was 44.4 (± 4.41) and was significantly lower than the general severe CAD population (p < 0.05). In the YADCAD, 74% had a positive current smoking status, with a p-value of 8.07 × 10 -10 ( Table 3). T2DM and hypertension showed a negative correlation with this group (Table 3). More than 70% of the YADCAD population had no history of diabetes (p = 0.00038) . In addition, hypertension had the same lack of correlation with the young diagnosed population with only 43.9% of this population being diagnosed as hypertensive ( p = 7.9 × 10 −14 ) ( Table 3). In contrast, family history showed a significant positive correlation with this category of the affected population with 77.6% of the YADCAD population having a positive history of family CAD (p = 0.000002) ( Table 3).

Linkage disequilibrium
In the first model, three of the reported SNPs are present on the same chromosome: rs9368648, rs9391637, and rs9295937. LD was calculated in pairs and for all three pairs the r 2 and the D' were equal to 1. The remaining of the SNPs reported significant in the three models are not on the same chromosome and thus LD was not studied (Table 5).
LD was also calculated for the three above mentioned SNPs with a specific locus from the PHACTR1 gene that was previously recently reported in association with CAD phenotype in the Lebanese population, rs9349379 [32]. The D′ in all the three cases were reported as 1, even though the r 2 values were low. For the LD with rs9368648, rs9391637, and rs9295937, the r 2 values were 0.000743385, 0.00124474 and 0.00124341 respectively ( Table 5).

Discussion
In this study, we evaluate the severe CAD phenotype in the Lebanese population with its unique characteristics and genetic heritage [33]. Our results deviated from previous reports of CAD in the Lebanese population. This deviation likely results from the different and more stringent criteria employed for the selection of cases and controls as well as the various models used in the genome wide association analyses.
The present study population included patients that were referred to cardiovascular consult for different presentations and were assessed in different hospitals in Lebanon. This variability in clinical diagnosis could be considered as a limitation. In addition to that, atherosclerosis was considered as the cause of coronary vessel stenosis, this could also be considered as a limitation of the study. More than that, the association of consanguinity was not completely evaluated, as some population bias may have interfered the reporting of this variable.
The standard epidemiological determinants of CAD showed significant correlation in our population. In addition, the population stratification by the positive/negative consanguinity and family history factor showed a trend of correlation with the age at diagnosis. Although the difference between the mean ages of the four categories is not statistically significant, there was a sequential age decrease from the category with no family history or consanguinity to the category positive for both conditions. This stratified trend is relevant to the Lebanese population where consanguinity is considered high compared to other societies [32].
The GWAS association of the first two models showed genome wide significant CAD susceptibility Table 4 Summary table for the SNPs significant for each of the Generalized Linear Models (GLMs) The significance level was set at 5 × 10 −7 for each of the three models and the nucleotide locations and annotation were based on the human genome reference GRCh38/hg18. SNPs common for both models. These SNPs had not been reported in the CAD literature. The most significant SNP in both models, rs9368648, is an intergenic variant on chromosome 6. This variant is 2 kb downstream from HCG21. HCG21 is responsible for the expression of human leukocyte antigen 21 (HLA) that has been linked to the inflammatory process involved in coronary disease and other vascular diseases. The second most significant variation, rs9391637, on chromosome 6 is an intronic variant of MUCL3, responsible for the expression of MUCL3 protein and acts as an enhancer for genes related to CAD and mean platelet volume with no definitive linkage to the diseases [34]. Rs2925937 is the third significant variation for both models 1 and 2. This polymorphism is on chromosome 6, in the pseudogene NAPGP2. This region of the genome has already been linked to congenital heart defects and conotruncal heart defect [35], which are related to abnormalities in the outflow of blood from the heart, and may play a significant role in the blood flow of major vessels [36]. The three abovementioned SNPs are also in strong linkage, which reinforces the importance of this region of chromosome 6 in coronary disease. Within this region, seven CAD variants have been previously associated to CAD, of which PHACTR1 that has been associated go CAD in the Lebanese population in a prior study [13,37]. Linkage disequilibrium between the three most significant SNPs in this study along with the point variant previously reported in PHACTR1 generated low r 2 values. These low values underscore the intergenic population variabilities and the need to have susceptibility alleles studies independently replicated in numerous populations [38]. Rs17005877, is an intergenic variant of chromosome 12 that was only significant in the first model. It is located between PAWR and PPP1R12A genes and its association with CAD has never been reported before. PAWR, located 72.6 kb upstream of rs17005877, is responsible for the regulation WT1 pro-apoptotic pathway involved in cardiovascular differentiation and disease [39]. PPP1R12A, located 9.7 kb downstream of rs17005877, is involved in the expression of light-chain myosin phosphatase. This protein belongs to the family of myosin targeting subunit (MYPT), which is involved in the regulation of light chain myosin phosphatase recently implicated in the aberrant contractility associated with atherosclerosis [40][41][42].
The young age at diagnosis was used to stratify the patient population and to investigate the factors that may have a link with early onset CAD. The third model identified four novel CAD susceptibility SNPs. The first, rs6778944 is an intergenic variant on chromosome 3, 44 kb away from a pseudo-gene, RNU6-348P, not previously reported in CAD patients. The second, rs12593069 on chromosome 15, is intronic to PCSK6. PCSK6 is involved in lipid metabolism and variations in this gene have been previously associated with atherosclerosis [43]. PCSK9, another gene of the same family, has been reported in several GWAS as significantly associated with hyper-LDL-cholesterolemia in CAD [13,37,43]. The third SNP, rs785529 on chromosome 8, is 60 kb away from DLC1, a gene responsible for Rho-GTPase activity and has a high expression in fat tissues [44]. DLC1 was recently linked to congenital heart disease in the Chinese population [45]. In addition, the general functions of Rho-GTPase are closely related to cardiovascular disease in the context of vessel contraction, oxidative stress, and inflammation and is being targeted as a potential treatment for the general cardiovascular disease [46,47]. The last significant SNP in our model, rs2343305 on chromosome 10, is an intronic variant of TSPAN14. This tetraspanin gene has its highest expression in fat tissues and is related to the platelet interaction with endothelial cells. It may interact with the inflammatory pathway of atherosclerosis, which has a major role in CAD, but was never previously reported as directly associated with the disease [44,48].
In our population, stratification validated the importance of YADCAD and helped us elaborate on the uniqueness of this stratification showing distinctive epidemiological factors and loci for sever CAD. The replication of the association with different factors showed variations in the genes correlated to the severe phenotype of the disease.
Overall, the significant SNPs for severe CAD in young population analysis revealed new loci related to the disease, some of which were part of gene families already mentioned in the CAD literature namely lipid metabolism, atherosclerosis-related inflammation.

Conclusion
This study evaluated the difference in affected CAD population by the age of onset both in epidemiological determinants and in genome-wide variations. Young age CAD patients should be considered separately as epidemiological determinants for the disease vary and even genomic relations differ. Continued exploration of clinical presentations and genetic variations is required to understand better the distinctiveness of young age CAD, and the public health community should adapt treatment and prevention to younger populations and their specific risk factors compared to the general population.