Association of SMAD7 genetic markers and haplotypes with colorectal cancer risk

Purpose Colorectal cancer (CRC) is one of the common cancers with a high mortality rate worldwide. In Iran, there has been a trend of increased incidence of colorectal cancer in the last three decades that necessitates the early diagnosis. Genetic factors have an influential role in its etiology along with the conventional risk factors such as age, diet, and lifestyle. Results from GWAS have shown significant associations between SMAD7 gene variants and risk of CRC. This study aimed to assess the association of certain polymorphisms as well as haplotypes of this gene and risk of colorectal cancer. Methods and materials This study was designed as a case–control association study. After obtaining ethical approval and informed consent, blood samples from 209 patients with colorectal cancer were collected and DNA was extracted. Four variants: rs4939827, rs34007497, rs8085824 and rs8088297 were genotyped using ARMS-PCR method. Results SMAD7 rs4939827 in the recessive and co-dominant models was associated with colorectal cancer risk [TT/CT + CC: OR = 2.90, 95%CI (1.38–6.09), p = 0.005; CC + TT/CT: OR = 1.66, 95%CI (1.00–2.75), p = 0.01]. Haplotype analysis indicated that some SNP combinations including two for-SNPs haplotypes of T-T-C-C and T-C-C-A were significantly associated with CRC risk. Conclusion Based on the identified association of SMAD7 gene variations and haplotypes with colorectal cancer risk in our population, genetic variations in this gene region may have a role in CRC development. This data may shed light on the genetic predisposition of CRC which involves different pathways including TGF-β. Supplementary Information The online version contains supplementary material available at 10.1186/s12920-021-01150-3.


Introduction
Cancer is a heterogenic disease and is one of the most significant health problems in developed, industrialized, and developing countries [1]. Colorectal cancer is the most common malignancy among gastrointestinal malignancies, and it accounts for 13% of all digestive tract malignancies [2]. Based on epidemiologic reports, 60% of CRCs occur in developed countries. However, this figure is continuously decreasing. It is in contrast to developing countries, particularly, Asian countries, where the incidence has been increasing during the two recent decades [3]. This disease has the third and fourth rank among cancers in men and women in incidence and mortality respectively [4]. Over the past three decades, in Iran, colorectal cancer incidence has been increased and reported as the third and fourth most common cancer type in males and females, respectively [5].
Several risk loci in Genome Wide Association Studies (GWAS) studies have indicated that they are associated with CRC risk. Some of these are involved in TGF-β signaling pathway. In this pathway, one of the most common polymorphisms associated with CRC risk is located on SMAD7 gene [6].
It has been reported that the transforming growth factor beta signaling pathway controls some processes such as tumor initiation, cell proliferation, invasion and metastasis. This pathway has been considered as both a tumor suppressor pathway and a promoter of tumor invasion and progression [6]. SMADs are known as receptors and intracellular signal transducers mediating TGF-β signaling pathway [7]. SMAD7 is a kind of inhibitory SMADs, and it is able to binned TGF-β receptor type I, so it acts as a negative regulator of the TGF-β signaling pathway. Therefore the TGF-β signaling is remarkably decreased by the act of SMAD7, which lead to increasing cancer risk. [8,9].
One of the main problems in CRC is the late diagnosis, therefore, there is an increasing need in molecular studies which lead to the recognition of biological markers for early diagnosis [10]. Due to multifactorial nature of colorectal cancer, identifying genetic and environmental factors involved in its pathogenesis is would help to enhance our understanding for a better management of the disease [11]. The molecular variations related to a particular disorder may result in a new view for a preventive strategy. The development of genome characterization could result in advances in personalized medicine. Single Nucleotide Polymorphisms (SNP) have been shown as appropriate markers for forecasting susceptibility to various diseases such as colorectal cancer and can be considered as a road to personalized medicine [12].
Several studies have reported that influence of risk alleles on CRC risk, owing to allele distributions or specific Linkage Disequilibrium (LD) structure and specific genetic and environmental backgrounds, in various population is different. Therefore, identification of these variations could be a valuable tool for risk prediction in our population [13,14]. In the present study, we conducted a case-control study to investigate the potential association of SMAD7 gene polymorphisms with the risk of colorectal cancer in an Iranian population. Moreover, we assessed the association of these markers with the clinical features and tumor characteristics.

Study populations
The study was performed in accordance with Declaration of Helsinki and relevant guidelines by the institutional ethics committee. The study was approved by the ethics committee of Mashhad University of Medical Sciences under the ethical approval number of (IR.MUMS. SM.REC.1396.234). The inclusion criteria determined which are as follows: 1-Any patients diagnosed with sporadic CRC who had Age ≥ 40 years. With consideration of the following exclusion criteria. 1-Familial CRC / strong family history of colorectal cancer were excluded. 2-Known genetic susceptibility syndromes (e.g., Lynch Syndrome, FAP) were excluded. 3-Inability to provide informed consent. In this case-control study, a total of 409 Iranian individuals in a period of time between 2016 and 2018 were enrolled. The case group included 209 patients with sporadic CRC diagnosis confirmed with positive colonoscopy and pathology result, and the control group consisted of 200 healthy controls who had normal colonoscopy results. We collected a 5-ml peripheral venous blood sample from each subject (all healthy controls and patients) after getting written informed consent.

SNP selection DNA isolation and genotyping
In our study, selection of polymorphisms was performed based on several criteria; including validation of the association in numerous GWAS studies which denotes a strong association with colorectal cancer risk in different populations. We also considered selecting SNPs that are located in the same region to be able to perform haplotype analysis to examine the overall effect of these polymorphisms. We also considered selecting markers with an acceptable MAF and heterozygosity to achieve the highest possible study power. Therefore, based on these criteria, we selected four polymorphisms to have a more comprehensive picture of a possible association of SMAD7 gene with CRC. Three of them were previously reported to be associated with colorectal cancer and are located in the intronic region (rs4939827, rs34007497, rs8085824) and rs8088297 is located in 3′UTR. Minor allele frequencies of all SNPs were > 5%, and average heterozygosity was more than 0.3. Information related to the SNPs was obtained from the National Center of Biotechnology Information (NCBI) SNP database (www. ncbi. nlm. nih. gov/ SNP, dbSNP BUILD 156).
Collection of peripheral blood was done from each participant in EDTA coated vials and kept at − 80 °C. Genomic DNA extracted by the standard salting-out method. The quality of extracted DNA confirmed using 1% agarose gel electrophoresis containing Green Viewer. The concentration and purity of DNA were also evaluated utilizing Epoch ™ Microplate Spectrophotometer (BioTek Instruments Inc., Winooski, VT, USA). ARMS (The Amplification-Refractory Mutation System) method was applied as the main genotyping technique. Primers were designed using Primr1 and WASP primer online software (www. prime r1. soton. ac. uk/ prime r1. html, www. bioin fo. biotec. or. th/ WASP/). The specifications of primers and set up protocols have been presented in Table 1. PCR reactions were performed in 10 µl total volume, comprising 1 µl of DNA (200 ng), 5 µl of Taq DNA Polymerase 2 × Master Mix (Ampliqon), 0.3 µl (10 μM) of each primer for rs34007497 and rs8085824. 1.3 µl (10 μM) forward prime, 1.6 µl (10 μM) reverse primer and 1 µl (10 μM) of each control primer for rs4939827. And also 1 µl (10 μM) of each primer for rs8088297.

Statistical analysis
Deviations from Hardy-Weinberg equilibrium was assessed by using the Chi-square test. Also The chisquare test was done for evaluation of allele and genotype frequencies between cases and control groups. Odds Ratios (OR) and corresponding 95% confidence intervals (95% CI) were calculated to estimate the conferred risk by the risky alleles. Results were also adjusted for potential confounders such as age, gender and BMI in the logistic regression analysis. To compare age and BMI between two groups (patients and healthy people), the Student's t-test was performed. Associations between mentioned SNPs and CRC risk was evaluated for three different genetic models (dominant, recessive and allelic model) analysis. In addition, haplotype analysis was performed using PHASE v.2.1.1 software. The LD has been calculated by 2LD software version 1.00. All the statistical analyses were conducted using IBM SPSS Statistics for Windows, Version 16.0 (IBM, USA). p < 0.05 was considered to indicate statistical significance.

Baseline characteristics
A total of 209 colorectal cancer cases and 200 controls were included in this study. The frequency distributions of baseline characteristics and histological parameters  are presented in Table 2. In the healthy control group, the number of males and females was equal. No significant differences was indicated in the distribution of sex (p = 0.36) and age (p = 0.19) between cases and controls. Additionally, we investigated the correlations between BMI and the results showed a significant difference between cases and controls (p < 0.001). Regarding tumor status, the proportion of which patients were graded as "well-differentiated" was 11.0%, while moderate and poorly differentiated were 42.5% and 31.1% respectively. The results showed a significant association between the other three polymorphisms and colorectal cancer.rs34007497 was associated with the risk of disease in recessive genetic model (GG/CC + CG, p = 0.01). rs8085824 indicated the association with colorectal cancer in dominant (CC + CT/TT, p = 0.003), recessive (CC/CT + TT, p = 0.04) and allelic (T/C, p = 0.003) genetic models. Furthermore, rs8088297 was associated with colorectal cancer risk in recessive (CC + AC/ AA: OR = 2.33, p = 0.006), co-dominant (AC/CC + AA, p = 0.004) and allelic (A/C, p = 0.01) genetic models. Although after adjustment for age, gender and BMI as a confounder factor, there was no association between these three polymorphisms and colorectal cancer risk. Electrophoresis results of polymorphisms genotyping are shown in Fig. 1.

Association between polymorphisms and the colorectal cancer risk
Haplotype analysis showed two haplotypes, T-C haplotype of rs8085824-rs34007497, as well as T-C and C-T haplotypes of rs8085824-rs4939827 were associated with CRC risk. In addition, C-C haplotype of rs8085824-rs4939827 conferred a decreased risk of CRC . On the other hand, in haplotypes consisting of rs8085824 -rs34007497-rs4939827 the distribution of three haplotypes C-C-T, C-C-C and T-C-C were different between patients and healthy groups. Also, A-T-C and A-C-T haplotypes of rs8088297 -rs8085824-rs4939827 revealed an elevated risk of CRC. The results have been shown in details in Table 4. In four-SNP haplotype, three haplotypes including T-T-C-C, T-C-C-A and T-C-C-C were associated with risk of CRC. No significant association with the risk of CRC show in other haplotypes which are shown in (Additional file 1 -haplotypes distribution; Table S1).

Discussion
The role of TGF-β signaling has proven to be important in homeostasis, cell differentiation and tumor suppression [15,16]. Additionally, it is a pleiotropic cytokine which has a dual function in cancer development: it acts as a tumor suppressor and a tumor promoter in the early stages and late stages [17]. The production of TGF-β increased in various tumor types, such as CRC [18]. Regulation of TGF-β by SMAD7 is critical to preserve gastrointestinal homeostasis. SMAD7 as an intracellular antagonist of TGF-β signaling binned to the receptor compound and act as an obstacle for initiation of downstream signaling proceedings [19,20]. SMAD7 interacts with activated TGF-β type I receptor, thereby blocking the phosphorylation and activation of SMAD2 thus resulting in lack of formation of SMAD2/SMAD4 complexes and ultimately inhibiting binding to transcription factors and expression of target genes by preventing of entering the complex (SMAD2/SMAD4) into the nucleus [6]. The influence of SMAD7 expression has been proven in CRC progression. High-level expression of SMAD7 mRNA in CRC cell lines was reported to boost cell growth and prevent apoptosis via a mechanism dependent on the suppression of TGF-β signaling pathway [21]. Several investigations have examined numerous genetic variations within SMAD7 gene (on 18q21). Broderick et al. performed a genome-wide association study which revealed that the SMAD7 intron 3 was highly polymorphic, harboring one of the associated SNP-rs4939827 [22]. However, most of the association studies have been performed on European Caucasian populations. Because of differences in the incidence of CRC and the allele distribution of SNPs across populations, this study investigated four candidate variants to examine the association of these SNPs and CRC risk in an Iranian population. We also evaluated the association of these polymorphisms with clinicopathological characteristics such as age, sex, tumor location, tumor grade and family history.
The three polymorphisms rs4939827 (C > T), rs34007497 (C > G) and rs8085824 (C > T) are located in the intronic region of the SMAD7 gene. Our results suggest a significant association in the genotype distribution of rs4939827 between cases and controls. Previous studies have also mainly shown the association between this SNP and the risk of CRC [22][23][24]. Our study demonstrated that rs4939827 TT genotype increases the risk of CRC. The results of our study were in line with a meta-analysis by Huang Y. et al. including 37 studies on rs4939827 (48,751 cases and 61,529 controls) [25]. They found an association between this SNP and colorectal cancer risk with a 15% increase in the risk of disease. Although in our study, we did not find any significant association between the above mentioned polymorphisms and CRC susceptibility, even when stratifying  [26]. Regarding the function of selected polymorphisms, none of four SNP in this study has a specific function. At first, we found a statistically significant association between the rs34007497, rs8085824 and rs8088297 with the risk of CRC in our study.
However, after adjustment for some confounder factors (age, sex and BMI) in different genetic models the associations of these three polymorphisms did not remain significant. Therefore, we can conclude that these three markers were not associated independently with colorectal cancer risk.
rs8088297 (A > C) is located in the 3′UTR region of the SMAD7 gene. Since The 3′ and 5′-UTRs of mRNAs have a regulating role, so it is crucial. mRNA folding is affected by sequence alterations in the UTR regions. SNPs in 3′-UTRs can change target recognition of microRNAs by disrupting sequence complementarity [27]. Superenhancers define a cell's identity as large enhancer clusters which in several studies approximately whole of the human SMAD7 gene categorized as a super-enhancer in colon crypts, sigmoid colon and fetal intestine [28].
Haplotype-based association analysis has numerous advantages over one-SNP approach. First, the loss of information, imputable to bi-allelic rather than multi-allelic loci, can be overcome by assessing haplotypes rather than single-locus tests. Second, methods based on haplotypes are regarded as more powerful than those based on single markers. Third, haplotype-based methods can potentially detect associations of cis-interactions SNPs that are among nearby SNPs. In light of these aforementioned benefits, we performed a haplotype analysis [29]. In our study we found a significant association of several haplotype combinations [(rs8085824-rs34007497-rs4939827); (rs8088297-rs8085824-rs4939827); (rs8088297-rs8085824-rs34007497); (rs8085824-rs34007497); (rs8085824-rs4939827); (rs4939827-rs34007497-rs8085824-rs8088297)] with CRC risk. It can reflect the effect of multiple SNPs on the risk of the disease which remained significant even after adjustment for confounders (age, sex, BMI). Therefore, the haplotype analysis further illustrated the pivotal role of these variations in combination with each other where they can confer some degrees of risk.
As mentioned before, we selected a variety of polymorphisms on SMAD7 genes in different positions to have an overview of the whole gene. One of the strengths of this study is the evaluation of several markers across the gene and haplotype analysis which can increase the study power. To the best of our knowledge, none of the previous studies looked at haplotypes of this gene in terms of association with CRC. Moreover, replication of association of the rs4939827 With CRC in our population can strengthen the evidence of the effect of this genetic risk factor on CRC which can be used in clinical risk assessment, especially in direct-to-consumer genetic testing.
The development of genome characterization could result in advances in personalized medicine. Variations have been shown as appropriate markers for forecasting susceptibility to various diseases such as colorectal cancer and can be considered as a road to personalized medicine.