Clinically relevant combined effect of polygenic background, rare pathogenic germline variants, and family history on colorectal cancer incidence
BMC Medical Genomics volume 16, Article number: 42 (2023)
Background and aims
Summarised in polygenic risk scores (PRS), the effect of common, low penetrant genetic variants associated with colorectal cancer (CRC), can be used for risk stratification.
To assess the combined impact of the PRS and other main factors on CRC risk, 163,516 individuals from the UK Biobank were stratified as follows: 1. carriers status for germline pathogenic variants (PV) in CRC susceptibility genes (APC, MLH1, MSH2, MSH6, PMS2), 2. low (< 20%), intermediate (20–80%), or high PRS (> 80%), and 3. family history (FH) of CRC. Multivariable logistic regression and Cox proportional hazards models were applied to compare odds ratios and to compute the lifetime incidence, respectively.
Depending on the PRS, the CRC lifetime incidence for non-carriers ranges between 6 and 22%, compared to 40% and 74% for carriers. A suspicious FH is associated with a further increase of the cumulative incidence reaching 26% for non-carriers and 98% for carriers. In non-carriers without FH, but high PRS, the CRC risk is doubled, whereas a low PRS even in the context of a FH results in a decreased risk. The full model including PRS, carrier status, and FH improved the area under the curve in risk prediction (0.704).
The findings demonstrate that CRC risks are strongly influenced by the PRS for both a sporadic and monogenic background. FH, PV, and common variants complementary contribute to CRC risk. The implementation of PRS in routine care will likely improve personalized risk stratification, which will in turn guide tailored preventive surveillance strategies in high, intermediate, and low risk groups.
Colorectal cancer (CRC) is the fourth leading cancer-related cause of death worldwide. Major established exogenous risk factors are summarized as Western lifestyle . However, an inherited disposition contributes significantly to the disease burden since up to 35% of interindividual variability in CRC risk has been attributed to genetic factors [2, 3].
Around 5% of CRC occur on the basis of a monogenic, Mendelian condition (hereditary CRC), in particular Lynch syndrome (LS) and various gastrointestinal polyposis syndromes. Here, predisposing rare, high-penetrance pathogenic variants (PV, constitutiona/germline variants) result in a considerable cumulative lifetime risk of CRC and a syndrome-specific spectrum of extracolonic tumors. The autosomal dominant inherited LS is by far the most frequent type of hereditary CRC with an estimated carrier frequency in the general population of 1:300–1:500 [4,5,6]. It is caused by a heterozygous germline PV in either of the mismatch repair (MMR) genes MLH1, MSH2, MSH6 or PMS2 or, in few cases, by a large germline deletion of the EPCAM gene upstream of MSH2. The most frequent Mendelian polyposis syndrome is the autosomal dominant Familial Adenomatous Polyposis (FAP) caused by heterozygous germline PV in the tumor suppressor gene APC, followed by the autosomal recessive MUTYH-associated polyposis (MAP) which is based on biallelic germline PV of the base excision repair gene MUTYH [7, 8]. However, even in such monogenic conditions, the inter- and intrafamilial penetrance and phenotypic variability is striking, pointing to modifying exogenous or endogenous factors. Heterozygous (monoallelic) MUTYH germline PV may be associated with a slightly increased CRC risk [9, 10]; the carrier frequency in northern European populations is estimated to be 1:50–1:100 .
Approximately 20–30% of CRC cases are characterized by a suspicious, but unspecific familial clustering of CRC (familial CRC). Around 25% of CRC cases occur before 50 years of age (early-onset CRC); in around one quarter of those a hereditary type (mainly LS) has been identified . Although further high-penetrance candidate genes have been proposed [12,13,14], the majority of familial and early-onset cases cannot be explained by monogenic subtypes and instead are supposed to result from a multifactorial/polygenic etiology including several moderate-/intermediate penetrance risk variants and shared environmental/lifestyle factors. A positive family history (FH) in first- and second-degree relatives increases the risk of developing CRC by 2- to ninefold [15, 16], which underpins the hypothesis of shared genetic and non-genetic risk factors.
A variety of models to predict CRC risk has been developed and evaluated, which include clinical data, FH, lifestyle factors, and genetic information .For more than a decade, genome-wide association studies (GWAS) in large unselected CRC cohorts identified an increasing number of common, low-penetrance risk variants, mainly single nucleotide polymorphisms (SNPs), which are significantly associated with CRC risk [18,19,20,21]. Each SNP risk allele individually contributes only little to CRC risk (OR 1.05 to 1.5), however, summarised in quantitative polygenic risk scores (PRS), the combined effect might explain a substantial fraction of CRC risk variability and can identify individuals at several times lower and greater risk than the general population [22,23,24].
As such, it is expected that the genetic background defined by the common risk variants may not only influence the occurrence of late-onset sporadic cases, but also modulate the risk of familial, early-onset, and hereditary CRC . Recent studies demonstrated that high PRS values are associated with an increased risk of CRC and other common cancers in the general population up to an order of magnitude that is almost similar to hereditary tumor syndromes [26, 27].
Based on these data, it can be hypothesized, that the identification of common genetic CRC risk variants not only provides deep insights into the biological mechanisms and pathways of tumorigenesis, but could improve personalized risk stratification for sporadic, familial/early-onset, and hereditary CRC in the future by the implementation of SNP-based PRS screening in routine patient care, which will in turn guide tailored preventive strategies in high, moderate, and low risk groups.
However, even if previous studies provide promising results for a clinical benefit of a PRS-based personalized risk stratification, the impact of common risk factors and their interplay with high-penetrance variants and other unspecified factors, captured partly by the FH, still has to be improved and validated in additional patient cohorts.
In the present work, we compare the prevalence and the lifetime risk of CRC among 163,516 individuals from a population-based European repository (UK Biobank, UKBB). Individuals were stratified according to three major risk factors 1) their carrier status of rare, high-penetrance.
UK Biobank (UKBB) genetic and phenotypic data were used in this study. UKBB is a long-term prospective population-based cohort study that has recruited volunteers mostly from England, Scotland, and Wales, with over 500,000 participants aged 40 to 69 years at the time of recruitment. For each participant, extensive phenotypic and health-related data is available; genotyping data is accessible for 487,410 samples, and exome sequencing data is available for 200,643 people. All participants gave written consent, and the dataset is available for research. UKBB provided follow-up information by linking health and medical records .
CRC cases were defined based on self-reported code of 1022 or 1023 (in data field 20,001), or ICD-10 code of C18.X or C20.X, D01.[0,1,2], D37.[4, 5], or ICD-9 of 153.X or 154.[0,1] (in hospitalization records). Control samples were those that had no previous diagnosis of any cancer. The study includes people of all ethnicities. Outliers for heterozygosity or genotype missing rates, putative sex chromosome aneuploidy, and discordant reported sex versus genotypic sex were excluded. Only individuals (n = 200,643) who had both genotyping and whole-exome sequencing (WES) data were considered. If the genetic relationship between individuals was closer than the second degree, defined as kinship coefficient > 0.0884 as computed by the UK Biobank, we removed one from each pair of related individuals (cases were retained if exist).
We used ANNOVAR  to annotate the VCF files from the 200,643 WES samples. The Genome Aggregation Database (gnomAD)  were used to retrieve variant frequencies from the general population. We focused on rare PV for hereditary CRC (Lynch syndrome, polyposis) and considered the same variant filtering approach that was used in a recent study aiming at selecting rare PV . The following inclusion criteria were used: (1) only APC, MUTYH, MLH1, MSH2, MSH6, PMS2 variants in protein-coding regions were included since PV in other genes associated with hereditary CRC are too rare or even absent in the study population; (2) allele frequency (AF) < 0.005 in at least one ethnic subpopulation of gnomAD; (3) not annotated as “synonymous,” “non-frameshift deletion” and “non-frameshift insertion”; (4) annotated as “pathogenic” or “likely pathogenic” based on ClinVar . We did not include MUTYH in the pooled analysis since no biallelic (i.e. high penetrance) case was identified in the cohort; however, we included the heterozygous (monoallelic) carriers in the single gene analysis to compare the effect size with the other genes.
Polygenic risk scores (PRS)
We applied a previously validated PRS for CRC with 95 variants to calculate the PRS . The PRS was estimated using the PLINK 2.0  scoring function through UKB genotype data. To reduce PRS distributions variance among genetic ancestries, we used a previous approach . We used the first four ancestry principal components (PCs) to fit a linear regression model to predict the PRS across the full dataset (pPRS ~ PC1 + PC2 + PC3 + PC4). Adjusted PRS (aPRS) were calculated by subtracting pPRS from the raw PRS and used for the subsequent analysis.
In addition, we calculated the PRS using 140 SNPs  and another PRS based on 50 SNPs that were replicated in the meta-analyzed GWAS after excluding UKBB samples . Thus, in total three PRS models were computed: (1) 95 SNPs (95 PRS); (2) 140 SNPs (140 PRS); (3) 50 SNPs (50 PRS).
Individuals were divided into groups depending on (1) carrier status of PV, (2) PRS, and (3) FH. For FH, we considered participants’ reports of CRC in their parents and siblings (data fields: 20,110, 20,107, 20,111). For PRS, individuals were assigned into three groups: low (< 20% PRS), intermediate (20–80% PRS), and high (> 80% PRS) where the definition of a high PRS (above the 80th percentile) corresponding to OR > = 2.
We conducted both an analysis specific to single genes and a combined analysis (i.e., carriers of PV in APC, MLH1, MSH2, MSH6 and PMS2). First, we estimated the OR for each carrier group based on a logistic regression adjusting for age at recruitment, sex, CRC screening status, and the first four ancestry PCs. Afterwards, we additionally incorporated interactions between PV carriers and FH with PRS by introducing an interaction term within the logistic regression model.
We calculated the lifetime risk by age 75 from carrier status of rare PV and the PRS and hazard ratios (HRs) based on a Cox proportional hazards model. Individual’s age served as the time scale, representing the time to event, for observed cases (age at diagnosis), and censored controls (age at last visit); age 0 was used as index time. Carrier status, PRS category, FH, age, sex, CRC screening status, and the first four ancestry PCs were incorporated in the model, and adjusted survival curves were produced. The information about FH and CRC screening is based on interviews at the time of study recruitment. However, information about the timing or result of CRC screening was not available. We therefore included this information as a binary covariate to account for both effects.
Model performance was assessed via the area under the receiver operating characteristic curve (AUC), Nagelkerke's Pseudo-R2, and the C-index for time-to-event data. R 3.6.3 with the corresponding add-on packages survival and survminer was used for all statistical analyses.
Stratification of UKBB individuals for CRC prevalence, FH, and PV carrier status
We identified 1,902 CRC cases (894 prevalent cases and 1,008 incident cases) among the 163,516 UKBB individuals that retained after exclusion criteria, with a mean age at diagnosis of 60.9 years. The remaining 161,614 individuals with no previous diagnosis of any cancer were considered as controls, with a mean age of 56.9 years at last visit (Table 1). The European population represents 92% of the analyzed cohort.
The fraction of individuals with a positive FH of CRC is significantly higher in cases (19%) compared to controls (11%) (OR = 1.95 [1.73–2.19], P < 0.01) and ranges between 9 and 23% in the subgroups (Table 2). There is a significantly higher proportion of individuals with a FH of CRC not only among carriers of PV in the selected cancer susceptibility genes (OR = 1.96 [1.72–2.20], P < 0.01), but also among non-carriers with high PRS (OR = 1.60 [1.31–1.94], P < 0.01).
In the analyzed CRC susceptibility genes APC, MLH1, MSH2, MSH6, PMS2, we identified 399 heterozygous carriers of 111 PV. They were present in 30 (1.57%) cases and 369 (0.23%) controls, which is in line with published data. A list of the considered variants and annotations is shown in Additional file 1: Table S1, a summary of the number of PV carriers per gene is provided in Additional file 2: Table S2. No individual with a homozygous PV was identified. In other known genes associated with hereditary CRC (BMPR1A, POLE, POLD1, RNF43, SMAD4, STK11), the number of (L)P variant carriers was extremely low or no variant carrier was present at all, so that these genes were not considered in the analysis.
PRS distribution within the UKBB cohort
CRC PRS follow a normal distribution both regarding raw and PC-adjusted PRS (Additional file 2: Fig. S1) and is significantly higher in cases compared to controls, regardless of which PRS model is used (95 PRS, 140 PRS or 50 PRS), the PRS is significantly higher in cases compared to controls (Additional file 2: Fig. S2). The OR for 50 PRS (1.74 [1.57–1.92]) is slightly lower than that of 95 PRS (1.98 [1.79–2.19]), or 140PRS (1.92 [1.74–2.12]); that might be due to overfitting.
Since we included only individuals with both genotyping and WES data, we investigated the distribution of the PRS and age in the whole cohort and compared it to the subcohort with WES data. Density plots show that the distribution of PRS and age was similar between both groups (Additional file 2: Fig. S3).
The prevalence of CRC according to PRS percentiles demonstrates that values in the extreme right tail of the PRS distribution are associated with a non-linear increase of CRC risk, whereas in the left tail a less evident non-linear decrease can be observed (Additional file 2: Fig. S4). This supports the hypothesis of using PRS to stratify individuals into risk classes (i.e., low, intermediate, and high risk) according to a liability threshold model.
Interplay between PV and PRS
There was no overlap between the selected rare high penetrance PV and the common SNPs used for PRS calculation, and thus, the PRS represents an additional genetic signal. Notably, the PRS distributions showed that the mean of PRS is significantly higher in affected carriers compared to unaffected carriers (P < 0.01) (Additional file 2: Fig. S5).
We assessed how CRC risk is influenced by PRS and carrier status for PV in high penetrant CRC susceptibility genes (APC, MLH1, MSH2, MSH6, PMS2) by calculating the ORs for CRC across groups compared to non-carriers with intermediate PRS as reference group. Non-carriers with a low or high PRS are estimated to have a 0.5-fold or 2.1-fold change in the odds for CRC, respectively. We observed that the PRS also alters the penetrance of PV in susceptibility genes considerably as PV carriers with high PRS had four times higher OR than carriers with low PRS (OR = 17.5 and 3.9, respectively; Fig. 1A; and corresponding HR in Additional file 2: Table S3). We did not observe a significant interaction between PV carrier status and PRS (p = 0.87). In addition, we performed a sensitivity analysis including only the incident cases (n = 1,008). We observed the same trend, that PRS provides an OR risk gradient in the general population and among carriers of pathogenic variants in CRC susceptibility genes (Additional file 2: Table S4).
The high PRS, which is by definition present in 20% of the non-carriers, is associated with an almost doubled CRC risk (Fig. 1A, Table 2). Since the vast majority (97.9%) of non-carriers are controls (= healthy), almost the same percentage results if only healthy non-carriers are considered. We performed the same analysis using the 140 PRS and 50 PRS. All the three PRS models had comparable performance in the UKBB cohort (Additional file 2: Fig. S6).
Similarly, the lifetime cancer risk analysis shows a combined impact of PV and PRS: Among carriers, the estimated cumulative incidence by age 75 increased from 40% in case of a low PRS to 74% in case of a high PRS compared to 6% to 22% for non-carriers (Fig. 1B, Additional file 2: Table S3).
Inclusion of family history on cancer risk stratification
Taking individuals with no FH and intermediate PRS as a reference, both FH and PRS are associated with a higher CRC risk (Fig. 1C, Additional file 2: Table S5). The CRC risk for individuals having low PRS and no FH (OR 0.6) is five times lower than for individuals having both positive FH and high PRS (OR 3.1). We did not observe a significant interaction between FH status and PRS (p = 0.12). Noteworthy, individuals without FH and high PRS and individuals with FH and intermediate PRS both have similar CRC risks with an OR of around 2, whereas the CRC risk of individuals having low PRS even in the context of a FH is decreased compared to the reference group.
Among individuals with FH, the cumulative CRC incidence by age 75 increases threefold from 8% in case of a low PRS to 26% in case of a high PRS (Fig. 1D). Noteworthy, the cumulative CRC incidence of individuals with a positive FH and an intermediate PRS is lower (16%) than for individuals with negative FH and a higher PRS category (21%), respectively.
The full model integrating PRS, FH, and PV status shows that the CRC risk is strongly influenced by PRS in all groups (Fig. 2, Additional file 2: Table S6). Considering the non-carriers with no FH and intermediate PRS group as reference, the CRC OR in low PRS is 0.6 for non-carriers with no FH, while it is estimated more than 60 times higher (OR 40) for carriers with FH and high PRS (Fig. 2A). The corresponding cumulative CRC incidences are 6% and 98%, respectively (Fig. 2B). Although all PV carriers showed a significantly increased CRC risk, both the PRS and FH modify these risks considerably: depending on the FH and PRS, the OR in PV carriers vary between 4 and 40 and the cumulative incidence between 35 and 98%. Despite the CRC screening status is a key predictor for CRC risk (Additional file 2: Fig. S7), the main findings of the analysis were maintained irrespective of the screening status.
PRS improved model discrimination over carrier status and FH of CRC in first-degree relatives. The AUC derived from PRS (0.688) was higher compared to those derived using FH (0.654) and carrier status (0.646). The full model including PRS, carrier status, and FH improved the AUC (0.704) in risk prediction by 1.6%, 5%, and 5.8%, respectively, and was also better than any combination of two factors (Table 3, Additional file 2: Fig. S8a). We also performed an analysis in which age and sex were excluded. The AUCs demonstrate that the PRS still has a high discriminative power for CRC risk prediction (Additional file 2: Fig. S8b).
The impact of polygenic risk in single gene mutation carriers
The gene-specific analysis revealed a strong variability in risk conferred by rare heterozygous PV in the different genes. The largest effect sizes are attributable for MLH1 and APC, those for MSH2 and MSH6 are a bit less, while the effect size for PMS2 is considerably lower (Fig. 3). When heterozygous MUTYH variants are included in this analysis, the risks are very similar to the PMS2-related risks. Both the PMS2 and heterozygous MUTYH risks show a broad overlap with the non-carrier risks, while there is no overlap between the risks of non-carriers and those with PVs in MLH1, MSH2, MSH6, and APC.
We estimated how PRS and FH influence CRC prevalence among PV carriers in each of the five susceptibility genes (Additional file 2: Table S7). Despite the different effect sizes, the PRS and FH modifies the relative risk across all genes; however, the effect of PRS and FH is conversely related to the penetrance of the gene with the smallest effects in MLH1 PV carriers.
As for the overall analysis, in the gene-specific analysis a positive FH, a PV in a cancer risk gene, and a high PRS are associated with an increased CRC risk. As such, an individual with a low-penetrance PMS2 PV, but high PRS and/or positive FH ends up with an estimated CRC risk similar to a MSH6 PV carrier without FH and/or low PRS (Additional file 2: Fig. S9, Table S7).
Recent studies demonstrated that the polygenic background, defined as PRS based on disease-associated SNPs, modifies the risks for several cancers of the general population including CRC considerably, both in terms of age at onset and cumulative lifetime risks [12, 23, 27, 36,37,38]. In line with this, the risk alleles of those SNPs are found to also accumulate in unexplained familial and early-onset CRC cases [25, 39]. Whereas a low polygenic burden decreases the CRC risk down to one quarter on average, individuals with a high PRS (> 80%) doubles and those with a very high PRS (99%) almost quadruplicate their risk and thus, reach a CRC risk in an order of magnitude almost comparable to carriers of hereditary CRC with low PRS . In a pervious study, Jia et al. found that the risk of CRC is significantly associated with its PRS: Compared with individuals in the lowest PRS quintile those in the highest quintile had a greater than threefold risk (during a 5.8-year follow-up period). Hazard Ratios estimated with the middle quintile as the reference resulted in a risk between 0.56 and 1.71, a threefold risk in those in the top 1% of PRS, and a 70% reduced CRC risk for individuals in the bottom 1% of the PRS .
To extend these studies on how the CRC prevalence is influenced by genetic susceptibility using, we used the sufficiently larger, more robust dataset of the most recent UKBB cohort, incorporate the family history (FH) as an additional factor for risk stratification, and include a single gene analysis. We considered both the genetic component driven by rare high-penetrance PV associated with hereditary CRC and common low-penetrance variants captured by the PRS.
Firstly, our results confirm that the polygenic background strongly modulates CRC risk in the general population. Compared to the average polygenic burden, individuals with a low (< 20%) or high (> 80%) PRS are estimated to have a 0.5-fold or 2.1-fold change in the odds for CRC, respectively. The additional time-to-event analysis revealed a corresponding cumulative lifetime risk of 6% and 22% by age 75. Hence, when the PRS is included in risk calculation, around 20% of healthy individuals of the general population with no FH of CRC have a doubled CRC risk, which is similar to those with a first degree relative affected by CRC . These so far unknown and otherwise unrecognisable at-risk individuals might need surveillance 10–15 years earlier than usually recommended . On the other hand, the around 20% of individuals with low PRS and no FH might need less surveillance than the general population due to a considerably lowered risk, while even those with low PRS and positive FH might not need a more intense surveillance than the general population.
A concern in evaluating CRC PRS using 95 or 140 SNPs  in UKBB studies is that the calculation is based on summary statistics derived from a GWAS meta-analysis that included findings from the UKBB. Previous studies have also used 95 or 140 SNPs, but it is uncertain if this could result in overfitting of models. A recent study  addressed this issue using stringent inclusion criteria, only including 50 SNPs that reached GWAS significant (p < 5 × 10–8) in the meta-analysis after excluding UKBB samples. The effect sizes from meta-analysis of these 50 SNPs were then used to conduct the 50 PRS. The slightly lower OR of the 50 PRS in the present study compared to the 95 and 140 PRS might be due to overfitting; however, by comparing the PRS calculations, we could show that all three PRS models had a comparable performance in the UKBB cohort (Additional file 2: Figs. S2 and S6).
It is well known that among patients with hereditary CRC syndromes, the age of onset and cumulative CRC incidence is very heterogeneous, even within PV carriers of the same family. The estimated gene-specific, individual CRC lifetime risks of LS patients with MLH1 or MSH2 PV can be lower than 10% but as high as 90–100% in a considerable fraction. In the past, the analysis of modifying effects based on common CRC-associated variants in LS and other high-risk groups has been restricted to selected cohorts and small subsets of SNPs [42, 43]. A recent study demonstrated that the polygenic background also substantially influences the CRC risk in LS using UKBB data, even though the ORs for CRC risks could only be predicted due to the small sample sizes . In the present work, ORs could be calculated directly from the model since over three times more UKBB individuals have been included with six times more CRC cases, and five times more PV carriers.
So secondly, we were able to show that the PRS modifies the CRC risks not only in the general population considerably, but also in carriers of a MMR gene PV identified in the general population. For the first time we demonstrated, that this is also true for APC PV. Depending on the PRS, the cumulative CRC lifetime incidence in PV carriers ranged between 40 and 74%, and thus, the PRS is able to explain parts of the interindividual variation in CRC risk among PV carriers.
However, the single-gene analysis revealed heterogeneous effects across genes and therefore the modifying role of the polygenic background should be framed within the absolute risk attributable to individual genes. As expected, the effect of the PRS seems to be relevant in particular in less penetrant CRC risk genes such as PMS2 where the OR ranges between 0.94 and 5.43 respectively (Additional file 2: Table S6). This is in line with findings in moderate breast cancer risk genes such as CHEK2, PALB2 and ATM [44,45,46] and suggests that PRS inclusion in risk stratification may in particular be relevant to prevent excess of surveillance measures in PV carriers of those genes.
In addition, our results provide evidence that the inclusion of FH can further and independently improve the risk stratification in both carriers and non-carriers. Including PRS and FH in risk assessment, the cumulative CRC lifetime incidence ranged between 8 and 26%, and in PV carriers between 30 and 98%, and thus, outperformed the consideration of a single risk factor. This suggests that familial clustering points to additional risk factors besides those captured by common low-risk SNPs (PRS) and rare PV [47, 48]. These might be common and rare structural genetic alterations including copy number variants, rare non-coding variants, or other intermediate and low-impact risk variants not included routinely in PRS models, and non-genetic contributors such as environmental/lifestyle factors.
Only few PRS studies considered the FH. In line with our results, Jenkins et al. found no correlation between SNP-based and FH-based risks and an improved risk stratification when both PRS and FH are considered . In the analyses by Jia et al., the AUC derived from PRS (0.609) was substantially higher compared to the one derived using FH (0.523). Adding PRS and FH of cancer in first-degree relatives improved the model’s discriminatory performance (AUC 0.613) [17, 49]. Our AUC calculations point in the same direction with a higher AUC (0.704) when all three risk factors (PRS, FH, carrier status) are considered.
Interestingly and in apparent contrast to our results and those of others, a study using 826 European-descent carriers of PV in the DNA MMR genes MLH1, MSH2, MSH6, PMS2, and EPCAM (i.e. LS carriers) from the Colon Cancer Family Registry (CCFR) did not find evidence of an association between the PRS and CRC risk, irrespective of sex or mutated gene, although an almost identical set of SNPs was used for PRS calculations . A reason which might partly explain different risk estimates between studies using individuals from a population-based repository such as the UKBB and those using curated clinical data registries, where patients/families with suspected hereditary disease are included (e.g. the CCFR), is a potentially different risk composition across cohorts recruited in different ways (recruitment bias). That way, a familial clustering of CRC might reflect the existence of several genetic and non-genetic risk factors as outlined above, which are not captured by the PRS and which may superimpose the polygenic impact.
In particular, the composition of cases and controls is different between the Jenkins et al. study on the one hand and the Fahed et al. and present study on the other hand. In the Jenkins et al. study, obviously both cases (i.e., PV carriers with CRC) and controls (healthy PV carriers) derived from the same LS families, while the UKBB controls are PV carriers not apparently related to the PV cases. This is also reflected by the different ratio between cases and controls (7.5% CRC cases among PV carriers in the present study, but 61% in the Jenkins et al. study). Hence, the controls in the Jenkins et al. study are relatives of the cases and thus, it is likely that they share parts of the polygenic background and other risk factors of their affected relatives (cases) to a certain extent which may explain the observed missing effect of the PRS. The comparison between population-based and registry-based predictions indicates that the study design and recruitment strategy may strongly influence the results and conclusions. Consequently, the application of PRS in clinical practice should consider the familial background and ascertainment of the patient.
Our data analyses provide evidence that the PRS acts as a relevant risk modifier for CRC among both the general population and population-based PV carriers in genes causing hereditary CRC. The findings of us and others qualify the PRS as important component of risk stratification and resulting risk-adapted surveillance strategies in terms of age of onset and frequency. Given the risk distribution across PRS groups, the PRS can define a considerable proportion of the general population at a CRC risk level which is considered sufficient for a more or a less intensive surveillance. Importantly, the non-carriers with high PRS are a much larger target group compared to PV carriers and thus might generate an even higher preventive effect form a healthcare perspective. A small group of non-carriers with positive FH and high PRS even has CRC risks almost in the same order of magnitude as LS carriers without additional risk factors and thus may need similar intensive surveillance measures.
According to these findings, there should be a potential benefit for both the general population and at-risk individuals carrying PV, from the inclusion of PRS in healthcare prevention policies, as risk-stratified surveillance improves early disease detection and prevention. A recent study demonstrated that individuals with a higher genetic risk benefited more substantially from preventive measures than those with a lower risk: CRC screening was associated with a significantly reduced CRC incidence and more than 30% reduced mortality among individuals with a high PRS high PRS [51, 52]. Preliminary calculations indicate that polygenic-risk-stratified CRC screening could become cost-effective under certain conditions including an AUC value above 0.65 which was reached in our analyses .
Based on the striking different penetrance between individual hereditary CRC genes, very recent guidelines start to recommend a more gene-specific surveillance intensity in LS and polyposis [54, 55]. Given the strong modifying effect, the inclusion of additional risk factors will result in a more appropriate, clinically relevant risk stratification. Our results demonstrate that a combined risk assessment including FH and PRS will likely improve precise risk estimations and tailored preventive measures not only in the general population, but also in patients with hereditary disease.
Our study has some limitations. Firstly, there is evidence of a “healthy volunteers” selection bias of the UKBB population (UKBB participants tend to be healthier than the general population), and thus the results might not be completely generalizable in terms of effect sizes . Secondly, we cannot exclude that few carriers of APC PV who were classified as controls, are affected by a polyposis but have not been recognized as such or did not develop CRC due to intensive surveillance and/or prophylactic surgery, so that the calculated CRC risk of APC PV might be slightly underestimated. As in other similar studies, the presence of colorectal polyps could not be considered due to the lack of appropriate data. Thirdly, to increase the power of the analysis, our risk assessment was based solely on genetic variants and FH and did not include other risk factors. Previous studies on UKBB cohorts showed that lifestyle modifiable risk factors play a pivotal role in cancer prevalence, and a shared lifestyle within families could influence FH with the disease [49, 57]. That might explain the partly independent association of the FH and the genetic risk. Finally, although we performed the analysis on the whole UKBB cohort, we could not test the risk stratification generalizability across different populations due to the limited sample size. PRS could be biased towards the European population as PRS was constructed based on European reference GWAS. Thus, these PRS might be a worse predictor in non-European or admixed individuals, as previously discussed in different studies .
In conclusion, we show the important role of PRS and FH on CRC risk in both the general population and population-based carriers of a monogenic predisposition for CRC. The combined effect of common variants can strongly alter the age-related penetrance and life-time risk of CRC. Thus, the PRS represents an additional, independent stratification level to cancer risk besides the FH and lifestyle factors and likely increase the accuracy of risk estimation. Consequently, PRS can define a relevant proportion within the general population as a risk group, which should be considered as subjects for more intense surveillance measures, and in addition point to a striking risk variability even among carriers of hereditary CRC, which requires more personalized, risk-adapted surveillance strategies. As expected, the modifying effect of the PRS seems to be relevant in particular for moderate penetrant CRC risk genes. When important modifiers such as polygenic background, FH, and non-genetic factors are included in risk assessment, the dichotomous risk division between sporadic and hereditary CRC will be partly replaced by a more continuous risk distribution.
Availability of data and materials
Genome-wide genotyping data, exome-sequencing data, and phenotypic data from the UK Biobank are available upon successful project application (http://www.ukbiobank.ac.uk/about-biobank-uk/). Restrictions apply to the availability of these data, which were used under license for the current study (Project ID: 52,446). Summary statistics are available from the Polygenic Score Catalog (email@example.com): https://www.pgscatalog.org/publication/PGP000170/
Carr PR, Weigl K, Edelmann D, Jansen L, Chang-Claude J, Brenner H, et al. Estimation of absolute risk of colorectal cancer based on healthy lifestyle, genetic risk, and colonoscopy status in a population-based study. Gastroenterology. 2020;159:129–38.
Lichtenstein P, Holm NV, Verkasalo PK, Iliadou A, Kaprio J, Koskenvuo M, et al. Environmental and heritable factors in the causation of cancer–analyses of cohorts of twins from Sweden, Denmark, and Finland. N Engl J Med. 2000;343:78–85.
Czene K, Lichtenstein P, Hemminki K. Environmental and heritable causes of cancer among 9.6 million individuals in the Swedish Family-Cancer Database. Int J Cancer. 2002;99:260–6.
Win AK, Jenkins MA, Dowty JG, Antoniou AC, Lee A, Giles GG, et al. Prevalence and penetrance of major genes and polygenes for colorectal cancer. Cancer Epidemiol Biomarkers Prev. 2017;26:404–12.
Biller LH, Syngal S, Yurgelun MB. Recent advances in Lynch syndrome. Fam Cancer. 2019;18:211–9.
Grzymski JJ, Elhanan G, Morales Rosado JA, Smith E, Schlauch KA, Read R, et al. Population genetic screening efficiently identifies carriers of autosomal dominant diseases. Nat Med. 2020;26:1235–9.
Talseth-Palmer BA. The genetic basis of colonic adenomatous polyposis syndromes. Hered Cancer Clin Pract. 2017;15:5.
Kanth P, Grimmett J, Champine M, Burt R, Samadder NJ. Hereditary colorectal polyposis and cancer syndromes: a primer on diagnosis and management. Am J Gastroenterol. 2017;112:1509–25.
Vogt S, Jones N, Christian D, Engel C, Nielsen M, Kaufmann A, et al. Expanded extracolonic tumor spectrum in MUTYH-associated polyposis. Gastroenterology. 2009;137:1976–85.
Win AK, Dowty JG, Cleary SP, Kim H, Buchanan DD, Young JP, et al. Risk of colorectal cancer for carriers of mutations in MUTYH, with and without a family history of cancer. Gastroenterology. 2014;146:1208–11.
Stoffel EM, Koeppe E, Everett J, Ulintz P, Kiel M, Osborne J, et al. Germline genetic features of young individuals with colorectal cancer. Gastroenterology. 2018;154:897–905.
Chubb D, Broderick P, Dobbins SE, Frampton M, Kinnersley B, Penegar S, et al. Rare disruptive mutations and their contribution to the heritable risk of colorectal cancer. Nat Commun. 2016;7:11883.
Schubert SA, Morreau H, de Miranda NFCC, van Wezel T. The missing heritability of familial colorectal cancer. Mutagenesis. 2020;35:221–31.
Yurgelun MB, Kulke MH, Fuchs CS, Allen BA, Uno H, Hornick JL, et al. Cancer susceptibility gene mutations in individuals with colorectal cancer. J Clin Oncol. 2017;35:1086–95.
Brenner H, Hoffmeister M, Haug U. Family history and age at initiation of colorectal cancer screening. Am J Gastroenterol. 2008;103:2326–31.
Butterworth AS, Higgins JP, Pharoah P. Relative and absolute risk of colorectal cancer for individuals with a family history: a meta-analysis. Eur J Cancer. 2006;42:216–27.
McGeoch L, Saunders CL, Griffin SJ, Emery JD, Walter FM, Thompson DJ, et al. Risk prediction models for colorectal cancer incorporating common genetic variants: a systematic review. Cancer Epidemiol Biomarkers Prev. 2019;28:1580–93.
Huyghe JR, Bien SA, Harrison TA, Kang HM, Chen S, Schmit SL, et al. Discovery of common and rare genetic risk variants for colorectal cancer. Nat Genet. 2019;51:76–87.
Schmit SL, Edlund CK, Schumacher FR, Gong J, Harrison TA, Huyghe JR, et al. Novel common genetic susceptibility loci for colorectal cancer. J Natl Cancer Inst. 2019;111:146–57.
Lu Y, Kweon SS, Tanikawa C, Jia WH, Xiang YB, Cai Q, et al. Large-scale genome-wide association study of East Asians identifies loci associated with risk for colorectal cancer. Gastroenterology. 2019;156:1455–66.
Law PJ, Timofeeva M, Fernandez-Rozadilla C, Broderick P, Studd J, Fernandez-Tajes J, et al. Association analyses identify 31 new risk loci for colorectal cancer susceptibility. Nat Commun. 2019;10:2154.
Thomas M, Sakoda LC, Hoffmeister M, Rosenthal EA, Lee JK, van Duijnhoven FJB, et al. Genome-wide modeling of polygenic risk score in colorectal cancer risk. Am J Hum Genet. 2020;107:432–44.
Hsu L, Jeon J, Brenner H, Gruber SB, Schoen RE, Berndt SI, et al. A model to determine colorectal cancer risk using common genetic susceptibility loci. Gastroenterology. 2015;148:1330–9.
Frampton MJ, Law P, Litchfield K, Morris EJ, Kerr D, Turnbull C, et al. Implications of polygenic risk for personalised colorectal cancer screening. Ann Oncol. 2016;27:429–34.
Mur P, Bonifaci N, Díez-Villanueva A, Munté E, Alonso MH, Obón-Santacana M, et al. Non-lynch familial and early-onset colorectal cancer explained by accumulation of low-risk genetic variants. Cancers. 2021;13:3857.
Frampton M, Houlston RS. Modeling the prevention of colorectal cancer from the combined impact of host and behavioral risk factors. Genet Med. 2017;19:314–21.
Khera AV, Chaffin M, Aragam KG, Haas ME, Roselli C, Choi SH, et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat Genet. 2018;50:1219–24.
Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–9.
Yang H, Wang K. Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR. Nat Protoc. 2015;10:1556–66.
Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581:434–43.
Fahed AC, Wang M, Homburger JR, Patel AP, Bick AG, Neben CL, et al. Polygenic background modifies penetrance of monogenic variants for tier 1 genomic conditions. Nat Commun. 2020;11:3635.
Landrum MJ, Lee JM, Riley GR, Jang W, Rubinstein WS, Church DM, et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 2014;42:D980-985.
Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:7.
Khera AV, Chaffin M, Zekavat SM, Collins RL, Roselli C, Natarajan P, et al. Whole-genome sequencing to characterize monogenic and polygenic contributions in patients hospitalized with early-onset myocardial infarction. Circulation. 2019;139:1593–602.
Briggs SEW, Law P, East JE, Wordsworth S, Dunlop M, Houlston R, et al. Integrating genome-wide polygenic risk scores and non-genetic risk to predict colorectal cancer diagnosis using UK Biobank data: population based cohort study. BMJ. 2022;379:e071707.
Torkamani A, Wineinger NE, Topol EJ. The personal and clinical utility of polygenic risk scores. Nat Rev Genet. 2018;19:581–90.
Jenkins MA, Makalic E, Dowty JG, Schmidt DF, Dite GS, MacInnis RJ, et al. Quantifying the utility of single nucleotide polymorphisms to guide colorectal cancer screening. Future Oncol. 2016;12:503–13.
Jia G, Lu Y, Wen W, Long J, Liu Y, Tao R, et al. Evaluating the utility of polygenic risk scores in identifying high-risk individuals for eight common cancers. JNCI Cancer Spectr. 2020;4:21.
Archambault AN, Su YR, Jeon J, Thomas M, Lin Y, Conti DV, et al. Cumulative burden of colorectal cancer-associated genetic variants is more strongly associated with early-onset vs late-onset cancer. Gastroenterology. 2020;158:1274–86.
Fuchs CS, Giovannucci EL, Colditz GA, Hunter DJ, Speizer FE, Willett WC. A prospective study of family history and the risk of colorectal cancer. N Engl J Med. 1994;331:1669–74.
Rex DK, Boland CR, Dominitz JA, Giardiello FM, Johnson DA, Kaltenbach T, et al. Colorectal cancer screening: recommendations for physicians and patients from the U.S. Multi-Society Task Force on Colorectal Cancer. Am J Gastroenterol. 2017;112:1016–30.
Wijnen JT, Brohet RM, van Eijk R, Jagmohan-Changur S, Middeldorp A, Tops CM, et al. Chromosome 8q23.3 and 11q23.1 variants modify colorectal cancer risk in Lynch syndrome. Gastroenterology. 2009;136:131–7.
Talseth-Palmer BA, Wijnen JT, Brenne IS, Jagmohan-Changur S, Barker D, Ashton KA, et al. Combined analysis of three Lynch syndrome cohorts confirms the modifying effects of 8q23.3 and 11q23.1 in MLH1 mutation carriers. Int J Cancer. 2013;132:1556–64.
Hassanin E, May P, Aldisi R, Spier I, Forstner AJ, Nöthen MM, et al. Breast and prostate cancer risk: the interplay of polygenic risk, rare pathogenic germline variants, and family history. Genet Med. 2022;24:576–85.
Mars N, Widén E, Kerminen S, Meretoja T, Pirinen M, Della Briotta Parolo P, et al. The role of polygenic risk and susceptibility genes in breast cancer over the course of life. Nat Commun. 2020;11:6383.
Gao C, Polley EC, Hart SN, Huang H, Hu C, Gnanaolivu R, et al. Risk of breast cancer among carriers of pathogenic variants in breast cancer predisposition genes varies by polygenic risk score. J Clin Oncol. 2021;39:2564–73.
Jenkins MA, Win AK, Dowty JG, MacInnis RJ, Makalic E, Schmidt DF, et al. Ability of known susceptibility SNPs to predict colorectal cancer risk for persons with and without a family history. Fam Cancer. 2019;18:389–97.
Biller LH, Horiguchi M, Uno H, Ukaegbu C, Syngal S, Yurgelun MB. Familial burden and other clinical factors associated with various types of cancer in individuals with lynch syndrome. Gastroenterology. 2021;161:143–50.
Kachuri L, Graff RE, Smith-Byrne K, Meyers TJ, Rashkin SR, Ziv E, et al. Pan-cancer analysis demonstrates that integrating polygenic risk scores with modifiable risk factors improves risk prediction. Nat Commun. 2020;11:6084.
Jenkins MA, Buchanan DD, Lai J, Makalic E, Dite GS, Win AK, et al. Assessment of a polygenic risk score for colorectal cancer to predict risk of lynch syndrome colorectal cancer. JNCI Cancer Spectr. 2021;5:22.
Choi J, Jia G, Wen W, Long J, Shu XO, Zheng W. Effects of screenings in reducing colorectal cancer incidence and mortality differ by polygenic risk scores. Clin Transl Gastroenterol. 2021;12:e00344.
Stanesby O, Jenkins M. Comparison of the efficiency of colorectal cancer screening programs based on age and genetic risk for reduction of colorectal cancer mortality. Eur J Hum Genet. 2017;25:832–8.
Naber SK, Kundu S, Kuntz KM, Dotson WD, Williams MS, Zauber AG, et al. Cost-effectiveness of risk-stratified colorectal cancer screening based on polygenic risk: current status and future potential. JNCI Cancer Spectr. 2020;4:86.
Monahan KJ, Bradshaw N, Dolwani S, Desouza B, Dunlop MG, East JE, et al. Guidelines for the management of hereditary colorectal cancer from the British Society of Gastroenterology (BSG)/Association of Coloproctology of Great Britain and Ireland (ACPGBI)/United Kingdom Cancer Genetics Group (UKCGG). Gut. 2020;69:411–44.
Seppälä TT, Dominguez-Valentin M, Sampson JR, Møller P. Prospective observational data informs understanding and future management of Lynch syndrome: insights from the Prospective Lynch Syndrome Database (PLSD). Fam Cancer. 2021;20:35–9.
Tyrrell J, Zheng J, Beaumont R, Hinton K, Richardson TG, Wood AR, et al. Genetic predictors of participation in optional components of UK Biobank. Nat Commun. 2021;12:886.
Saunders CL, Kilian B, Thompson DJ, McGeoch LJ, Griffin SJ, Antoniou AC, et al. External validation of risk prediction models incorporating common genetic variants for incident colorectal cancer using UK biobank. Cancer Prev Res. 2020;13:509–20.
Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet. 2019;51:584–91.
UK Biobank analyses were conducted via application 52446 using a protocol approved by the Partners HealthCare Institutional Review Board. CM and EH are supported by the BONFOR-program of the Medical Faculty, University of Bonn (O-147.0002). This study was supported (not financially) by the European Reference Network on Genetic Tumour Risk Syndromes (ERN GENTURIS)—Project ID No 739547. ERN GENTURIS is partly co-funded by the European Union within the framework of the Third Health Programme “ERN-2016—Framework Partnership Agreement 2017–2021”. DRB and PM are supported by the FNR INTER INTER/DFG/21/16394868. This research was also supported by the Instituto de Salud Carlos III and co-funded by European Social Fund—ESF investing in your future—(grants CM19/00099 and PID2019-111254RB-I00) and from the European Union’s Horizon 2020 research and innovation program under the EJP RD COFUND-EJP Nº 825575.
Open Access funding enabled and organized by Projekt DEAL. No funding was obtained for this study.
Ethics approval and consent to participate
Consent for publication
No potential conflicts (financial, professional, or personal) relevant to the manuscript.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Hassanin, E., Spier, I., Bobbili, D.R. et al. Clinically relevant combined effect of polygenic background, rare pathogenic germline variants, and family history on colorectal cancer incidence. BMC Med Genomics 16, 42 (2023). https://doi.org/10.1186/s12920-023-01469-z
- Colorectal cancer
- Family history
- Hereditary cancer
- Polygenic risk
- Risk stratification