Skip to main content

Text Box 1 Human genetic variation and disease

From: Whole genome sequencing in clinical practice

The human genome is composed of 3.2 billion base pairs of DNA, organized into 23 pairs of chromosomes [2,3,4]. In addition to the nuclear genome there is also a small amount of maternal DNA located in the mitochondria. Only about 1,5% of the genome sequences consists of protein coding exons [41]. The remaining 98% of the genome is made up of non-coding regions, which include regulatory elements, repetitive DNA sequences, and other functional elements [42]. While there is general consensus that we have about 20,000 protein-coding genes, the size of the proteome is still debated [41, 43]. Moreover, a numbers of non-coding RNAs such as micro RNAs and long non coding transcripts are also produced but their number and biological significance are with few exceptions uncertain [44].

Like any other species humans are under constant selection and genetic variation is an integral part of the evolution. We continuously acquire both positive adaptive germ cell mutations as well as neutral and disease causing variants [45]. Mutations result from radiation, environmental stress factors and deficient DNA repair [46] and they locate to all parts of the genome [47] albeit with varying frequency. On average a human genome accumulate about 75 mutations per generation [45]. Dominantly inherited variations leading to lactase persistence has for example allowed adult northern Europeans to digest milk [48] and the caspase-12 gene is polymorphic for a stop codon, that makes carriers more resistant to severe sepsis. We can also observe how the Black Death shaped genetic diversity around particular immune loci such as ERAP2 and CTLA4, highlighting how natural selection may have played a role in present-day susceptibility towards chronic inflammatory and autoimmune disease [49]. Finally, it is clear that genes encoding transcription factors and RNA binding proteins which are essential for fetal development are subject to a strong selective pressure as illustrated by their low or entirely absent occurrence of loss of function variants [30].

In genetic terms humans are 99.9% identical to each other. The remaining 0.1% of our genome corresponding to ~ 3.000.000 simple variants distinguish us from another. Among these ~ 45.000 (1.5%) are found in protein coding exons [2]. In addition, numerous structural variations, such as copy number variations (CNVs) and structural variations (SVs) may contribute to our genetic diversity [50, 51]. From a medical perspective, this genetic variation significantly influences individual susceptibility and disease development. The impact extends to pharmaceutical side effects and clinical outcomes, underscoring the integral role of genome sequencing in personalized medicine.

Both rare (< 1% minor allele frequency) and common variants (> 1% minor allele frequency) contribute to the risk of developing a disease, and they can sometimes interact with each other in complex ways. From a diagnostic point of view this is one of the major challenges for the current interpretation of WGS data. Common variants are typically associated with a small increase in disease risk, but because they are so common, they can have a significant impact on the population as a whole. At the individual level the presence of numerous common variants may generate a significant risk for a particular disease and their cumulative effect is captured by the current polygenic risk scores (PRS). Rare disease associated variants, on the other hand, with few exceptions occur at a much lower frequency in the population, often far less than 1% of individuals. Most of the rare variants that are considered in diagnostics locates to the coding exons and alters or reduce the function of the encoded proteins. In families they exhibit a mendelian segregation pattern in the families, but they may also occur as de novo variants. During the past decade genome-wide association studies (GWAS) have associated thousands of common-variants to various diseases and traits, and in the same a series of large-scale sequencing studies have recently started to identify rare-variant associations [52,53,54]. A surprising finding has been that for a particular trait, common and rare variants appear to be mechanistically convergent [55]. The relative contribution of rare variants to the total genetic burden may be relatively small but rare variants may serve to improve the fundamental understanding of the disease pathogenesis and define possible targets of treatment.