Skip to main content
  • Research article
  • Open access
  • Published:

Inflammation and neuronal gene expression changes differ in early versus late chronic traumatic encephalopathy brain



Our understanding of the molecular underpinnings of chronic traumatic encephalopathy (CTE) and its associated pathology in post-mortem brain is incomplete. Factors including years of play and genetic risk variants influence the extent of tau pathology associated with disease expression, but how these factors affect gene expression, and whether those effects are consistent across the development of disease, is unknown.


To address these questions, we conducted an analysis of the largest post-mortem brain CTE mRNASeq whole-transcriptome dataset available to date. We examined the genes and biological processes associated with disease by comparing individuals with CTE with control individuals with a history of repetitive head impacts that lack CTE pathology. We then identified genes and biological processes associated with total years of play as a measure of exposure, amount of tau pathology present at time of death, and the presence of APOE and TMEM106B risk variants. Samples were stratified into low and high pathology groups based on McKee CTE staging criteria to model early versus late changes in response to exposure, and the relative effects associated with these factors were compared between these groups.


Substantial gene expression changes were associated with severe disease for most of these factors, primarily implicating diverse, strongly involved neuroinflammatory and neuroimmune processes. In contrast, low pathology groups had many fewer genes and processes implicated and show striking differences for some factors when compared with severe disease. Specifically, gene expression associated with amount of tau pathology showed a nearly perfect inverse relationship when compared between these two groups.


Together, these results suggest the early CTE disease process may be mechanistically different than what occurs in late stages, that total years of play and tau pathology influence disease expression differently, and that related pathology-modifying risk variants may do so via distinct biological pathways.

Peer Review reports


Chronic traumatic encephalopathy (CTE) is a progressive neurodegenerative disease that is caused at least in part by exposure to repeated traumatic head impacts most commonly observed in the brains of athletes of contact sports and combat veterans. CTE pathology is associated with many clinical symptoms, including behavioral, personality and mood changes as well as memory and cognitive function deficits. Upon autopsy, CTE is diagnosed by the presence of neurofibrillary tangles found around blood vessels and in the depths of the cortical sulcus.

The primary risk factor for developing CTE is the duration of exposure to repetitive head impacts (RHI), which can be measured as total years of play for contact sport athletes [1]. The neurological symptoms of CTE often only manifest decades after players have retired, suggesting the effects of RHI set in motion a progression of pathological events ultimately leading to disease. However, not all individuals with a similar degree of exposure will go on to develop CTE, and while some genetic evidence suggests specific genes are associated with increased risk of disease, the mechanisms underlying the disease process are currently unknown. It is also currently unknown whether early disease states differ from those that are coincident with severe pathology.

Our previous work showed that two genetic risk variants in the Apolipoprotein E (APOE) and Transmembrane Protein 106B (TMEM106B) genes also influence the frequency and severity of CTE. The association of the APOE e4 allele with increased Alzheimer’s Disease (AD) risk is well documented [2] and APOE e4 has also been shown to be associated with increased severity of CTE pathology [3]. Variants in TMEM106B are associated with increased neuroinflammation in aging [4], frontotemporal lobar degeneration (FTLD)-TDP [5], and with AD [6, 7] and our prior study suggested a protective role of the TMEM106B variant rs3173615 in CTE [8]. While prior research has provided some information about the functional roles of APOE [9] and to a lesser extent TMEM106B [10,11,12,13] in both normal and disease contexts, the role they play in the development of CTE is poorly understood.

To address these knowledge gaps, we profiled whole transcriptome gene expression of prefrontal cortex (Brodmann Area 9) by mRNASeq in a cohort of 66 CTE and 10 RHI controls. The CTE samples were subdivided into 13 low (CTE-L, Stages I & II) and 53 high (CTE-H, Stages III & IV) pathology groups using the McKee staging criteria [14, 15], with the goal of identifying molecular changes associated with early versus late disease. The RHI controls are individuals who experienced a similar level of repetitive head impacts exposure as the disease groups but displayed no CTE pathology upon autopsy. In this study, we sought to identify genes and biological processes associated with differences between CTE and RHI controls, repetitive head impacts exposure as measured by total years of play, amount of tau pathology using immunohistological quantification of Phospho-Tau (AT8), and between APOE and TMEM106B risk variant carrier groups.


Sample characteristics

Autopsy participants with a history of RHI exposure were drawn from the from the Understanding Neurologic Injury and Traumatic Encephalopathy (UNITE) study. Inclusion criteria for UNITE include a history of contact sports participation, military service, or domestic violence [16]. Participants were excluded if they lacked fresh frozen prefrontal cortex, died from drug overdose, hanging, or gunshot wound to the head, or had motor neuron disease or other significant neurodegenerative disease in the absence of CTE. For this study, contact sports included American football (n=67), boxing (n=2), ice hockey (n=2), and professional wrestling (n=1). Years of consecutive contact sport participation was used as a proxy for RHI. Two participants without CTE had RHI from non-contact sports exposures (e.g. multiple traumatic brain injuries during military service) and were not included in the years of play analyses. Sample statistics are included in Table 5. This study has been approved by the VA Bedford Healthcare System and Boston University School of Medicine Institutional Review Boards.

CTE-L and CTE-H samples were categorized based on the McKee staging criteria [14, 15] Stages I & II and Stages III & IV, respectively. Briefly, CTE stages I and II are characterized by patchy and focal neurofibrillary tangles and tau-positive processes present around blood vessels and within the sulcal depths primarily within the frontal lobes, but also involving temporal and parietal lobes in stage II. CTE stages III and IV are characterized by more widespread tau pathology that extends to the gyral crest in multiple cortical lobes and involves the medial temporal lobe and subcortical structures. This stratification was used to model early versus late disease stages by noting that CTE-L subjects were younger at time of death on average than CTE-H, and the model assumes that CTE-L individuals would have progressed to CTE-H had they lived longer. RHI subjects were selected as athletes with a similar age at death as the CTE-L subjects and a similar level of repetitive head impacts exposure as the CTE subjects but exhibiting no CTE pathology (McKee Stage 0) upon autopsy. Total years of play was collected from family members at the time of brain donation and cross-checked with available public records for American football players [16]. AT8 histology measurements were collected using digital pathology analyses via an Aperio slide scanner as previous described [17]. The raw AT8 measures were observed to be log-normal by inspection and therefore were log transformed for all downstream analyses. For this study, subjects with no APOE4 allele were classified as APOE 0, and those with at least one APOE4 allele were classified as APOE 1. Additionally, subjects with a G in TMEM106B rs3173615 were classified as TMEM 0 and those with homozygous C were classified as TMEM 1.

Sample processing, RNA extraction, and mRNA sequencing

Grey matter tissue was dissected from the cortical ribbon of the Brodmann Area 9 gyral crest. Total RNA was extracted using the Promega Maxwell RSC simplyRNA Tissue Kit (Cat No. AS1340) according to the manufacturer’s protocol. The integrity and quality of RNA were verified by an Agilent 2100 Bioanalyzer using RNA 600 Nano Chips (Cat No. 5067-1511). Only cases with a RNA Integrity Number (RIN) of 4 or higher using an Agilent Bioanalyzer instrument were selected for study. Paired end poly-A selected mRNA sequencing libraries with a targeted library size of 80M reads were generated using a Kapa Stranded mRNA-Seq Kit according to the manufacturer instructions and sequenced on an Illumina MiSeq instrument by the Boston University Microarray and Sequencing Core. Samples were sequenced in four batches due to study size and were distributed among batches as to avoid any confounding of status, age at death, RIN, or other important experimental variables.

mRNA-Seq analysis

A custom analytical pipeline was developed to analyze the sequencing data. Sequencing reads adapter- and quality-trimmed using trimmomatic [18] and assessed for high quality using FastQC [19] and multiqc [20]. Trimmed reads were aligned against the human reference genome build GRCh38 with Gencode v34 [21] gene annotation using STAR [22]. Aligned reads were quantified to the gene level using the HTSeq package? and the Gencode v34 gene annotation. For each analysis, genes with abundance estimates less than 0 in at least 50% samples within each group were filtered out.

Differential expression (DE) analyses were conducted separately for case status, total years of play, and AT8 with the DESeq2 [23] package. The five DE analyses are listed and summarized in Table 1. Group comparisons of CTE-L and CTE-H were performed separately with all RHI samples. For total years of play, AT8, APOE and TMEM106B, CTE-L and RHI samples were grouped together to increase sample size, as modeling continuous variables on the groups separately had insufficient statistical power to detect meaningful associations. CTE-H samples were analyzed as a group for these variables. Three samples (RHIN_0052, CTES_0064, CTES_0079) were filtered out of the years of play analysis due to extreme values (> 30 years) that drove spurious DE results. One sample, CTES_0068 was filtered out of the APOE and TMEM106B analysis due to missing TMEM106B value. All analyses included age at death, RIN, and sequencing batch in the model as covariates in addition to the variable of interest. Gene associations were considered significant if they attained a false discovery rate (FDR) of less than 0.1.

Table 1 DE models and result statistics

Differential expression results from each analysis were subject to preranked gene set enrichment against the Gene Ontology annotation [24] curated in the c5 MsigDB gene set database [25] using the fgsea [26] package. This gene set enrichment strategy uses ranked fold change for all genes irrespective of significance and compares this ranked list against the genes annotated to each gene set. The intuition of this analysis is that if the genes related to a biological pathway or process are collectively increased or decreased when comparing disease versus control, this suggests that biological pathway or process is involved in disease. The analysis produces a statistic (Normalized Enrichment Score, NES) and associated p-value with each gene set that may be positive or negative if the genes within that gene set are highly or lowly ranked, respectively. Since gene sets may have genes that either enhance or inhibit the overall activity of a pathway or process, we cannot interpret a positive or negative NES to mean the function of pathway or process overall is increased or decreased, just that the genes within the process are coordinately differentially expressed. GO categories were considered significant if they attained an FDR of less than 0.1. To aid in interpretation of GO terms, each enriched term was manually categorized by the investigators into one of a set of high level biological categories a priori. Categories include blood brain barrier (BBB), extracellular matrix (ECM)/membrane, cell cycle, cytoskeleton, development, immune/inflammation, ion homeostasis, metabolism/mitochondria, neuron, protein processing, signaling, transcription/translation, and an "other" category for terms that were not easily categorized. The immune/inflammation and neuron categories were further subcategorized due to the large number of enriched terms and to aid in further interpretation. All custom analysis was performed with python, R, snakemake [27] and Jupyter Lab [28] software.

qPCR validation

Delta CT expression values for an a priori set of 33 genes known to be implicated in CTE (full gene list in Additional file 2: Table S2) in a set of 54 overlapping BA9 brain samples (37 CTE-H, 9 CTE-L, and 8 RHI) that overlapped with the samples presented in this study were analyzed for differential expression. Linear regression models were constructed for each gene modeling qPCR expression values as a function of each variable of interest separately adjusting for RIN and age at death, consistent with the DE models conducted with the mRNASeq data. Genes were marked as concordant between the qPCR and DE analyses if the direction of effect agreed (i.e. both either positive or negative log2 fold change), irrespective of significance in either dataset. Concordant/discordant genes were tabulated into confusion matrices for each analysis and Fisher’s exact test was applied to each to assess the likelihood the degree of concordance could occur by chance using a right-tailed p-value. Separately, Spearman correlation was computed for qPCR versus mRNASeq DE log2 fold change values, to assess overall agreement in ranked effect size.

Data and code availability

Raw and processed read data have been deposited into GEO under accessions GSE157330. All results, analysis, and figure code for this project are available on an Open Science Framework project located at


Distributions of these key sample characteristics are depicted in Figure 1.

Fig. 1
figure 1

Sample characteristics for the variables examined in this study. AC Distribution of total years of play, age at death, and log AT8 histochemical quantification, respectively, for RHI, CTE-L, and CTE-H sample groups. D, E Distribution of age at death for each sample group broken out into risk allele groups for APOE and TMEM106B, respectively. F, G Distribution of log AT8 for each sample group as in D and E

Five separate differential expression (DE) and subsequent gene set enrichment analyses (GSEA) using Gene Ontology (GO) annotations were conducted in this study. Each analysis sought to identify genes whose expression was associated with the different comparisons of interest in the same set of samples as described in Table 1. Specifically, separate DE models were conducted corresponding to case versus RHI control, number of years of play as a continuous variable, amount of tau pathology as measured by AT8 histochemistry, possession of one or more APOE e4 risk alleles, and possession of the TMEM106B risk allele (i.e. recessive model homozygous C for rs3173615). Samples were stratified into low (CTE-L, Stages I & II) and high (CTE-H, Stages III & IV) pathology groups and analyzed for each model separately. Table 1 contains sample count information and summary statistics for the five DE models reported in this study.

CTE versus RHI

Figure 2 depicts results from the case status DE models and subsequent gene set enrichment analysis. DE genes were primarily decreased in disease compared with RHI for both sample groups (Figure 2A, B). There was relatively consistent agreement in the direction of effect for genes when comparing the two disease groups with RHI controls, and the common FDR significant DE genes were all changed in the same directions (marked genes in Figure 2C, listed in Table 2). However, there was little agreement in the direction of effect for gene set enrichment, where only four GO terms were significantly altered at FDR < 0.1 in both analyses, and three of those show an opposite direction of effect (Figure 2D, E). GO terms that trend toward significance (i.e. nominal p-value < 0.05) similarly showed substantial discordance in direction of effect for some GO terms (Figure 2E), where processes related to immune response, inflammation, blood brain barrier, extracellular matrix/membrane, and metabolism were increased in high stage disease but decreased in low stage disease. Processes increased in both were related to protein processing, metabolism, neuronal functions, and metal ion homeostasis, while those decreased in both involve mostly ribosomal processes and transcription/translation (Figure 2E).

Fig. 2
figure 2

DE statistics for case versus RHI comparisons. Early and late stage CTE versus RHI controls showed general concordance in direction of DE genes, but mixed agreement on the biological process level. A, B) Distribution of log2 fold change for DE genes with FDR < 0.1 for both CTE-H versus RHI and CTE-L versus RHI, respectively. C Log2 fold change values for DE genes with FDR < 0.1 in either CTE-H or CTE-L analyses. D Normalized Enrichment Scores (NES) from Gene Set Enrichment Analysis (GSEA) of GO terms from the c5 MSigDB curated GO annotation at FDR < 0.1. Gene sets significant in both analyses are highlighted and colored based on concordance (i.e. same or different direction of effect). E Hierarchically clustered heatmap of NES for enriched GO terms from D that have nominal p-value less than 0.05 in both CTE-H versus RHI and CTE-L versus RHI. Diamond and X markers correspond to GO terms significant at FDR < 0.1 in both from D. GO term names are colored red if the direction of effect is discordant between analyses. NS—GO namespace of corresponding term: BP—biological process, CC—cellular component, MF—molecular function

Table 2 Common significant genes for CTE-L and CTE-H vs RHI from Fig. 2C

To understand which processes were unique to CTE-H or CTE-L, we filtered the GO terms to include only those with FDR

< 0.05 in one analysis and nominal p-value > 0.25 in the other. This strategy identified 290 and 4 GO terms that were strongly enriched in CTE-H versus RHI and CTE-L versus RHI, respectively. The numbers of these uniquely enriched GO terms as well as the 41 with nominal p-value < 0.05 in both analyses depicted in Figure 2E organized by category are in Table 3. Immune and inflammation processes have the highest number of enriched GO terms and were represented in both analyses, with 15 terms in common between them. All biological categories were implicated by both analyses except for cytoskeleton, apoptosis, and signaling terms, which were only identified when comparing CTE-H and RHI.

Table 3 Counts of enriched GO terms grouped by high level category uniquely significant in CTE-L, in CTE-H, or implicated by both sample groups

To better summarize the large number of significantly enriched GO terms, we manually categorized GO terms found to be significant in any analysis into 13 high level categories, as depicted in Figure 3 (see Additional file 1: S1 for a complete categorization table). This categorization strategy enables concise comparison of different biological processes between groups, including direction of effect. The clustered heatmap of Normalized Enrichment Scores (NES) for GO terms significant at FDR < 0.1 in either CTE-H versus RHI or CTE-L versus RHI depicted in Figure 3A shows that there were terms which trend in both concordant and discordant directions of effect, and there is no obvious consistency in the concordance pattern from the perspective of categories. When the significant terms were grouped by category as in Figure 3B, immune/inflammatory processes appear as the most frequently increased in CTE-H versus RHI, while these processes appear decreased in CTE-L versus RHI.

Fig. 3
figure 3

Enriched GO terms for case versus control. Detailed GO term enrichment of early and late CTE versus RHI showed concordant neuronal processes and opposite direction of effect for inflammatory processes. A Heatmap of normalized enrichment scores (NES) for enriched GO terms. Row color bar represents significant pathways respective of columns (salmon colored bars to right of dendrogram indicate corresponding NES is FDR < 0.1) Rightmost color bar represents the GO category as listed in the legend. B Number of significant GO terms from A grouped by category. Terms with positive, negative NES (red, blue in A respectively) are plotted as bars to the right and left of 0, respectively. C Significant GO terms from the neuron category in B grouped by subcategory. D Significant GO terms from the immune/inflammation category in B grouped by subcategory

Due to their relevance to this disease context, the immune/inflammation and neuron categories were further divided into subcategories based on their biological role (Figure 3C, D). From Figure 3C, increased neuronal development terms comprise most of the significant terms between CTE-H and RHI, while a small number of increased synaptic processes were implicated by CTE-L versus RHI. Increased innate immune processes were the most numerous in the severe CTE group but were closely followed by immune cell migration, cytokine-related inflammation, and apoptotic processes, while the few significant terms in CTE-L versus RHI suggest decreased phagocytosis, innate immune, antigen presentation, and adaptive immune processes (Figure 3D).

Association with total years of play

We next sought to identify genes associated with exposure as measured by total years of play stratified into low (CTE-L+RHI) and high (CTE-H) pathology sample groups as a model of early versus late changes associated with head impact exposure. We chose to group CTE-L and RHI samples together in this analysis for three reasons. First, a goal of this study is to identify potential early changes in result of exposure to repetitive head trauma. Although CTE-L and RHI are pathologically distinct, they represent a similar level of exposure in years of play and age at death (Figure 1A, B), and therefore we hypothesize that similar genes are involved. Second, the brain tissue was taken from the gyral crest of the prefrontal cortex, which tends to be relatively spared of pathology in early disease compared with late, thereby avoiding most effects caused directly by pathology that likely influence the CTE-H samples. Last, we wanted to maximize our statistical power to detect differences with the total years of play variable by combining into a larger sample size.

Figure 4A, B compare DE genes and enriched GO terms for genes associated with total years of play, respectively. We note that there were no significant DE genes at FDR < 0.1 for either analysis (Fig. 4A, Table 1), and unlike disease versus RHI DE genes, there is no apparent relationship between the direction of effect of these genes between CTE-L+RHI and CTE-H, and only 3 genes were nominally significant in both analyses (Figure 4A). GO term enrichment, on the other hand, identified many significant enriched terms at FDR < 0.1 that show consistently similar direction of effect of enriched terms, and all terms that are significant in both analyses are increased (Figure 4C). Most of these common GO terms are related to immune response and inflammation, but unlike with comparison of disease versus RHI, all are increased in both CTE-L+RHI and CTE-H (Figure 4D). With the exception of neuronal processes in CTE-L+RHI, all implicated GO terms are positively associated with years of play in both sample groups.

Fig. 4
figure 4

DE and GSEA statistics for genes associated with years of play. There were no FDR significant DE genes in either group but many significant gene sets in common that showed complete concordance in direction of effect between RHI + CTE-L and CTE-H. A Log2 fold change estimates of genes associated with years of play for CTE-L + RHI (Low) and CTE-H (High) sample groups. As no genes were significant after adjusting for multiple hypotheses, genes with nominal p-value less than 0.01 are plotted. B Gene set enrichment analysis of GO terms using log2 fold change. Gene sets are Concordant if they are significant at FDR < 0.1 in both comparisons and are modified in the same direction. C Clustered heatmap of enriched GO terms for both CTE-H and CTE-L + RHI associated with total years of play at nominal p-value < 0.05 in both. D GO terms from C grouped by category. E Neuron category terms. F Immune/Inflammation category terms

Association with Tau pathology

We next sought to identify genes associated with tau protein aggregation as measured by immunohistological quantification of phospho-tau (AT8). AT8 immunostaining values were log transformed to attain normally distributed values. We again group CTE-L and RHI together for reasons described above. Although the amount of AT8 differs between CTE-L and RHI, the amount of tau pathology is more similar between CTE-L and RHI than CTE-H (Figure 1C, note log scale).

A substantial number of DE genes were associated with log AT8 at FDR < 0.1 for both CTE-L+RHI and CTE-H groups, and a large number of genes were significantly associated with both (Figure 5A). Strikingly, nearly all significant DE genes show an opposite direction of effect when comparing CTE-L+RHI and CTE-H. This inverse relationship is also observed in the enriched GO terms induced by the DE genes (Figure 5B), where all common enriched terms are discordant in direction of effect. The inverse relationship is also visible in Figure 5C, where immune/inflammation processes are primarily up in CTE-H but trend down in CTE-L+RHI, and the reverse is true of many other terms. This is shown clearly in Figure 5D, where many immune/inflammation terms are increased in association with AT8 in the CTE-H group, similar to the association seen with total years of play. However, CTE-H also shows an equally large number of neuronal terms that are decreased with increasing amounts of tau and, curiously, these processes appear to be positively associated with AT8 in the CTE-L+RHI group.

Fig. 5
figure 5

DE and GSEA statistics for genes associated with log AT8. There were many FDR significant DE genes and gene sets associated with AT8 that showed almost complete discordance in direction of effect between RHI + CTE-L and CTE-H. A Log2 fold change estimates of genes associated with AT8 for CTE-L + RHI (Low) and CTE-H (High) sample groups. B Gene set enrichment analysis of GO terms using log2 fold change. Gene sets were discordant if they were significant in both comparisons and are modified in opposite directions. C Clustered heatmap of enriched GO terms for both CTE-H and CTE-L + RHI associated with total years of play at nominal p-value < 0.05 in both. D GO terms from C grouped by category. E Neuron category terms. F Immune/Inflammation category terms

Risk variant effects

We next investigated how risk variants of APOE and TMEM106B affect low and high pathology groups. A dominant encoding for the APOE e4 allele was used to separate samples in each sample group, i.e. individuals with at least one e4 allele were included in the risk group. The TMEM106B SNP rs3173615 G allele has been shown to be protective [8]. Thus, individuals homozygous for the non-protective C allele were grouped to form the risk carriers (i.e. a recessive encoding for the non-protective allele). This encoding makes comparing the results of the two variants more straightforward, as association of genes and GO terms have the same interpretation with respect to the risk carriers.

DE analyses were conducted comparing the risk versus non-risk subjects within each of the CTE-L+RHI and CTE-H groups, resulting in four sets of DE results. Very few genes were significantly associated at FDR < 0.1, where only 2 met this significance level in the APOE comparisons and none for TMEM106B (see Table 1). However, 196 and 76 GO terms were significantly associated at FDR < 0.1 for CTE-L+RHI and CTE-H, respectively, with APOE risk, and 3 and 262 were significantly associated with these in TMEM106B risk. It is interesting to note that the number of significant GO terms is higher for APOE risk in the CTE-L+RHI group, which has the smallest number of samples, and also higher in the CTE-H group for TMEM106B risk, where the remaining two analyses yielded relatively few results.

We next compared the log2 fold change associated with the risk allele group of genes between variants and sample groups (Figure 6A–D). In principle, if the risk variants modulate disease risk using a common mechanism, the DE genes identified for the different risk genes and sample groups should show some similarity. Very few nominally significant genes overlapped between any of the comparisons, as illustrated in the Venn diagrams of Figure 6. However, similar to the inverse relationship observed between CTE-L+RHI and CTE-H in the AT8 comparison, we observed that the effect size of the overlapping genes was inverted between these groups for APOE risk carriers (Figure 6A scatter plot). The relationship was less consistent when comparing TMEM106B risk carriers between CTE-L+RHI and CTE-H, where there were both concordant and discordant directions of effect in the common genes (Figure 6B scatter plot). Curiously, an inverse relationship is also observed when comparing the APOE and TMEM106B risk carriers within the CTE-L+RHI group (Figure 6C scatter plot). There is no consistent relationship between the genes for the two risk carriers within CTE-H (Figure 6D scatter plot).

Fig. 6
figure 6

DE and GSEA statistics for genes and gene sets associated with APOE ε4 and TMEM106B risk alleles. Genes associated with risk alleles in both sample groups were largely disjoint, and gene sets associated with APOE and TMEM106B risk alleles were primarily found in CTE-L + RHI and CTE-H, respectively. A, B Venn diagram of nominally significant (p-value < 0.05) gene overlap and scatterplot of L2FC of CTE-L + RHI versus CTE-H sample groups for APOE (A) and TMEM106B risk allele (B), respectively. C, D Venn diagram of nominally significant (p-value < 0.05) gene overlap and scatterplot of L2FC of APOE ε4 versus TMEM106B risk allele forCTE-L + RHI (C) and CTE-H (D), respectively. E NES heatmap of GO terms significant at FDR < 0.1 in at least one condition. Bars on left of the heatmap indicate significance of corresponding NES in the heatmap columns respectively, GO IDs that were significant at FDR < 0.1 is in red. F, G GO pathways that were FDR < 0.1 grouped by category. Values left of zero corresponds to number of significant pathways with negative NES. Values right of zero corresponds to number of significant pathways with positive NES

Similar to the gene set enrichment analyses presented earlier, there was a mix of concordance and discordance in the direction of effect of GO terms implicated by the DE genes. The clustered heatmap in Figure 6E depicting Normalized Enrichment Scores for each analysis showed little consistent pattern of concordance between them, except to note that the sample groups clustered together more closely than the variant groups. The discordance in direction of effect for APOE variants across sample groups was also apparent (1st and 3rd column of heatmap), consistent with the inverted gene expression relationship from Figure 6A. More generally, the patterns observed in comparing concordance between pairs of columns in the heatmap are consistent with the scatter plots by inspection.

There was a surprising number of significantly enriched GO terms associated with APOE risk carriers in the CTE-L+RHI group in Figure 6F. No other analysis conducted by this study found so many significant results in this sample group, which was notable considering this was the sample group with the smallest sample size. Many GO terms in various categories, most notably neuron, are negatively associated with APOE risk carriers, meaning neuronal gene expression overall was decreased in CTE-L+RHI individuals who have the APOE risk allele. In contrast to the results from APOE, the pattern of biological processes implicated by TMEM106B were more consistent with other comparisons made in this study, namely that CTE-H exhibited the majority of the significantly associated GO terms which were primarily composed of decreased neuron and increased immune/inflammation categories (Figure 6G).

Validation of associations with qPCR

To provide orthogonal validation of these findings, we examined previously generated qPCR expression quantification on an a priori set of 33 genes known to be implicated in CTE (full gene list in Additional file 2: Table S2) in a subset of 54 participants (37 CTE-H, 9 CTE-L, and 8 RHI). These genes were chosen based on our previous investigations as being involved more generally in neurodegenerative disease processes and as such many were not significantly DE in our mRNASeq CTE analyses (see Additional file 3: Table S3). We therefore focused on comparing the direction of effect (i.e. log2 fold change) between the qPCR and DE genes to assess concordance and tabulated the results into confusion matrices found in Table 4.

Table 4 DE genes vs qPCR log2 fold changes for all five models reported in this study

The level of agreement across sample groups and models between qPCR and mRNASeq DE varied. Some comparisons showed very little agreement in the direction of effect, particularly the CTE-L versus RHI and total years of play models. Note from the mRNASeq analysis of total years of play above (Figure 4) that there were very few DE genes implicated and very little agreement on the direction of effect, which was consistent with the results of Table 4b. On the other hand, some comparisons show very high concordance, the most noteworthy being those for AT8 (Table 4c). Here the unexpected inverse relationship of CTE-L+RHI and CTE-H with AT8 (see Figure 5A) was also observed; note most genes are decreased and increased in CTE-L+RHI and CTE-H, respectively, and the Fisher’s Exact test for CTE-L+RHI attained significance and CTE-H trended toward significance. The Spearman correlation of log2 fold changes for AT8 are modest but significant at p-value < 0.05. The TMEM106B risk allele comparisons (Table 4e) also show substantial agreement. Overall, the concordance of results from qPCR and mRNASeq was remarkable, especially considering the genes were chosen independently.

Comparative analysis

To aid in summarizing results presented in this study, subpanels of GO category enrichment for each of the five primary analyses were included in Figure 7. These plots depict the number of significantly enriched GO terms grouped by high level category as in each analysis specific figures presented earlier. The subfigures have been annotated to emphasize several salient patterns observed across analyses. Increased immune/inflammation processes were implicated in all comparisons involving CTE-H except with APOE risk. CTE versus RHI and total years of play analyses (Figure7A, C) were nearly absent of neuronal processes, while in AT8 and TMEM106B risk comparisons, neuronal processes were substantially decreased for CTE-H (Figure7B, D). Comparisons with CTE-L had very few enriched GO categories except for APOE risk, where there was a substantial decrease in immune/inflammatory categories.

Fig. 7
figure 7

Enriched GO term categories for all five primary analyses presented in this work as found in previous figures. Increased inflammation was associated with all CTE-H analyses except APOE risk carriers, and decreased neuronal processes were only associated with AT8 and TMEM106B risk. A CTE versus RHI, B AT8, C total years of play, D TMEM106B risk carriers versus non-risk carriers, and E APOE risk carriers versus non-risk carriers. Dashed boxes and bolded text are annotated to aid in interpretation


This study presents the largest transcriptome-wide gene expression analysis of post-mortem human brain in CTE to date. We set out to identify gene expression patterns observed in low- (CTE-L) and high-stage (CTE-H) CTE as a model of early versus late disease as they relate to presence of pathology (i.e. case vs. RHI), the amount of repetitive head impacts exposure (i.e. total years of play), quantitative measures of tau pathology in affected brain tissue, and genetic variants known to influence CTE symptoms and severity. We showed that there were substantial gene expression effects in individuals with severe CTE across all of these axes while the comparisons involving RHI and CTE-L had fewer significant results. Case versus control comparisons yielded mixed concordance in the direction of effect of implicated pathways, while processes associated with total years of play were in good agreement between low compared with high pathology sample groups. In contrast, the processes associated with the amount of tau pathology had almost exactly opposite directions of effect between these groups. Furthermore, the DE genes associated with APOE and TMEM106B risk variants did not have a high degree of overlap and suggest distinct processes.

In every comparison except APOE risk carriers, substantial neuroimmune and neuroinflammatory processes were positively associated with conditions that increase risk of severe disease in CTE-H subjects. Processes in the innate immune response subcategory were the most numerous, but many different components of the immune response, including cytokine activity and apoptosis, were also well represented in these comparisons. While some inflammatory processes implicated when comparing CTE-H APOE risk carrier groups were consistent with these findings with disease, RHI, tau pathology, and TMEM106B risk, the small number of processes identified may suggest that the effects of APOE risk variants largely precede the development of severe disease. This idea is further supported by the finding that the comparison of APOE risk carriers in CTE-L+RHI samples produced many significantly enriched gene sets, whereas no other comparison implicated nearly as many for this sample group. Curiously, these processes, most notably immune/inflammation, appear reduced in APOE risk carriers relative to non-risk carriers in this sample group, forming an almost exact mirror image of the increased processes observed in CTE-H. This suggests the possibility that the APOE risk variant may impair these processes in early disease, which in turn might predispose risk carriers to developing more severe pathology over time than non-risk carriers. An alternative explanation might be that the APOE risk allele causes an aberrant increase in the inflammatory state of the brain which is then compensated by homeostatic mechanisms that are competent in early disease but become less effective over time, leading to dramatically increased inflammation in late stage disease.

A second noteworthy pattern we observed is that neuronal gene expression decreases with increasing tau pathology and with the presence of TMEM106B risk. It is interesting to note that comparatively few neuronal processes are implicated when comparing CTE-H and CTE-L to RHI or when examining gene expression associated with total years of play, and almost none were associated with APOE risk status. Synapse was the neuronal subcategory with the largest number of enriched gene sets across these comparisons, followed by neuronal development. With the exception of CTE-L versus RHI, synaptic genes are decreased with increasing disease risk factors, suggesting that synaptic density is reduced in affected tissues, as has been recently shown in mouse models of RHI [29]. TMEM106B risk, in particular, was associated with decreased expression of many neuronal process pathways in CTE-H, which is consistent with previous reports demonstrating an association with decreased neuronal density [30]. In addition, the previously demonstrated increased variation in synaptic density in CTE suggests that synapse turnover may be a feature of RHI and CTE and may be associated with greater variation in synapse related gene expression [31].

The amount of head injury exposure as measured by total years of play appears to have only a weak effect on individual gene expression across individuals, but the biological processes they implicate are relatively consistent when comparing exposure groups. A given individual’s exposure to repetitive head impacts may have occurred many years prior to death when these samples were collected, and while we adjusted for the effects of age at death as well as possible, this duration paired with diverse life experiences and disease progression could easily distort any consistent signal on the gene level. The consistency between sample groups on the biological process level is therefore noteworthy and suggests there may indeed be a common response to exposure detectable even many years after exposure (Table 5).

Table 5 Sample statistics

In contrast with years of play, tau pathology in the post-mortem samples captures an immediate condition of the brain when gene expression is measured. Indeed, the presence of tau pathology appears to have a strong effect on gene expression in both CTE-L+RHI and CTE-H sample groups. Thousands of genes and hundreds of enriched GO term gene sets are associated with AT8 immunopositivity, primarily implicating inflammatory and neuronal processes. The striking and nearly complete inverse relationship between both gene expression fold changes and gene set enrichment direction suggests a qualitatively different effect of pathology, or precursors to development of pathology, in individuals with low compared with high levels of exposure. Since the amount of tau pathology is relatively low but increased in CTE-L compared with RHI, this suggests a fundamentally different process affects these groups than in severe disease. With the exception of inflammation, development, and ECM/membrane, all biological processes trend as negatively associated with increasing amounts of tau. We recently found similar gene expression changes comparing sulcal versus gyral crest in dorsolateral prefrontal cortex in CTE subjects [32].

The concordant effects of years of play and the discordant effects of tau explain the mixed concordance observed when comparing case versus RHI. Because the amount of tau pathology and years of play vary among individuals in both low and high exposure groups, we may therefore interpret the case versus RHI comparison as a convolution of these disparate effects. Although years of play and pathology are highly correlated, this motivates considering these as separate effects that modulate disease expression, which may have important therapeutic implications.

The relative lack of strong gene expression signal in APOE risk carriers is somewhat surprising, considering evidence of the role this gene is thought to play in the development of tau pathology in CTE and AD. Also surprising is the lack of overlap of DE genes and the discordance of the direction of effect between APOE and TMEM106B risk carriers versus non-risk carriers. Risk alleles of both genes increase the risk of developing severe CTE pathology, but the low overlap of the DE genes and biological processes suggests that the molecular mechanisms underlying this increased risk are distinct. As in the AT8 comparison, there is an inverse relationship between the nominally significant DE genes comparing CTE-L+RHI versus CTE-H APOE risk groups as well as TMEM106B versus APOE in the CTE-L+RHI sample group. The similarity of expression profiles between tau and TMEM106B does not appear to be driven by the amount of tau pathology, as the distribution of tau in the TMEM106B risk carriers and non-carriers does not significantly differ. The reasons for this are unclear, but further suggest a different molecular process is at play when considering the effects of these genes in combination with tau and exposure.

The relatively small number of FDR significant DE and GSEA results for the CTE-L versus RHI and CTE-L+RHI analyses in the study is likely due in part to low statistical power on account of relatively small sample size, but, since the results are sparse, the influence of false positives is minimal. On the other hand, the relative paucity of results compared with severe disease comparisons may indicate that early molecular changes in the brain of those with CTE are in fact similar to those who experienced a similar level of repetitive trauma exposure but lack specific CTE pathology. This would suggest a model where the pathology itself is a driver of molecular changes in later disease stages, which is supported by the observation that gene expression in many genes is associated with the amount of tau pathology.

While our understanding of the antemortem precedents in the CTE brain remains in infancy, it is important to consider these findings within the broader context of basic biological and clinical studies in neurotrauma. Gene expression changes are strongly influenced by epigenetic alterations such as histone modifications and DNA methylation in response to traumatic brain injury [33, 34]. Basic research into these alterations in brain have been investigated in rodent models, and a few studies have examined antemortem human biofluids including blood and cerebrospinal fluid. No studies to date have investigated epigenetic changes in postmortem CTE brain. The lack of availability of antemortem brain tissue is a fundamental limitation of this research, but biofluid correlates of the disease processes shown in this study would provide strong evidence of the relevance of these findings and is the subject of future studies. Such correlates are badly needed, as they could inform diagnostic and prognostic assessments and therapeutic approaches where no reliable markers are currently known [35].


In conclusion, these data present compelling evidence of widespread gene expression changes in late stage CTE and less pronounced changes in early disease and those with repetitive head impacts exposure but without CTE pathology. Furthermore, these results suggest individuals with low exposure and little to no pathology experience a different set of molecular processes than those with late disease as well as a distinct association with tau pathology and genetic risk factors. Therapeutics and biomarkers developed as targets with late-stage signatures might not be effective for individuals early in the progression of disease. Future studies should endeavor to further characterize the active disease process in younger individuals with less exposure.

Availability of data and materials

The high throughput sequencing data used in this study is publicly available on Gene Expression Omnibus accession GSE193407. All code and processed files are accessible at



Chronic traumatic encephalopathy


Apolipoprotein E


Transmembrane protein 106B


Differential expression


Gene ontology


  1. Mez J, et al. Duration of american football play and chronic traumatic encephalopathy. Ann Neurol. 2020;87:116–31.

    Article  PubMed  Google Scholar 

  2. Farrer LA, et al. Effects of age, sex, and ethnicity on the association between apolipoprotein E genotype and Alzheimer disease: a meta-analysis. JAMA. 1997;278:1349–56.

    Article  CAS  PubMed  Google Scholar 

  3. Atherton K, Han X, Chung J, Cherry JD, Baucom Z, Saltiel N, Nair E, et al. Association of APOE genotypes and chronic traumatic encephalopathy. JAMA Neurol. 2022;79(8):787–96.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Nelson PT, et al. Reassessment of risk genotypes (GRN, TMEM106B, and ABCC9 variants) associated with hippocampal sclerosis of aging pathology. J Neuropathol Exp Neurol. 2015;74:75–84.

    Article  CAS  PubMed  Google Scholar 

  5. Van Deerlin VM, et al. Common variants at 7p21 are associated with frontotemporal lobar degeneration with TDP-43 inclusions. Nat Genet. 2010;42:234–9.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Jun G, et al. A novel alzheimer disease locus located near the gene encoding tau protein. Mol Psychiatry. 2016;21:108–17.

    Article  CAS  PubMed  Google Scholar 

  7. Satoh J-I, et al. TMEM106B expression is reduced in Alzheimer’s disease brains. Alzheimers Res Ther. 2014;6:1–14.

    Article  Google Scholar 

  8. Cherry JD, et al. Variation in TMEM106B in chronic traumatic encephalopathy. Acta Neuropathol Commun. 2018;6:115.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Huang Y, Mahley RW. Apolipoprotein e: structure and function in lipid metabolism, neurobiology, and alzheimer’s diseases. Neurobiol Dis. 2014;72(Pt A):3–12.

    Article  CAS  PubMed  Google Scholar 

  10. Baggen J, et al. Genome-wide CRISPR screening identifies TMEM106B as a proviral host factor for SARS-CoV-2. Nat Genet. 2021;53:435–44.

    Article  CAS  PubMed  Google Scholar 

  11. Zhou X, et al. Loss of TMEM106B leads to myelination deficits: implications for frontotemporal dementia treatment strategies. Brain. 2020;143:1905–19.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Nicholson AM, Rademakers R. What we know about TMEM106B in neurodegeneration. Acta Neuropathol. 2016;132:639–51.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Tropea TF, et al. TMEM106B effect on cognition in Parkinson disease and frontotemporal dementia. Ann Neurol. 2019;85:801–11.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. McKee AC, et al. The spectrum of disease in chronic traumatic encephalopathy. Brain. 2013;136:43–64.

    Article  PubMed  Google Scholar 

  15. Alosco ML, et al. Characterizing tau deposition in chronic traumatic encephalopathy (CTE): utility of the McKee CTE staging scheme. Acta Neuropathol. 2020;140:495–512.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Mez J, et al. Assessing clinicopathological correlation in chronic traumatic encephalopathy: rationale and methods for the UNITE study. Alzheimers Res Ther. 2015;7:62.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Cherry JD, et al. Microglial neuroinflammation contributes to tau accumulation in chronic traumatic encephalopathy. Acta Neuropathol Commun. 2016;4:112.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for illumina sequence data. Bioinformatics. 2014;30:2114–20.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. FastQC. Accessed: 2016–9–27.

  20. Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 2016.

  21. Harrow J, et al. GENCODE: the reference human genome annotation for the ENCODE project. Genome Res. 2012;22:1760–74.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Dobin A, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21.

    Article  CAS  PubMed  Google Scholar 

  23. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550.

    Article  PubMed  PubMed Central  Google Scholar 

  24. The Gene Ontology Consortium. The gene ontology resource: 20 years and still GOing strong. Nucleic Acids Res. 2019;47:D330–8.

    Article  Google Scholar 

  25. Subramanian A, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102:15545–50.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Korotkevich G, Sukhov V, Sergushichev A. Fast gene set enrichment analysis (2019). bioRxiv.

  27. Köster J, Rahmann S. Snakemake–a scalable bioinformatics workflow engine. Bioinformatics. 2012;28:2520–2.

    Article  PubMed  Google Scholar 

  28. Kluyver T, Ragan-Kelley B, Pérez F, Granger B, Bussonnier M, Frederic J, Kelley K, et al. Jupyter Notebooks – a Publishing Format for Reproducible Computational Workflows. In: Loizides F, Scmidt B, editors., et al., Positioning and Power in academic publishing: players agents and agendas. IOS Press; 2016. p. 87–90.

    Google Scholar 

  29. Sloley SS, et al. High-frequency head impact causes chronic synaptic adaptation and long-term cognitive impairment in mice. Nat Commun. 2021;12:2613.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Li Z, Farias FHG, Dube U, Del-Aguila JL, Mihindukulasuriya KA, Fernandez MV, Ibanez L, et al. The TMEM106B FTLD-protective variant, rs1990621, is also associated with increased neuronal proportion. Acta Neuropathol. 2020;139(1):45–61.

    Article  CAS  PubMed  Google Scholar 

  31. Warling A, et al. Putative dendritic correlates of chronic traumatic encephalopathy: a preliminary quantitative golgi exploration. J Comp Neurol. 2021;529:1308–26.

    Article  CAS  PubMed  Google Scholar 

  32. Cherry JD et al. Differential gene expression in the cortical sulcus compared to the gyral crest within the early stages of chronic traumatic encephalopathy. Free Neuropathol 2021;2.

  33. Dagra A, Barpujari A, Bauer SZ, Olowofela BO, Mohamed S, McGrath K, Robinson C, Robicsek S, Snyder A, Lucke-Wold B. Epigenetics of Neurotrauma. Neurology. 2022;2(2):42–7.

    CAS  PubMed  Google Scholar 

  34. Zima L, West R, Smolen P, Kobori N, Hergenroeder G, Choi HA, Moore AN, Redell JB, Dash PK. Epigenetic modifications and their potential contribution to traumatic brain injury pathobiology and outcome. J Neurotrauma. 2022;39(19–20):1279–88.

    Article  PubMed  Google Scholar 

  35. Sriram S, Lucke-Wold B. Advances research in traumatic encephalopathy. Biomedicines. 2022.

    Article  PubMed  PubMed Central  Google Scholar 

Download references


We gratefully acknowledge all the individuals whose participation and contributions made this work possible.


This work was supported by the United States (U.S.) Department of Veterans Affairs, Veterans Health Administration, BLRD Merit Awards (I01BX005933, I01BX005161), Career Development Award (BX004349); National Institute of Aging (RF1AG054156); National Institute of Neurological Disorders and Stroke (U54NS115266, U01NS086659); National Institute of Aging Boston University AD Research Center (P30AG072978); and the Concussion Legacy Foundation. This work was also supported by unrestricted gifts from the Andlinger Foundation and WWE.

Author information

Authors and Affiliations



ATL—co-conceived, designed, and oversaw completion of the study, wrote manuscript. FA—performed all analyses, contributed to manuscript. NA—brain sample collection, curation, and pathology. JC—contributed pathology data, manuscript contributions. JM—critical manuscript and study feedback. AM—brain bank. TDS—co-conceived and oversaw completion of study, critical manuscript feedback. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Adam Labadorf or Thor D. Stein.

Ethics declarations

Ethics approval and consent to participate

This study has been approved by the VA Bedford Healthcare System and Boston University School of Medicine Institutional Review Boards. Written consent for donation of all brain tissue was provided by the next of kin contact, legally authorized representative, or the living subject themselves antemortem through the Brain Donation Registry. Consent included use of tissue for research purposes.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1

. Table S1. GO Term Category Annotations. Manual annotations of GO terms to high level categories and subcategories used in this study.

Additional file 2

. Table S2. qPCR expression values for validation genes. Delta CT values for genes from qPCR analysis for genes examined for validation of mRNASeq findings.

Additional file 3

. Table S3. Overlap of mRNASeq and qPCR genes for DE models. Summary table of genes significant that overlap in mRNASeq or qPCR for each DE modelconsidered in study.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Labadorf, A., Agus, F., Aytan, N. et al. Inflammation and neuronal gene expression changes differ in early versus late chronic traumatic encephalopathy brain. BMC Med Genomics 16, 49 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: