We ascertained forty-four pediatric CMI patients in order to establish disease subtypes using biological data consisting of patient blood and dura whole genome expression profiles, as well as radiological data comprised of PF traits. Sparse k-means clustering analyses were performed using the biological and radiological data both individually and collectively. The latter analysis required us to extend the original sparse k-means method to accommodate multiple datasets from different sources. Identified subtypes were compared across analyses for class membership agreement. Subtypes were also fully characterized for better interpretability, which included the identification of enriched biological pathways, cranial base discriminating features, and, to a more limited extent, correlated clinical traits. All clustering analyses resulted in the identification of significant patient classes, with the dura and blood individual clustering analyses showing the strongest evidence. Further characterization of these classes led to the identification of several factors that may contribute to disease heterogeneity within a pediatric CMI population.
Prior to focusing on our specific findings, we present a general discussion surrounding the results obtained from both the standard and integrative sparse k-means clustering methods. First, limited class membership concordance was observed across most analyses. This was not particularly surprising as the source of each dataset differed substantially (blood vs. dura vs. PF trait). It does, however, suggest that the use of more readily available patient data, such as PF traits, would not be sufficient to identify subtypes established using biological data alone. In addition, the class membership agreement between the dura and blood clustering analyses was extremely low. While each of these analyses identified biological classes, the features that accounted for a large proportion of variation in the data not only differed between the analyses but led to the partitioning of patients into different classes. Specifically, the biological relationships observed among the patients were strongly dependent on the biological source used. While this could be due to our limited sample size or the particular clustering algorithm implemented, it could also be due to the fact that one or both of these tissues were not able to strongly identify true disease subtypes or that each provided us with very distinct information only relevant to the specific tissue assessed. Continued collection of patient samples and further investigation of results would be necessary to resolve these possibilities.
In addition to the individual dataset clustering, we performed integrative clustering. The purpose of these analyses was to place more weight on features within each dataset that act collectively to discriminate patient classes. For example, the Dura-Cranial analysis should identify genes that are relevant to cranial base morphology and able to, in combination with those PF traits, partition patients into distinct subtypes. While this was an attractive a priori approach, the findings did not provide significant insight into disease heterogeneity. In general, the integrative analyses were less significant than the individual clustering analyses. In addition, two out of the three analyses (Blood-Cranial and Blood-Dura-Cranial) did not produce a strong biological interpretation as evident from the lack of enrichment observed for any biological pathway. We thus focus our discussion on the individual clustering analyses, particularly on our two most significant findings: the dura and blood whole genome expression clustering results.
Sparse k-means clustering using blood whole genome expression data alone resulted in our most significant finding. Using the top 100 ranked genes, we identified at least nominally significant enrichment for five biological pathways, including the ribosome, spliceosome, proteasome, RNA degradation, and oxidative phosphorylation pathways. Interestingly, we observed a down-regulation of the ribosome (involved in protein synthesis), spliceosome (involved in splicing of pre-mRNA), and proteasome (involved in the degradation of proteins) pathways in the group of patients that also had older fathers, a smaller basion to reference line, a larger tentorium, and a flatter cranial base as indicated by a larger Boogaard’s angle (Figure
1). One potential link between paternal age and the ribosome comes from a study which examined DNA methylation in rat liver and germ cells and found that a region of the ribosomal DNA (rDNA) locus was preferentially hypermethylated with increased age in both the sperm and liver
. Hypermethylation of the rDNA may compromise function and affect protein synthesis
. While this study did not directly examine whether this age-dependent methylation change could be passed on to and remain in offspring, aberrant methylation was suggested as a potential mechanism contributing to paternal age related abnormalities in offspring
. In a study examining whole genome expression profiles from peripheral blood lymphocytes in children with ASD and controls, decreased variance in the distribution of gene expression levels in children with ASD as well as control individuals with older fathers was reported
. This decreased variance was suggested to be due to a global down-regulation of transcriptional regulation
. Thus, paternal age appears to be associated with global changes in transcriptional regulation and protein synthesis, which we may be detecting in our blood classes.
Sparse k-means clustering using dura whole genome expression alone resulted in our second most significant finding. Within class 2 we observed a reduction in area3, the supraoccipital bone, and the opisthion to reference line, as well as down-regulation of all genes identified in the enriched dorso-ventral axis formation pathway and over half of the genes identified in pathways in cancer. While these pathways are very general and don’t appear immediately relevant, several of the genes involved in these pathways have multiple functions, some of which are applicable to CMI biology.
Two of the genes identified within the dorso-ventral axis formation pathway, ETS1 and ETS2, are involved in osteoblast differentiation and the formation of bone
. While some studies have suggested that overexpression of ETS2 is associated with the craniofacial abnormalities observed in Down syndrome patients
, Hill and colleagues found that, in general, trisomy 16 mice (model for human trisomy 21, Down syndrome) with or without an extra copy of ETS2 produced comparable craniofacial abnormalities
. However, this study did observe a more severely shortened occipital bone in trisomy 16 mice with only two (16% reduction) versus three copies of ETS2 (4% reduction) when compared to euploid mice. This suggests a more complex interplay among genes present within the Down syndrome interval. Another potential link between ETS1/2 and cranial abnormalities comes from a recent report that found that reduced expression of Ets2 repressor factor (ERF) causes complex craniosynostosis characterized by premature fusion of multiple cranial sutures, craniofacial abnormalities, language delay, and CMI
. Although not one of our most significant findings, reduced expression of ERF was also observed in class 2 (adjusted p = 0.04). The third gene identified within the dorso-ventral axis formation pathway is NOTCH4. NOTCH4 does not appear to have a known, major role in bone formation; however other NOTCH genes are involved in the proliferation and maturation of chondrocytes
Extending our investigation of the biological differences between these two dura classes, we examined genes outside of these pathways and identified several genes that play key roles in biological processes related to endochondral ossification, which is the process by which bones in the cranial base are formed. Although these genes were not within our top 100 ranked genes identified from the clustering analysis, they are still among some of the most significantly differentially expressed genes between the two classes. Examples of these include the runt-related transcription factor 2 (RUNX2), runt-related transcription factor 3 (RUNX3), collagen, type II, alpha 1 (COL2A1), parathyroid hormone 1 receptor (PTH1R), and notch 1 (NOTCH1)
. With the exception of COL2A1, all of these genes were down-regulated in class 2 compared to class 1. Another strong biological candidate is transforming growth factor, beta receptor II (TGFBR2) which was expressed at significantly lower levels in class 2 patients. A previous study generated mice with a conditional deletion of TGFBR2 in COL2A1 expressing cells and found defects in the cranial base and vertebrae
. Two additional studies in mice demonstrated that the conditional inactivation of TGFBR2 specifically in mesoderm-derived cells results in defects in the supraoccipital bone and C1 vertebra and meningoencephalocele
, whereas TGFBR2 inactivation in neural crest cells leads to calvaria defects and cleft palate
. This report has particular relevance as a shortened supraoccipital bone and reduced TGFBR2 expression was observed in class 2 patients. Taken together, these findings implicate several strong biological candidates pertinent to endochondral ossification and suggest that the dura mater may be a reasonable tissue to examine for CMI.
Although encouraged by our findings, there are several important limitations to present. First, our relatively small sample size resulted in limited power for the clustering analysis and follow-up characterization. Obtaining clinical tissue samples from CMI patients is extremely challenging as they are not readily available thus greatly inhibiting the collection of a large sample size. Despite our limited power, we did identify significant underlying structure within the datasets, some of which appeared to be biologically relevant. However, many of the analyses would not have withstood a correction for multiple testing. As one of the primary purposes of this study was to generate hypotheses, we feel it appropriate to examine our results without such an adjustment noting that future work and continued ascertainment would be needed to validate any findings. Another limitation of our study relates to the acquisition of disease relevant patient tissue for gene expression analysis. Our study is not unique in its challenges to ascertain appropriate biological samples for a developmental disease, particularly with respect to developmental stage. Thus, we cannot disregard the implications of this when interpreting our findings. However, we did identify several strong biological candidates from the dura analysis providing additional support for its potential relevance. Additionally, due to the small size of the dura samples acquired and the fact that dura is largely comprised of collagen bundles, low RNA yield was obtained resulting in a limited number of samples with RNA remaining for validation experiments. Moreover, due to the unique nature of this study, we are currently unable to replicate our findings using an independent study population. While further validation and replication of our results is vital, our goal by reporting these initial findings is to encourage additional research in this understudied field and the generation of similar datasets to be used for replication and further investigation into the underlying disease biology. Finally, as the underlying biologic mechanism for CMI is currently unknown, it is unclear which tissue at this stage in development would be most biologically relevant. Thus, future studies using other tissue types, including bone and/or the pia-arachnoid layer may further elucidate the biology of CMI and perhaps correlate better with CMI symptoms or structure.