ASD, like other complex disorders such as diabetes and heart disease, is almost certainly associated with the effect of multiple genes as well as environmental factors. To date, no more than 20% of cases have been linked to structural genomic variants such as de novo CNVs and mutations, and monogenic syndromic disorders. To further understand the heterogeneity of ASD genetic architecture reflected in the blood transcriptome, we developed a novel approach using outlier statistics. To demonstrate the plausibility, we used two independently collected data sets of ASD and controls. Only ~30% of cases shared molecular signatures including neural development (29% of cases), NO signaling pathway (29% of cases), and skeletal development (27% of cases). These pathways could not be identified with group comparison or gene-level outlier methods, and the significantly perturbed cases for these pathways were not identical. Overall, our approached identified 50% of cases but only 8% of controls as outliers in at least one of these pathways.
To date, most emergent biological themes in ASDs have fallen into one of three categories: neuroanatomical, systems, and molecular and cellular . Neuroanatomical observations of altered brain growth patterns [61–63] and minicolumnopathy  are the most reproducible clinical signatures of ASDs. Pathways affecting cellular proliferation such as the PI3K-AKT-mTOR pathway have been hypothesized to affect abnormal brain growth in ASD, but no concrete link between such pathways and brain growth patterns exists as yet . At the systems level, evidence has accumulated for functional alterations in white matter tracts [65–68] and overall imbalance between excitation and inhibition in the brain [69–72]. Cellular and molecular themes have converged on the function and structure of the synapse. Rare or de novo, deleterious mutations were found in ionotropic glutamate receptors [49, 73], voltage-gated sodium channels [49, 50, 74], and voltage-gated calcium channels [75, 76]. Neurexins and neuroligins are involved in neuronal adhesion and have been heavily implicated in ASDs by cytogenetic analysis , CNV studies [8, 9, 11, 49] and knockout mouse models [70, 72, 78]. Similarly, candidate genes SHANK2 and SHANK3 code for scaffold proteins in the postsynaptic density. Other ASD candidate genes with protein products in the postsynaptic density include FMR1 and associated genes MET, PTEN, TSC1, TSC2, and NF1, all of which are involved in translation, as well as genes involved in protein degradation such as UBE3A, PARK2, RFWD2, FBXO40, and USP7. In summary, anatomical, physiological, mouse-model, and human genetic studies have implicated brain growth, white matter connectivity, synaptic transmission, and the structure of the synapse as promising biological themes in ASD.
In this context, our discovery of three pathways related to neural development—axonogenesis, neurite development, and neuron development, which we collapsed together for analysis—was notable as defects in early neurodevelopmental processes such as neuronal survival, differentiation, migration and synaptogenesis may cause neurobiological abnormalities in ASD . The NO signaling pathway contains genes involved in the glutamate NMDA receptor, as well as in the calcium/calmodulin and NO mediated second messenger systems that regulate long-term potentiation and other activity dependent developmental processes. Moreover, neurogenesis was dysregulated in a subgroup of cases from an independently collected cohort. Specifically, in the Simons data set we identified 20.8% of cases and 12.0% of unaffected family members as neurogenesis outliers.
Outlier samples also showed gene-level differences compared to non-outlier samples. By comparing outliers and non-outliers, we could identify differentially expressed genes that were specific to outlier subgroups. Among these, FEZ1 was recently shown to interact with DISC1, a susceptibility gene for schizophrenia and other mental disorders . In that paper, the authors show that fasciculation and elongation protein zeta-1 (FEZ1) acts together with Disrupted-in Schizophrenia 1 (DISC1) to regulate dendritic growth in the hippocampus of adult mice. Interestingly, DISC1 is also an ASD candidate gene; variation in DISC1, located at 1q42, was correlated with autism in a Finnish cohort . While FEZ1 was not differentially expressed between cases and controls overall in the TGen data set (P = 0.238), we were able to detect differential expression of FEZ1 in a subset of cases using our heterogeneity-based approach (P = 1.85 × 10− 7, q-value = 5.95 × 105). Similarly, SPON2, whose protein product spondin-2 was shown to direct the development of hippocampal neurons in rats , was highly over-expressed in neuron development outliers (P = 2.29 × 10− 19, q-value = 4.48 × 10− 15, differential expression rank = 1/21,184) but this significance was diluted at the group difference level (P = 0.000946, q-value = 0.0430, differential expression rank = 292/21,184). Interestingly, we could recover most outlier cases from the distributions of SPON2 and FEZ1 alone (Figure 2B). While gene-level analysis detected these two genes at FDR < 5%, neuron development was not overrepresented among the outlier genes overall, indicating that other genes in the pathway also played an important role.
Our method and the two data sets used in our study had several limitations. Due to their incompleteness and generality, the pathway definitions from MSigDB imperfectly describe the underlying biology of ASD. Nevertheless, we chose to use these definitions as opposed to data-driven pathways to avoid over-fitting. Clinical definitions of ASD are constantly changing, and include a broad swath of individuals with heterogeneous disorders; while this was the motivation for our analysis, it is also conceivable that misdiagnosis due to overly inclusive criteria led to the inclusion of false-positive outliers in our study. It is possible that genetically distinct cohorts were recruited for the two data sets, as samples were collected at two geographically distant study sites with different local ancestral structures. Although we tried to reduce technical variation such as batch effects in each data set, it is possible that some technical artifacts remained. There will also inevitably be technical variability between two genomic profiling facilities and microarray platforms. Therefore, it is unsurprising that we were not able to replicate all of our results: specifically, we were unable to identify NO signaling and skeletal development signatures in the Simons cohort, and the RANKL pathway, while perhaps related to skeletal development, was the top-ranking outlier-enriched pathway in the Simons data set but not significantly outlier-enriched the TGen data set. Finally, because we used blood gene expression profiles as a surrogate for studying genomic alterations in a neurodevelopmental disorder, the difference in transcriptomic repertoire between blood and brain might have limited us to characterizing only 50% of samples in our results. Because of these limitations, this study and its results should be considered exploratory, showing the potential benefits of a novel approach, but not conclusive.
A large number of samples from different cohorts and the integration of genetic and transcriptomic profiles are essential for the identification of subgroups that may share clinical features, treatment responses, and prognostic characteristics . Along with the alarming increase in ASD prevalence in the last few decades has come an accumulation of genetic and genomic profiling data [42, 49, 50, 74, 81], and yet the group difference between ASD and non-ASD is not obvious by any measure. We characterized 50% of cases with specific genomic signatures using an outlier-based approach, which will be strengthened by the integration of different modalities of genomic data such as whole-genome and whole-exome sequences. Looking farther into the future, true personalized medicine will only be achieved when individual genetic and genomic characteristics are combined with clinical and other phenotypic information.