Whole-exome sequencing is a powerful approach for establishing the etiological diagnosis in patients with intellectual disability and microcephaly

Clinical and genetic heterogeneity in monogenetic disorders represents a major diagnostic challenge. Although the presence of particular clinical features may aid in identifying a specific cause in some cases, the majority of patients remain undiagnosed. Here, we investigated the utility of whole-exome sequencing as a diagnostic approach for establishing a molecular diagnosis in a highly heterogeneous group of patients with varied intellectual disability and microcephaly. Whole-exome sequencing was performed in 38 patients, including three sib-pairs, in addition to or in parallel with genetic analyses that were performed during the diagnostic work-up of the study participants. In ten out of these 35 families (29 %), we found mutations in genes already known to be related to a disorder in which microcephaly is a main feature. Two unrelated patients had mutations in the ASPM gene. In seven other patients we found mutations in RAB3GAP1, RNASEH2B, KIF11, ERCC8, CASK, DYRK1A and BRCA2. In one of the sib-pairs, mutations were found in the RTTN gene. Mutations were present in seven out of our ten families with an established etiological diagnosis with recessive inheritance. We demonstrate that whole-exome sequencing is a powerful tool for the diagnostic evaluation of patients with highly heterogeneous neurodevelopmental disorders such as intellectual disability with microcephaly. Our results confirm that autosomal recessive disorders are highly prevalent among patients with microcephaly.


Background
Microcephaly, a disproportionally small head size defined as occipitofrontal circumference at or below −3 standard deviations (SD), is an important neurological sign usually associated with developmental delays and intellectual disability [1]. Microcephaly can be congenital or may develop later after birth. It can occur as a more or less isolated finding or as one of the features of a more complex syndrome. To date, according to the London Medical Database [2], more than 800 syndromes with microcephaly have been described, reflecting an enormous number of possible diagnoses. Microcephaly may stem from a wide variety of genetic or non-genetic conditions and is considered to be a consequence of abnormal brain development. Non-genetic causes include foetal alcohol exposure, perinatal infections, asphyxia or haemorrhage. Microcephaly can also occur in patients with inborn errors of metabolism, but this explains only a small proportion of the total cases [3]. In the majority of patients a genetic cause is suspected [4]. Chromosomal abnormalities may account for 15-20 % of patients [5]. In a retrospective study of 680 children with microcephaly [6], a specific genetic cause was detected in 15.3 % of them, with numerical chromosome aberrations accounting for 6.8 % and microdeletions/duplications and monogenetic disorders accounting for the other 8.5 %. Over the last few years the number of genes related to microcephaly has been rising, primarily as a result of the increased use of whole-exome sequencing (for an overview see [7]). This clinical and genetic heterogeneity represents a major diagnostic challenge. Although the presence of additional clinical findings may aid in identifying a specific cause in some cases, the majority of patients with microcephaly remain undiagnosed. Yet an early and specific diagnosis is important because it can provide relevant information about disease prognosis, appropriate medical or supportive care, reproductive consequences for parents and other family members, and prenatal diagnostic options. It may also preclude unnecessary further, and possibly invasive, diagnostic tests and evaluations.
The diagnostic work-up of microcephaly is extensive and includes brain imaging, metabolic and ophthalmologic evaluations, skin or muscle biopsies, karyotyping or microarray analysis, and mutation analyses of microcephaly genes. However, the use of targeted approaches, such as candidate gene testing by Sanger sequencing, is rather limited and their diagnostic yield is generally low. In contrast, because whole-exome sequencing does not require a priori knowledge of the gene or genes responsible for the disorder under investigation [8], it has been proven to be a very effective technique for identifying new genetic causes of microcephaly [9,10].
In this study, we investigated the utility of whole-exome sequencing as a diagnostic approach for establishing a molecular diagnosis in a highly heterogeneous group of patients with microcephaly and intellectual disability.

Study subjects
We performed whole-exome sequencing in 38 patients, including three sib-pairs, with severe microcephaly and varied intellectual disabilities (Table 1). Patients were recruited, both prospectively and retrospectively, from the population of patients with intellectual disability referred to the genetics departments of the University Medical Centre Groningen and the Academic Medical Centre in Amsterdam. Informed consent was obtained from the participating families and the study protocol was approved by the Ethics Committee of the University Medial Centre Groningen.
The clinical characteristics of all 38 patients are summarized in Table 1. The average age of the patients was 10 years (range 0 to 57 years). The majority (87 %) were children, with only five patients over 18 years of age. Severe microcephaly was defined as an occipital frontal head circumference of at least 3 SDs below the age-related mean according to Dutch national reference curves. The mean occipital frontal head circumference Z-score for our patients was −4.5 SD (range −3 to −8 SD). Consanguinity was reported for the parents of six patients (patients# 16,23,27,28,29,35), including one of the three sib-pairs. Phenotype information was retrieved from medical records and provided by the referring physicians.
All patients had undergone genetic testing during their routine diagnostic work-up. In addition to wholeexome sequencing, the work-up may have included standard chromosome analysis, metabolic screening, or DNA sequencing of individual or multiple genes. On average, 2.8 DNA tests (range 0 to 9 tests) were performed per family (Additional file 1: Table S1). The most frequently tested gene was ASPM. This gene was analysed in seven (20 %) of the families.

Microarray analysis
SNP array analysis was performed in all patients prior to inclusion using the IlluminaHumanCytoSNP-12 v2.1 DNA Analysis BeadChip following the manufacturer's instructions (Illumina, San Diego, CA, USA). Normalized intensity and allelic ratios were analysed using GenomeStudio Data Analysis Software and the cnvPartition v3.1.6 algorithm to assess copy number variation. No causal copy number variants were found (data not shown). Homozygosity was determined as were, if possible, any shared haplotypes within a family. Shared homozygous stretches between the siblings with consanguineous parents were detected by selecting regions in which both siblings shared the same homozygous alleles; for the sib-pair with a recessive disorder we selected regions in which they shared the same combination of alleles, and for the sib-pair with a dominant disorder we selected regions in which they shared at least one allele at each SNP position. Stretches of homozygosity were detected within a single patient by selecting regions that only showed homozygous alleles.

Whole-exome sequencing
Library preparation was based on the SureSelect All Exon V4 bait protocol (Agilent Technologies, Inc., Santa Clara, CA, USA), and carried out on a PerkinElmer® SciClone NGS workstation (PerkinElmer, Inc., Waltham, MA, USA). Briefly, 3 micrograms of genomic DNA were randomly fragmented with ultrasound using a Covaris® instrument (Covaris, Inc., Woburn, MA, USA). Adapters were ligated to both ends of the resulting fragments. Fragments with an All lanes of sequence data were aligned to the human reference genome build b37, as released by the 1000 Genomes Project [11], using Burrows-Wheeler Aligner [12]. Subsequently the duplicate reads were marked. Using the Genome Analysis Toolkit (GATK) [13], realignment around insertions and deletions detected in the sequence data and in the 1000 Genomes Project pilot [11] was performed, followed by base quality score recalibration. During the full process the quality of the data was assessed by performing Picard [14], GATK Coverage and custom scripts. SNP and indel discovery was done using GATK Unified Genotyper and Pindel [15], respectively, followed by annotation using SnpEff [16]. This production pipeline was implemented using the MOLGENIS compute [17] platform for job generation, execution and monitoring. For each sample, a vcf file was generated that included all variants. Mean target coverage of 64x was achieved, with more than 82 % of targeted bases having at   Table S2). Sample swab was excluded by a concordance check with the SNP information from the array.

Data interpretation
For data interpretation we uploaded the vcf files to Cartagenia NGS bench (Cartagenia, Leuven, Belgium) and then filtered the variants. We removed variants covered by less than five sequence reads and possibly benign variants annotated in dbSNP133, the 1000 Genomes project, the Seattle exome database [18], or the GoNL database [19] with an allele frequency above 2 %. Additionally, variants that were annotated in eight control samples, and for which exome sequencing was performed in the same sequence run, were also removed because they were considered to be artefacts. The remaining variants were filtered on any occurrence in known genes related to the phenotype, using a filter with selected phenotype traits [HPO-terms for microcephaly (HP:0011451, HP:0005484, HP:0000253), abnormalities of the nervous system (HP:0002011, HP:0000707), and abnormality of the head (HP:0000234)] [20] in the Cartagenia filter tree. The remaining variants were assumed to be related to the phenotype and further filtered on an inheritance model. First, an autosomal recessive inheritance model was applied for gene identification, then an autosomal dominant or X-linked inheritance model was applied. We considered variants resulting in transcriptional or splice site effects as potentially pathogenic. For the remaining group of variants, including those without a phenotypic match, we checked for their presence in the professional HGMD database (Biobase-international, Beverly, MA, USA) and manually checked the relationships between variants and possible phenotypes. The effect of potentially pathogenic variants was explored in the Alamut prediction program (Alamut, Rouen, France) or based on information from available databases and literature.

Validation of mutations by Sanger sequencing
Candidate pathogenic variants were validated by Sanger sequencing. When available, we studied segregation of these confirmed variants by investigating DNA samples from additional family members. Sequencing analysis was carried out using flanking intronic primers (primer sequences are available upon request). The forward primer was designed with a PT1 tail (5′-TGTAAAACGACGGCCAGT-3′) and the reverse primer with a PT2 tail (5′-CAGGAAACAGC TATGACC-3′). PCR was performed in a total volume of 10 μl containing 5 μl AmpliTaq Gold ®Fast PCR Master Mix (Applied Biosystems, Foster City, CA, USA), 1.5 μl of each primer with a concentration of 0.5 pmol/μl (Eurogentec, Serian, Belgium) and 2 μl genomic DNA in a concentration of 40 ng/μl. Samples were PCR amplified and sequenced according to our standard diagnostic protocols (available upon request).

Causality criteria
To designate candidate pathogenic variants as causative, variants confirmed by Sanger sequencing were required to occur in genes in which mutations are known to cause a disorder with clinical features consistent with the patient's phenotype. All variants were either reported in the literature or HGMD database [21] to be deleterious, classified as truncating or having a splice site effect, or predicted to be deleterious at least in three prediction algorithms in Alamut. The presence of candidate pathogenic variants was checked in the 1000 Genomes project, the Seattle exome database [18], Exome Variant Server [22], and Exome Aggregation Consortium (ExAC) [23]. Furthermore, when possible, familial segregation was investigated to determine whether this was consistent with the expected mode of inheritance. Causative variants were then reported to the referring physicians.

Results
On average 34,580 single-nucleotide variants and small insertions or deletions were found in each patient by whole-exome sequencing (see Additional file 3: Table S3). A molecular diagnosis could be established in 11 patients from 10 families, corresponding to a diagnostic yield of 29 %. The inheritance of these mutations were autosomal recessive (N = 7 families), autosomal dominant (N = 2) and X-linked (N = 1). In three patients, a molecular diagnosis was reached during their diagnostic work-up while performing exome sequencing (patient 1, patient 26 and patient 32). All three of these diagnoses were also reached by whole-exome sequencing. In the remaining 25 families, we identified five possibly candidate genes (data not shown). However, for none of these candidates a causal relationship to the phenotype of the patients could be proven. Mutations were detected in nine different previously identified microcephaly genes: ASPM, RAB3GAP1, RNASEH2B, KIF11, RTTN, ERCC8, CASK, DYRK1A and BRCA2 ( Table 2). None of these mutations was homo-or heterozygous present in the databases of benign variants. One exception was the variant in RNASEH2B [c.529G > A]. This mutation has been described before [24] and is predicted to be pathogenic by Alamut . It was heterozygous reported in the ExAC database with a frequency of 0.13 %.
We identified ASPM mutations related to primary microcephaly type 5 [25] in two independent families. Patient 6 had two novel mutations in the ASPM gene: a deletion of four base pairs resulting in a frameshift and a splice site mutation. Sanger sequencing of the parents confirmed segregation. The other ASPM mutations (in patient 32) had already been detected by Sanger sequencing: a stop mutation and an insertion of one base pair resulting in a frameshift. In two patients with clinically recognizable syndromes (i.e. Warburg Micro syndrome and Aicardi-Goutières syndrome), mutations in RAB3GAP1 (patient 1) [26] and RNASEH2B (patient 26) [24] related to these conditions were identified during their diagnostic work-up as well as by whole-exome sequencing. In RAP3GAP1 a homozygous deletion of four base pairs in exon 6 was detected. This deletion creates a frameshift starting at codon Thr159. The new reading frame ends in a stop codon 18 positions downstream.
A frame-shift mutation in the KIF11 gene was identified in patient 4, who was known to have microcephaly and retinopathy, consistent with the phenotype. The mutation is an insertion of one base pair resulting in a frameshift and after eight amino acids causing a stop codon. The mutation in this patient proved to be inherited from the mother, who also had microcephaly and mild learning problems, but who was not found to have any ophthalmologic abnormalities. The unaffected maternal aunt of patient 4 did not carry the mutation.
We found a compound heterozygous frame-shift and missense mutation in the RTTN gene encoding the centrosome-associated protein Rotatin in one of the three sib-pairs (patients 8 and 9). The missense mutation was predicted as deleterious by several programs (including Polyphen, SIFT, and Mutation taster).
In the ERCC8 gene a homozygous frame-shift mutation was found in patient 16, located in an 8 Mb homozygous region causing a stop codon further in the exon. Sanger sequencing of the parents confirmed that both were carriers of this heterozygous mutation.
We further found a splice-site mutation in the CASK gene in patient 19, and analysis of the parents confirmed that this was a de novo mutation. This mutation results in a predicted change of the splice site.
We identified a frame-shift mutation in the DYRK1A gene in patient 23, which was not inherited from the patient's mother. Because DNA of the patient's father was not available for testing, we were unable to confirm a de novo event in our patient, but the mutation is highly likely to be pathogenic.
A homozygous mutation in the BRCA2 gene was detected in patient 24. Biallelic BRCA2 mutations cause Fanconi anaemia complementation group D1 [27], a phenotype consistent with the phenotype of this patient. Sanger sequencing of the parents confirmed segregation. Since both parents of our patient were heterozygous for the mutation, and because of the associated increased risk for cancer, both of their families were offered appropriate genetic counselling and mutation screening. To date, no other family members have been reported to have cancer.

Discussion
Overall, a molecular diagnosis was established in 29 % of the families with microcephaly studied using whole-exome sequencing. This diagnostic yield is comparable to the diagnostic yield of 25 % for whole-genome sequencing reported in a study of patients with a wide range of suspected genetic disorders, mostly related to neurologic conditions [28]. Our results are also in line with other studies on the diagnostic utility of whole-exome sequencing in genetically highly heterogeneous disorders. Wholeexome sequencing had a higher diagnostic yield than Sanger sequencing in patients with deafness (44 % vs. 10 %), blindness (52 % vs. 25 %), mitochondrial diseases (16 % vs. 11 %) and movement disorders (20 % vs. 5 %) [29]. In a study of 188 probands from families with consanguineous parents and two or more affected children, mutations in known genes were found in 27 % of the patients [8]. A similar diagnostic yield of approximately 25 % was obtained in a study on consanguineous families from Qatar [30].
We found mutations that are highly likely to be causative in nine different genes known to be associated with a disorder in which microcephaly is an important phenotypic feature. The ASPM gene is related to primary microcephaly type 5, which is the most common cause of autosomal recessive inherited primary microcephaly [31]. Although ASPM was the most frequently tested gene during the diagnostic work-up of the patients (Additional file 1: Table S1), in one of these two families this gene had not been tested. Mutations in the KIF11 gene are known to cause autosomal dominant inherited microcephaly with or without chorioretinopathy, lymphedema, or mental retardation [32], a phenotype that is consistent with the clinical features of the patient in which we identified a KIF11 mutation. Homozygous missense mutations in the RTTN gene have recently been identified in patients with microcephaly, intellectual disability, epilepsy and bilateral polymicrogyria [33]. The p. H865R mutation identified in our family results in a substitution of an amino acid that is located in the second Armadillo-like domain of Rotatin, a protein-protein interacting domain that is highly conserved among species. Sanger sequencing in both parents and two unaffected siblings confirmed that the mutations segregated with the clinical phenotype in this family, further proving causality. In our patients, the type of cortical malformations differed and the degree of microcephaly was more severe than in these published cases. It is known that such heterogeneity in distribution, severity and type of cortical malformations can occur in relation to mutations in one specific gene, as has been observed with mutations in the GPR56 gene [33,34]. ERCC8 gene mutations cause Cockayne syndrome type A [35].
Although, in retrospect, this diagnosis is consistent with the clinical features of our patient, it had not been considered prior to our discovery of the ERCC8 gene mutation by whole-exome sequencing. When the patient was re-evaluated, his facial features had changed substantially and had become more characteristic of Cockayne syndrome. Mutations in the CASK gene cause mental retardation with microcephaly and pontocerebellar hypoplasia [36]. This X-linked condition is seen more frequently in females than in males [37]. The presence of severe intellectual disability, spasticity, seizures, absence of speech, and prominent upper incisors reported in our CASK patient are known clinical features associated with this disorder [38]. Unfortunately, no brain MRI was available from this patient so it remains unknown whether she also had pontine or cerebellar hypoplasia. The absence of this information may also explain why this diagnosis had not been considered prior to whole-exome sequencing. DYRK1A mutations cause autosomal dominant mental retardation type 7, in which microcephaly is an important clinical finding [39,40]. Typically, de novo DYRK1A mutations are found [39] as was most likely the case in our patient. Biallelic BRCA2 mutations are known to cause Fanconi anaemia type D1 [41], and we found a homozygous mutation in this gene. Secondary microcephaly and hepatocellular carcinoma, as seen in our patient, are known features of Fanconi anaemia [27]. This particular BRCA2 mutation has been identified in families with familial breast cancer [42].
In our cohort an autosomal recessive disorder was detected in 7/10 of the families with an established diagnosis, while only eight out of 35 (23 %) families had been suspected of having a recessive disorder (six consanguineous families and two families with affected siblings). This is consistent with the high empirical recurrence risk in sibs (approximately 15-20 %) of patients with severe primary microcephaly without an identified cause [43]. Our results suggest that autosomal recessive inheritance is far more frequent in patients with microcephaly than in patients with intellectual disability overall [28,44]. In a study of 100 patients with severe intellectual disability, mostly de novo mutations with an assumed dominant effect were identified and only one patient was found to have an autosomal recessive disorder [45].
Our results confirm the clinical and genetic heterogeneity in patients with microcephaly. When wholeexome sequencing is performed early in the diagnostic work-up of patients, it may prevent other, unnecessary, diagnostic evaluations. Most of the patients in our study had had extensive diagnostic evaluations but these had not led to a specific diagnosis in the majority of cases. Neveling et al. calculated that when three or more genes per patient are investigated by Sanger sequencing, whole-exome sequencing is more cost effective and has a higher diagnostic yield [29].