Identification of de novo mutations for ARID1B haploinsufficiency associated with Coffin–Siris syndrome 1 in three Chinese families via array-CGH and whole exome sequencing

Background Coffin–Siris syndrome (CSS) is a multiple malformation syndrome characterized by intellectual disability associated with coarse facial features, hirsutism, sparse scalp hair, and hypoplastic or absent fifth fingernails or toenails. CSS represents a small group of intellectual disability, and could be caused by at least twelve genes. The genetic background is quite heterogenous, making it difficult for clinicians and genetic consultors to pinpoint the exact disease types. Methods Array-Comparative Genomic Hybridization (array-CGH) and whole exome sequencing (WES) were applied for three trios affected with intellectual disability and clinical features similar with those of Coffin–Siris syndrome. Sanger sequencing was used to verify the detected single-nucleotide variants (SNVs). Results All of the three cases were female with normal karyotypes of 46, XX, born of healthy, non-consanguineous parents. A 6q25 microdeletion (arr[hg19]6q25.3(155,966,487–158,803,979) × 1) (2.84 Mb) (case 1) and two loss-of-function (LoF) mutations of ARID1B [c.2332 + 1G > A in case 2 and c.4741C > T (p.Q1581X) in case 3] were identified. All of the three pathogenic abnormalities were de novo, not inherited from their parents. After comparison of publicly available microdeletions containing ARID1B, four types of microdeletions leading to insufficient production of ARID1B were identified, namely deletions covering the whole region of ARID1B, deletions covering the promoter region, deletions covering the termination region or deletions covering enhancer regions. Conclusion Here we identified de novo ARID1B mutations in three Chinese trios. Four types of microdeletions covering ARID1B were identified. This study broadens current knowledge of ARID1B mutations for clinicians and genetic consultors.


Background
Coffin-Siris syndrome (CSS, OMIM#135900) is a rare congenital anomaly syndrome characterized by intellectual disability, growth deficiency, microcephaly, coarse facial features and hypoplastic nail of the fifth finger and/or toe [1]. Most of the cases were sporadic and showed an autosomal dominant mode of inheritance. The global prevalence of this disease was estimated at approximately 1:10,000-1:100,000 [2].
Here, through array-CGH and whole-exome sequencing (WES) techniques, we identified one 2.84 Mbp 6q25 microdeletion in case 1, two loss-of-function (LoF) variants in ARID1B (AT-rich interaction domain 1B) in case 2 (c.2332 + 1G > A) and case 3 (c.4741C > T, p.Q1581X) All of the three abnormalities were novel, not inherited from any of their parents. Because more than 10 types of Coffin-Siris syndrome, and many hospitalized patients with intellectual disability (ID) were in their childhood without distinct clinical phenotypes, it is difficult to identify the underlying genetic factors.
The combination of array-CGH and WES might be an efficient methodology to pinpoint the causal mutations.

Sample collection
This study was conducted in accordance with the Code of Ethics of the World Medical Association (Declaration of Helsinki) for experiments involving humans. This study was approved by the Ethical Committee of the Shenzhen Bao'an Women's and Children's Hospital. Written informed consent was obtained from each individual. The clinical phenotypes were compiled in Table 1.
Peripheral venous blood was collected from the three patients and their parents. Genomic DNA was extracted using the TIANamp Blood DNA Kit (DP348, Tiangen Biotech, Beijing, China) according to the manufacturer's instructions.

Array-comparative genomic hybridization (Array-CGH)
Array-CGH was performed using the Fetal DNA Chip (Version 1.2) designed by The Chinese University of Hong Kong (CUHK) [20,21]. The chip contains a total of 60,000 probes for more than 100 diseases caused by known microduplication/microdeletions. It doesn't include small-size chromosomal abnormalities, copy number polymorphism, chimerism and chromosomal rearrangement [22]. The experimental procedures were performed according to the standard Agilent protocol (Agilent Oligonucleotide Array-Based CGH for Genomic DNA Analysis, version 3.5). Hybridized slides were scanned with SureScan High-Resolution Microarray Scanner (G2505B, Agilent Technologies, Santa Clara, CA, USA), and the image data were extracted and converted to text files using Agilent Feature Extraction software (Version 10.5.1.1). The data were graphed and analyzed using Agilent CGH Analytics software.
Only gains or losses that encompassed by at least three consecutive oligomers on the array were considered. Then, the clinical relevance of observed chromosomal aberrations was estimated according to data found in the scientific literature and databases for each of the regions and genes involved, using the DECIPHER database for known microdeletion and microduplication syndromes and the Online Mendelian Inheritance in Man (OMIM) for known disease-causing genes, gene functions, and inheritance patterns. DNA copy alterations were considered possibly pathogenic when they involved regions known to be associated with microdeletion or microduplication syndromes.
Keywords: Haploinsufficiency, ARID1B, Coffin-Siris syndrome, SWI/SNF complex, Microdeletion, Loss-of-function The sequencing depth was about 100× for each sample. Sequences were aligned to the human reference genome (UCSC hg19). ANNOVAR was applied to annotate the VCF file. Variants with a Minor Allele Frequency (MAF) > 0.1% or synonymous single nucleotide variants (SNVs) were removed. SNVs causing splicing, frameshift, stop gain or stop loss were kept for subsequent analysis. The location, type, conservation of the identified mutations was obtained from several public databases, such as UCSC Genome Browser, NCBI dbSNP, NCBI ClinVar, 1000 Genome and ExAC. The pathogenicity of the variants was evaluated according to the American College of Medical Genetics and Genomics (ACMG) guidelines [23] and the online software, PolyPhen-2 and SIFT for functional prediction. A position was called as heterozygous if 25% or more of the reads identify the minor allele.

Protein interaction analysis
The 12 genes for CSS and 1 for NCBRS listed in OMIM were used as input to STRING Protein-Protein interaction database (http:// string-db. org/), that holds experimental, predicted and transferred interactions together with interactions obtained through text mining [24]. To select stronger interactors, network clustering was performed using k-means algorithm (number of clustering was set as 4).

Clinical and demographic characteristics of three cases Case 1 (Family 1)
The girl was G2P2, born by cesarean section at full-term of pregnancy. The birth weight was 2.95 kg. At 1 year 6 months old, the child was unbale to speak, but not taken to seek medical advice. It was at the age of nearly 2 that the child could unconsciously made a "Baba-Mama" sound. At 2 years 4 months old, she was still unable to walk without support. Now, she was 3 years and 5 months old, with a weight at 11.9 kg, height 88 cm, and head circumstance 48.5 cm. The pedigree and some features were depicted in Fig. 1a and b. Family history: Her parents were healthy and non-consanguineous. Her mother was 30 years old and had a healthy living style during pregnancy. The patient has a healthy 6-year-old sister.

Case 2 (Family 2)
The girl was G5P2A3, born by cesarean section at 39 + 3 weeks of gestation, with a birth weight of 2.65 kg,  birth height 48 cm and head circumstance 32 cm. Two months after birth, she was admitted to the Shenzhen Hospital affiliated to the University of Hong Kong for treatment due to five times of "suspicious convulsions " and diagnosed as "epilepsy". At 5 months old, her height increased to 90 cm, and weight to 11.6 kg. Her psychomotor development was significantly behind the children of the same age, and also combined with hypotonia. The pedigree and some features were depicted in Fig. 1c and d. Family history: Her healthy parents were not consanguineous. Her mother was 38 years old and had no history of smoking, drinking, long-term exposure to chemicals and harmful radiation during pregnancy. The patient has a 12-year-old brother suffered from "attention deficit hyperactivity disorder".

Case 3 (Family 3)
The girl was G2P2, born naturally at 39 + 1 weeks of gestation with a birth weight of 3.2 kg, height 50 cm and head circumstance 34 cm. Her crying voice was weak at birth. She could not take the initiative to suck in the first two months after birth and was fed with dropper. On the 4th day after birth, she was hospitalized for 1 week due to "neonatal hyperbilirubinemia". About 1 year ago (at 8 months and 21 days old), the child could not sit alone and was treated as "developmental delay". Her cognitive and motor development was significantly lagging behind children of the same age. She could understand and execute simple instructions, and uttered no more than 5 words. At this time, her height was 94 cm, with a weight at 12.4 kg. The pedigree and some features were depicted in Fig. 1e and f. Family history: She was born of healthy, non-consanguineous parents. Her mother was 35 years old and had no history of smoking, drinking, long-term exposure to chemicals and harmful radiation during pregnancy. The patient has a healthy brother. Clinical characters of the facial, skeletal-limb, nervous system and other features of the three cases were compiled in Table 1.

A 6q25 microdeletion was identified in case 1 by array-CGH
Oligonucleotide array-CGH was performed for the three patients using the Fetal DNA Chip (Version 1.2 (Fig. 2a). This deletion was not identified in her parents. Besides, there were no deletions or duplications detected in case 2 and case 3. This deleted region contains 7 protein-encoding genes, and is highly conserved in mammals (Fig. 2b). Five of them (ARID1B, ZDHHC14, SNX9, SYNJ2 and GTF2H5) were localized on the sense strand and two (TMEM242 and SERAC1) on the antisense strand.

Two novel pathological point mutations were identified by WES
As for the other two cases, WES was performed at MyGenostics (MyGenostics, Beijing, China). The aligned bases for case 2 and case 3 were 13,859.9 and 15,678.15 Mb, respectively. The ratios of the coverage on target regions were 99.33% and 99.69%, respectively. The average sequencing depths on target regions were 109.65 and 118.64 for case 2 and case 3, respectively.
In case 3, a nonsense mutation (c.4741C > T, chr6:157,522,259) was identified in the exon 18 of ARID1B (NM_017519), causing the codon (CAG) for Gln (Q) to be a premature stop codon (TAG, X) (p.Q1581X) (Fig. 3a, d, Table 2). This mutation was verified by Sanger sequencing (Fig. 3c). This nucleotide (4492C) was strongly conserved during evolution. The mutation was not included in the large-scale genomic databases mentioned above. Since this stop gain mutation was only identified in case 3 and not in her parents, it was regarded as another 'de novo' variant. Besides, the variant has been annotated as rs1554235831 in NCBI dbSNP database. According to the criteria of ACMG guidelines, p.Q1581X was classified as PVS1 + PS2 + PM2 and annotated as "pathogenic".

6q25 Microdeletions involving ARID1B
9 reported 6q25 microdeletions associated with ARID1Brelated disorders were recruited from published articles. We also collected individuals with developmental disorders whose genomes containing microdeletions involving ARID1B gene from DECIPHER [25,26] and Developmental Delay [27,28]. 32 were from DECI-PHER and 5 from Developmental Delay. The microdeletions were compared against human genome (hg19) using UCSC Genome Browser. Totally, 46 microdeletions were recruited for the subsequent analysis (Fig. 4a). 39 of them were completely (or almost) covered the whole  [29], several candidate enhancers were located in the region covered by this deletion (266,355) (Fig. 4b).

Discussion
Currently, there are 12 genes responsible for CSSs and 1 for NCBRS, a disorder with similar phenotypes of CSSs. Among the 13 genes, except the two transcription factors, SOX4 and SOX11, proteins encoded by the remaining 11 genes were bound with each other (Fig. 5a) to form two SWI/SNF-related complex, BAF (Brg/Brahma-associated factors) complex and/or PBAF (Polybromo BRG1 Associated Factor) (Fig. 5b, c). The SWI/SNF complex was originally referred to as the protein complex critical for cellular responses to mating-type switching (SWI) or sucrose fermentation (SNF) in yeast [30,31]. This multiprotein complex contains more than 15 subunits to activate gene expression through its capacity to remodel and remove nucleosomes at gene promoters [32]. Recently, mutations, translocations and deletions of the subunits in the SWI/SNF complex have been linked to a number of human diseases, such as cancer [33], different types of CSS [4,[34][35][36][37][38] and NCBRS [15].
The gene for CSS1 is ARID1B [2,4,39], a core subunit of the BAF complex. This gene is the most frequently mutated genes in cases with CSS [5]. The phenotypes caused by ARID1B mutations encompass a spectrum of features, including feeding difficulties, laryngomalacia, speech delay, motor delay, hypertrichosis, and cryptorchidism [40]. According to the Human Protein Atlas, ARID1B is expressed ubiquitously, and abundantly detected in never, endocrine, muscle and lymphoid systems (Fig. 5d). It is reported that meticulous coordination between the actin cytoskeleton and the microtubule  Table 2 De novo mutations of ARID1B identified in two cases Het, heterozygous; X, stop codon; PVS1, pathogenic very strong, represent "loss-of-function"; PS2, pathogenic strong, 2 represent "de novo"; PM2; pathogenic moderate, 2 represent "absent from controls" network regulate the formation and transportation of secretory vesicles for proper neurite outgrowth and maintenance, which is critical for normal neural development [41]. This finely-tuned coordination is regulated by the BAF complex [42]. ARID1B (previously named as BAF250B) is a core member of the BAF complex and plays an essential role for dendrite outgrowth and arborization in cortical and hippocampal pyramidal neurons during brain development in mice [43]. ARID1B deficiency led to decreased dendritic branching, thus hinder the dendritic innervation into cortical layer I to form proper synapses. This might disrupt the balanced excitatory and inhibitory inputs and result in pathologic phenotypes of ID.
Based on the above analysis, ARID1B, GTF2H5, SERAC1 and ZDHHC14 might be the promising candidates responsible for the 6q25 microdeletion syndrome. Currently, there are more than 10 reported individuals harboring 6q25 microdeletion [2,[55][56][57][58][59]. The reported shortest 6q25 deletion (1.1 Mb, chr6:156,004,307-157,120,089) contained only one protein-coding gene, ARID1B [60]. According to the ARID1B-related deletions collected in DECIPHER and Developmental Delay, the shortest deletion (270,613) was only 5.11 Kb, just spanning the promoter region and first exon of ARID1B. This deletion made ARID1B unable to start the transcription, just as the deletions reported by Ronzoni L in 2016 and detected in two DECIPHER samples (282,767 and 360,703). There also existed a 28.79 Kb microdeletion (274,690), which covered the whole last exon of ARID1B, thus removing part of the protein-coding sequence and the complete 3′-UTR of ARID1B. The transcription of ARID1B could not stop at the normal termination site, and might produce truncated proteins without the C-terminal BAF250_C domain (pfam12031). Interestingly, the microdeletion (266,355) just located in the middle of ARID1B, just like the sample 389,566. According to the chromatin modification patterns from 7 cell lines (GM12878, H1-hESC, HSMM, HUVEC, K562, NHEK and NHLF) from ENCODE, several candidate enhancers were located in this region (Fig. 5b). Loss of these enhancers might affect the transcription efficiency of ARID1B. Therefore, the insufficient production of ARID1B could be caused by four types of 6q25 microdeletions, namely whole genomic region, promoter region, termination region and enhancer regions.
Haploinsufficiency of ARID1B has been reported to be recurrently detected in intellectual disability (ID) or mental retardation (MRT) [60,61], therefore, ARID1B might be the crucial pathogenic factor behind the 6q25 microdeletion syndrome.

Haploinsufficiency of ARID1B caused by de novo SNVs
In case 2, a variant at the splicing site of exon 6 (c.2332 + 1G > A) of ARID1B (NM_017519) was identified. This variant disrupted the splicing donor site (GT) of intron 6, which might affect the proper splicing of ARID1B's mRNA during transcription to produce abnormal transcripts, thus to inactivate the function of ARID1B. It is reported that heterozygous mutations of splicing sites could affect splicing and lead to haploinsufficiency of the affected genes [62,63]. Therefore, it is reasonable that c.2332 + 1G > A mutation might be detrimental the proper splicing and result to insufficient expression of ARID1B.
In case 3, the mutation (c.4741C > T) was located in exon 18 of ARID1B, causing the codon (CAG) for Q to be a premature termination codon (TAG) (p.Q1581X). The transcript contains two stop codons (one at position 1581 and another at position 2237). It had been reported that mRNAs containing premature termination codons (PTC) could be detected and degrade rapidly by a special mRNA surveillance mechanism, Nonsense-mediated mRNA decay (NMD). It is widely accepted that the biological purpose of NMD is to protect cells from potential harmful effects caused by truncated translational products as a consequence of frameshift or nonsense mutations or by inaccurate pre-mRNA splicing [64][65][66]. It is been found that in 143 patients with CSS, all pathogenic variants were truncating (nonsense, frameshift, splicesite, and deletions of various numbers of exons including whole-gene deletions). Therefore, these two de novo single nucleotide variants in our project might lead to haploinsufficiency of ARID1B and be pathogenic for CSS.

Treatments for ARID1B-related disorders
Till now, there has no effective treatments for ARID1Brelated disorders. The present methods are symptomatic treatment and cannot cure this disease. Currently, a clinical trial is ongoing to investigate the effects of clonazepam on children with ARID1B-related intellectual disability in Netherlands (EudraCT: 2019-003558-98). The purpose of this clinical trial is to test the beneficial effects of clonazepam on behavior and cognitive function in ARID1B patients.
Animal with ARID1B haploinsufficiency (Arid1b +/− ) would be promising models for the study of molecular mechanism and discovering of drugs for ARID1B-related disorder. The Arid1b +/− C57BL/6J mice showed reduced corpus callosum size, dentate gyrus size, cortex thickness, and proliferation [67][68][69]. It was found that deficiency of GHRH-GH-IGF1 axis was detected in Arid1b +/− mice [67]. Exogenous GH supplementation could significantly reverse the growth retardation in Arid1b +/− mice, but no improvement on abnormal behavioral phenotypes such as anxiety. This indicated there might be other critical unknown druggable targets for the treatment of ARID1B-related disorder. Or Arid1b +/− mice might not be the most suitable animal for the study of ARID1Brelated disorder.