Genome-wide copy number variant analysis for congenital ventricular septal defects in Chinese Han population

Background Ventricular septal defects (VSDs) constitute the most prevalent congenital heart disease (CHD), occurs either in isolation (isolated VSD) or in combination with other cardiac defects (complex VSD). Copy number variation (CNV) has been highlighted as a possible contributing factor to the etiology of many congenital diseases. However, little is known concerning the involvement of CNVs in either isolated or complex VSDs. Methods We analyzed 154 unrelated Chinese individuals with VSD by chromosomal microarray analysis. The subjects were recruited from four hospitals across China. Each case underwent clinical assessment to define the type of VSD, either isolated or complex VSD. CNVs detected were categorized into syndrom related CNVs, recurrent CNVs and rare CNVs. Genes encompassed by the CNVs were analyzed using enrichment and pathway analysis. Results Among 154 probands, we identified 29 rare CNVs in 26 VSD patients (16.9 %, 26/154) and 8 syndrome-related CNVs in 8 VSD patients (5.2 %, 8/154). 12 of the detected 29 rare CNVs (41.3 %) were recurrently reported in DECIPHER or ISCA database as associated with either VSD or general heart disease. Fifteen genes (5 %, 15/285) within CNVs were associated with a broad spectrum of complicated CHD. Among these15 genes, 7 genes were in “abnormal interventricular septum morphology” derived from the MGI (mouse genome informatics) database, and nine genes were associated with cardiovascular system development (GO:0072538).We also found that these VSD-related candidate genes are enriched in chromatin binding and transcription regulation, which are the biological processes underlying heart development. Conclusions Our study demonstrates the potential clinical diagnostic utility of genomic imbalance profiling in VSD patients. Additionally, gene enrichment and pathway analysis helped us to implicate VSD related candidate genes. Electronic supplementary material The online version of this article (doi:10.1186/s12920-015-0163-4) contains supplementary material, which is available to authorized users.


Background
Congenital heart defects (CHDs) are the most prominent birth defects, with a prevalence of 4 to 10 per 1000 live births [1]. A ventricular septal defect (VSD) occurs in more than 1 in 300 live births and is the most common CHD identified to date [2]. Although nearly 40 % of infants with VSDs can survive without treatment up to the age of 15 years, VSD patients diagnosed in adulthood may experience potentially serious clinical and hemodynamic problems [3]. Early detection and diagnosis lead to improved prognosis for patients with CHD.
Genomic imbalances detected by karyotype or FISH explain 9 % to 18 % of neonatal CHD cases [4]. CHDrelated CNVs, identified by chromosomal microarray analysis (CMA), have been reported on almost every human chromosome [5][6][7][8][9] and numerical chromosomal abnormalities such as trisomy 21, trisomy 18 and trisomy 13 and also CNVs such as 22q11.2 deletion are causally related to CHD. Although the causal relationship between CNVs within the size range of 100 kb-1 Mb and CHD is incompletely investigated, rare de novo CNVs were revealed up to 5 % of CHD trios [10].
Some CNV studies focus on one type of CHD such as syndromic CHD [5], tetralogy of Fallot [8], double outlet right ventricle [11], thoracic aortic aneurysms and dissections [12] and isolated congenital heart disease [9]. Aproximately 10 % of Tetralogy of Fallot CHD patients (TOF) display an increased genome-wide CNV burden [8,10]. Hence,while Studies focusing on the involvement of CNV in CHD development have been reported [5,7,8,12], the complex and heterogeneous phenotypic and genetic nature of CHD suggest the need for further investigation of their genetic basis, particularly for certain category of CHD.
The aim of the present study was to detect CHD-associated CNVs in Chinese patients with VSD. Although several studies had examined the occurrence of CNVs in Chinese CHD patients [13,14], the CNVs in the Chinese patients with VSD have not been particularly investigated. Detecting the CNVs in patients with VSD may reveal VSD specific candidate genes and associated pathways.

Subjects
The subjects were recruited from multi-center hospitalbased CHD cohort between 2000 and 2009. We randomly enrolled 166 unrelated patients (Subject details in Additional file 1: Table S1). All patients except seven had VSD phenotype. Every subject underwent complete cardiac evaluation. Congenital cardiac malformations were diagnosed by echocardiography and subsequently confirmed during surgery when performed. We categorized cases into two large groups: Isolated VSD (patients with VSD as the only cardiac defect) and complex VSD (patients with more than two additional cardiac phenotypes besides VSD). The additional phenotype besides cardiac phenotype such as mental defect or developmental disability was not discussed due to lack of clinical evaluation. The ethics committee of Fudan University approved the study. Documented consents were obtained from all participating patients or their legal guardians.

CNV callings and rare CNVs identification
The Agilent Human Genome CGH microarray 244 k kit was used for CMA analysis (Agilent Technologies). Sample-specific CNV regions were identified using two software packages, Agilent DNA Analytics 4.0 CH3 Module (Agilent Technologies) and Nexus Copy Number v5.0 (BioDiscovery). Copy number gains or losses identified by both software packages were further manually inspected and confirmed.
We interpreted the CNVs hierarchically as shown in Figure 1. Common CNVs were removed based upon their frequency in DGV (Database of Genomic Variants) [15,16] and Chinese control data sets which were compiled from four published data sets including 10 individuals from Park et al. [17], 779 individuals from Lin et al. [18], 99 individuals established by SGVP (Singapore Genome Variation Project) [19] and 80 Han Chinese by Lou et al. [20]. CNVs with >70 % overlap with the ones reported in DGV were considered as common CNVs; CNVs partially (< 30 %) overlapped or with no overlap with the DGV dataset or other data sets were considered as rare CNVs. For the rare CNVs, we consulted the DECIPHER (https://decipher.sanger.ac.uk/) and ISCA (now as Clingene, https://www.clinicalgenome.org/) databases for evidence of clinical relevance [21]. The Refseq genes which included in CNVs were identified by UCSC browser (Human NCBI36/hg18 Assembly).

Validation of small rare CNVs
CNVs with marginal QC values or of small size (< 80 kb) were selected to be confirmed by multiplex ligationdependent probe amplification analysis (MLPA) (MLPA probes are listed in Additional file 1: Table S2). We also performed parental testing for 16 probands as listed in Additional file 1: Tables S3-S4.

Statistical analyses
Statistical analysis was performed using SPSS 17. Twoside Fisher's exact test and Student's t-test were performed for qualitative and quantitative variables respectively.

Identifying CHD-associated genes
In order to identify VSD related genes, we compared the genes located in our rare CNVs with known CHD candidate genes. The Mouse Genome Informatics resource (MGI, http://www.informatics.jax.org/) can be very informative for studying disease-related genes in the human. We used "abnormal interventricular septum morphology" as the MP term to search for VSD related genes listed in MGI (MP: 0000281 as shown in the Additional file 1: Figure S1; http://www.informatics.jax.org/) and identified 147 genes with 375 genotypes and 416 annotations from MGI. In addition, 202 CHD-related genes were compiled from other resources: 104 genes from UCSC with the Human Genome Build 19 (cardiac gene: 76, cardiac transcription factor gene: 28), 51 genes from published literature (non-syndromic and syndromic CHD) and 47 genes from the CHD wiki. We also collected gene sets from the term "cardiovascular system development" (GO: 0072358) and candidate pathways involved in cardiac development such as Wnt, Notch, Hedgehog and FGF by KEGG and Netpath (http://www.netpath.org/). The CHDrelated pathway selection processes are listed in Additional file 1: Figure S2. In total, there are a total of 1957 collected genes involved in cardiac related pathways which were combined as a potentially CHD-related dataset for further analysis. We compared the above combined data sets with genes mapping to CNVs detected in VSD patients.

VSD candidate gene identification and pathway analysis
To define the most promising candidate genes from above defined gene list, ToppGENE was used as a gene prioritization and enrichment tool [22]. We used Ingenuity Pathway Analysis (IPA) to annotate genes encompassed within VSD-related CNVs for their molecular and cellular functions and associated pathways. Network scores were calculated based on the hypergeometric distribution and Fisher's exact test.

Chromosomal imbalances in VSD patients
We identified six aneuploid abnormalities: two cases of trisomy X (47, XXX) and four of trisomy 21. Up to 70 % of Down syndrome [23]subjects but only 1.3 % of trisomy X subjects have been reported to display CHD. CHD features in the trisomy X patients included VSD, ASD (Atrial septal defect), pulmonic and aortic stenosis coarctation [24].

Rare CNVs in VSD patients
There were 1575 CNVs detected in our 154 patient cohort, with a median size of 310.5 kb (max 33.4 Mb, Fig. 1 Workflow of CNV analysis and candidate genes discovery. CNV calls by DNA Analytics were performed by using the ADM2 algorithm, with a sensitivity threshold of 6.0 and a minimum of 5 probes. The QC metrics table was used to check signal intensity and background noise. Above 0.22 of DLR Score (Derivative Log Ratio) was set as the cutoff to avoid false CNVs. 6 cases were removed because of bad quality of data during the QC filter.6 cases with aneuploid abnormals (Trisomy X and Down syndrome) were not put into further analysis   (Fig. 1).
We also detected 32 intergenic CNVs but these were excluded from further analysis (Additional file 1: Table S9). 24.1 % (7/29) of the genic CNVs were less than 100 kb, 34.4 % (10/29) were from 100 kb to 500 kb and 41.3 % (12/29) were larger than 500 kb as shown in Additional file 1: Figure S3. Parental testing revealed that six CNVs were inherited from unaffected parents, reducing the likelihood that these are clinically significant. Three CNVs were confirmed as de novo: one deletion of 57.9 kb at Xp22.2 involving EGFL6 gene (Additional file 1: Figure S4) and two duplications of 156.0 kb at 14q32.12, and of 117.8 kb in 7p14.2, which were experimentally confirmed; the two CNV gains were found in same subject.

CNVs larger than 1 Mb
Five VSD cases revealed CNVs larger than 1 Mb (as shown Additional file 1:

CNVs putatively associated with VSD
All of the identified 29 rare CNVs, putatively causally associated with VSDs were placed on the chromosomal map of the genome (Additional file 1: Figure S5). These CNVs comprised mostly subtelomeric or centromeric imbalances and distributed on chromosomes such as 2p, 2q, 3p, 4q, 6p, 15q, 16q, 21q and 22q and most of these CNVs located on chromosomes 2, 3, 4, 7, 16 and X. The CNVs sizes identified in our VSD study are much smaller than those deposited in CHD wiki, which reports three regions (4q-ter, 15q26.2, 16q22) and one gene (TBX1) related to CHD. Twelve of the 29 CNVs (41.3 %) affect regions known to be ASD, VSD or general heart disease-related in DECIPHER and ISCA (Table 1).

CNV comparison in isolated and complex VSD
We compared CNVs within the 100 isolated VSD patients with those 44 complex VSD patients (Additional file 1: Table S6). There was a trend towards increased CNV size in patients with complex VSD, but the difference did not reach statistical significance. There was no significant difference in rare CNV numbers (average CNV count for each case) for either deletions or duplications.

Enrichment of CHD related genes
Several lines of evidence support the enrichment of CHD related genes within the CNVs detected in VSD patients. First, we found that PAX3 and LBX1 (in duplications) and CRKL, GP1BB, PDLIM3, TBX1, TXNRD2 (in deletions) were annotated in the MGI database and CHD wiki as associated with CHD. Evidence from the literature and from GO signal pathway analysis further supported this notion (Tables 2 and 3). Second, the enrichment analysis revealed 25 genes of 285 genes within both duplication and deletion CNVs detected in this study enriched in transcription factor, chromatin binding and three of five biological processes associated with heart development or cardiovascular system development are the main functions for candidate genes (Table 4). Third, the top two networks constructed by IPA analysis for the 285 genes include networks of cardiovascular disease and network of herediary disorder (Score 46: 25 genes) (Fisher's exact test, P = 3.42E-08 to 3.79E-02) (Additional file 1: Table S7). Top transcription regulators (NANOG, TP53, SOX2,  Table S8 and Additional file 1: Figure S6C. As a homeobox, NANOG regulates several transcription factors [27] such as EN1, SOX2, LBX1 and ZFP42 in our dataset (P = 4.91E-03), which controls cellular growth, organic growth and development.

Discussion
Genomic imbalance, including known genomic disorders, contribute to the genetic etiology of congenital malformations such as CHD. In previous studies, syndromic chromosome abnormalities explained 6-9 % of CHD [28]. We found that Down syndrome (4 cases, 2.5 %),  DiGeorge syndrome (2 cases, 1.2 %) and Trisomy X syndrome (2 cases, 1.2 %) contributed to up to 5 % of cases of VSD, consistent with the previous report [23,24]. In addition, we identified large CNVs (> 1 Mb) (3/161, 1.9 %) including 4q34.3-q35.1, 3q26.32-q29 and 16p13.11-p11.2, which are associated with CHD as reported by DECIPHER and ISCA. Other CNV regions identified in our study such as 4q-ter, 15q26.2, and 16q22 had also been reported in the CHD wiki. We did not identify any significant difference in size, number or genic content of rare CNVs between complex VSDs and isolated VSDs. Some previous reports had reported a higher rate of CNVs carried in patients with CHD plus extracardiac or developmental abnormalities [5], but some claimed no significant increase [29]. We believe it likely that the genes affected by the CNVs are more important to cause VSD than CNV size or number, but the sample size might be too small to identify differences between isolated and complex VSDs. Our interpretation suggests that critical genes contribute to the development of CHD by altered expression due to duplication or deletion CNVs. The genes identified in both de novo and recurrent CNVs were likely to be CHD-related genes. For example, we found a de novo deletion at Xp22.2 including EGFL6. EGFL6 involved in the regulation of cell cycle, proliferation and developmental processes has been previously reported as a candidate gene for human developmental disorders and is expressed during embryonic development [30]. 16p13.11 duplication is recurrent in our cohort, it had also been reported to be significantly associated with CHD recently [31]. MYH11 is the proposed candidate gene at this interval as defects in this gene underlie aortic aneurysm familial thoracic type 4 (AAT4) [MIM: 132900] and also contribute to familial thoracic aortic aneurysm and dissection (TAAD) and patent arterial duct (PDA). Our study suggests that EGFL6 and MYH11 may be dosage sensitive genes involved in embryonic heart development. Furthermore, we specifically evaluated genes involved in CNVs detected in patients with VSD. We identified 15 genes previously known to be associated with CHD or in CHD-related signal pathways (Tables 2 and 3). Among them, CRKL, TBX1, TXNRD2, GP1BB were known to be involved in DiGeorge syndrome. MYH11, TXNRD2, PAX3, LBX1 and BCL6 were associated with abnormal heart ventricle thickness (MP: 0020135). BMP5, EN1, PRKCB, CACNG3 and CHP2 were clustered in CHD related signaling pathways. Importantly, CASP3, CRKL, FGF12, LBX1, MYH11, PDLIM3, TXNRD2 and TBX1 are related to heart development (GO: 0007507) and also cardiovascular system Two types of molecular functions including chromatin binding and transcription factor complex were revealed through unbiased gene priority and enrichment analysis for all genes within CNVs of VSD patients and 5 biological processes via GO annotations, which indicated to be related to VSD. Transcription factors including LBX1, PAX3, EN1, SOX2 and TBX1 with confirmed effects on cardiogenesis were detected in our data set. LBX1 is a homeodomian-containing transcription factor required for the diversification of heart precursor cells in Drosophila and its expression had been described in cardiac neural cells and in migrating muscle precursor cells [32]. The overexpression of Lbx1 mRNA resulted in enlarged somites, an increase in cell proliferation by upregulating MyoD and lack of differentiated muscle [33]. PAX3, as a key regulatory factor in controlling the migrating of myogenic precursor cells, genetically acted in the upstream pathways of Lbx1 and Msx1. Pax3 also directly activate MyoD expression. The rising levels of Pax3 and Lbx1 result in enlarged muscle precursor cell population and then increase the bias for myogenic differentiation [34]. Additionally, a transcription regulation loop (NANOG-SOX2-OTC4) associated with downstream cascade regulation on GATA4, NKX2.5, MESP to modulate heart development (Additional file 1: Figure S6C). As the first formed organ, the genesis of heart involves a very complex series of morphogenetic interactions [35] and the transcription factors are essential for cardiogenesis at different embryonic stages.
As reported in the recent exome sequencing of CHD, de novo mutations in chromatin markers played a vital role in regulating cardiac development genes [36]. Seven genes (HIRA, SOX2, PRKCB, ING2, TP63, BCL6 and PAX3) in this study were enriched in chromatin binding pathway (GO: 0003682) (P =1.12E-04)which are worthy of being investigated in more detail in future studies.
Based on our cohort, chromosomal imbalances account for 5.2 % (8/154) and rare CNVs account for 16.9 % (26/ 154) of the cases. No significant difference was detected in terms of CNV diagnostic yield between complex and isolated VSD patients, indicating that both populations should be tested for genomic imbalances. Although the VSD-related candidate genes (as shown in Table 5) need further studies to confirm their involvement in VSD pathogenesis, our findings demonstrated that high-density microarray analysis is a useful tool to uncover potential underline genomic causes for VSDs and extended enrichment and pathway analysis indicate possible convergence on pathways during cardiogenesis.

Conclusions
In this pilot study, we identified genomic imbalances had an important contribution to the genetic burden of patient with VSD, which was consistent with the previous report in CHD. The rare CNVs VSD patients carried were interpreted and classified for clinical utility by comparing the population CNV database and patientderived CNV database. CNV analysis of VSD patient in this study firstly showed genetic status of VSD on copy number variant and no significant difference between isolated VSD and complex VSD indicated that both populations need equal CNV tests. Furthermore, we applied gene enrichment and pathway analysis for understanding the relevant genes involved and the potential relevance of CNV with heart development, which may delineate the genetic etiology and pathways of VSDs.