BDdb statistics
At present, the BDdb contains 101 and 37 GSE Series records from humans and mice, respectively. The obtained datasets include 15 diseases, 12 tissues, and 35 cell lines in humans, and four diseases, 12 tissues, and 17 cell lines in mice (Fig. 2a–e) obtained from multi-omics studies, e.g., genomics, transcriptomics, epigenomics, and single-cell omics (Fig. 2c, d). Moreover, the BDdb contains 869 potential biomarkers pertinent to 22 types of birth defects, such as microcephaly and neural tube defects (Fig. 2f). These markers were obtained from more than 500 studies involving six species, i.e., Homo sapiens, Danio rerio, Mus musculus, Sus scrofa, Canis familiaris, and Gallus gallus.
In the BDdb, embryonic tissues are among the most abundant samples for humans, followed by brain and blood; brain tissue samples are among the most abundant for mice, followed by lymphoid and embryonic samples (Fig. 2a, b). Correspondingly, the most abundant cell lines are blastocysts belonging to human embryo tissue and cortex belonging to mouse brain tissue. In terms of the diversity of sequencing types, DNA microarray data are dominant for both humans (44%) and mice (91%), followed by RNA-Seq, methylation profiling by array, ChIP-Seq, and DNA methylation (Fig. 2c, d). Most datasets are linked to Down syndrome, followed by Klinefelter syndrome, Turner syndrome, Warkany syndrome 2, and Edwards syndrome (Fig. 2e). In addition to datasets of diseases associated with chromosomal abnormalities, those related to diseases such as orofacial clefts, and open myelomeningocele are also included. For biomarkers collection, the top five diseases regarding related datasets include neural tube defects, anophthalmia/microphthalmia, cleft lip, atrioventricular septal defects, and diaphragmatic hernia (Fig. 2f). A summary of these datasets and biomarkers can be found in Additional files 1 and 2: Tables S1 and S2, respectively.
Database features and utility
The BDdb contains multi-omics datasets and allows users to query the subsequent analysis results with five functional states. The easy-to-use interface provides access for searching, browsing, visualizing, and downloading (Fig. 3). The online user guide illustrates several cases of BDdb usage.
Information search
For the search module, users can search by inputting keywords or choosing the provided options. As shown in Fig. 3a, users can select one or more options, including the organism, karyotype, tissue, and cell line. This enables users to opt for certain kinds of karyotypes, such as trisomy 21, trisomy 18, and monosomy X, which are typical for chromosome aneuploidy. In terms of omics datasets, the BDdb only contains those related to humans and mice at present. After submitting the search request, relevant results are displayed.
View module
Data from diverse sequencing types are displayed with different result modules (Fig. 3a). For example, in the RNA-Seq datasets, the resulting interface contains five sections: (1) “Basic Information”, shows the information on karyotype, disease, organism, tissue, and cell line, which are the sample’s features, as well as the GEO title, literature, and searching link, which can help users to trace the origin of the data; (2) “DE Genes”, enables users to search or download the gene expression matrix as well as up- and down-regulated gene tables; (3) GO and KEGG enrichment, enables users to explore functional/pathway enrichment, with the bubble charts, bar charts, and cnetplots provided; (4) Network analysis of DEGs; (5) “Genome Browser”, can intuitively display expression patterns with a graphical interface. Apart from the RNA-Seq datasets, the database also provides fundamental analysis for other datasets. Particularly, omics datasets from different studies can be displayed in the “Genome Browser” as per the user’s requirements to further mine for useful information (Fig. 3b).
Birth-defect diseases biomarker mapper
Biomarkers are stored in the “Biomarker” module. Users can search and view markers of interest by selecting species, diseases, and tissues from the pull-down menu. Users can also download all analysis and biomarker results via the ‘Download’ function. The BDdb also provides detailed tutorials and answers to common questions on the “Help” page.
Case study: exploring biomarkers for diseases diagnosis using BDdb
To discover useful clues for diseases using the BDdb database, we targeted Down syndrome in humans, which has drawn considerable attention worldwide over many years. Taking fibroblasts as an example, a total of 13 GSE Series records were linked to trisomy 21, including various sequencing types such as RNA-Seq and DNase-Seq. We consolidated the up-regulated DEGs from eight GSE Series records obtained by RNA-Seq and DNA microarray, and then sorted them by counts. In total, 21 genes had counts ≥ 4 (Fig. 4a). Among them, the TTC3 (tetratricopeptide repeat domain 3) and IFI27 (interferon α-inducible protein 27) genes ranked first, with counts of six. TTC3 is located on 21q22.2 within the Down syndrome critical region (DSCR) and plays an essential role in neural development. TTC3 is commonly regarded as a candidate gene for Down syndrome and Alzheimer’s disease [37, 38]. In addition, IFI27 is involved in the interferon response in trisomy 21 [39]. We found that both TTC3 and IFI27 had higher expression levels in trisomy 21 than in euploid controls in GSE55504. This was in accordance with the chromatin accessibility pattern in GSE55425 as assessed by DNase-Seq (Fig. 4b), implying that the extra copy of chromosome 21 or other transcript regulators in Down syndrome may confer this difference. In addition to TTC3 and IFI37, other eight genes (marked with asterisks in Fig. 4a) such as SH3BGR [40, 41] and APP [42] are also typical biomarkers for Down syndrome. As these well-studied trisomy 21 marker genes can be captured by the BDdb, the rest that has not been reported yet, such as OLFM2 and HAS1, may be prospective biomarkers for trisomy 21. Taken together, we can theoretically seek additional biomarkers associated with a particular disease using the BDdb.
Future perspectives
To assist clinicians and researchers, we developed the BDdb, which consists of multi-omics data and potential biomarkers of birth defects. The database will be updated constantly according to the frequency of publications associated with chromosomal aberrations. Aside from existing data, we will also add proteomics data to expand the repository. Moreover, other species and diseases will be added to provide more information to users. Ultimately, we hope that the BDdb, serving as an auxiliary tool, can provide clues for studies on birth defects, and hopefully, accelerate research progress.