Identification of genes and miRNA associated with idiopathic recurrent pregnancy loss: an exploratory data mining study

Background Recurrent pregnancy loss (RPL) is a significant adverse pregnancy complication, with an incompletely understood pathology. While many entities were proposed to elucidate the pathogenic basis of RPL, only few were significant enough to warrant investigation in all affected couples.. The aim of this study was to provide novel insights into the biological characteristics and related pathways of differentially expressed miRNA (DEMs) and genes (DEGs), in RPL, and construct a molecular miRNAs–mRNAs network. Methods miRNAs and gene expression data were collected, and a number of DEMs and (DEGs) were obtained, and regulatory co-expression network were constructed. Function and enrichment analyses of DEMs were conducted using DIANA-miRPath. DEGs were screened, and were used in generation of protein-protein interaction (PPI) network, using STRING online database. Modularity analysis, and pathway identification operations were used in identifying graph clusters and associated pathways. DEGs were also used for further gene ontology (GO) analysis, followed by analysis of KEGG pathway. Results A total of 34 DEMs were identified, and were found to be highly enriched in TGF-β signaling pathway, Fatty acid metabolism and TNF signaling pathway. Hub miRNAs were selected and were found to be involved in several functional pathways including progesterone-mediated oocyte maturation and Thyroid hormone signaling pathway. Five dysregulated feedback loops involving miRNA and TFs were identified and characterized. Most notably, PPI network analysis identified hub-bottleneck protein panel. These appear to offer potential candidate biomarker pattern for RPL diagnosis and treatment. Conclusions The present study provides novel insights into the molecular mechanisms underlying RPL.

endocrine, immunologic, infectious, and genetic factors [3]. Whereas increased RPL susceptibility was linked with carriage of select gene variants, few of these at-risk variants was confirmed to contribute to RPL pathogenesis.
MicroRNAs (miRNA) are short (22-23 nucleotides) non-coding RNA, which regulate post-transcriptional activities [4], reportedly regulating~60% of all proteincoding human genes, and are involved in diverse physiological processes and pathological states [5]. Their activity is attributed to their gene silencing capacity, by which miRNA binds > 100 target mRNA sites with partial base complementarity, thus preventing de novo translation, and/or accelerating mRNA degradation. Collectively, this constitutes an important regulatory feature of the transcriptome [6]. miRNAs are co-expressed with their host genes [7], and thus influence downstream signaling events [8]. Dysregulated miRNA expression was associated with physiological and pathological processes, including cellular differentiation, angiogenesis, apoptosis, and embryogenesis [9,10].
There is growing interest in the contribution of geneenvironmental interaction in RPL, and miRNA were proposed to constitute RPL at-risk factors [11,12]. Given their diversity in regulating gene expression, the exact scope of action and influence of co-regulators and associated factors on miRNA activity remained poorly understood. Network modelling was proposed as key for deciphering miRNA activity. While this was described for number of pathological processes, miRNA network analysis in RPL pathogenesis was not previously reported.
The aims of the current study were to integrate the independent and different datasets in the analysis, thus overcoming the limitations caused by sample size and bias of statistical method. By summarizing large-scale gene expression data into limited modules, the study also aims to simplify data complexity, thus clarifying systematically the pathogenic mechanisms underlying RPL. Using direct association rationale, we provide for organization or connection information within module miRNA-genes. This may underscore the contribution of unannotated miRNA and genes to RPL pathogenesis.

Data collection and preprocessing
In the present study, the differentially expressed genes (DEGs) included genetically mutated genes, abnormally expressed protein genes, copy number variants, genes that alter DNA methylation, single nucleotide polymorphisms (SNPs), downregulated and upregulated genes. DEGs in the RPL were extracted from multiple databases (Additional file 1: Table S1). These databases include Online Mendelian Inheritance in Man (OMIM: catalog of human genes and genetic disorders that is freely available and updated daily) [13], GeneCards [14], Orphanet [15], Genetic Association Database (GAD, database of genetic association data from complex diseases and disorders) [16], and HuGE Navigator (provides a diseasecentered view of genetic association studies and information about genes explored in relation to a unique disease) [17]. We conducted a systematic review aimed at identifying genes with differential expression in RPL patients and tissues, using a combination of the key terms recurrent pregnancy loss, habitual abortion, idiopathic RPL, spontaneous recurrent miscarriage, associated polymorphisms, gene mutations, and gene expression profiling.
A list of genes associated with RPL included genes associated with cell adhesion (trophoblast/endometrium interaction), dysregulated immunity, coagulation and angiogenesis, and cell survival were also collected from literature search (Additional file 1: Table S1). P-match tool which is closely interconnected with the TRAN SFAC® database, was utilized in identifying DNA transcription factor binding sites (TFBS) [18]. We focused only on TFs obtained using TransmiR. Prior to P-match, 1000 nt promoter sequences of differentially expressed miRNAs were downloaded from Ensembl [19], using Regulatory Sequence Analysis Tools (RSAT) [20]. Pmatch matrix library comprises known TFBSs extracted from TRANSFAC [21], which allows searching of distinct TFBS. The "high quality vertebrate matrices only" option was selected as default, to reduce false-positive validation using P-match.

Differentially expressed miRNAs in RPL
Differentially expressed miRNA (DEM) linked with RPL were extracted from miR2Disease [22], Pheno-miR [23], and the Human microRNA Disease Database (HMDD) [24], using the following keywords: recurrent pregnancy loss, idiopathic RPL, spontaneous recurrent miscarriage AND microRNA (Additional file 1: Table S1). Additionally, we performed a secondary research based on systematic literature review of articles published between 2010 and 2017, and re-analyzed experimentally validated human miRNAs expression signatures in RPL from various biological sources (peripheral blood, and cell lines e.g., extravillous cytotrophoblast-derived cell line HTR8/SVneo (HTR8)) and information on aberrantly regulated miRNAs in patients with RPL as compared to healthy individuals. Our total search revealed 72 DEMs in RPL, as compared to healthy controls. Conflicting results have been found regarding expression levels of miRNAs. In order to increase the accuracy of our findings, only 37 of the most frequent DEM were collected (Table 1).

Experimentally validated miRNA associations
Experimentally validated human miRNAs and miRNAtarget interaction (MTI) datasets were extracted from miRtarbase and miRecords [25], and were validated by reporter assays, Western blotting, or microarrays with miRNA over-expression or knockdown [26]. Individual TFs were mapped to human TF list in ChIPBase, to distinguish among target genes [27]. Experimentally-confirmed TF-miRNAs regulations were extracted from TransmiR for assessing TF-miRNAs interplay [28]. This included information about miRNAs function, disease associations, and type of regulation (activation/repression). This association mapping between miRNAs and their host genes was done using NCBI, and miRBase, the latter containing information on 38,589 mRNA by March 2018 [29].

Construction of differentially expressed and transcriptional networks
A global network was constructed using CyTargetlinker [30], following extraction of regulatory associations between TFs, miRNAs, target genes and host genes. BridgeDb identifier mapping framework [31] was employed in standardizing miRNAs and genes designation, while Cytoscape was used in constructing differentially expressed networks (Additional file 1: Table S1). Lines between differentially-expressed nodal factors signify the interaction between different TFs, genes, and miRNAs in RPL; single nodes lacking regulatory miRNA association wereexcluded.

Protein-protein interaction (PPI) network
Protein-protein interaction (PPI) network, which identifies key elements in RPL based on interaction level, was acquired from STRING (Search Tool for Retrieval of Interacting Genes) database. DEG were mapped to STRING to evaluate PPI information, and visualized with Cytoscape. In addition, clusters of network highly intra-connected nodes were searched by MCODE (Molecular Complex Detection) Cytoscape network plug-in (Additional file 1: Table S1) [32].

Functional enrichment analysis
Gene ontology (GO) functional enrichment analysis was done using BiNGO (Biological Networks Gene Ontology) [33]. P values were correct for false discovery rate (FDR) using Benjamini-Hochberg multiple testing correction. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of differentially expressed miRNA was performed by DNA Intelligent Analysis (DIANA)-miRPath v2.0 (Additional file 1: Table  S1). Following DEMs inclusion miRNAs targets predicted with DIANA-microT-CDS and/or experimentally validated transcripts from TarBase v6.0 were selected; Pvalue for each pathway was obtained by Fisher's method [34]. The combinatorial effect of co-expressed miRNAs was evaluated by simultaneous analysis of multiple miR-NAs; default settings were score cutoff of 0.8 for target prediction of 350 mRNA targets per miRNA, FDR to correct multiple hypothesis testing, and P-value of 0.05.

Results
The occurrence and development of RPL are a complex process that involves genetic and epigenetic disorders. In the current study, a system-based networks' approach was used to examine the key regulatory associations between miRNAs, their target genes, their host genes, and TFs in RPL process. Experimentally validated data from databases and literature was collected and networks were constructed. Due to their complexity, data associated with this network does not provide for a clear description of the findings. Accordingly, we focused on analysis of differentially expressed network, RPL transcriptional network, and PPI network.

Differentially expressed miRNA and gene network of RPL
A regulatory network for up-and down-regulated miR-NAs and genes, was constructed for interpreting their differential expression in RPL at the transcriptional level ( Fig. 1). This network, comprising 133 nodes/206 edges, consisted of 44 TFs, 36 target genes, 37 miRNAs, and associated host genes (Additional file 2: Table S2). Except for host genes, all other nodes were differentially expressed. The network nodes, directly or indirectly, affect other nodes. When combined, the interactions reveal the primary cause of RPL. The unaltered genes and miRNAs suggest that they are not related to RPL, they might not be actually protective. This suggests that RPL may be prevented by targeting these differentially expressed elements. miRNAs with high connectivity and module membership within a module were defined as "hub miRNAs", and were assumed to be functionally relevant. The top-10 hub miRNAs with the highest connectivity and module membership, were (in order): hsa-miR-21-5p, hsa-miR-155-5p, hsa-miR-16-5p, hsa-miR-17-5p, hsa-miR-146a-5p, hsa-miR-92a-3p, hsa-miR-145-5p, hsa-miR-19b-3p, hsa-miR-221-3p and hsa-miR-101-3p (Additional file 2: Table S2). These were selected based on the module sizes as hub miRNAs for further study.
As individual miRNA regulates gene expression, and multiple miRNAs modulate specific pathways, we explored the pathways regulated by hub miRNAs in RPL. Following inclusion of upregulated miRNAs, DIANA-miRPath identified 60 pathways as significantly enriched (P < 0.05). For simplicity and specificity, we focused only on pathways linked to RPL, the top 30 of which are shown in Table 2 (Additional file 2: Table S2). The most significantly enriched pathways regulated by hub miR-NAs were TGF-β signaling, cell cycle, adherens junction, fatty acid elongation, progesterone-mediated oocyte maturation, thyroid hormone signaling, and MAPK signaling pathways, which in turn target NF-κB1, MYC and E2F1 hub genes. Interestingly, the estrogen signaling pathway, which modulates reproductive functions, including progesterone production, utero placental blood flow, female secondary sexual characteristics, and maintenance of pregnancy, is also regulated by the selected hub miRNAs. Alteration in this pathway represents a cause of implantation failure and recurrent miscarriage. Significant enrichment was seen in 15 pathways influenced by hsa-miR-21-5p, hsa-miR-17-5p, and hsa-miR-19b-3p (Table 2).

Proposed transcriptional network of predicted TFs
We predicted that DEM may contribute to targeting TF, since differentially expressed genes affect nucleic acid binding, and thus TF protein-binding activities. This allows assessment of the regulatory relationship between DEM and key TF genes. Analysis with predicted (Pmatch) and validated (transmiR) databases provided for construction of complex network, in which motifs between DEM and popular TFs were the building blocks (Fig. 2). A total of 44 TFs co-regulate 37 DEMs. Most proteins in this network are TFs that affect diverse cellular processes, such as angiogenesis, DNA damage response, development, morphogenesis, differentiation, and survival. Deregulated feedback in STAT3, NF-κB1, ESR1, MYC and E2F1 5 loops were also identified in this network.
Specific TF can cooperate with other TF in regulating the same miRNA, hence influencing the activity of the same target genes, exemplified by the regulation of hsa-miR-17-5p by E2F1 and NFKB1, subsequently targets MMP2, VEGFA, IL-8, and E2F1. These findings may be explained by temporary, but spatial interactions among mRNAs and miRNAs. TF, via miRNA, may indirectly influence the activity of other genes. This includes REL which regulates hsa-miR-21-5p, which targets STAT3, E2F1, NFKB1, MMP-2 and VEGFA. The TF-DEM network demonstrates that miRNA target TF sets, which physically interact with each other to silence the functional unit. The existence of a direct negative feedback loop between miRNA and TF highlights the limitation of functional units, as illustrated for NF-KB1, the limiting factor in the transcriptional regulation protein complex. It is noteworthy that miRNA and TF have similar regulatory nature, such as pleiotropy, regulation, and network motifs, which yield specific interactions that are propagated throughout the network. While not abnormally expressed, a number of the network host genes and associated miRNAs appear to be involved in RPL. Accordingly, a single miRNA is depicted to be localized within multiple host genes, exemplified by the presence of hsa-miR-101-3p in RCL1 and within LOC100129106, and the localization of hsa-miR-149-5p in GPC1 and PP14571. This provides for exploring more in depth the possible relationships between host genes and associated miRNA, especially when their miRNAs are differentially expressed.  Analysis of PPI network of the significantly RPLassociated proteins PPI networks of the significantly RPL-associated proteins, identified 77 nodes and 763 edges (Fig. 3). In addition, the power law of node degree distribution and centrality distribution were also performed (Fig. 4). Insofar as PPI networks are scale-free networks with power-law degree distribution, most nodes (proteins) have few connections to other node, whereas other nodes (specifically hubs) are connected to other nodes in the network. Common proteins (13 proteins), selected in hub and bottleneck states, and represented as hub-bottleneck proteins (Fig. 5), included EGFR, IL8, TP53, VEGFA, STAT3, TGFB1, CCND1, CTNNB1, CASP3, IL1B, MYC, ESR1 and NF-κB1. Protein complexes (clusters) were identified by MCODE algorithm, and the global network was partitioned into three clusters, evaluated according to functionality (Table 3); function annotation performed using   (Table 4). Cluster 1 focused on regulation of cell proliferation, positive regulation of biological and developmental processes. Cluster 2 proteins were related to negative regulation of striated muscle tissue and muscle organ development. On the other hand, Cluster 3 members dealt with positive regulation of transcription from RNA polymerase II promoter, and DNA-dependent and positive regulation of RNA metabolic process.

Discussion
Bioinformatics analysis was utilized in identifying key miRNAs and genes to RPL. While earlier studies focused mostly on specific genes or metabolic pathways, our study included plethora of data generated to elucidate the role of interacting elements, and to understand their functional contribution in RPL. Our results identified select miRNA and TFs, and mapped their interaction and regulation for experimental validation in RPL. This study is the first to use bioinformatics in identifying and characterizing miRNA and gene expression profile in RPL. It is also the first to analyze the interaction among gene products, and to address the potential functionality of these genes in RPL.
The use of "omics" for identifying genes and effector mechanisms in RPL has challenges, mainly selection of study subjects (female patients, couple with miscarriages, fetus/placenta, and controls). The rely on genetic association studies focusing on candidate genes with pathological effect in RPL has limitations. While polymorphic variants of∼100 genes were investigated for possible association with RPL, results obtained were often inconclusive, and their diagnostic and prognostic utility remains questionable. Future directions in investigating biomolecular risk factors for RPL needs to integrate high-throughput profiling of genomes, transcriptomes, proteomes, metabolomes, and interactomes.
Upregulated hsa-miR-16-5p targeted 31 genes in TGF-β signaling, and was involved in > 5 pathways, including adherens junction pathway. Differently expressed miRNAs affecting Wnt signaling, cell cycle, and adhesion molecules, were linked with defective embryo implantation in RPL [37]. Other pathways included MAPK pathway, which contributes to maintenance of normal pregnancy, and alteration in this pathway precipitates RPL [28]. MAPK pathway was influenced by hsa-miR-16-5p and p53 pathways, and modulated by hsa-miR-16-5p and hsa-miR-155-5p. In addition, altered p53 signaling pathway, involved in apoptosis and cell cycle regulation [38], results in enhanced placental apoptosis, and RPL through upregulation of hsa-miR-155-5p [39]. As overexpression of hsa-miR-155-5p enhances, while its antagonism impairs NK cell-mediated cytotoxic activity, this suggesting targeting hsa-miR-155 in managing RPL [40]. Similarly, 15 pathways affected by downregulated miRNAs were noted, including thyroid hormone signaling and prolactin signaling pathways. Altered thyroid hormone level can lead to abnormal sexual development, menstruation irregularity, and likely RPL, while hyperprolactinemia affects hypothalamicpituitary-ovarian axis, resulting in a shorter luteal phase [41]. Regulation of transcriptional and post-transcriptional events were obtained by constructing curated miRNA-TF regulatory network, the smaller significant subnetwork modules centering on NF-κB1 gene was key. Direct TF-miRNA feedback loops identified two NF-κB1 loops involving hsa-miR-155-5p and hsa-miR-146a-5p, and a third involving NF-κB1 and hsa-miR-21-5p. NF-κB1 regulates innate and adaptive immunity, and induces inflammation by stimulating the expression of pro-inflammatory genes, especially IL-6 and IL-8 [42]. Similarly, hsa-miR-146a-5p negatively regulates the expression of IL-6 and IL-8 [43]. Moreover, NF-κB1 activity is regulated by the ubiquitin-protein ligase pellino homolog 1, the expression of which is negatively regulated by hsa-miR-21-5p and thus is inhibitory of NF-κB1 [44]. On the other hand, hsa-miR-21-5p was shown to be activated by IL-6 in a STAT3-dependent manner [45]. A strong link between miR-146a-5p and hsa-miR-21-5p, and with NF-κB1 was seen, predicting that NF-κB1 interaction with hsa-miR-146a-5p affects IL-8 expression, suggesting feedback loops involving NF-κB1 and hsa-miR-146a-5p or hsa-miR-21-5p for controlling  RPL-associated inflammation. Earlier studies documented negative feedback comprising hsa-miR-21-5p with NF-κB1, which was linked with altered NF-κB1 and IL-6 activity, and additional negative feedback loop provided by hsa-miR-146 [46]. Accordingly, differences in the strength of the two loops provides for the needed signals for RPL development. By providing target-specific post-transcriptional repression mechanisms, miRNA-TF network underscores the complexity of TF-miRNA motifs. Compared to TF, miRNA can quickly terminate, or alternatively resume protein translation by binding to, or disassociating from an already transcribed mRNA. This provides for rapid and adaptive changes in gene expression. Although they are subject to differential expression, the exact role of miRNA in RPL remains to be seen. The proposed transcriptional network could explain this involvement. For example, while NF-κB1-hsa-miR-146a-5p loops underscore miRNA effect in sustaining normal immune homeostasis, abnormal hsa-miR-146a-5p expression can drive inflammation; a hypothesis that could not have been made otherwise.
PPI network analysis allows for discrimination of key nodes, and is useful as network biomarker discovery in RPL. Based on their significance in the network, PPI network of 77 RPL-associated proteins were constructed as predictors of drug targets. Crucial nodes which interacted with and controlled the expression of other network nodes, were introduced as essential in RPL PPI network. Localized in Cluster 1, 13 key nodes were found to be related to RPL, individually or in combination, of which CTNNB1 appears the most significant. Mothers against decapentaplegic homolog 4 (SMAD4) protein was also found to be a crucial node in the PPI network, and was also found as a seed protein in Cluster 3. SMAD4, previously reported as key component in TGF-β signaling [47], are intracellular transducers of TGF-β superfamily, and PSG9/SMAD4 complex recruite cytoplasmic SMAD2/3 forming a complex, which enhanced SMAD4 nuclear retention [48]. PSG9-SMAD4 complex was shown to activate the expression of angiogenesis-related genes, including VEGFA, thus contributing to the endometrial angiogenesis affecting both conception and fetal development [49].
STAT3 is a crucial hub-bottleneck protein involved in cytokine and growth factor signaling, and is a key regulator of anti-inflammatory signaling pathway. Altered STAT activity was implicated in adverse pregnancy outcomes, and a 3′-UTR STAT3 variant contributes to RPL by precipitating a local inflammatory state [50]. Since endometrium signaling involves STAT pathway [50], defective STAT signaling due to attenuated endometrial STAT3-associated tyrosine kinase activity [51,52], or altered availability of cytokine/growth factor-driven receptor engagement was seen in RPL, suggesting a role in unexplained infertility. CTNNB1, identified as the seed node in Cluster 1, is present in 21 KEGG pathways, including Wnt signaling pathway, suggest that altered CTNNB1 expression contributed to RPL pathogenesis. With regards to apoptosis-related genes and pathways, several evidences confirmed apoptosis of trophoblast cells in early pregnancy [53]. Implantation and growth of blastocyst, regression and reconstruction of decidual tissues, remodeling of placental structure and other processes are closely related to apoptosis, were also present [53]. A balance between apoptosis and proliferation of villous and decidual cells during pregnancy and adverse pregnancy outcome, including RPL, are commonly associated with the excessive apoptotic cells.

Conclusions
Using bioinformatics, results of the present study provides for theoretical framework toward personalized RPL therapy. Hub-bottleneck proteins, had critical role and high importance in physiopathology of RPL, can serve a diagnostic, even prognostic, role in RPL pathobiology. It is tempting to speculate that several of the identified miRNAs and proteins can potentially serve as diagnostic or therapeutic agents. To this end, prioritization of these targets, and validation of the results using direct cellular analysis, and where possible examination of the impact of the blockade of their expression and activity will be needed for further examination. We favor the notion that drugs designed against these proteins can be important in controlling and managing of RPL.