Identification of key genes and miRNAs related to polycystic ovary syndrome by comprehensive analysis of microarray

Background We aimed to explore mechanisms of development and progression of polycystic ovary syndrome (PCOS). Methods The microRNA expression microarray GSE37914 and gene expression profiles GSE43264 and GSE98421 were downloaded from the Gene Expression Omnibus database. The differentially expressed miRNAs (DEmiRNAs) and genes (DEGs) were screened using Limma package. Then, the DEGs and DEmiRNAs were combined to use for the subsequent analysis, including the functional enrichment analysis, protein–protein interaction (PPI) network and module analysis, drug–gene interaction network analysis, and DEmiRNAs–DEGs interactive network construction. Results A total of 26 DEmiRNAs and 80 DEGs were screened. The PPI network contained 68 nodes and 259 interactions. A significant clustering module with 8 nodes and 25 interactions was obtained. Three PCOS-related overlapping pathways were obtained based on PPI-degree top10 and module genes, including prion diseases, Staphylococcus aureus infection, and Chagas disease (American trypanosomiasis). A total of 44 drug–gene interaction pairs were obtained, which included 2 up-regulated genes (LDLR and VCAM1), 4 down-regulated genes (C1QA, C1QB, IL6 and ACAN) and 26 small molecules drugs. A total of 52 nodes and 57 interactions were obtained in the DEmiRNA–DEGs regulatory network, LDLR was regulated by miR-152-3p, miR-1207-5p, miR-378a-5p and miR-150-5p. Conclusions Our research has identified several key genes and pathways related to PCOS. These results can improve our understanding of PCOS and provide new basis for drug target research.


Background
Polycystic ovary syndrome (PCOS) is a common endocrinological disorder disease in women, which affected 4-10% women worldwide, and increased the risk of reproductive abnormalities [1,2]. The syndrome is related to numerous morbidities, such as obstetrical complications, infertility, cardiovascular disease, and type 2 diabetes mellitus [3,4]. Many compensatory hyperinsulinemia can lead to hyperandrogenemia by stimulating ovarian androgen secretion and inhibiting the production of hepatic sex hormone binding globulin [5]. Genetic and environmental factors also play an important role in the progression of PCOS [6,7]. Adipose tissue dysfunction is one of the causes of insulin resistance in PCOS. Obesity dependent insulin resistance is associated with PCOS. The prevalence of obesity in women with PCOS is higher than that in women of the Open Access *Correspondence: liuli304@jlu.edu.cn Sun et al. BMC Medical Genomics (2022) 15:267 same age without this syndrome, but the risk of PCOS only slightly increases with obesity [8]. In addition, visceral adiposity alone could not account for differences in insulin sensitivity between women with PCOS and those without the syndrome, though the problem is controversial [9]. The potential mechanism of insulin resistance in polycystic ovary syndrome is still unclear. Insulin resistance in patients with PCOS may be due to impaired local adipose tissue storage ability resulting in dyslipidemia, which leads to improved dietary caloric intake [10,11]. The parasecretory imbalance of cytokines secreted by macrophages in patients with polycystic ovary syndrome is conducive to the occurrence of insulin resistance [12].
Understanding the underlying mechanisms of adipose tissue dysfunction and insulin resistance in PCOS may eventually find new therapeutic targets for PCOS. MicroRNAs (miRNAs) are a class of single-stranded, small noncoding RNAs of approximately 22 nucleotides in length that are involved in the processes of biological growth, development, differentiation, and disease. Most human protein-coding genes are regulated by miR-NAs, which are involved in the occurrence, development and almost all life activities of diseases. In recent years, studies have shown that miRNAs are closely related to ovarian physiology and pathology, and the relationship between miRNAs and PCOS was studied by screening the differential expression of miRNAs in ovarian tissues [13]. However, miRNAs as biomarkers for PCOS remain to be further investigated.
In this study, the differentially expressed miRNAs (DEmiRNAs) from GSE37914 and the differentially expressed genes (DEGs) from GSE43264 and GSE98421 were screened first. Subsequently, integrated bioinformatics analyses, including the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis, the network analysis of protein-protein interaction (PPI), drug-gene interaction network and miRNAs-target genes network were performed. This research may help to explore the mechanism of PCOS and screen candidate biomarkers and treatment targets.

Microarray data and data preprocessing
PCOS, which is diagnosed as a diagnosis of exclusion. The most commonly used diagnostic criteria are the Rotterdam criteria proposed in 2003, which are as follows: 1. Rare ovulation or anovulation; 2. Clinical manifestations of hyperandrogenism and hyperandrogenism; 3. Changes in polycystic ovaries, as indicated by ultrasound One or both ovaries, more than 12 follicles with a diameter of 2-9 mm, and ovarian volume greater than 10 mL; 4. Two of the three items are met, and other causes of hyperandrogenism, congenital adrenal hyperplasia and Cushing's syndrome are excluded, androgen-secreting tumors, you can diagnose PCOS. The miRNA expression microarray data set GSE37914 and gene expression profiles GSE43264 and GPL15362 were obtained from Gene Expression Omnibus (GEO). The GSE37914 contained 6 subcutaneous adipose tissue samples, 3 patients with the lean PCOS and 3 matched control women, and the platform used was GPL8786[miRNA-1] Affymetrix Multispecies miRNA-1 Array. GSE43264 contained 15 subcutaneous adipose tissue samples, 8 women with PCOS and 7 matched healthy controls, and the platform used was GPL15362 NuGO array (human) NuGO_Hs1a520180 [CDF: Hs_ENTREZG_14]. The GSE98421 GSE98421 contained 8 subcutaneous adipose tissue samples from PCOS (n = 4) and NL (n = 4), and the platform used was GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array. The three dataset were preprocessed, which included background correction, quantile normalization and probe summarization, using Oligo package (version 1.34.0) in R language [14] and limma (Version 3.10.3) package [15].

Identification of DEGs and pathway analysis
The three normalized data were calculated by limma (Version 3.10.3) package. Genes with P < 0.05 and |logFC (fold change)|> 1 were regarded as DEGs between the PCOS group and the normal group, and miRNAs with P < 0.05 and |logFC (fold change)|> 1 were considered as DEmiRNAs. The heatmap of DEGs and DEmiRNAs were carried out using pheatmap package (version 1.0.10) in R language. The DEGs of GSE43264 and GSE98421 were combined to use for the subsequent analysis.
Pathway enrichment analysis [16] of DEGs and the subsequent PPI degree top10 was performed using the ClusterProfiler package (Version 2.4.3) in R [17], and the significant enrichment pathways with the cutoff of P < 0.05 and the count > 2. In addition, the pathways correlated with PCOS and Obesity were searched using the CTD (Comparative Toxicology Database) [18], and the search pages of PCOS and Obesity were http:// ctdba se. org/ detail. go? type= disea se& acc= MESH% 3aD01 1085& view= pathw ay and http:// ctdba se. org/ detail. go? type= disea se& acc= MESH% 3aD00 9765& view= pathw ay, respectively.

Analysis of PPI network
String database (Version: 11.0) [19]was used to predict and analyze the interaction between proteins encoded by genes, and the DEGs in the PPI network was established through the String and cyberscape software. In order to obtain as many relationship pairs as possible, the parameter PPI score is set as 0.15 (representing low confidence index). By analyzing the topological properties of network nodes, the gene with higher degree value is obtained, namely hub gene. The plug-in Mcode [20] (Version 1.4.2) of cytoscape was used to analyzed the most remarkable clustering module in PPI network and the threshold score ≥ 5 was selected in this study.

Drug-gene interaction network
The Drug Gene Interaction Database (DGIdb, http:// www. dgidb. org/), is a web resources that could be applied for searching candidate drugs or genes against the known and potentially druggable genome. We used DGIDB 2.0 [21] to predict the drug-gene relationship pairs of the top ten genes from PPI network, and the network of druggene relationship pairs was visualized using Cytoscape (version 3.2.0, http:// www. cytos cape. org/).

Statistical analysis
Data were analyzed by SPSS 21.0 (SPSS Inc., Chicago, IL, USA) and expressed as mean ± standard division. The continuous variables were compared using unpaired Student's t tests, and chi-square tests were used for nominal parameters. P value < 0.05 was considered statistically significant.

Identification of DEGs and DEmiRNAs
A total of 26 DEmiRNAs in the GSE37914 was identified, including 9 up-regulated and 17 down-regulated DEmiRNAs, and the distribution of DEmiRNAs was shown in Fig. 1A. A total of 45 and 26 DEGs were obtained in the GSE43264 and GSE98421 datasets, respectively. The heatmap of DEGs was shown in Fig. 1B and C. We merged the DEGs of the two datasets GSE43264 and GSE98421, and obtained a total of 80 DEGs, including 34 up-regulated and 46 down-regulated DEGs (KRT14 is the only overlapped down-regulated gene). The 80 DEGs were used for the subsequent analysis.
KEGG pathway analysis was performed to further explore the functions of the PPI-degree top 10 hub genes and module genes respectively. Totally, 14 pathways were obtained by PPI-degree TOP10 genes and 6 by module genes ( Table 3). The search results in CTD database implied that the all pathways were correlated with Obesity, and the 8 of 20 pathways were related to PCOS, including prion diseases, chagas disease (American trypanosomiasis), staphylococcus aureus infection, AGE-RAGE signaling pathway in diabetic complications, TNF signaling pathway, measles, hepatitis C, and

Drug-gene interaction analysis
Based on the DGIdb prediction results of PPI-degree top 10 and module genes, a total of 44 drug-gene interaction pairs were obtained. In addition, 26 small molecules drugs were predicted to be implicated with 2 up-regulated genes (LDLR and VCAM1) and 4 down-regulated genes (C1QA, C1QB, IL6 and ACAN) using DGIdb (Fig. 3).

Discussion
PCOS is a common endocrine disorder in women with an increasing risk of reproductive abnormalities. In the current study, a total of 26 DEmiRNAs were identified in the GSE37914, and 80 DEGs were selected from GSE43264 and GSE98421. Prion diseases, Staphylococcus aureus infection, and Chagas disease (American trypanosomiasis) were overlapping pathways in the PPI-degree TOP10 genes and module genes, the enrichment of these pathways may be due to the immune process involved. 6 hub genes were identified by drug-gene interaction analysis, which were LDLR, VCAM1, C1QA, C1QB, IL6 and  ACAN. In the regulatory network of DEmiRNAs and DEGs, LDLR was regulated by miR-1207-5p, miR-152-3p, miR-378a-5p and miR-150-5p. PCOS is characterized by obesity, hypertension, insulin resistance, diabetes mellitus and other metabolic diseases. The serum IL-6 level and the secretion of LPS activated monocytes are increased in insulin resistant PCOS patients [23]. In addition, the increase of IL-6 level in obese women with PCOS, which is not associated with obesity, may be related to insulin resistance [24]. VCAM-1 aggravates PCOS symptoms by improving leukocyte recruitment to the ovary and sustaining local inflammation [25]. Low density lipoprotein receptor (LDLR) has been proved to play an important role in lipoprotein metabolism. The number of follicles in LDLR − / − mice was less, the follicular atresia was more, the estrogen level was lower, and the estrus time was significantly shorter than that of the control group [26]. The previously reports suggested that LDLR played a central role in uptake and clearance of LDL cholesterol. Expression levels of LDLR mRNAs were markedly expressed in the PCOS NAFLD group when contrasted with the non-PCOS NAFLD group [27]. Complement component 1q (C1q) is the starting protein of the classical complement pathway. Complement C1q A chain (C1QA) and complement C1q B chain (C1QB) encode the a chain and b chain polypeptides of serum complement subcomponent C1q, respectively. The results of Katherine's research showed that the classical complement pathway was activated in the kidneys of diabetic rats, and the C1QA and C1QB were significantly high expression [28]. C1QA and C1QB interact with a variety of small drug molecules and may be potential targets for drug development.
In addition, the network analysis of miRNAs-target genes network to explore mechanisms of development and progression of PCOS. Plasma levels of miR-152-3p were related to diabetic nephropathy [29]. The reports implied that hsa-miR-1207-5p had a important role in many diseases, such as monogenic renal disorder [30] and serous ovarian carcinoma [31]. Some studies have shown that miR-378a is an important regulator of energy and glucose homeostasis and a potential target for improving metabolic disorders [32]. The results of Kang et al. indicate that MIR-150 may be a biomarker and new therapeutic target for obesity and insulin resistance [33].