DIA proteomics analysis through serum profiles reveals the significant proteins as candidate biomarkers in women with PCOS

Background The aim of this study was to apply proteomic methodology for the analysis of proteome changes in women with polycystic ovary syndrome (PCOS). Material and methods All the participators including 31 PCOS patients and 31 healthy female as controls were recruited, the clinical characteristics data was recorded at the time of recruitment, the laboratory biochemical data was detected. Then, a data-independent acquisition (DIA)-based proteomics method was performed to compare the serum protein changes between PCOS patients and controls. In addition, Western blotting was used to validate the expression of identified proteomic biomarkers. Results There were 80 proteins differentially expressed between PCOS patients and controls significantly, including 54 downregulated and 26 upregulated proteins. Gene ontology and Kyoto Encyclopedia of Genes and Genomes analysis showed that downregulated proteins were enriched in platelet degranulation, cell adhesion, cell activation, blood coagulation, hemostasis, defense response and inflammatory response terms; upregulated proteins were enriched in cofactor catabolic process, hydrogen peroxide catabolic process, antioxidant activity, cellular oxidant detoxification, cellular detoxification, antibiotic catabolic process and hydrogen peroxide metabolic process. Receiver operating characteristic curves analysis showed that the area under curve of Histone H4 (H4), Histone H2A (H2A), Trem-like transcript 1 protein (TLT-1) were all over than 0.9, indicated promising diagnosis values of these proteins. Western blotting results proved that the detected significant proteins, including H4, H2A, TLT-1, Peroxiredoxin-1, Band 3 anion transport protein were all differently expressed in PCOS and control groups significantly. Conclusion These proteomic biomarkers provided the potentiality to help us understand PCOS better, but future studies comparing systemic expression and exact role of these candidate biomarkers in PCOS are essential for confirmation of this hypothesis.


Introduction
Polycystic ovary syndrome (PCOS) is one of the most common endocrine and metabolic disorders, 5-15% female in reproductive age live with a PCOS diagnosis worldwide. The etiopathogenesis of PCOS is complex, genetic, environmental and lifestyle interaction contribute to the etiology of PCOS. Hyperandrogenism and insulin resistance (IR) are the major characteristics of PCOS [1]. And normally, except for the above-mentioned characteristics, the clinical features of PCOS usually including menstrual cycle irregular, ovarian abnormalities, follicular dysplasia (with multiple cystic ovarian follicles), etc. [2]. But the diagnosis of PCOS is quite difficult since the symptoms of PCOS patients are not unified. Except for the PCOS caused physical hazards in reproductive age women, PCOS also has negative effects on life quality and mental health. In the female with PCOS, the long-term risks of endometrial carcinoma, diabetes mellitus, and cardiovascular diseases also seem to be higher than the normal populations [3,4]. Population based studies show that PCOS and thereby caused infertility are associated with lower life satisfaction, poor health status and increased psychological distress, including depression, anxiety and perceived stress [5,6].
Proteomics is an emerging tool involves comprehensive study of qualitative and quantitative profiling of proteins present in tested samples [7]. It can systematically characterize a large-scale of dynamic changes in protein expression, which can provide basic information for the study of complex diseases. Thus allowing the capacity to unravel new mechanistic explanations and offering richer source of potential diagnostic biomarkers associated with complex metabolic disorders. In the past few years, substantial efforts have been used to study the pathogenesis of PCOS via proteomic approaches, but challenges still exist [8]. The identification of novel proteins in PCOS is of great interest for developing more precise diagnostic strategy and new therapeutic targets. Data independent acquisition (DIA) recommended with high reproducibility and high-throughput is a powerful technique in proteomics studies. The mass spectrometer in DIA systematically acquires MS/MS spectra without regard to whether a precursor signal is detected, which is in comparison to datadependent acquisition (DDA) for proteomics research [9].
In this experiment, DIA proteomic techniques was employed to explore a broad spectrum of functional proteins in PCOS patients and controls. The aim of this study was to use proteomic methodologies for the identification of biomarkers over-or under expressed in women with PCOS compared with the controls, and provide the potentiality to help us understand PCOS better.

Regents
UltraPure ™ Tris Hydrochloride (Tris-HCl) (Invitrogen, CA, USA), Ammonium biocarbonate (NH 4  The inclusion criteria for PCOS cases were: adolescent females, diagnosed with PCOS, had at least 2 years of menstrual history. And in order to exclude the luteal phase, progesterone (P) level was less than 3.82. PCOS was diagnosed based on the recent androgen excess and Rotterdam criteria, 2003: clinical or biochemical hyperandrogenism and ovarian dysfunction: oligomenorrhea (menstrual cycle of more than 45 days) and/or polycystic ovaries on ultrasound (ovarian volume > 10 mL in at least one ovary) [10]. Exclusion criteria: other disorders with similar presentation (hyperprolactinemia, thyroid disorders, late-onset congenital adrenal hyperplasia, androgen-secreting ovarian or adrenal tumors, Cushing syndrome, or other related disorders) were excluded [11]. Healthy controls were volunteers with age and gender matched with PCOS patients. And no evident disease was detected in them during the course of the study.
The clinical characteristics data of the enrolled participators were recorded at the time of recruitment. After fasting for 8 h, a venous blood sample from each participator was collected. The serum samples were stored at -80 ℃ for subsequent assay.

Clinical data analysis
All the clinical data were computed using SPSS18.0 version software. An unpaired, two-tailed student t test was performed on clinical biochemical data, the chisquare test was used for comparison of categorical variables. p value < 0.05 was considered to be statistically significant.

Sample processing
Samples were prepared following the manufacturer's instructions. The details were as following.

High-abundance protein removal
All 62 samples were depleted with top 14 high-abundant depletion spin columns (Thermo) was placed at room temperature, 10 μl (about 600 μg) of serum sample was added, and incubation was carried out for 30 min at room temperature. The supernatant was collected by centrifugation at 1,000g for 2 min, and the volume was about 320 μl, the solution system was 10 mM PBS, 0.15 M NaCl, 0.02% azide, pH 7.4.

LysC/trypsin digestion
320 μl of high-abundance protein-depleted sample was added to a 10 kD ultrafiltration tube (Millipore), 12,000g, centrifuged for 10 min; added 200 μl of 8 M Urea, 12,000g, centrifuged for 10 min, repeated once, and finally 50 μl of 8 M urea were added. Then the sample was added into a 96-well plate. 10 mM DTT solution was added and reacted at 37 °C for 30 min; IAM solution was added to a final concentration of 20 mM, and reacted at 37 °C for 30 min at room temperature in the dark for 30 min; DTT solution was added to a final concentration of 10 mM to quench the reaction. 1 μg of LysC was added to each sample, incubate at 37 °C for 2 h; 1 μg Trypsin was then added and incubated at 37 °C overnight; finally 10 μl of 10% TFA was added to terminate the reaction.

LC-MS DIA proteomics analysis
DIA proteomic analysis was performed using a Thermo U3000nano RSLC nanoLC (Thermo Fisher Scientific) with Orbitrap Fusion LumosTM quadrupole-linear ion trap-electrostatic field orbitrap high resolution mass spectrometry (Thermo Fisher Scientific, USA), data acquisition software XCalibur 4.3 (Thermo Fisher Scientific). The analysis was performed following the manufacturer's instructions, and the main parameters are as follows.

Data processing and analysis
Spectronaut X system was used for DIA proteomics data analysis. Briefly, 62 samples and QC data were imported into Spectronaut X, targeted extracted with laboratory built serum proteomics database, and the remaining parameters were controlled by default to control FDR < 1% of peptide and protein levels. Protein intensity was calculated by Spectronaut from the average of the top3 peptide was exported. This protein intensity was imported into Perseus and Metaboanalyst for statistical analysis. Log 2 transformation, feature filtering, missing value filling, and data normalization of raw data were done before statistical analysis. S0 0.1 and FDR p < 0.05 was used as the threshold value for differential proteins selection. PCA, cluster analysis, and correlation analysis were performed in Metaboanalyst. The up-regulated and down-regulated proteins were imported into the Protein Center (Thermo) for functional enrichment annotation of gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) with FDR p < 0.05. Receiver operating characteristic (ROC) curves analysis was performed using SPSS software.

Western blotting validation
Based on the former DIA proteomics results and the identified significant protein biomarkers, Western blot assay was further performed to validate the significance of biomarkers in PCOS. Serum samples were homogenized in RIPA buffer containing protease inhibitor and the protein concentration was measured using a BCA method. 10% sodium dodecyl sulphate-polyacrylamide gel electrophoresis (SDS-PAGE) was used to separate the proteins, then the proteins were electrotransferred onto the nitrocellulose (NC) membranes. After blocking with non-fat mike for 1 h and washed with Tris-buffered saline containing Tween 20 (TBST), the membranes were probed with the primary antibodies against Histone H4 (H4, Affinity, dilution 1:2000), Histone H2A (H2A, Affinity, dilution 1:1000), Trem-like transcript 1 protein (TLT-1, R&D systems, 0.1 µg/mL), Peroxiredoxin-1 (PRDX1, Affinity, dilution 1:2000), Band 3 anion transport protein (SLC4A1, Affinity, dilution 1:2000), Transferrin (Affinity, dilution 1:2000) at 4 °C over-night. And subsequently followed by incubated with horseradish peroxidase-conjugated secondary antibodies for 2 h at room temperature. Transferrin was detected as an internal reference protein [12]. Protein bands were visualized by enhanced chemiluminescence reagents and the protein intensity was quantified using Image-J software. Data was expressed as mean ± standard deviation, group comparisons were processed using the two-tailed Student's t-test, p < 0.05 was considered as statistical significance.

Clinical characteristics and biochemical data of the study subjects
The clinical characteristics and biochemical data of the study subjects were collected and analyzed, and the relative results were presented in Table 1. In this study, the study subjects included 31 healthy women as controls and 31 PCOS women. There are no statistically differences for the age and BMI between the two groups (p value > 0.05). For PCOS related biochemical data, the levels of fasting glucose, LH, T, TG, LDL-c and LH/FSH ratio were significantly higher in PCOS patients than those in the controls, the levels of PRL, HDL-c were significantly lower in PCOS patients than those in the controls (p value < 0.05).

Quality control of the proteomics analysis
The intensity of all identified proteins showed that the abundance of these proteins spans largely, covering 5 orders of magnitude, suggesting that the instrument is more sensitive (Fig. 1a). Coefficient of variation (CV) of the control group and PCOS group samples were 37.7% and 32.9%, respectively. Pearson correlation analysis of the protein intensity showed that the correlation index was among 0.8-1, suggested that the experimental procedures are reproducible (Fig. 1b, c). Blood coagulation is a interference factor for serum proteomics analysis, the decrease of fibrinogen indicates the coagulation event. We detected the serum levels of fibrinogen alpha chain (FGA), fibrinogen beta chain (FGB), fibrinogen gamma chain (FGG), the results showed that there are no significant decreases of FGA, FGB and FGG in all samples, suggested there is no coagulation event in our tested samples (Fig. 1d).

Principal component analysis (PCA)
There were 550 proteins were identified and filtered from the dataset. The PCA and OPLS-DA were performed based on the quantitative data of these proteins to determine the principal axes on protein abundance variations in PCOS cases and controls. The PCA (Fig. 2a) showed moderate separation of the serum proteins between the PCOS and control group at t(2) axis, but with overlaps at t(1) axis. Further OPLS-DA result (Fig. 2b, c) showed complete separation of the serum proteins between the PCOS and control group. The quality parameters of the model were: R 2 Y = 0.94; Q 2 = 0.71. The OPLS-DA model was also verified by 999 times permutation test (Fig. 2d).

Proteomics in PCOS patients and controls
A total of 80 proteins were significantly differentially abundant in PCOS patients compared to the healthy controls from the criterion p value after Benjamin-Hocheberg FDR adjustment p value was less than 0.05. The detailed information of these proteins were presented in Table 2. Among these proteins, 56 were downregulated, 24 were up-regulated (Fig. 3a). Heatmap (Fig. 3b) also showed the expression intensity of these proteins, and these proteins were clustered significantly in the samples between the PCOS and control groups.

Bioinformatics analysis of the proteomic results
The GO terms of these 56 down-regulated and 24 upregulated proteins enriched were shown in Fig. 4. GO terms were divided into three categories, biological process (BP), cellular component (CC), and molecular function (MF). GO analysis of the downregulated proteins showed that the enriched BP terms include response to stimulus, biological regulation, metabolic process, cell proliferation, and especially reproduction term. The GO terms that upregulated proteins enriched were similar to those downregulated proteins. Volcano plot (Fig. 5)   detoxification, cellular detoxification, antibiotic catabolic process and hydrogen peroxide metabolic process.

Validation the expression of significant proteins
Based on the former DIA proteomics results and the identified significant biomarkers, we chose the expression of top five significant proteins (H4, H2A, TLT-1, PRDX1, SLC4A1) from the ROC curves results for further validation using Western blot analysis. As shown in Fig. 7, the expression of H4, H2A, PRDX1, SLC4A1 in PCOS group were significantly increased comparing to the control group, but the expression of TLT-1 was significantly decreased in PCOS group comparing to the control group. More importantly, tendency of these detected proteins between the PCOS and control groups were consisted with the DIA proteomics results.

Discussion
Despite the great potential of -omic sciences in elucidating biological processes and monitoring disease progression, comprehensive proteomic analysis on PCOS patients has not been investigate fully to date [8,13]. In  this study, by quantitative DIA proteomics, we aimed to investigate whether proteomics change in PCOS women serum samples compared to the healthy controls, and the results evidenced the serum proteomic profile alterations in PCOS female. As a result, there were 80 proteins significantly differentially expressed between PCOS patients and controls, including 54 downregulated and 26 upregulated proteins. GO and KEGG analysis showed that downregulated proteins were enriched in platelet degranulation, cell adhesion, cell activation, blood coagulation, hemostasis, defense response and inflammatory response terms; upregulated proteins were enriched in cofactor catabolic process, hydrogen peroxide catabolic process, antioxidant activity, cellular oxidant detoxification, cellular detoxification, antibiotic catabolic process and hydrogen peroxide metabolic process. ROC curves analysis showed that the AUC of H4, H2A, TLT-1 were all over than 0.9, indicated promising diagnosis values of these proteins.
Serum proteomics results showed that there were 80 proteins differentially expressed between PCOS patients and controls significantly. Further GO and KEGG enrichment analysis indicated that the downregulated proteins were enriched in platelet degranulation, blood coagulation and inflammatory response. Reproductive disorders, such as PCOS, are often accompanied by platelet dysfunctions and thereby induced inflammation [14]. In patients with PCOS, coagulation and fibrinolysis parameters are usually evaluated. Platelet involves in granulosa cell corpus luteum formation and angiogenesis in ovary, and abnormal follicular development in PCOS is partly contributed by ovarian angiogenesis dysregulation. Aye et al. examined the effect of hypertriglyceridemia on IR and platelet function in young women with PCOS, and reported that acute hypertriglyceridemia induced IR, increased platelet activation both in control and PCOS groups [15]. PCOS patients also have risk to induce prothrombotic state, and platelet dysfunction might responsible for it. This study showed that changed proteins were enriched in platelet degranulation and blood coagulation in PCOS group, which indicated that these protein changes might contributed to the platelet dysfunction, and also associated with IR and inflammation. GO results also showed that these changed proteins were also enriched in reproduction biological process, which indicated that the proteomics changes in PCOS cases were directly associated with the PCOS pathogenesis and induced infertility. In addition, PCOS is also a low-grade chronic inflammation disease, continuous releasing of inflammatory mediators could perpetuate the inflammatory condition in women with PCOS [16,17]. Proteomics changes can also impact on inflammatory response and reveal the presence of inflammation in disease [18].
For the identified proteins that enriched in inflammatory response in this analysis, some proteins were related to blood coagulation, including thrombin (also called coagulation factor II), platelet factor 4, thrombospondin 1. Some publications also reported the associations of them with inflammation of PCOS [19][20][21]. Platelets are inflammatory anuclear cells, blood coagulation is an intrinsic pathway for proinflammation, blood coagulation factors also are important inflammatory mediators than just promoting or inhibiting blood coagulation [22][23][24]. The activation of coagulation factors could cause a proinflammatory response and initiate coagulation and downstream cellular signaling pathways. Our ROC curves results showed that the AUC of H4, H2A, TLT-1 were all over than 0.9, indicated a significant role of these three proteins in differentiating PCOS from controls. Among these three proteins, proteomics analysis showed that H4 and H2A was upregulated in PCOD patients with a PCOS/control ratio of 2.79 and 3.66, respectively. Histone is one of the critical components of chromatin, the amino acid residues at its N-terminus can be covalently modified, thus change the chromatin conformation and induce transcription or gene silencing [25]. This modification mainly including acetylation/deacetylation, methylation/demethylation, ubiquitination/deubiquitination, phosphorylation, sumo, biotin, etc. Excepting for gene expression controlling, histone modification also participates in cell division, cell apoptosis and memory formation by recruiting protein complex and affecting downstream proteins, and also has the impact on immune system and inflammatory reaction [26]. Histone subunits include H2A, H2B, H3 and H4, the histone modifications aforementioned could all be occurred and thus exhibit multiple functions. But previous studies about H4 or H2A in PCOS is rare. Monteiro's study reported that in endometriosis patients, lesions had significantly lower levels of Histone H3K9ac and Histone H4K16 acetylation compared to eutopic endometrium from controls, and comparing to the control endometrium, the hypoacetylation of Histone H3/ H4 within promoter regions of candidate genes known to be downregulated in endometriosis lesions, while the stereoidogenic factor 1 promoter region was enriched for acetylated H3 and H4 in lesions versus control tissues, correlating with its reported high expression in lesions [27]. Neonatal exposure to diethylstilbestrol (DES) can cause permanent alterations in female reproductive tract gene expression, infertility, and uterine cancer in mice, after DES treatment, three histone modifications associated with active transcription, including Histone H4 lysine 5 acetylation (H4K5ac), which were found to enriched at specific lactoferrin (Ltf ) promoter regions in uterine [28]. This suggested that the alteration expression of multiple chromatin-modifying proteins and epigenetic marks might lead to altered reproductive function and increased cancer risk.
TLT-1 is another protein with AUC over than 0.9 in ROC analysis. But different from the aforementioned two proteins, our proteomics analysis showed that TLT-1 was downregulated in PCOD patients with a PCOS/control ratio of 0.452. TLT-1 belongs to a kind of triggering receptors expressed on myeloid cells that play important roles in innate and adaptive immune responses, platelet aggregation, inflammation and insulted bleeding [29]. It mediates blood coagulation via binding fibrinogen. Based on the critical role in immune, inflammation and platelets, previous studies of TLT-1 are mainly focused on these associated diseases. In patients with systemic lupus erythematosus, the soluble TLT-1 levels were significantly lower than healthy individuals [30]. In a model of acute lung injury, results showed that infusion of sTLT-1 restored normal fibrinogen deposition and alleviated pulmonary hemorrhage by 40% and tissue damage by 25% [31]. In thrombocytopenia and platelet function defect, the content of TLT-1 was reduced, recombinant soluble TLT-1 could potentiate fibrinogen binding to patient platelets, and TLT-1 was found to be positively regulated by RUNX1 [32]. Derive et al. reported that TLT-1 is a potent endogenous regulator of sepsis-associated inflammation via suppressing leukocyte activation and modulating platelet-neutrophil crosstalk [33]. In this study, the expression of TLT-1 was also downregulated, and AUC of ROC curves analysis was 0.912. In this cases, we suspected that the changed expression of TLT-1 in PCOS might involved in the mediation of chronic inflammation and blood coagulation, thus impacting on the pathogenesis of PCOS.
In conclusion, this study enrolled 31 PCOS patients and 31 matched healthy control participators, by quantitative DIA proteomics analysis, we provided evidence of serum proteomic profile alterations in female with PCOS. 80 proteins were significantly differentially expressed between PCOS patients and controls, including 54 downregulated and 26 upregulated proteins. GO and KEGG analysis showed that downregulated proteins were enriched in platelet degranulation, cell adhesion, cell activation, blood coagulation, hemostasis, defense response and inflammatory response terms; upregulated proteins were enriched in cofactor catabolic process, hydrogen peroxide catabolic process, antioxidant activity, cellular oxidant detoxification, cellular detoxification, antibiotic catabolic process and hydrogen peroxide metabolic process. ROC curves analysis showed that the AUC of H4, H2A, TLT-1 were all over than 0.9, indicated promising diagnosis values of these proteins in PCOS. Western  blotting also validated that H4, H2A, TLT-1, PRDX1, SLC4A1 were differently expressed in PCOS and control groups significantly. Future studies comparing systemic expression and exact role of these candidate biomarkers in PCOS are essential for confirmation of this hypothesis.