Comprehensive analysis of SPAG1 expression as a prognostic and predictive biomarker in acute myeloid leukemia by integrative bioinformatics and clinical validation

Background Recently, an increasing number of studies have reported that sperm-associated antigen (SPAG) proteins play crucial roles in solid tumorigenesis, and may serve as potentially helpful biomarkers for cancer diagnosis and prognosis. However, very few studies systematically investigated the expression of SPAG family members and their clinical significance in acute myeloid leukemia (AML). Methods The expression of SPAGs and their prognostic significance in AML were determined by a systematic analysis on data gathered from public databases, and the results were validated in clinical samples. Results Using public data, we identified only increased SPAG1 expression negatively associated with survival in AML by Cox regression (P < 0.001) and Kaplan–Meier analysis (P < 0.001). The prognostic value of SPAG1 expression was further confirmed in other independent cohorts. Clinically, higher SPAG1 expression was significantly correlated with white blood cell counts (P = 0.014) and French–American–British (FAB) subtypes (P = 0.024). Moreover, higher SPAG1 expression was more common in + 8 patients (P = 0.034), rarely found with t(8;21) (P = 0.014), and correlated with FLT3 (P < 0.001) and DNMT3A mutations (P = 0.001). Despite these associations, multivariate analysis confirmed the independent prognostic value of SPAG1 expression in AML (P < 0.001). Notably, AML patients with higher SPAG1 expression may benefit from hematopoietic stem cell transplantation (HSCT), whereas patients with lower SPAG1 expression appeared less likely to benefit. Finally, we further validated that SPAG1 expression was significantly increased in newly diagnosed AML patients compared with normal controls (P < 0.001) and with AML patients who achieved complete remission (P < 0.001). Additionally, SPAG1 expression could act as a potentially helpful biomarker for the diagnosis and prognosis of AML (P < 0.001 and = 0.034, respectively). Conclusions Our findings demonstrated that SPAG1 overexpression may serve as an independent prognostic biomarker and may guide the choice between HSCT and chemotherapy in patients with AML. Supplementary Information The online version contains supplementary material available at 10.1186/s12920-022-01193-0.


Introduction
Acute myeloid leukemia (AML) represents a cohort of clonal hematopoietic malignances that originate from myeloid precursors and is a highly heterogeneous disease in terms of molecular, cytogenetic and clinical features [1]. Genetic and molecular abnormalities are closely associated with the leukemogenesis and prognosis of AML [2]. Although mutations in several genes, such as FLT3, CEBPA, NPM1, TP53, RUNX1 and ASXL1, have been well established to occur in AML, the current understanding of the molecular mechanisms involved in the development and progression of AML is still limited [3]. Precise risk stratification and prognosis assessment are of great significance in the selection of treatment for AML patients [4]. Therefore, the identification of a series of molecular alterations that can predict the clinical outcomes of AML patients may contribute to the development of AML-specific targeted therapies.
Evidence has proven that cancer-testis (CT) antigens may function in stemness due to their expression during germ cell and embryonic development, which promotes an important oncogenicity effect in cancer cells [5]. To date, a cluster of proteins named spermassociated antigens (SPAG), of which 15 members (SPAG1, SPAG2/UAP1, SPAG3/SPAG8, SPAG4, SPAG5, SPAG6, SPAG7, SPAG9, SPAG10/MFGE8, SPAG11B, SPAG12/NHP2L1, SPAG13/SSFA2, SPAG15/SPAM1, SPAG16 and SPAG17) are CT antigens, has been identified [6]. Over the years, ample research has reported the vital role of SPAG proteins, which may function as promising new biomarkers for diagnosis and prognosis in solid tumorigenesis, yet there is a great lack of systematic investigation of SPAG family member expression and clinical evaluation of these proteins in acute myeloid leukemia (AML) [6].
To date, our research is the first to report that SPAG1 mRNA expression, among SPAG family members, is negatively associated with survival in AML. Moreover, the prognostic value of SPAG1 overexpression in AML was further confirmed by our data. High expression of SPAG1 mRNA was intrinsically connected to specific genetic (both cellular and molecular levels) abnormalities in AML. Despite these associations, SPAG1 overexpression could also function independently as a prognostic biomarker in AML, and it may serve as a reference for consolidation therapy selection between chemotherapy and hematopoietic stem cell transplantation (HSCT).

Public datasets
The identification cohort comprised 173 AML patients with RNA-Seq V2 data for SPAG family members (SPAG1, SPAG2/UAP1, SPAG3/SPAG8, SPAG4, SPAG5, SPAG6, SPAG7, SPAG9, SPAG10/MFGE8, SPAG11B, SPAG12/NHP2L1, SPAG13/SSFA2, SPAG15/SPAM1, SPAG16 and SPAG17) from The Cancer Genome Atlas (TCGA) [7]. The treatment regimens for these patients included induction therapy and consolidation therapy. All patients received standard chemotherapy as induction therapy. Following induction chemotherapy, a total of 100 patients underwent chemotherapy only, whereas 73 patients received HSCT as consolidation treatment. In addition, the expression of SPAG1 in AML compared with controls was analyzed in GEPIA (http:// gepia. cancer-pku. cn/). Three independent cohorts from the Gene Expression Omnibus (GEO) database (GSE12417, GSE6891 and GSE37642) were used to validate the prognostic value of SPAG1 expression in AML. Moreover, the association of the SPAG1 expression level with the prognosis of SPAG1 expression level on prognosis of 78 and 162 cytogenetically normal AML (CN-AML) patients was analyzed in the GSE12417 dataset with the public platform Genom-icScape (http:// genom icsca pe. com/ micro array/ survi val. php) [8,9]. The GSE6891 dataset consisted of 461 AML patients, whereas the GSE37642 dataset comprised 562 AML patients. Kaplan-Meier analysis was performed to explore the prognostic value of SPAG1 expression in two groups with median level of SPAG1 expression as the cutoff.

Patients
The validation cohort included 131 AML patients, with 86 enrolled at diagnosis and 45 at complete remission (CR), treated at our hospital. Patients with antecedent hematological diseases or therapy-related AML were eliminated. The clinical characteristics of the cases are presented in Additional file 1: Table S1. Fifteen healthy bone marrow donors served as the controls. The age of the newly diagnosed AML patients (median 52, range 18-81) was similar to that of the AML patients at CR (controls) (median 45, range 28-66). The diagnosis and classification of AML patients followed the 2016 revised World Health Organization (WHO) and French-American-British (FAB) criteria [3,10]. The treatment Conclusions: Our findings demonstrated that SPAG1 overexpression may serve as an independent prognostic biomarker and may guide the choice between HSCT and chemotherapy in patients with AML.
Keywords: SPAG1, Expression, Prognosis, AML, Bioinformatics regimens of these AML cases were as reported [11][12][13]. The study protocol was approved by the Institutional Ethics Committee of the Affiliated People's Hospital of Jiangsu University, and all the volunteers provided written informed consent.

Sample preparation, RNA isolation and reverse transcription
Clinical bone marrow (BM) specimens were sampled from the validation cohort of AML cases and controls who were treated in our hospital. We separated BM mononuclear cells (BMMNCs) and then extracted total RNA by using Lymphocyte Separation Medium (Solarbio, Beijing, China) and TRIzol reagent (Invitrogen, Carlsbad, CA), respectively. cDNA was synthesized via RNA reverse transcription as described previously [11][12][13].

Bioinformatics analysis
All procedures referring to bioinformatics analysis were conducted as our previous reports [14,15]. To obtain the differentially expressed genes/miRNAs (DEGs), RNAsequencing (mRNA and microRNA) data analysis was performed according to the raw read counts with the R/ Bioconductor package "edgeR" based on the filter condition: |log2 fold change (FC)|> 1.5, false discovery rate (FDR) < 0.05 and P < 0.05. All analyses were controlled for FDR by the Benjamini-Hochberg procedure. Gene Set Enrichment Analysis (GSEA) software was used for analysis, and the enrichment pathway was set to be significant based on the nominal (NOM) P < 0.05 and FDR Q < 0.05.

Statistical analysis
Comparisons of continuous and categorical variables were performed using the Mann-Whitney's U/ Kruskal-Wallis test followed by Dunn's post-hoc test and Pearson's χ 2 /Fisher's exact test, respectively. Both the Kaplan-Meier method (log-rank test) and Cox regression were used to analyze the intrinsic connection between SPAG1 expression and survival time, including leukemia-free survival (LFS), event-free survival (EFS) and overall survival (OS). The receiver operating characteristic (ROC) curve and area under the ROC curve (AUC) were used to determine the discriminating ability of SPAG1 expression for AML and controls. Two-sided P values < 0.05 in all statistical analyses were considered statistically significant.

Identification of SPAG1 among SPAG family members linked to AML prognosis in public datasets
To explore the prognostic significance of the SPAG family members (SPAG1/2/3/4/5/6/7/9/10/13/16/17) in AML, we first determined the impact of each SPAG member on survival time (both OS and LFS) by Cox regression univariate analysis among AML in TCGA datasets. When analyzing the prognostic value of the SPAG family members in AML patients, each one was evaluated according to the difference between two groups of patients, divided by the median level of SPAG expression as the cutoff. As presented in Table 1 and Additional file 1: Table S2, only SPAG1 expression had a significant connection with OS and LFS in both AML (both P < 0.001) and non-M3 AML (both P < 0.001) as well as CN-AML (P = 0.005 and 0.006, respectively). Furthermore, Kaplan-Meier analysis also revealed that AML patients with higher SPAG1 expression showed significantly shorter OS and LFS than those with lower SPAG1 expression among AML (both P < 0.001), non-M3 AML (both P < 0.001), and CN-AML (both P = 0.004) patients ( Fig. 1a and b). In addition, the expression of SPAG1 was upregulated in AML patients, as analyzed by GEPIA (Fig. 1c).
Next, the prognostic value of SPAG1 expression in AML was further validated in GEO datasets including GSE12417, GSE6891 and GSE37642. For GSE12417, the online platform GenomicScape (http:// genom icsca pe. com/ micro array/ survi val. php) also confirmed the prognostic correlation of SPAG1 expression with OS in patients with CN-AML among two independent cohorts (P = 0.0035 and 0.05, respectively, Fig. 1d). For GSE6891 and GSE37642, Kaplan-Meier analysis showed that AML patients with higher SPAG1 expression had strikingly shorter EFS and/or OS times than those with lower SPAG1 expression (P = 0.025, 0.0025 and 0.045, respectively, Fig. 1e).

Clinical implications of SPAG1 expression in AML in the TCGA dataset
SPAG1 was the only remaining SPAG member linked to AML prognosis, which prompted us to analyze the associations of SPAG1 expression with the clinical/biological characteristics of AML patients. The differences between the high and low SPAG1 groups in terms of sex, age, white blood cell (WBC) counts, peripheral blood (PB)/BM blasts, FAB classifications, cytogenetics, and gene mutations are shown in Table 2. Notably, cases with higher SPAG1 expression had markedly higher WBC counts than did those with lower SPAG1 expression (P = 0.014, Table 2). Furthermore, there were marked differences between the two groups regarding the occurrence rate of each FAB classification and cytogenetics (P = 0.024, Table 2). Cases with higher SPAG1 expression were commonly classified as FAB-M4/M5 (P = 0.058 and 0.050, respectively, Table 2). Regarding cytogenetics, patients with higher SPAG1 expression more commonly exhibited + 8 (P = 0.034) and rarely t(8;21) (P = 0.014, Table 2). We further showed SPAG1 expression among groups with + 8, t(8;21) or neither (Fig. 2a). In addition, we revealed the associations of SPAG1 expression with several of the most frequent gene mutations in AML (Table 2). Higher SPAG1 expression was markedly or nearly correlated with FLT3, DNMT3A, and WT1 mutations (P < 0.001, = 0.001 and = 0.057, respectively, Table 2). Moreover, we further compared SPAG1 expression between patients carrying or not carrying these gene mutations and observed statistical significance in subgroups divided by FLT3 and DNMT3A status (P < 0.001 and = 0.015, respectively, Fig. 2b and 2c), whereas a trend was observed in subgroups divided by WT1 status (P = 0.051, Fig. 2d).

Further confirmation of the prognostic value of SPAG1 expression in AML in the TCGA dataset
Since a significant relationship was observed between SPAG1 expression and some common prognostic factors such as WBC, cytogenetics and gene mutations, we performed multivariate analysis by Cox regression to confirm the effect of SPAG1 expression on survival rate and demonstrated that SPAG1 expression acted as a positive independent risk factor affecting OS and LFS in wholecohort AML (both P < 0.001), non-M3 AML (P = 0.003 and 0.005, respectively), or CN-AML patients (P = 0.001 and 0.007, respectively) ( Table 3 and Additional file 1:  Table S3).
Mutations in FLT3, DNMT3A and WT1 are widely accepted factors that influence AML prognosis [2,3]. According to this study, since SPAG1 expression was significantly or nearly significantly correlated with FLT3, DNMT3A and WT1 mutations, we further investigated the prognostic value of SPAG1 expression in AML independent of these gene mutations. As Fig. 3 shows, both AML and CN-AML patients with higher SPAG1 expression also exhibited markedly shorter OS and LFS times than those with poor SPAG1 expression, regardless of the mutation status of FLT3 (Fig. 3a), DNMT3A (Fig. 3b), WT1 (Fig. 3c) or all the three genes (Fig. 3d).

SPAG1 expression is a prognostic indicator for AML after HSCT in the TCGA dataset and may have a guiding effect on treatment choice between chemotherapy and HSCT
HSCT is an important consolidation treatment regimen in against disease recurrence in AML. To explore whether HSCT could nullify the negative prognostic effect of higher SPAG1 expression in AML, we analyzed the effect of HSCT intervention on prognosis in both the lower and higher SPAG1 expression groups. In the SPAG1 overexpression group, HSCT for AML patients undergoing induction therapy markedly improved OS and LFS, which was not observed for those just receiving chemotherapy (both P < 0.001, Fig. 4). However, there were no obvious differences regarding OS and LFS between the HSCT and chemotherapy sets in the group of AML patients with lower SPAG1 expression (P = 0.131 and 157, respectively, Fig. 4). To sum up the results, AML patients with SPAG1 hyperexpression may profit from HSCT, which suggests that SPAG1 expression may be used to guide therapeutic selection between HSCT and chemotherapy in AML patients undergoing induction therapy.

Molecular signatures associated with SPAG1 expression in AML in the TCGA dataset
To explore the biological network in AML caused by abnormal SPAG1 expression, we first compared the transcriptomes of AML samples with lower and higher SPAG1 expression in the TCGA set. Up to 429 mRNAs and 13 miRNAs were found to be differentially expressed between two sets based on the following conditions: |log2 FC|> 1.5, FDR < 0.05 and P < 0.05 (Fig. 5a-c and Additional file 2: Table S4). Among these DEGs, 206 mRNAs and 7 miRNAs were found to be positively correlated with SPAG1 expression, whereas 223 genes and 6 miRNAs were negatively correlated with SPAG1 expression. Positively correlated genes such as MECOM were reported to have pro-leukemia effects [16] and were associated with prognosis in AML. Negatively correlated genes such as RUNX1T1 and LEP were reported to have anti-leukemia effects and were also informative for AML prognosis [13]. Moreover, GSEA revealed that SPAG1 might participate in HOXA9 dysregulation associated with AML (Fig. 5d).

Validation of SPAG1 expression and its clinical significance in AML in our research cohort
To verify the expression pattern and clinical significance of SPAG1 expression in AML, we further  Fig. 6a). Moreover, ROC analysis revealed that SPAG1 expression may serve as a quantifiable biomarker for distinguishing AML from controls, presenting an AUC of 0.857   Fig. 6b). Significantly, AML patients who did not achieve CR after 1-2 courses of induction therapy exhibited markedly higher SPAG1 expression levels at diagnosis than those who achieved CR after 1-2 courses of induction therapy (P = 0.020, Fig. 6c). According to the set point of 1.0198 determined by ROC analysis (sensitivity of 66.3% and specificity of 100%), we grouped AML patients into two sets to analyze the prognostic significance of SPAG1 expression. Kaplan-Meier analysis demonstrated a marked tendency of shorter OS time in AML patients with high SPAG1 expression than in those with low SPAG1 expression (P = 0.034, Fig. 6d).

Discussion
Recent evidence has characterized SPAG family member expression together with its functional roles in cancer development. For example, SPAG1 expression could be enrolled in the early spread and adverse outcome of pancreatic adenocarcinoma and prostate cancer [17,18]. SPAG2/UAP1 has been shown to be a promising therapeutic target for bladder cancer as well as lung adenocarcinoma [19,20]. SPAG4 could act as a potential biomarker of glioblastoma progression and prognosis, as well as in renal cell carcinoma and lung carcinoma [21][22][23]. Moreover, SPAG5 hyperexpression was connected to poor disease-free survival in breast cancer patients, and fueled breast cancer cell proliferation [24]. Interestingly, reduced expression of SPAG6, which is transcriptionally Fig. 4 The effect of HSCT on survival of AML patients among different SPAG1 expression groups. Kaplan-Meier survival curves of overall survival and disease/leukemia free survival among whole-cohort AML in both lower and higher SPAG1 expression group from TCGA datasets regulated by tumor specific DNA methylation, has been revealed in non-small-cell lung cancer [25]. A direct role for aberrant SPAG9 was identified in diverse human cancers such as Kaposi's sarcoma, gastric cancer, prostate cancer, thyroid carcinoma, liver cancer, and bladder transitional cell carcinoma [26][27][28][29][30][31][32]. Notably, accumulating studies have shown that SPAG6 expression is correlated with the pathogenesis of myelodysplastic syndrome (MDS) and Burkett lymphoma (BL) [33][34][35][36][37]. Consequently, SPAG proteins serve as a novel type of CT antigen with contributions to cancer formation and are likely to be novel targets for tumor targeted therapies. This study was the first to reveal SPAG1 expression as uniquely associated with poor prognosis in AML among all SPAG family members through both analysis of public data and validation in a research cohort. It was demonstrated that SPAG1 expression could be a promising prognostic biomarker and could be used to optimize the choice of therapy between chemotherapy and HSCT in AML. Unlike SPAG6, SPAG1 expression has rarely been studied in hematological malignances. Conversely, several reports have examined at the relationships between SPAG1 and solid tumors. Shamsara et al. demonstrated that the amplification of SPAG1 was associated with decreased survival in patients with prostate cancer [18]. Moreover, SPAG1 is an early expressed gene in pancreatic tumorigenesis and can promote the activity of cancer cells [17]. Lin et al. showed that SPAG1 expression was a crucial variable related to many clinicopathological features and to RFS in breast cancer [38]. Functionally, SPAG1 acts as an inhibitor of breast cancer cell proliferation and colony formation during breast cancer pathogenesis and development [38]. Since there was no deep insight into SPAG1 in AML, further mechanistic studies are essential for investigating the possible role of SPAG1 in leukemogenesis and AML development.
The current study also identified a significant association between SPAG1 expression and genetic (both cytogenetic and molecular) abnormalities in AML. We first found the associations of SPAG1 expression with FAB-M4/M5 disease, suggesting that SPAG1 expression may play a role in monocyte differentiation and Molecular signatures associated with SPAG1 in AML. a Expression heatmap of differentially expressed genes between AML patients with lower and higher SPAG1 expression groups among TCGA datasets (FDR < 0.05, P < 0.05 and |log2 FC|> 1.5). b Volcano plot of differentially expressed genes between AML patients with lower and higher SPAG1 expression (FDR < 0.05, P < 0.05, and |log2 FC|> 1.5). c Expression heatmap of differentially expressed microRNAs between AML patients with lower and higher SPAG1 expression (FDR < 0.05, P < 0.05, and |log2 FC|> 1.5). d GSEA analysis of SPAG1 expression associated with HOXA9 dysregulation oin AML (NOM P < 0.05 and FDR Q < 0.05) monocyte leukemogenesis. In terms of cytogenetics, SPAG1 expression was positively correlated with +8 but negatively associated with t(8;21) (q22;q22). Since the SPAG1 gene is located in 8q22.2, it is not surprising that aberrant SPAG1 expression was associated with these chromosome abnormalities. Notably, further studies are needed to determine whether the functions of these chromosomal abnormalities during leukemogenesis occur through aberrant SPAG1 expression. Regarding gene mutations, SPAG1 expression was associated with FLT3 and DNMT3A mutations in AML, but the exact relationship between SPAG1 expression and these gene mutations still unclear. Importantly, there is no evidence showing the association of FLT3 and DNMT3A mutations with the above chromosome abnormalities. Consequently, we need to obtain deeper insight into the underlying mechanism of SPAG1 expression in leukemogenesis caused by FLT3 and/or DNMT3A mutations.
In general, our discoveries suggested that SPAG1 hyperexpression may function independently as a prognostic biomarker and assist treatment selection between HSCT and chemotherapy in AML.

Fig. 6
Validation of SPAG1 expression and its clinical significance in AML. a SPAG1 expression in 15 controls, 86 AML patients at diagnosis time, and 45 AML patients who achieved complete remission. b ROC curve analysis of SPAG1 expression in distinguishing AML from controls. c SPAG1 expression at diagnosis time in AML patients who did and did not achieve CR after 1-2 course induction therapy. d Kaplan-Meier survival curves of overall survival regarding SPAG1 expression in whole-cohort AML from our hospital