Prognostic signature for hepatocellular carcinoma based on 4 pyroptosis-related genes

Hepatocellular carcinoma (HCC) is a cancer with a poor prognosis. Many recent studies have suggested that pyroptosis is important in tumour progression. However, the role of pyroptosis-related genes (PRGs) in HCC remains unclear. We identified differentially expressed PRGs in tumours versus normal tissues. Through univariate, LASSO, and multivariate Cox regression analyses, a prognostic PRG signature was established. The signature effectiveness was evaluated by time-dependent receiver operating characteristic (t-ROC) curve and Kaplan–Meier (KM) survival analysis. The signature was validated in the ICGC (LIRI-JP) cohort. In addition, single-sample gene enrichment analysis (ssGSEA) showed the infiltration of major immune cell types and the activity of common immune pathways in different subgroups. Twenty-nine pyroptosis-related DEGs from The Cancer Genome Atlas-Liver Hepatocellular Carcinoma (TCGA-LIHC) dataset were detected, and four genes (CTSV, CXCL8, MKI67 and PRF1) among them were selected to construct a prognostic signature. Then, the patients were divided into high- and low-risk groups. The pyroptosis-related signature was significantly associated with overall survival (OS). In addition, the patients in the high-risk group had lower levels of immune infiltration. The prognostic signature for HCC based on 4 pyroptosis-related genes has reliable prognostic and predictive value for HCC patients.

classifier to predict HCC outcomes is important. Such a classifier could have important reference value for predicting the survival and treatment outcomes of patients with liver cancer.
Pyroptosis is a form of regulated cell death (RCD), and the Nomenclature Committee on Cell Death recommended that it be recognized as a type of RCD that critically depends on the formation of plasma membrane pores by members of the gasdermin (GSDM) protein family, often (but not always) as a consequence of inflammatory caspase activation [4]. GSDM pores formed in the plasma membrane eventually lead to cell lysis, and a large number of damage-related molecular patterns (DAMPs), such as ATP, IL-1β, and HMGB1, are released at different times and stages [5], resulting in inflammation. Studies have shown that certain inflammatory factors, such as IL-1β, play a role in promoting tumour invasion. Solid tumours that have been shown to upregulate IL-1β include breast, colon, lung, and head and neck cancers and melanoma, and patients with these IL-1βupregulated tumours generally show a poor prognosis [6]. In many tumours, IL-1β may increase tumour cell metastasis by promoting angiogenesis [7,8]. However, IL-1β has also been shown to inhibit cancer. Haabeth et al. demonstrated that IL-1β can drive tumour-specific Th1 response activation to inhibit the progression of B-cell myeloma and lymphoma [9,10]. However, there are few studies on pyroptosis in HCC.
Here, we conducted a systematic study of pyroptosisrelated genes (PRGs) in HCC to determine differences in the expression of these genes between tumour patients and nontumour patients, to assess the correlation between their expression and patient prognosis and to study the effects of these PRGs on the tumour-immune microenvironment.

Data collection
The mRNA expression profiles (mRNA-seq) and corresponding clinical data of patients were obtained from The Cancer Genome Atlas (TCGA) database. We downloaded TCGA-liver hepatocellular cancer (LIHC) sample information after log2 (FPKM + 1) processing from the Xena platform (https:// xena. ucsc. edu/) [11]. Our inclusion criteria for patients were as follows: histologically diagnosed with hepatocellular carcinoma [12]; available expression profiles; patients with survival information. Consequently, we extracted 267 tumour samples from the TCGA dataset for subsequent analysis (Table 1). Under the same criteria, 240 primary tumour solid tissue samples (Table 2) were included as the external validation dataset from the International Cancer Genome Consortium (ICGC), and 184 PRGs were retrieved from the GeneCards database (25th August 2021. https:// www. genec ards. org/) with the keyword "pyroptosis". The above three databases are publicly available and accessible.

Consensus clustering (CC) analysis
Unsupervised class discovery is a highly useful technique in cancer research, where intrinsic groups sharing biological characteristics may exist but are unknown. The CC method provides quantitative and visual stability evidence for estimating the number of unsupervised classes in a dataset [13]. We used the "ConsensusClusterPlus" R package (version 1.52.0) to duplicate tumour samples in TCGA 1000 times to find the optimal classification  cluster of tumour samples, and approximately 80% of the samples were selected in each iteration.

Search for differentially expressed genes (DEGs) in the TCGA-LIHC dataset
We used the empirical Bayesian approach of the "limma" R package (version 3.44.3) to obtain DEGs [14]. DEGs with |log2-fold change (FC)|≥ 1 and adj P-value (FDR) < 0.05 were used for subsequent analysis.

Construction of the prognostic PRG signature
Univariate Cox regression, least absolute shrinkage and selection operator (LASSO)-Cox regression and multivariate Cox regression analyses were used to identify the genes associated with pyroptosis and establish a prognostic PRG signature for HCC. In the univariate Cox regression analysis, P < 0.05 was considered to indicate statistical significance, indicating that the gene is associated with overall survival (OS). The LASSO Cox regression model was then utilized to narrow down the candidate genes by using tenfold cross validation via the "glmnet" R package [15]. Then, we carried out multivariate Cox regression analysis and established the risk scoring equation according to the standardized regression coefficients of the factors.
To evaluate the predictive ability of the prognostic PRG signature, we used the median risk score to divide the sample into high-and low-risk groups. Kaplan-Meier (KM) survival curves were drawn with the "survival" R package (version 3.2-13), and the P value was calculated. In addition, time-dependent receiver operating characteristic (t-ROC) curve analyses were performed to determine how accurately the signature predicted patient survival at six months, one year, two years, and three years [16]. The risk curve was also used to verify the relationship between the risk score and patient survival status. Principal component analysis (PCA) integrated the risk score and patient survival information through dimensionality reduction and presented relevant information in a low-dimensional manner, which could verify the signature's ability to distinguish between high-and low-risk groups [17]. In addition, we performed a similar process with data from the ICGC database, and the prognostic PRG signature was further validated in this independent cohort.

Validation of prognostic genes in an external database
The mRNA expression levels of four marker genes were verified with the Tumour Immune Estimation Resource (TIMER) database (https:// cistr ome. shiny apps. io/ timer/). In addition, we used the cBioPortal database Risk Score = �i(Coefi · Expi) (https:// www. cbiop ortal. org/) to investigate the genetic alterations of the related genes. The above two databases are publicly available and accessible.

Construction of the nomogram and evaluation of its predictive value
The expression levels of the four marker genes in the risk scoring equation were included in the nomogram to assess the 1-year, 2-year and 3-year survival rates of the patients, and calibration curves were drawn to assess the prognostic performance of the nomogram.

Validation of the independence of the risk scores from other clinical variables
The clinical information, including sex, age, tumour grade, tumour stage and risk scores, of patients providing the samples was included in the univariate Cox regression analysis, and bilateral P values < 0.05 were considered to be statistically significant. The clinical features were also incorporated into the multivariate Cox regression analysis to determine whether the prognostic model prediction ability was independent of the conventional clinical features. Hazard ratios (HRs) and 95% confidence intervals (CIs) were calculated.

Gene enrichment analysis
We next investigated the biological processes involving the DEGs between the high-and low-risk patient groups. DEGs with P < 0.05 and absolute value of log2 FC > 1 were selected for enrichment analysis. For the Kyoto Encyclopedia of Genes and Genomes (KEGG) [18][19][20] and gene ontology (GO) analyses, P < 0.05 and false discovery rate (q) < 1 were considered indicators of statistical significance.

Immune infiltration analysis
The "GSVA" package (version 1.32.3) was used to calculate the immune score of the samples, explore the infiltration of 16 major immune cell types and investigate the activity of 13 common immune pathways in different subgroups in the high-and low-risk groups.

Statistical analysis
Single-factor analysis of variance was applied to compare the gene expression of 184 PRGs between the normal and HCC tissues. Independent prognostic factors were identified by univariate and multivariate Cox regression. Differences in OS between high-and low-risk groups were assessed using Kaplan-Meier analysis and two-sided log-rank test. When comparing the immune cell infiltration and immune pathway activation between the two groups, the Mann-Whitney test was used. All statistical P values are two-side and P < 0.05 represents statistical significance. Besides, R software (Version 4.0.2) was employed for all statistical data analyses. The flowchart of the whole study is shown in Fig. 1.

CC analysis to explore patient differences
To effectively distinguish liver cancer patients by PRGs, we conducted CC analysis on all HCC patients in the TCGA cohort. The cluster variable K was increased from 2 to 10; when K = 2, the correlation within subgroups was the highest, while the correlation between groups was the lowest. However, the samples could not be well distinguished (Additional file 1: Fig. S1a). In addition, when KM survival curves were generated for the two subgroups, we found no significant difference between the two groups (Additional file 1: Fig. S1b).

Identification of differentially expressed PRGs between tumour and normal tissue
Overall, 184 genes related to pyroptosis were retrieved from the GeneCards database, and 29 differentially expressed genes with |Log2 FC|> 1 were screened by the "limma" package ( Fig. 2a,b). P < 0.05 was considered to indicate statistically significant DEGs. Of the DEGs, 8 PRGs were upregulated in liver cancer tissues, and 21 PRGs were downregulated in tumour tissues.
To further explore the association between these PRGs, protein-protein interaction (PPI) analysis was performed, setting the minimum required interaction score at 0.4 (medium confidence). Cytoscape software (Version 3.7.2) was employed for visualization (Fig. 2c), and we found that IL1B interacts with many genes at the protein level. In addition, we mapped the correlation network containing these 29 genes and explored their relationships at the gene level (Fig. 2d).

Construction and development of a prognostic signature for HCC
Through univariate Cox regression analysis, we found 10 DEGs related to the prognosis of HCC among 29 PRGs (Fig. 3a). High expression of ANXA2, CTSV, CXCL8, IL13RA2, MKI67, TLR2 and TREM2 may promote survival, while high expression of CD14, MST1 and NLRP6 may adversely affect survival. LASSO regression analysis further narrowed ten genes down to seven (ANXA2, CTSV, CXCL8, MKI67, NLRP6, NLRP6, and PRF1) (Fig. 3b, c). Finally, we obtained 4 key prognosis-related PRGs-CTSV, CXCL8, MKI67 and PRF1-by stepwise multivariate regression analysis (Fig. 3d, e). The prognostic PRG signature was constructed as follows: risk score = (0.249* CTSV expression) + (0.137* CXCL8 expression) + (0.264* MKI67 expression) + (− 0.224* PRF1 expression). CTSV, CXCL8 and MKI67 were highrisk factors (HR > 1), and PRF1 was a protective factor (HR > 1). After calculating the risk scores of all samples, the median was used to divide the samples into two subgroups (high and low risk), with a cut-off value of 0.384. Patients in the high-and low-risk groups were well Fig. 1 The flowchart of the whole study. Flowchart for establishing and verifying prognostic signature distinguished by PCA (Fig. 4a), and the high-risk score group had a higher mortality rate and shorter survival time (Fig. 4b, c). By plotting t-ROC curves, we found that risk scores also had certain significance in differentiating patients' half-year, 1-year, 2-year, and 3-year survival (Fig. 4d). The area under the t-ROC curve (AUC) was 0.691 at 6 months, 0.725 at 1 year, 0.694 at 2 years and 0.681 at 3 years. In addition, multivariate Cox regression proved that our prognostic signature was an independent prognostic factor for HCC (Fig. 4e).

External database validation of the PRG-related prognostic signature in HCC
TIMER database analysis confirmed that CTSV and MKI67 were highly expressed in tumour tissues, while CXCL8 and PRF1 were expressed at low levels in tumour tissues (Fig. 5a), which was consistent with our findings. We investigated the genetic changes in the CTSV, CXCL8, MKI67 and PRF1 genes. The mutations of these four genes in TCGA HCC samples were assessed via the cBioPortal database. MKI67 had the highest frequency of genetic alterations (3%), and missense mutation was the most common alteration (Fig. 5b).

Construction of a nomogram
Univariate and multivariate Cox regression analyses showed that the expression of CTSV, CXCL8, MKI67 and PRF1 were independent prognostic factors in liver cancer patients. On this basis, we constructed a nomogram to predict the patient's probability of survival at one, two, and three years. The expression levels of four genes (CTSV, CXCL8, MKI67 and PRF1) were included as indicators to establish the risk score (Fig. 6a). In addition, a calibration plot was drawn and indicated that our nomogram had good predictive performance (Fig. 6b-d).

Validation of the PRG-related prognostic signature for HCC
To verify the availability of our PRG-related model, we downloaded liver cancer datasets from the ICGC Multivariate Cox regression showed that the risk score was an independent prognostic factor in the TCGA database (e) database to verify the risk score equation. We used a cutoff value of 0.384 for the training set to divide the ICGC database samples into high-and low-risk subgroups. Consistent with the results from the TCGA database, patients in the high-and low-risk groups could still be distinguished by PCA (Fig. 7a). Patients with high risk scores had worse prognoses (Fig. 7b), and the OS of highrisk patients was significantly reduced compared with that of low-risk patients (P = 6.501e−03) (Fig. 7c). The AUCs of the prognostic signature at six months, one year, two years and three years were 0.673, 0.731, 0.681 and 0.705, respectively (Fig. 7d). Multivariate Cox regression proved that the prognostic signature was an independent prognostic factor for HCC (Fig. 7e). All of the above evidence indicates that PRGs are of certain clinical value in our prognostic signature of HCC.

Gene enrichment analysis
To further explore the differences in gene function and pathways between the high-and low-risk subgroups, the "limma" package was used to identify DEGs between the two subgroups of TCGA patients. The differential criteria were an absolute value of log FC > 1 and P < 0.05. In total, 45 genes related to the risk score were found. Of these, 43 genes were upregulated in the high-risk group, and 2 were downregulated. Then, KEGG and GO enrichment analysis was conducted on these genes (Fig. 8a, b), and the results showed that most of the high risk-related pathways were cell cycle, cell metabolism and immunerelated pathways.

Analysis of the immune microenvironment in patient subgroups
To explore the differences in immune infiltration and the activity of immune-related pathways in high-to lowrisk subgroups, single-sample gene enrichment analysis (ssGSEA) was used to calculate the immune score of 16 kinds of immune cells in each sample. The results showed that the high-risk group usually had a lower level of immune infiltration. In particular, B cells, CD8 + T cells, macrophages, mast cells, neutrophils, NK cells, pDCs (plasmacytoid dendritic cells), T helper cells, and TILs (tumour-infiltrating lymphocytes) (Fig. 9a) were found at lower levels in this group. The activity of immune-related pathways was also generally downregulated in the highrisk group (Fig. 9b).

Discussion
HCC is a serious health hazard worldwide. Some studies have shown that pyroptosis plays an important role in the occurrence and development of tumours [21,22], but the role of PRGs in liver cancer remains unclear. In this study, we first assessed 29 PRGs that were differentially   prognosis (b, c). The t-ROC curve verifies the accuracy of the prognostic signature (d). Multivariate Cox regression showed that the risk score was an independent prognostic factor in the ICGC database (e) analyses, and the risk score equation was verified to be clinically significant in the ICGC database. A nomogram was constructed to estimate the patient's one-, two-, and three-year survival probability. Enrichment analysis showed that the DEGs between the high-and low-risk groups were involved in immune-related pathways. We conducted ssGSEA and found that the level of immune infiltration and the activity of related pathways were decreased significantly in the high-risk group.
Among the four key PRGs obtained, cathepsin V (CTSV) promotes the proliferation and invasion of tumour cells in breast cancer [23] and colorectal cancer [24], but its role in liver cancer is unclear. CXCL8/ IL-8 is a well-known chemokine that mediates cancer cell motility, invasion and metastasis by promoting epithelial-mesenchymal transition (EMT) [25] while its role in HCC has been little explored. IL-8 has been found to increase cancer proliferation in vitro [26]. In HCC, the IL-8 gene expression can be regulated by transcriptional activation of NF-κB, activation of the ERK, p38 mitogen-activated protein kinase (MAPK) and PI3K pathway [27]. Furthermore, NF-κB activation has been proven to be related to gasdermin D which is the executor of pyroptosis [28]. In Zhao et al. 's study [29], MKI67 was confirmed to be related to the prognosis of liver cancer. Upregulation of MKI67 elevates the degree of immune infiltration of many immune cell subtypes within LIHC, including functional T cells, CD4 + T cells, and CD8 + T cells [30]. However, the molecular mechanism of its involvement in pyrotopia in hepatocellular carcinoma is still unclear. Perforin, a pore-forming protein encoded by the PRF1 gene [31] and plays a crucial role in the killer cell-mediated elimination of virally infected host cells, tumour cells [32]. Perforin plays a dominant role in the CD8 + T cellmediated lysis of HCV-replicating human hepatoma cells [33]. At the same time, in HBV-specific hepatocellular carcinoma, perforin also plays a certain cytotoxic role [34]. Our results indicate that these four key genes are independent factors affecting the prognosis of HCC, and the signature including all these genes has good predictive performance in terms of prognosis and recurrence.
There are some shortcomings in our present study. Our signature should be validated further by performing clinical trials to better evaluate the relationship between the risk-score and pyroptosis in HCC. Besides, specific molecular mechanisms of 4 PRGs in our signature needs further examination.
In conclusion, our study suggests that the prognosis of HCC is closely related to pyroptosis. We constructed a promising PRG prognostic model to predict the prognosis of HCC patients. It will be a helpful reference for clinical and treatment decision-making. At present, there are few studies on pyroptosis, especially its mechanism in HCC. The four genes in the signature may play an important role in pyroptosis in HCC. Our study preliminarily confirmed the prognostic value of these PRGs in HCC.

Conclusion
In summary, we identified pyroptosis-related DEGs between HCC and normal tissues and provided a signature to evaluate the prognosis of HCC based on pyroptosis-related genes.