MicroRNA related prognosis biomarkers from high throughput sequencing data of kidney renal clear cell carcinoma

Kidney renal clear cell carcinoma (KIRC) is the most common type of kidney cell carcinoma which has the worst overall survival rate. Almost 30% of patients with localized cancers eventually develop to metastases despite of early surgical treatment carried out. MicroRNAs (miRNAs) play a critical role in human cancer initiation, progression, and prognosis. The aim of our study was to identify potential prognosis biomarkers to predict overall survival of KIRC. All data were downloaded from an open access database The Cancer Genome Atlas. DESeq2 package in R was used to screening the differential expression miRNAs (DEMs) and genes (DEGs). RegParallel and Survival packages in R was used to analysis their relationships with the KIRC patients. David version 6.8 and STRING version 11 were used to take the Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis. We found 2 DEGs (TIMP3 and HMGCS1) and 3 DEMs (hsa-miR-21-5p, hsa-miR-223-3p, and hsa-miR-365a-3p) could be prognosis biomarkers for the prediction of KIRC patients. The constructed prognostic model based on those 2 DEGs could effectively predict the survival status of KIRC. And the constructed prognostic model based on those 3 DEMs could effectively predict the survival status of KIRC in 3-year and 5-year. The current study provided novel insights into the miRNA related mRNA network in KIRC and those 2 DEGs biomarkers and 3 DEMs biomarkers may be independent prognostic signatures in predicting the survival of KIRC patients.


Background
Kidney Cancer is one of the most common malignancies which account for 2.2% of all new cancer cases and 1.8% of all cancer related death in 2018 globally [1]. Renal cell carcinoma (RCC) accounts for 90% of all kidney cancers [2]. RCC could be divided into three categories: kidney renal clear cell carcinoma (KIRC), kidney renal papillary cell carcinoma (KIRP) and kidney chromophobe (KICG). Among them, KIRC is the most common type which accounts for about 70-75% of RCC [2]. Due to the radiotherapy and chemotherapy resistance, surgical treatment is currently the most effective way for KIRC patients [3]. However previous studies indicated that KIRC usually has the worst overall survival rate [2]. Almost 30% of patients with localized cancers eventually develop to metastases despite of early surgical treatment carried out [4]. Therefore, it is necessary to identify suitable prognosis biomarkers for the diagnosis and treatment of KIRC Open Access *Correspondence: xiaoliangxinghnm@126.com † Minjiang Huang and Ti Zhang have contributed equally to this work 1 Hunan University of Medicine, Huaihua 418000, Hunan, People's Republic of China Full list of author information is available at the end of the article even though many prognosis factors have been reported for KIRC.
Cancer stem cells are a subpopulation of cells that has the driving force of carcinogenesis which is a complex and multistep phenomenon and involves accumulation of genetic and epigenetic changes ultimately leading to the development of pathological manifestations [5]. MicroR-NAs (miRNAs) are a family of endogenous, small, non-codingRNAs. MiRNAs could regulate the expression of genes associated with various biological phenomenons such as homeostasis, development, proliferation, differentiation, and apoptosis [6,7]. Deregulated expression and signaling of miRNA have been well-studied in the pathogenesis of various cancers. Aberrant expressions of miRNAs are vital for the initiation and progression of human malignancies as they act as both tumor suppressors and oncogenes [8]. In the present study, we aimed to identify potential prognosis biomarkers to predict overall survival of KIRC.

Data source and data processing
KIRC (71 controls vs 516 cancers) miRNA sequencing data, KIRC (72 controls vs 530 cancers) mRNA sequencing data and the corresponding clinical information were obtained from an open database TCGA. DESeq2 was used to identify the DEMs and DEGs according to |log2FC| > 0.5, basemean > 50, padj < 0.05. And we utilized two different web based tools, miRDB and TargetS-canHuman 7.2, to screen the target genes of miRNAs. The target genes only enriched in two databases could be selected as putative target genes for the next analysis.
Gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) pathway enrichment analysis DAVID version 6.8 was used to determine the association among target genes. To gain insight into the biological functions of those DEGs, GO and KEGG pathway enrichment analyses were performed.

Protein-protein interaction (PPI) network and module analysis
STRING version 11 was used to assess PPI information. A normal medium confidence interval of 0.4 was used as threshold. Cytoscape_v3.7.2 software was used to visualize the resulting PPI network. The Molecular Complex Detection (MCODE) application was used to select significant modules from the PPI network in Cytoscape_v3.7.2.

Pathologic TNM correlation analysis
After the classification base on the size and/or extent of the main cancer (T1 + 2 and T3 + 4), lymph node metastasis (N0 and N1), and distant metastasis (M0 and M1) respectively, we used Cox regression analysis, Kaplan-Meier curve, and log-rank analysis to further verify the characteristics of the pathologic TNM and the intensity of their correlation with survival. A repeatedmeasure ANOVA followed by Bonferroni post hoc tests or unpaired two-tail Student's t test was used exam the correlation of DEGs with pathologic TNM.

Survival analysis
Median is the number in the middle of a set of data in order. We divided the samples into high expression group and low expression group based on the median. We used RegParallel and survival packages in R to carry out univariate and multivariate Cox regression analysis.

Specific prognostic model construction
After multivariate Cox regression analysis, we constructed specific prognostic models according to previous reports [9]. KIRC patients were divided into low risk group and high risk group depend on the median value of the risk score. Patients whose risk values were higher than the median were classified as high risk, and Patients whose risk values were lower than the median were classified as low-risk. And then we used survival analysis to know the relationship of the models with the survival of KIRC patient. And then we constructed time-dependent receiver operating characteristic (ROC) curves within 1-, 3-, and 5-year and estimated its utility as a prognostic model for predicting the survival status.

Identification of DEMs and DEGs
TCGA is an open access database containing miRNA/ mRNA profiles and the corresponding clinical information. Using Padj < 0.05 and |log2FC|> 0.5 as cut-off criteria for DEMs, 111 DEMs were identified by DESeq2 analysis, including 62 up-regulated DEMs and 59 down-regulated DEMs (Fig. 1a). Using basemean > 50, Padj < 0.05 and |log2FC| > 0.5 as cut-off criteria for DEGs, 8694 DEGs were identified by DESeq2 analysis, including 5288 up-regulated DEMs and 3406 down-regulated DEMs (Fig. 1b). Then we performed survival analysis for those 111 DEMs and found that 40 DEMs were correlated with the overall survival of KIRC patients. Of which 25 DEMs was up-regulated and 15 DEMs was down-regulated (Table 1).

GO function and KEGG pathway enrichment analysis of the DEGs.
To gain a deeper understanding of the selected DEGs, we performed GO and KEGG analysis for those 252 DEGs. There were 65 biological process (BP), 23 cellular component (CC), and 29 molecular functions (MF) that were enriched by GO analysis (such as, Top 3 GO-BP, signal transduction, negative regulation of transcription from RNA polymerase II promoter, and positive regulation of transcription from RNA polymerase II promoter. Top 3 GO-CC, plasma membrane, endoplasmic reticulum, and cell surface. Top 3 GO-MF, protein binding, ATP binding, and transcription factor activity, sequence-specific DNA binding) (Fig. 2a-c, Additional file 1: Table S1). And there were 20 KEGG pathways that were enriched by KEGG analysis, of which 9 signaling pathway was enriched significantly (such as Top 3 signaling pathway, hsa04144: Endocytosis, hsa04510: Focal adhesion, and hsa04914: Progesterone-mediated oocyte maturation) (Fig. 2d, Additional file 1: Table S2).

PPI network construction and module selection.
Using the STRING database and Cytoscape software, a total of 252 DEGs were filtered into the PPI network, containing 252 nodes and 274 edges. According to the view that highly connected genes can have a major impact on disease, we identified 86 DEGs (with   (Fig. 2e). According to their degree of importance, 10 important modules involved 43 DEGs from the PPI network complex were selected for further analysis based on Cytoscape MCODE (Fig. 2f ). By cross analysis of those 86 DEGs and those 43 DEGs, we identified 10 overlap DEGs (ANXA1, BCL11B, CYCS, HLA-G, HMGCS1, RAB5C, RNF123, TIMP3, TRPM4, ZEB2).

Pathologic TNM correlation analysis
We classified the pathologic TNM staging of KIRC patients and conducted an overall survival correlation analysis. The results indicated that pathologic TNM were actually correlated with the overall survival. The KIRC patients with bigger of the size and/or extent of the main cancer (T3 + T4), or lymph node metastasis (N1), or distant metastasis (M1) displayed the worse overall survival rate ( Fig. 3a-c). Subsequently, we evaluated their relationship of those 10 overlap DEGs with pathologic TNM, and found that 5 DEGs was confirmed to be associated with pathologic T and pathologic M (Fig. 3d-e).

Specific prognostic model construction
After pathologic TNM correlation analysis, we performed univariate Cox regression analysis for those 10 DEGs. We found that HMGCS1, TIMP3, and RNF123 were correlated with overall survival. The KIRC patients with high expression of HMGCS1, TIMP3, and RNF123 exhibited a better overall survival (Fig. 4a-c). Then, we performed multivariate Cox regression analysis for HMGCS1, TIMP3, and RNF123. And the result showed that TIMP3 (p = 0.0001) and HMGCS1 (p = 0.0110) were still correlated with the overall survival of KIRC patients, of which the expression of TIMP3 (logFC = − 0.766, p < 0.001) and HMGCS1 (logFC = − 0.643, p < 0.001) were decreased significantly. Followed the multivariate Cox regression analysis, we constructed a prognostic model by using TIMP3 and HMGCS1. The patients with low risk actually exhibited a better overall survival (Fig. 4d). The time-dependent receiver operating characteristic (ROC) curves had area under curve (AUC) values higher than 0.5, which were 0.7487, 0.6893, and 0.6893 respectively (Fig. 4e-g).

Discussion
Previous study indicated that 2.2% of all new cancer cases and 1.8% of all cancer related death was RCC in 2018 globally [1]. And almost 30% of KIRC, the main type of RCC, with localized cancers eventually develop to metastases despite after surgical treatment [4]. Therefore, it is necessary to identify DEGs as suitable prognosis biomarkers for the diagnosis of KIRC. The expression of mRNA was affected by many factors, such as miRNA, lncRNA, methylation, and so on. In the present study, we just focus on the miRNAs. MiRNAs, a class of small noncoding RNAs of ∼22nt in length, are firstly identified in 1993 in Caenorhabditis eleganss. Increasing scientific reports demonstrate that miRNAs are involved in the regulation of almost all the biological phenomena in various species which repress the target transcripts through partial complementarity [5]. Abnormal expression of miRNAs is closely related to the pathogenesis of most human diseases, including cancer [8,10]. Previous studies have shown that miRNAs are involved in the initiation, progression, and prognosis of various cancers. For instance, Zhang et al. found that miR-1246 and miR-1290 are critical for the tumor initiation and progression of lung cancer [11]. Wang et al. found that miR-200c targets CDK2 and suppresses tumorigenesis in renal cell carcinoma [12]. Therefore, the aim of this study was to find the biomarkers related correlated to abnormal miRNA  expression, which has an important role in the diagnosis and prognosis for various cancers. In the present study, we found 20 DEMs and 252 DEGs through correlation analysis. And then, we found 20 KEGG pathways were enriched by functional enrichment analysis, of which 9 signaling pathway was enriched significantly. The other 11 signaling pathways are not significantly enriched, but previous studies also indicated that they also play an important role in the pathogenesis of cancers, such as metabolic related pathways [13][14][15].
Subsequently, we identified that TIMP3 and HMGCS1 were correlated with the overall survival of KIRC patients through bioinformatics analysis. And the constructed prognosis model based on TIMP3 and HMGCS1 could accurately predict the overall survival rate of KIRC patients. TIMP3 (Tissue inhibitor of metalloproteinases-3) belongs to a family of negative regulators of matrix metalloproteinase activity. Previous studies indicated that TIMP3 as a tumor suppressor could modulate tumor migration, invasion, and tumorigenicity [16]. High expression of TIMP3 could promote apoptosis in various tumors [16]. Das et al. found that reduced expression of TIMP3 was observed in 74% of the human malignant melanoma cases [17]. Loss or down regulation of TIMP3 could promote the metastasis, cell growth and invasion of several cancers [18][19][20]. Moreover, Gu et al. and Mylona et al. found that TIMP3 could predict the overall survival rate for hepatocellular carcinoma and breast cancer [21,22]. In the present study, we also found that the expression of TIMP3 was decreased significantly in KIRC. And the survival analysis also indicated that TIMP3 was correlated with the overall survival rate of KIRC patients which further reinforce the relationship of TIMP with cancers. The KIRC patients with low expression of TIMP3 displayed worse overall survival rate. All of these results reinforced the relationship of TIMP3 with cancers. By retrospective analysis, we found hsa-miR-21-5p was also correlated with overall survival which could be a potential prognosis biomarker for KIRC. Park et al. found hsa-miR-21-5p was more highly expressed in the recurrence group than in the nonrecurrence group of gastric cancer [23]. Chang et al. found that hsa-miR-21-5p may exert protective phenotypes by targeting breast oncogenes that contribute to patient survival [24]. HMGCS1 (3-hydroxy-3-methylglutaryl-CoA synthase 1) is a potential regulatory node in the mevalonate pathway, whose up-regulation is a common transcriptional event in cancer stem cell enriched subpopulations of breast cancer cell lines [25]. Wang et al. also found the expression of HMGCS1 is increased significantly in stomach adenocarcinoma samples of patients and tumorspheres of gastric cancer cells [26]. Additionally, previous reports demonstrated that krüppel-like factors (KLFs) could regulate the expression of various substrates. Yao et al. found that KLF13 was downregulated in colorectal cancer tissues and colorectal cancer cell line [27]. KLF13 knockdown could effectively promote cell proliferation and colony formation. Opposite results were observed in KLF13 overexpressed cells [27]. In the further research, Yao et al. found that KLF13 transcriptionally inhibited HMGCS1 and knockdown of HMGCS1 could suppress the proliferation of colorectal cancer [27]. But in the present study, we found that the expression of HMGCS1 was decreased significantly in KIRC. And by retrospective analysis, we found HMGCS1 correlated miRNAs hsa-miR-223-3p and hsa-miR-365a-3p were correlated with the overall survival of KIRC. The expression of hsa-miR-223-3p and hsa-miR-365a-3p were increased significantly in KIRC. Previous studies indicated that hsa-miR-223-3p and hsa-miR-365a-3p could promote proliferation, migration and invasion of cancer cells [28][29][30][31]. But what very interesting is that previous studies also indicated that hsa-miR-223-3p and hsa-miR-365a-3p could inhibit the invasion and migration of cancer cells [32][33][34][35]. All of those results suggested that the same gene or miRNA may play different roles in different cancers.

Conclusion
In the present study, we found 2 DEGs and 3 DEMs could be the candidate prognosis biomarkers for KIRC patients. The constructed risk models based on 2 DEGs and 3 DEMs could accurately predict the outcome. We just provided an analysis direction depended on theoretical knowledge and clinical outcomes, more scientific research, especially clinical studies, were needed to confirm our findings.