Gene Biomarker Prediction in Glioma by Integrating scRNA-seq Data and the Gene Regulatory Network

Background: Although great efforts have been made to study the occurrence and development of glioma, the molecular mechanisms of glioma are still unclear. Single-cell sequencing technology provides a new perspective for researchers to explore the pathogens of tumors to further help make treatment and prognosis decisions for patients with tumors. Methods: In this study, we proposed an algorithm framework to explore the molecular mechanisms of glioma by integrating single-cell gene expression proles and gene regulatory relations. First, since there were great differences among malignant cells from different glioma samples, we analyzed the expression status of malignant cells for each sample, and then tumor consensus genes were identied by constructing and analyzing cell-specic networks. Second, to comprehensively analyze the characteristics of glioma, we integrated transcriptional regulatory relationships and consensus genes to construct a tumor-specic regulatory network. Third, we performed a hybrid clustering analysis to identify glioma cell types. Finally, candidate tumor marker genes were identied based on cell types and known glioma-related genes. Results: We got six identied cell types using the method we proposed and for these cell types, we performed functional and biological pathway enrichment analyses. The candidate tumor marker genes were analyzed through survival analysis and veried using literature from PubMed. Conclusions: The results showed that these candidate tumor marker genes were closely related to glioma and could provide clues for the diagnosis and prognosis of patients with glioma. In addition, we found that four of the candidate tumor marker genes (NDUFS5, NDUFA1, NDUFA13, and NDUFB8) belong to the NADH ubiquinone oxidoreductase subunit gene family, so we inferred that this gene family may be strongly related to glioma.


Background
Malignant tumors have a very large impact on human health due to their high mortality rate and high recurrence rate. There are many factors that affect tumorigenesis, including genetic variation, epigenetics, and external environmental in uences. Glioma is the most common type of brain tissue tumor in complex diseases and accounts for approximately 40% of brain tissue tumors [1]. Therefore, it is of great importance to explore the molecular mechanisms of glioma, as these may help researchers develop glioma treatment strategies and drugs.
In recent years, many researchers have focused on the molecular mechanisms of glioma. Hu et al.
constructed a coexpression network by calculating the differentially expressed genes (DEGs) between 971 glioma samples and 102 normal samples, and functional and pathway enrichment analyses indicated that the p53 signaling pathway and the pathway of neuroactive ligand-receptor interaction may play important roles in the progression of glioma, and three genes (PUS7, EFR3B and NRCAM) were potential biological agent landmarks [2]. Niu et al. used protein-protein interaction networks to screen key DEGs and then applied machine learning methods to reveal the molecular mechanisms of glioma [3].
Wang et al. integrated gene interaction information into a weighted random survival forest method to perform an accurate survival prediction and to discover a survival biomarker for glioma [4]. Zhou et al. identi ed the glioma-speci c protein interaction network based on bulk RNA-seq data and performed enrichment analysis to verify disease-speci c molecular complexes [5]. Due to the complexity of glioma, more genetic markers need to be discovered.
Recent advances in micro uidic technology have made it possible to isolate a large number of cells, and single-cell sequencing (scRNA-seq) data analysis has become one of the most noteworthy technical elds in bioinformatics [6][7][8]. The resolution of scRNA-seq technology is accurate to a single cell, can resolve more subtle differences among cells and is widely used in biology, including development [9,10], infectious diseases [11,12], immunity [13,14], neurology [15] and oncology [16][17][18][19][20]. Cell type identi cation and/or rare cell type prediction based on scRNA-seq data can deepen the understanding of tumors and analyze the process of tumor occurrence [21]. At present, many methods have been proposed to identify cell types. For example, Kiselev et al. proposed a method for consistent clustering of single cells [22]. Wang et al. proposed Single-cell Interpretation via Multi-kernel LeaRning (SIMLR), which is based on single-cell data and multicore learning similarity measures. They used downscaling and clustering to analyze cell types [23]. Kim et al. implemented a semi-supervised learning classi cation tool, scReClassfy, to ne tune cell type annotations generated using any method in single-cell sequence datasets [24]. Lin et al. adopted an implicit missing value processing method to reduce the impact of dropout values in scRNA-seq data and achieved rapid and accurate cell type identi cation [25]. Grun et al. designed the RaceID method to identify rare cell types in complex single-cell populations through k-means and outlier detection methods [26]. Most of these methods directly identi ed cell types based on single-cell gene expression data without integrating multi-omics data.
In addition, gene expression levels are affected by a variety of regulatory factors, and it is also crucial for the treatment and prevention of complex diseases to understand the disturbance of transcriptional regulatory relationships. In terms of regulatory mechanisms, a transcription factor (TF) is a key gene regulatory factor that mainly activates or inhibits gene expression during the transcriptional stage. TFs participate in many important cellular processes, such as cell proliferation and cell differentiation. These cellular processes may affect the development of many complex diseases, including tumors [27]. For example, Zhang et al. reconstructed a multilayer signaling network that contains pathways from intercellular ligand-receptor interactions, intracellular TFs and their target genes. In this way, they discovered a new multilayer network biomarker (MNB) that was indicated to be valuable for the prognosis and prediction of glioma patients [28].
To further analyze the molecular mechanisms of glioma, in this study, we identi ed multiple cell types and candidate tumor marker genes in glioma by integrating scRNA-seq data and transcriptional regulation pairs. Through gene enrichment analysis, survival analysis and PubMed analysis, our results showed that our method has an effective performance and provides clues for the diagnosis and prognosis of patients with glioma. To explore the molecular mechanisms of glioma, we downloaded the single-cell gene expression data with EXP0062 from the CancerSEA database [29]. The data contain a single-cell gene sequencing pro le of 4044 tumor malignant cells, in which all malignant cells were derived from six glioma samples, and the tissue source of the samples was oligodendrocytes. The sample IDs are MGH36, MGH53, MGH54, MGH60, MGH93 and MGH97, respectively. The CancerSEA database uses methods such as copy number variation inference on the original single-cell data to ensure that all cells in the data set are tumor malignant cells.

Transcriptional regulation pairs
Gene transcriptional regulation pairs were collected from the HTRIdb [30] and TRRUST [31] databases. For HTRIdb, we collected 51871 regulation pairs, and for TRRUST, we collected 8427 regulation pairs. The regulation pairs were the pairs between TFs and the regulatory targets (TARGETs). TARGETs contain target genes and target TFs. Therefore, we divided the regulation pairs into TF-TF pairs and TF-gene pairs, according to whether a TARGET is a TF or gene. Finally, we obtained 952 TFs, 17600 target genes, 5694 TF-TF pairs and 53408 TF-gene pairs.

Known glioma-related genes
We collected known cancer-related genes from the Online Mendelian Inheritance in Man (OMIM) [32] and the Catalogue Of Somatic Mutations In Cancer (COSMIC) [33] databases. OMIM is an authoritative database focusing on the relationship between disease phenotypes and genotypes and contains cancerrelated genes with high con dence. COSMIC is a comprehensive somatic mutation database that contains thousands of somatic mutation information related to cancer development. In addition, we obtained known cancer-related genes from Bailey's research results [34]. This research uses 26 different bioinformatics tools to analyze somatic mutations in a variety of cancers and provides services for cancer research. In total, we obtained 77 KGGs.

Bulk RNA-seq of gene expression data and clinical data from glioma
Bulk RNA-seq of gene expression data and clinical data from glioma were obtained from The Cancer Genome Atlas (TCGA) [35]. The clinical data contained overall survival (OS) data. To analyze the data more effectively, we retained samples that had a tissue type of oligodendrocytes only. In this way, the tissue type of the samples in bulk RNA-seq data was consistent with the tissue type of the samples in the single-cell gene expression data. In the end, we obtained 198 glioma samples. Then, to improve the data quality, we deleted genes whose expression values were less than 1 in more than half of the samples.
Finally, to mitigate the in uence of different samples on the expression level and avoid the in uence of overcapturing features with extreme values and outliers, the z-score was used to normalize the gene expression values in the bulk RNA-seq expression data.

Preprocessing of single-cell gene expression data
In the single-cell gene expression data, there are signi cant differences in tumor malignant cells among different patient samples. To comprehensively analyze tumor characteristics, we rst explored the expression status of malignant cells in a single sample. Therefore, the original single-cell gene expression data were split according to the sample source to obtain multiple single-sample single-cell gene expression data.
Next, we cleaned the single-sample single-cell gene expression data from the perspective of cells and genes. First, the number of cells and genes were tted to a normal distribution, and cells with signi cantly fewer expressed genes were deleted (FDR < 0.05). Then, the genes that had an expression value detected in at least 3 cells and had an average normalized expression value greater than 10 − 5 were retained. To effectively improve the signal-to-noise ratio, the genes affected by technical noise were ignored. We performed the M3Drop feature selection method [36] and obtained the feature genes of single-sample single-cell gene expression data with FDR < 0.01. Finally, we normalized each expression data with a logarithmic function of offset 1.
After preprocessing, we obtained a total of 6 single-sample single-cell gene expression data. In MGH36, there were 694 cells and 4608 feature genes. In MGH53, there were 726 cells and 4126 feature genes. In MGH54, there were 1174 cells and 4732 feature genes. In MGH60, there were 428 cells and 3609 feature genes. In MGH93, there were 440 cells and 3879 feature genes. In MGH97, there were 582 cells and 4113 feature genes.

Identi cation of tumor consensus genes
Differences among samples of tumor malignant cells may affect the identi cation of genetic markers, so we rst analyzed each single-sample single-cell gene expression data. First, we performed PCA on each single-sample single-cell gene expression data to determine the appropriate principal components. To prevent excessive capture of certain genes with large values, the z-score was used to normalize gene expression data. In addition, the criteria for determining the number of principal components were as follows: (1) the cumulative contribution rate was greater than 90%, and (2) the difference between two consecutive principal components was less than 0.1%. We used the minimal number in the numbers obtained from condition (1) and condition (2) as the nal number of principal components.
Then, we adopted the idea of the k-nearest neighbors to construct a cell-speci c network within a single sample. The Euclidean distance was used to calculate the distance between all cell pairs. The k-nearest neighbor relationships were determined for each cell, and the similarity between the two cells was calculated by Jaccard coe cients. Next, Louvain clustering [37] was used to achieve the initial division of cells in a single sample. The results of the initial division helped to analyze the expression status of malignant cells in a single sample. We constructed a cell-speci c network with k as 20 and used the Seurat package [38] to complete the clustering process. According to each cell cluster in the initial division of each sample, the cells were divided into the cells belonging to the cell cluster and the remaining cells. We then performed the Limma package [39] to calculate the DEGs for each sample (lg2 | FC |> 1, p-value < 0.05).
If a gene was a differential gene in multiple samples, the gene re ected the coexpression pattern among samples to some extent. We selected tumor consensus genes by screening the genes that were differentially expressed in at least 30% of the samples.

Identi cation of tumor cell types
Each TF in the speci c regulatory network was used to form its corresponding regulation meta module (RMM). Each RMM included all target genes and other TFs directly regulated by the core TF. Then, based on the entire single-cell gene expression data that contained all cells from different samples, the RMM was regarded as a new feature of malignant cells to construct a speci c regulation expression matrix, in which the feature value was calculated by the Cell Score Method [40]. Then, hybrid clustering was used to identify the glioma cell types based on the matrix. The canopy clustering algorithm [41] was rst performed on all malignant cells to provide the k value and initial clustering center for k-means clustering. Then, k-means clustering was used to identify cell types. The Calinski-Harabaz (CH) coe cient was used to tune the parameters of the hybrid clustering, and the larger the CH value was, the better the clustering results.
In the identi ed cell types, we combined the entire single-cell gene expression data after M3Drop feature selection and divided all cells into two groups for each cell type: the cells that belonged to the cell type and the remaining cells. The ROTS method [42] was performed to obtain the marker genes for each cell type.

Gene set enrichment analysis
To further analyze the functional characteristics of cell types, GO functional enrichment and pathway enrichment analyses were conducted for marker genes from these cell types. We applied the Metascape tool [43] for enrichment analysis, which mainly provided ve forms of gene annotations, including GO biological processes, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway, Reactome pathway database, Reactome canonical pathways and CORUM.

Overview of the computational framework
We proposed a computational framework, which consisted of four steps (Fig. 1), to gain insight into the molecular mechanisms of glioma.
Step 1. Preprocessing of single-cell gene expression data. The original single-cell gene expression data were split according to the sample ID. Then, we preprocessed the gene expression data through data cleaning, feature selection and standardization.
Step 2. Identi cation of tumor consensus genes. For each single-sample single-cell gene expression data, we explored the gene expression patterns of all malignant cells through principal component analysis (PCA), cell-speci c network construction, and differential gene identi cation. Then, based on the overlapping degree of the differential genes among samples, tumor consensus genes were identi ed.
Step 3. Identi cation of tumor cell types. We combined the gene expression pro les of each sample and integrated transcriptional regulatory pairs. As a result, a speci c regulatory network was built based on tumor consensus genes and feed forward loops (FFLs). Finally, the single-cell speci c regulatory expression matrix was constructed, and a hybrid clustering method was used to obtain the cell types of glioma.
Step 4. Identi cation of candidate tumor marker genes. The marker genes of the cell types were regarded as candidate genes, and then the tumor eigenvector was calculated by known glioma-related genes (KGGs). Finally, the tumor marker genes were identi ed according to the degree of correlation between the candidate genes and the tumor eigenvector.

Differential analysis of single-cell gene expression data among samples
We performed data cleaning and feature selection on the single-cell gene expression data, and then tdistributed stochastic neighbor embedding (TSNE) [44] was used to visualize the malignant cell clusters.
Each point in Fig. 2 represents a cell, and each color represents a tumor sample. All of the tumor malignant cells were clustered according to their tumor sample source, and there were almost no mixing results of multiple tumor sample cells, which was consistent with previous studies [40,45,46]. Thus, there were considerably signi cant differences in malignant cells among samples, which inspired us to conduct our analysis at the sample level.

Analysis of consensus genes
For each sample, the k-nearest neighbors and Jaccard coe cient were used to construct a cell-speci c network, then Louvain clustering [37] was used to obtain the initial division of all cells, and the differentially expressed genes (DEGs) were calculated (Table 1). Finally, consensus genes were identi ed based on the overlapping degree of differential genes among different samples. In this paper, a total of 1123 tumor consensus genes were conservatively conserved by screening differential genes that were present in at least two samples. To show that the overlapping degree of DEGs could describe the co-expression patterns of genes among different samples, we counted the overlapping degree of the DEGs in a certain sample and the other samples (Fig. 3). The x-axis represents the sample source of all malignant cells. For each sample, the DEGs that overlapped with the other 5 samples were calculated, respectively, and the y-axis represents the proportion of the overlapping DEGs in all the DEGs of the sample. Figure 3 shows that the highest percentage of overlap was 75%, the lowest percentage of overlap was 28%, and more than half of the overlap percentages were from 30-50%. The analysis results showed that the DEGs in different samples had a high degree of consistency, which further showed that the consensus genes re ected the common gene expression patterns in different samples.

Speci c regulatory network analysis
TF-TF pairs and TF-gene pairs were obtained from the Human Transcriptional Regulation Interactions database (HTRIdb) [30] and Transcriptional Regulatory Relationships Unraveled by Sentence-based Text mining (TRRUST) [31], and we constricted the target genes as tumor consensus genes. To improve the speci city of the regulatory network, the entire single-cell gene expression data containing all cells from all samples were used to adjust the network links. We rst reassigned the missing values in the entire single-cell gene expression data using the method proposed by Venteicher et al. [47]. The new value of E i,j was proportional to the expected expression of gene i in cell j, which was calculated by the average expression of gene i and the complexity of cell j (the number of detected genes). Then, we calculated the Euclidean distance for each regulation pair based on the entire expression data, and the maximum and minimum normalization was used to shrink the range of distance. We then calculated the similarity of the regulation pairs based on the 1-distance and ltered the pairs whose similarity was less than 0.6. In addition, a feed forward loop (FFL) is an important building block of regulatory mechanisms and is related to the development of tumors, in which one TF M regulates another TF N, and M and N jointly regulate their target gene G. Therefore, we identi ed FFLs in the regulatory network to construct the nal speci c regulatory network.
Ultimately, the speci c regulatory network consisted of 121 TFs, 439 target genes and 2081 regulatory pairs. There were two categories of edges in the speci c regulatory network: 394 TF-TF pairs and 1687 TF-target gene pairs. Each edge that corresponded to two nodes in the network had a tumor-speci c regulation relationship, and the similarity of the edge represented the degree of regulation between the two nodes.
ETS1 was the node with the highest degree in the speci c regulatory network. ETS1 is a protein-coding TF that can act as an activator or inhibitor of multiple genes in a variety of different cellular environments. Moreover, annotations of Gene Ontology (GO) related to ETS1 indicate that the gene participates in various biological functions, such as cell senescence, apoptosis, and cell development, and plays an important role in the occurrence of diseases. ETS1 upregulates the expression of the integrin α5 subunit and mediates intracellular signal transduction and invasion processes, leading to the occurrence of malignant glioma [48].

Cell type identi cation
We identi ed 121 regulation meta modules (RMMs) in the speci c regulatory network, and then the RMMs were considered as single-cell features to obtain the speci c regulation expression matrix. Next, we used hybrid clustering to identify the cell types, and reproducibility-optimized test statistic (ROTS) method [42] was used to identify the marker genes of different cell types. The process of cell type identi cation fully considered the differences among tumor samples from malignant cells and the effect of transcriptional regulatory mechanisms on gene expression pro les. The resulting 6 cell types identi ed are shown in Table 2 and were named cell types A to F. To further analyze the functional and biological signi cance of different cell types, we performed enrichment analysis for marker genes in each cell type. Enrichment analysis was used as the priori knowledge such as gene annotation to classify a group of genes, and the classi cation results could help explore whether these genes had certain functions in common and understand the role of genes in life activities. In this study, the Metascape tool [43] was used for analysis. Figures 4 and 5 show the enrichment analysis results for cell types A and D, respectively. The more depth the color of the bar is, the greater the enrichment of the gene. For cell type A, the genes were mainly enriched in biological functions such as glial cell differentiation, the ERBB4 signaling pathway, and nervous system development. Among them, ERBB4 belongs to the ERBB receptor family and plays an important role in the development of the nervous system, and the ERBB growth factor receptor is considered to be a key signaling pathway for many human tumors, including glioma [49]. For cell type D, the genes were mainly enriched in a number of biological functions related to cellular respiration, including aerobic respiration and the negative regulation of respiration involving in ammation. Hypoxia could lead to increased aggressiveness of tumors, and tumor growth, metastasis and resistance to drug treatment greatly improved in the hypoxic microenvironment. There was also some evidence that the hypoxic response plays a key role in the behavior of glioma cells, which is very important for personalized treatment of patients with glioma [50].
The four most enriched entries of cell types B, C and F are shown in Table 3. Table 3 shows that cell type B was mainly involved in a variety of cellular metabolic activities; cell type C was mainly related to apoptosis, inhibition of cell growth and other biological functions; and cell type F was associated with biological functions related to multiple ribosomal proteins. Cell metabolism, apoptosis, and disturbance of ribosomal proteins could cause many complex diseases, including cancer. In addition, since only 4 marker genes were found in this cell type, there were no related enrichment items. Howerer, two of these genes are known cancer-related genes, indicating that cell type E may also be related to the development of glioma. Metascape enrichment analysis indicated that each cell type had unique functionality, and the marker genes may be closely related to the occurrence and treatment of glioma.

Candidate tumor marker gene analysis of glioma
Assuming that KGGs are speci cally expressed in glioma, we used the rst principal component method to calculate the tumor feature vector (TEV) for all KGGs based on bulk RNA-seq gene expression data.
TEV was a linear combination of all KGG expression vectors, which could represent the expression level of all KGGs. In addition, marker genes of cell types re ected the biological function of glioma and were likely to be the causative molecules of glioma. Therefore, we took all of these genes as candidate genes and calculated the Pearson correlation coe cient (PCC) between the candidate gene and TEV. The greater the absolute PCC value is, the stronger the relationship between the candidate gene and glioma. The absolute PCC value was used as the correlation between the candidate gene and TEV. We analyzed the top 20 genes in detail and de ned them as candidate tumor marker genes of glioma (Table 4). Statistical results showed that the correlations between two candidate tumor marker genes (ATP6V0B and GUK1) were extremely strong, with correlations greater than 0.8. The correlations of the remaining 18 candidate tumor marker genes were strong (between 0.6 and 0.8). Additionally, 11 out of the 20 candidate tumor marker genes were con rmed by relevant medical literature that they had a direct or indirect relationship with glioma (Table 4), which showed that the identi ed candidate tumor marker genes were reliable to a certain extent. In addition, we found that four (NDUFS5, NDUFA1, NDUFA13, and NDUFB8) of the candidate tumor marker genes belong to the NADH ubiquinone oxidoreductase subunit gene family. This gene family plays a key role in the transfer of NADH to the respiratory chain. NADH is the reduced state of nicotinamide adenine dinucleotide and is mainly involved in the metabolism of matter and energy in cells, which plays a key role in maintaining cell growth and differentiation. The studies of Yuan et al. [51] and Trinh et al. [52] showed that NADH is regarded as a new marker to classify glioma cancer cells. Therefore, we inferred that the NADH ubiquinone oxidoreductase subunit gene family may be closely related to glioma.

Survival analysis of candidate tumor marker genes of glioma
To further explore the effect of the expression level of candidate tumor marker genes on the prognosis of gliomas, overall survival (OS) data were used for survival analysis. Speci cally, we used speci c survival time for the Kaplan-Meier (KM) survival curve analysis. Figure 6 shows the results of the KM survival analysis of the most highly correlated tumor marker gene (ATP6V0B). Glioma samples were divided into a high expression group (expression_level = 1) and a low expression group (expression_level = 0), according to the median expression value of the candidate tumor marker gene in bulk RNA-seq pro les. The red curve and the blue curve represent the survival curves of the low and high expression sample groups, respectively. Analysis of the downward trend of the KM curve in Fig. 6 revealed that the interval between the survival curves of the high and low expression samples of ATP6V0B was quite obvious, and the samples with high expression exhibited the worse prognosis.
For each candidate tumor marker gene, we divided samples into a high expression group and a low expression group and analyzed the downward trend of the KM survival analysis between the two curves.
The interval between curves of the two groups clearly indicated that the gene expression level affected the survival of patients with glioma. The survival curve of the high expression sample group was below that of the low expression sample group, indicating that the patients with high gene expression had a worse prognosis, whereas the patients with low gene expression of the gene had a worse prognosis. Table 5 summarizes the KM survival analysis results of candidate tumor marker genes. There were 5 genes (ATP6V0B, MRPL20, NDUFS5, DDRGK1, and SDHB) with high expression levels, and the prognosis of the patient was worse. In addition, there were 5 genes (GUK1, ZNF195, MRPL41, NDUFA13, and BMPR1A) in which the expression level was low, and the prognosis of the patient was worse. The experimental results of survival analysis showed that the identi ed candidate tumor marker genes were reliable and had a strong correlation with glioma, and these genes could provide clues for the diagnosis and treatment of patients with gliomas and further help to understand the molecular mechanisms of glioma.

Discussion
Due to the batch effect or other factors, the individual differences in malignant cells among tumor samples are strong. The identi cation of cell types can essentially be regarded as an unsupervised clustering process. If cluster analysis of malignant cells based on single-cell expression pro les is performed directly, malignant cells from the same individual will often cluster together, and the clustering results may not re ect the tumor cell types. However, there will be consistency among different samples of the tumor. To identify the genes that were coexpressed in the tumor and the genes that expressed heterogeneity in different tumor samples, we analyzed the cell-speci c network from the single-sample level to identify tumor consensus genes and then combined the transcriptional regulatory pairs with multisample cell expression data to identify glioma cell types. Since the marker genes of the cell types have a strong correlation with glioma, we used the correlation assessment method to predict the candidate tumor marker genes that are highly related to glioma. From the analysis results, we concluded that the identi ed candidate tumor marker genes had a strong correlation with glioma. The research results may be helpful for the diagnosis and treatment of patients with glioma, but these predicted candidate tumor marker genes should be veri ed by further biological experiments. Of course, there are some problems in our study. For example, the results were heavily affected by the input gene expression data and noise in the data. In the future, we will integrate more omics data to perform further analyses, such as DNA methylation, noncoding RNA regulation, and protein interactions.

Conclusion
In this study, we have proposed a new framework for identifying candidate tumor marker genes based on single-cell gene expression pro les and transcriptional regulation pairs. The framework mainly contains four steps: preprocessing of single-cell gene expression data, identi cation of tumor consensus genes, identi cation of tumor cell types, and identi cation of candidate tumor marker genes. We have shown the framework's performance by exploring the molecular mechanisms of glioma. For glioma, 6 cell types and 20 candidate tumor marker genes were identi ed. The Metascape enrichment analysis showed that the cell types had signi cant functionality, and the analysis of candidate tumor marker genes showed that it had a strong correlation with glioma. In addition, recent relevant studies have also shown that some candidate tumor marker genes were recognized as targets of glioma, and 4 genes (NDUFS5, NDUFA1, NDUFA13, and NDUFB8) of the candidate tumor marker genes belonged to the NADH ubiquinone oxidoreductase subunit gene family, indicating that this gene family may have a strong correlation with glioma. These ndings contributed to the clinical diagnosis, therapeutic drug development, and pathological mechanisms of glioma.  Overview of the computational framework.     KM survival analysis results of ATP6V0B in OS survival data.