Identification of intestinal flora-related key genes and therapeutic drugs in colorectal cancer

Background Colorectal cancer (CRC) is a multifactorial tumor and a leading cause of cancer-specific deaths worldwide. Recent research has shown that the alteration of intestinal flora contributes to the development of CRC. However, the molecular mechanism by which intestinal flora influences the pathogenesis of CRC remains unclear. This study aims to explore the key genes underlying the effect of intestinal flora on CRC and therapeutic drugs for CRC. Methods Intestinal flora-related genes were determined using text mining. Based on The Cancer Genome Atlas database, differentially expressed genes (DEGs) between CRC and normal samples were identified with the limma package of the R software. Then, the intersection of the two gene sets was selected for enrichment analyses using the tool Database for Annotation, Visualization and Integrated Discovery. Protein interaction network analysis was performed for identifying the key genes using STRING and Cytoscape. The correlation of the key genes with overall survival of CRC patients was analyzed. Finally, the key genes were queried against the Drug-Gene Interaction database to find drug candidates for treating CRC. Results 518 genes associated with intestinal flora were determined by text mining. Based on The Cancer Genome Atlas database, we identified 48 DEGs associated with intestinal flora, including 25 up-regulated and 23 down-regulated DEGs in CRC. The enrichment analyses indicated that the selected genes were mainly involved in cell–cell signaling, immune response, cytokine-cytokine receptor interaction, and JAK-STAT signaling pathway. The protein–protein interaction network was constructed with 13 nodes and 35 edges. Moreover, 8 genes in the significant cluster were considered as the key genes and chemokine (C-X-C motif) ligand 8 (CXCL8) correlated positively with the overall survival of CRC patients. Finally, a total of 24 drugs were predicted as possible drugs for CRC treatment using the Drug-Gene Interaction database. Conclusions These findings of this study may provide new insights into CRC pathogenesis and treatments. The prediction of drug-gene interaction is of great practical significance for exploring new drugs or novel targets for existing drugs.

to improve clinical treatment over the past few decades, CRC still poses an enormous threat to human health. For example, the median overall survival rarely exceeds 6.8 months for patients with metastatic CRC [3,4], and the 5-year survival rate remains only 10% [5]. Therefore, targeting novel pathways is indispensable for further improvement in the prognosis of patients with CRC.
CRC is generally considered to be a multifactorial disease involving diet, inflammatory process, genetic alteration, and environmental factor [6,7]. The human intestinal tract contains 10 to 100 trillion florae, which is 10 times more than the number of total human body cells. There is increasing evidence that patients with CRC harbor a distinct microbiota. Compared with healthy individuals, the probiotics, including Bifidobacterium and Lactobacillus acidophilus, are decreased while the pathogenic bacteria, including Bacteroides/Prevotella and Enterococcus faecalis, are increased in patients with CRC [8][9][10][11]. Previous studies have confirmed the role of intestinal flora as an environmental factor for the carcinogenesis of CRC [12,13]. Intestinal flora dysfunction can induce abnormal immune reactions, resulting in a special immune microenvironment in colorectal tissue and producing carcinogenic metabolites that induce DNA damage and gene mutations in host cells [14]. For example, alterations in the microbiota can drive the upregulation of interleukin-17c in intestinal epithelial cells during intestinal tumorigenesis. Microbiota-driven IL-17c promotes cell survival and tumorigenesis by inducing the expression of Bcl-2 and Bcl-xl in intestinal epithelial cells [15]. The enzymes produced by Clostridium bacteria induce increased secondary bile acids that lead to ROS production and DNA damage. These changes can lead to non-integer multiples, KRAS mutations, and micronuclei [16]. However, the current knowledge on the mechanism by which intestinal flora influences the pathogenesis of CRC remains scarce.
Due to the existence of tens of thousands of biomedical journals, the field of biomedicine is flooded with new articles, leading to information overload. Therefore, there is increasing interest in text mining, an efficient approach that enables the identification of biologically relevant entities, such as genes and diseases, complex biological relations, and comprehensive networks. Researches in multifarious fields have been performed using text mining, ranging from pattern-matching methods to molecular events extraction [17][18][19]. Yet to date, the approaches of text mining have not addressed how intestinal flora has an impact on the development of CRC.
In this study, we extracted the genes associated with intestinal flora using text mining. Then, we generated common genes by combining these extracted genes and differentially expressed genes (DEGs) between CRC and normal samples. With further analyses of the functional enrichment and protein-protein interaction (PPI), we identified 8 potential target genes. Finally, candidate drugs for CRC treatment were derived using the Drug-Gene Interaction database (DGIdb).

Gene collection
The web server GenCLiP 3 (https ://ci.smu.edu.cn/gencl ip3/analy sis.php) was used to perform text mining. The inclusion criteria for screening were: (i) literature published from 1980 to 2020; (ii) human genes that co-occur with "gut microbiota" or "intestinal flora" in sentence (iii) genes in MEDLINE. Subsequently, a list of genes associated with intestinal flora was extracted. To identify the DEGs between CRC and normal samples from The Cancer Genome Atlas (TCGA) database, the matrix data of gene expression levels were analyzed with the limma package of the R software. Log 2 (fold change) > 2 or < − 2 and false discovery rate < 0.05 were set as the threshold for filtering DEGs. VennDiagram package was used to identify the intersection of the two aforementioned gene sets.

Functional enrichment analyses
Using the online tool Database for Annotation, Visualization and Integrated Discovery (DAVID), enrichment analyses of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway were performed for the intersection genes. P value < 0.05 was considered as statistically significant.

PPI network analysis
The PPI network of the selected common genes was constructed using the online STRING (https ://strin g-db. org/) and then was visualized using Cytoscape v3.7.2. A confidence score of 0.4 or higher was set as the cut-off criterion. The Molecular Complex Detection (MCODE) plug-in of Cytoscape was used to classify the significant node clusters with the default. The genes in the most significant cluster were selected as key genes for overall survival and drug-gene interaction analyses.

Correlation of the key genes with overall survival
Gene Expression Profiling Interactive Analysis (GEPIA) is a web-based tool that is freely available to users [20]. Based on TCGA data, GEPIA contains multiple modules, including analyses of the relationship between gene expression and overall survival. We used the GEPIA to analyze the correlation of the key genes with overall survival of CRC patients.

Drug-gene interaction
The screened key genes were served as the potential targets for searching drugs. DGIdb database (v3.0, https :// www.dgidb .org/), a freely available database, was used to identify known and potential drug-gene interactions. Finally, the FDA approved antineoplastic drugs were considered as the potential drugs for CRC treatment.

Identification of intestinal flora-related DEGs in CRC
According to the inclusion criteria, 518 genes were identified within 1110 published papers. To explore whether intestinal flora influences the pathogenesis of CRC by regulating the gene expression, we then identified the DEGs between CRC and normal samples from the TCGA database. Compared with the normal samples, there were 1665 and 2235 DEGs in colon adenocarcinoma (COAD) and rectal adenocarcinoma (READ), respectively. Moreover, there were 25 up-regulated genes ( Fig. 1a) and 23 down-regulated genes ( Fig. 1b) in common between the DEGs and intestinal flora-related genes. The 48 common genes listed in Table 1 were selected for further analysis.

Functional enrichment analyses
Using the DAVID tool, we identified 55 GO terms and 5 KEGG terms in which the 48 intestinal flora-related DEGs enriched significantly (P < 0.05). The results showed the top 5 significant enrichment terms for biological processes, cellular component and molecular function, and 5 KEGG pathway terms (Fig. 2). As listed in Table 2, in the biological processes annotation, selected DEGs are mainly involved in cell-cell signaling, immune response, inflammatory response, digestion, and G-protein coupled receptor signaling pathway. In the cellular component annotation, DEGs are mainly involved in extracellular space, extracellular region, plasma membrane, extracellular exosome, and apical plasma membrane. As for molecular function, DEGs are mainly involved in hormone activity, growth factor activity, cytokine activity, receptor binding, and chemokine activity.
Of the 5 enriched pathways, the two most significant pathways were cytokine-cytokine receptor interaction (P = 2.22E−05) and JAK-STAT signaling pathway (P = 0.006). Additional highly enriched relevant pathways were rheumatoid arthritis, fat digestion and absorption, and TNF signaling pathway (Table 3).

PPI network construction and survival analysis
From the KEGG pathway analysis, 13 genes were selected. With these 13 genes, the PPI network analysis was performed using STRING and then was visualized using Cytoscape v3.7.2 (Fig. 3a). There were 13 nodes and 35 edges in the PPI network (PPI enrichment P < 1.0E−16). Moreover, the most significant cluster network (score = 7.143) was created using MCODE plug-in, consisting of 25 edges and 8 nodes/genes (CSF2, CXCL1, MMP3, CXCL10, IL23A, CXCL8, IL22, and IL6R) (Fig. 3b). Further, we used the GEPIA to analyze the correlation of the 8 key genes with overall survival of CRC patients. The result showed that only chemokine (C-X-C motif )   ligand 8 (CXCL8) was closely related to the overall survival of patients with CRC (Fig. 3c).

Drug-gene interaction
The 8 potential key genes in the PPI network were selected for drug-gene interaction analysis. A list of 24 drugs meeting the requirements for CRC treatment was compiled, comprising only the antineoplastic drugs that had been approved by the FDA (Table 4). As listed in Table 4, the potential targets of these drugs include CSF2, CXCL8, IL6R, and CXCL10, and 79.2% (19/24) of the drugs target CSF2 and CXCL8. Several drugs have been applied either alone or in combination in clinical trials or treatments for CRC, such as cetuximab, oxaliplatin, bevacizumab, temozolomide, and interferon alfa-2b.

Discussion
The intestinal flora plays significant roles in the formation and development of CRC via producing carcinogenic toxins and metabolites, causing intestinal dysbiosis, and altering immune response [14]. However, current knowledge of the specific mechanism by which intestinal flora influences the pathogenesis of CRC remains limited. In this study, we utilized text mining to extract the intestinal flora-related genes. With further analyses of the functional enrichments and PPI network, we identified 8 key genes associated with intestinal flora and the The most significant module from the PPI network. c Kaplan-Meier curves of overall survival for patients grouped by expression level of CXCL8. PPI, protein-protein interaction; DEGs, differentially expressed genes; CXCL8, chemokine (C-X-C motif ) ligand 8 development of CRC. Finally, candidate drugs targeting 4 key genes (CSF2, CXCL8, IL6R, and CXCL10) were derived using the DGIdb database.
To explore whether intestinal flora contributes to the development of CRC via regulating gene expression, we identified the DEGs between CRC and normal samples. The interaction of the DEGs and intestinal flora-related genes extracted by text mining was selected for further functional enrichments and PPI analyses. Functional enrichment of the 48 selected DEGs in the GO biological process and KEGG pathways highlight their role in immune response, inflammatory response, and intestinal function. Previous studies have confirmed that alteration of intestinal flora can promote CRC development by inducing inflammation, immune suppression, and attacking the gut barrier system [21][22][23]. Moreover, intestinal flora has been proven to promote tumorigenesis and chemoresistance of CRC via regulating gene expression [24][25][26]. Therefore, the mechanism of intestinal flora contributing to the development of CRC is complex and multifaceted.
Using the MCODE plug-in, we created the most significant cluster network, including 8 genes. 87.5% (7/8) of these genes enriched in the cytokine-cytokine receptor interaction pathway, indicating that interactions between the intestinal flora and immune system contribute to the development of CRC.
Drug resistance has been a Gordian knot in the treatment of cancer. Therefore, we identified a list of 24 drugs with the potential therapeutic efficacy against CRC. Among the 8 key genes, the potential gene targets of these drugs are CSF2, CXCL8, IL6R, and CXCL10, and most of the drugs were CSF2 and CXCL8 inhibitors. As a cytokine, CSF2 can stimulate the recruitment and maturation of dendritic cells to induce protective immunity and then exert anti-tumor effects [27]. Yet in the tumor microenvironment, CSF2 is often up-regulated and suppress the immune response, resulting in a poor prognosis for patients [27,28]. Another cytokine, CXCL8, has been proven to be associated with chemoresistance of CRC [29]. The increased expression of CXCL8 induced by anti-cancer drugs, such as doxorubicin and cisplatin, can upregulate the expression of ATP-binding cassette transporters, resulting in poor chemotherapeutic response [29]. Moreover, a high level of CXCL8 predicted poor overall survival in patients with CRC [30], which is contrary to our findings in this study. This discrepancy probably attributes to the expression characteristics of CXCL8 in different cells within CRC tissues. The tumor microenvironment is a complex system that contains infiltrating immune cells, epithelial cells, and fibroblasts, as well as tumor cells. In addition, CXCL8 is a proinflammatory cytokine produced by tumor cells, neutrophils, and endothelial cells [31]. Oladipo et al. found that CRC patients with CXCL8 positivity in the tumorinfiltrating cells had a significantly improved prognosis compared with patients with negativity [32]. Therefore, we speculated that CXCL8 expression in infiltrating cells was dominant in CRC tissues that were obtained from the TCGA database. Among the listed drugs, cetuximab was considered as a prospective drug for CRC therapy thanks to its ability to decrease CXCL8 expression other than inhibiting the epidermal growth factor receptor [33,34]. IL6R is the receptor of IL-6, which forms a dimer with glycoprotein-130. IL-6 binds the IL6R to initiate the IL-6 signaling that transduces intracellular signals via activation of the JAK-STAT3 pathway [35]. Research has suggested that the IL-6 signaling pathway plays an important role in the development and chemoresistance of various cancers, including CRC [36][37][38]. Accordingly, IL6R has been proposed as a promising target for CRC treatment. IL6R antagonist antibody, tocilizumab, could significantly reduce viability and enhance the apoptosis of CRC cells by blocking the IL-6/STAT3 pathway [39].
CXCL10 is a member of interferon-inducible proteins, which is increasingly being considered as a pro-tumorigenic factor in various cancers, including CRC [40]. Besides, elevated serum CXCL10 was associated with liver metastasis and poor prognosis in CRC [41]. Thus, CXCL10 may be a potential therapeutic target for CRC.

Conclusions
In conclusion, we presented a novel method to explore the molecular mechanism underlying the effect of intestinal flora on CRC. Importantly, we identified 8 potential key genes and 24 candidate drugs. Ten of the 24 drugs have not been tested in CRC, which not only provides a theoretical basis for new trials but also provides new insights into targeting drug discovery. However, our study has some limitations. First, a limitation of the present study is its retrospective nature. All the findings were generated based on the published literature and public database. More prospective studies should be required to verify our findings. Second, it was not clear whether the decrease in probiotics or the increase in pathogenic bacteria caused the DEGs in CRC. Therefore, further mechanistic study of intestinal flora-related genes is encouraged.