Screening of modules highly associated with immune cells in HIV-1
+
The whole flowchart of the study is shown in Additional file 1: Figure S1. WGCNA was run on 10 HIV-1− samples and 30 HIV-1+ samples in the GSE6740 dataset to determine the essential modules most closely connected to the clinical characteristics of HIV-1+. Prior to WGCNA, cluster analysis was conducted to ensure that there were no outlier samples in the data set (Fig. 1A). The clinical information of HIV-1, such as HIV-1+, HIV-1−, CD4 cells, and CD8 cells, was retrieved from the GSE6740 dataset and included in the clustering diagram (Fig. 1B). The soft threshold power was set to 14 (scale-free R2 = 0.85), conforming to the scale-free distribution to the maximum extent (Fig. 1C), and the dynamic shear tree algorithm was applied to determine 15 modules (Fig. 1D). A detailed examination of the module-clinical feature correlation heat map revealed that the MEdarkgreen module had the most significant association with the four clinical features (Fig. 1E, |correlation coefficient|> 0.4, P < 0.01), hence it was designated as the key module. The MEdarkgreen module produced 225 genes, 19 of which met the criteria (|MM|> 0.8 and |GS |> 0.2) and were identified as Hub genes (Fig. 1F).
Screening candidate genes for HIV-1
+
The limma package was used to evaluate the DEGs between HIV-1+ and HIV-1− samples in the GSE6740 dataset, with P < 0.5 chosen as the screening threshold. A total of 524 DEGs in HIV-1+ samples were evaluated in contrast to HIV-1− samples, comprising 278 up-regulated genes and 246 down-regulated genes (Additional file 2: Table S1). Figure 2A and B represent these DEGs in the form of volcanic and thermal maps. Subsequently, these differential genes were overlapped with the hub genes in the MEdarkgreen module to determine five candidate genes (SORL1, DPEP2, RGCC, ARRB1, and LTBP3) (Fig. 2C). Furthermore, the ClueGO/CluePedia plug-in in Cytoscape software was applied to explore the GO of candidate genes (Fig. 2D, Additional file 2: Table S2). SORL 1 was shown to be involved in 22 important biological processes, including early endosome to recycling endosome transport (GO: 0,061,502), choline O-acetyltransferase activity regulation (GO: 1,902,769), and positive regulation of adipose tissue development (GO: 1,904,179). Amyloid precursor protein catabolic process was associated with protein exit from the endoplasmic reticulum (GO: 0,032,527), positive regulation of glial cell-derived neurotrophic factor production (GO: 1,900,168), regulation of amyloid precursor protein catabolic process (GO: 1,902,991), and negative regulation of metalloendopeptidase activity (GO: 1,902,963). Furthermore, the follicle-stimulating hormone signaling pathway was discovered to be linked to ARRB1, histone acetyltransferase activity (GO: 0,004,402), the ovulation cycle process (GO: 0,022,602), and the negative regulation of interleukin-6 production (GO: 0,032,715), (GO: 0,042,699), and DPEP2's molecular function was primarily involved in regulating RNA binding (GO: 0,061,980). In addition, RGCC was shown to be involved in the negative regulation of fibroblast growth factor synthesis, and both RGCC and LTBP3 have been connected to the regulation of extracellular matrix formation (GO: 0,085,029) during HIV-1 infection.
Screening of HIV-1
+
diagnostic markers
To further screen the genes with diagnostic value for AIDS, the GSE6740 dataset was used to establish a random forest model (RF), support vector machine model (SVM), and generalized linear model (GLM). These three models were then submitted to an explanatory analysis in R using the DALEX package, and the residual distribution map was plotted to identify the optimum model. The RF model, as illustrated in Fig. 3A and B, is considered to be the best match for the minimum sample residue. The four factors in the RF model, including DPEP2, RGCC, ARRB1, and LTBP3, have a considerable influence on the projected value of the response variable (Fig. 3C). The areas under the ROC curve of which were 0.777, 0.720, 0.890, 0.835, respectively (Fig. 3D–G), indicating that these genes had an excellent separating capacity between HIV-1- and HIV-1 + samples. Therefore, these four genes were taken as diagnostic markers for HIV-1+ to carry out further analysis.
Construction and evaluation of a four diagnostic marker-based nomogram model
To enhance clinical prediction for HIV-1+, a nomogram model based on DPEP2, RGCC, ARRB1, and LTBP3 was constructed using the rms package (Fig. 4A), and the calibration curve (Fig. 4B) was established using the calibration function in the rms package to assess the nomogram model's predictive capacity. The calibration curve shows that the difference between the actual risk of HIV-1 infection and the projected risk is insignificant, implying that the nomogram model's prediction is highly accurate for HIV-1 infection diagnosis. As revealed by decision-making curve analysis (DCA), the “nomogram” curve is higher than the gray line, the “ARRB1” curve, the “DPEP2” curve, the “LTBP3” curve, and the “RGCC” curve. The x-axis in DCA represents forecast probability, while the y-axis represents net income. The oblique red line indicates that the nomogram model could benefit those patients with high-risk thresholds of 0 to 1, with the clinical benefit of the nomogram model outweighing that of the “ARRB1”, “DPEP2”, “LTBP3”, and “RGCC” curves, which suggested that the diagnostic value of the nomogram model for predicting HIV-1 infection (Fig. 4C). In addition, the clinical impact curve was further evaluated based on the DCA curve to determine the clinical effect of the “nomogram” model more intuitively. It was found that from 0 to 1, the “Number high risk” curve was close to the “Number high risk with the event” curve under the high-risk threshold, indicating the excellent ability of the “nomogram” model to predict HIV-1 infection (Fig. 4D). These findings also suggest that the four genes DPEP2, RGCC, ARRB1, and LTBP3 may play an important role in HIV-1 infection.
Construction of diagnostic gene–drug interaction network
The potential drugs or molecular compounds (Additional file 2: Table S3) of diagnostic genes DPEP2, RGCC, ARRB1, and LTBP3 for HIV-1+ treatment were searched in the CTD database. A diagnostic gene-drug interaction network was created and visualized using Cytoscape to determine the interaction between diagnostic genes and existing HIV-1 medications (Fig. 5). During the treatment of HIV-1+, multiple drugs can affect the expression of these four diagnostic genes.
Construction of CeRNA network
The ENCORI database was examined for miRNAs related to these four diagnostic genes. According to the ceRNA hypothesis, there is a negative association between miRNAs and lncRNAs or mRNAs. Therefore, the link between diagnostic markers (mRNA) and miRNAs was investigated, with miRNAs demonstrating a negative correlation with the diagnostic genes selected (Fig. 6A). Figure 6B shows the potential binding sites between miRNA sequences and diagnostic genes. Figure 6C illustrates the interaction of each diagnostic marker (mRNA) with targeted miRNAs. Furthermore, to construct the lncRNA-miRNA-mRNA regulatory network in HIV-1+, the ENCORI database was used to obtain the lncRNAs related to the above miRNAs, and the correlation between miRNAs and lncRNAs was calculated. Given multiple lncRNAs corresponding to the same miRNA, only the one that was most closely associated with miRNA was chosen for demonstration (Fig. 6D). Meanwhile, potential binding sites between lncRNA sequences and miRNAs were identified and documented in the ENCORI database (Fig. 6E). Based on 18 mRNA-miRNA pairs and 56 miRNA-lncRNA pairs, a ceRNA network in AIDS was constructed, as shown in Fig. 6F. Moreover, the lncRNA-miRNA-mRNA-drug regulatory network was developed using CTD-identified potential therapeutic medicines (Fig. 7). The red hexagon represents lncRNA, the purple triangle indicates miRNA, the light blue circle represents mRNA, and the green diamond indicates molecular compounds.
Clinical sample validation of HIV-1
+
diagnostic genes
The expression levels of the four diagnostic genes were analyzed in the GSE6740 dataset, with the results suggesting that the expression levels of DPEP2, RGCC, ARRB1, and LTBP3 were significantly down-regulated (Fig. 8A). This is consistent with clinical blood sample detection results, which showed that the expression levels of four diagnostic genes were considerably down-regulated in the blood of HIV-1 + patients compared to the control group (Fig. 8B).