Analysis of gene expression profiles and protein-protein interaction networks in multiple tissues of systemic sclerosis

Background Systemic sclerosis (SSc), a multi-organ disorder, is characterized by vascular abnormalities, dysregulation of the immune system, and fibrosis. The mechanisms underlying tissue pathology in SSc have not been entirely understood. This study intended to investigate the common and tissue-specific pathways involved in different tissues of SSc patients. Methods An integrative gene expression analysis of ten independent microarray datasets of three tissues was conducted to identify differentially expressed genes (DEGs). DEGs were mapped to the search tool for retrieval of interacting genes (STRING) to acquire protein–protein interaction (PPI) networks. Then, functional clusters in PPI networks were determined. Enrichr, a gene list enrichment analysis tool, was utilized for the functional enrichment of clusters. Results A total of 12, 2, and 4 functional clusters from 619, 52, and 119 DEGs were determined in the lung, peripheral blood mononuclear cell (PBMC), and skin tissues, respectively. Analysis revealed that the tumor necrosis factor (TNF) signaling pathway was enriched significantly in the three investigated tissues as a common pathway. In addition, clusters associated with inflammation and immunity were common in the three investigated tissues. However, clusters related to the fibrosis process were common in lung and skin tissues. Conclusions Analysis indicated that there were common pathological clusters that contributed to the pathogenesis of SSc in different tissues. Moreover, it seems that the common pathways in distinct tissues stem from a diverse set of genes.


Background
Systemic sclerosis (SSc) is a rare, multisystemic, autoimmune disease that involves the skin and various internal organs, including the lungs, gastrointestinal tract, heart, and kidneys. The exact pathogenesis of SSc remains unknown, but it seems that vascular abnormalities, inflammation, dysregulation of immune system, and extracellular matrix (ECM) deposition can lead to progressive connective tissue fibrosis. Organ failures that arise from fibrosis are the most significant causes of mortality in SSc patients [1,2].
Although the etiopathogenesis of SSc has not been well identified, accumulated evidence suggests that multiple genes and their interactions with environmental factors play important roles in this context [3,4]. Traditional researches have been performed in order to demonstrate the involvement of a particular gene or protein in SSc physiopathology [5,6]. Although these studies generate invaluable data, they provide a small amount of evidence that is insufficient to clarify the complex interactions between multiple genes or proteins simultaneously. Consequently, it is essential to utilize new approaches for realizing the alterations of different genes and pathways in complicated pathological conditions, like SSc [7,8]. These approaches could have a major role in the holistic understanding of complex disease patterns and developing effective therapies.
Microarrays have been extensively applied for understanding biological mechanisms, discovering new drug targets, and evaluating drug responses [9,10]. In addition, results obtained from microarray technology might be helpful in generating abundant complex datasets that mostly address the same biological inquiries [11][12][13][14][15][16][17]. Integration of relevant gene expression datasets can improve the reliability of the outputs and facilitate the identification of altered molecular pathways and complex disease pathogeneses [8,18,19].
Skin involvement is one of the most common clinical manifestations of SSc and is known to be a key marker of disease activity [20]. The lung is frequently involved in SSc, and such condition is known as the major cause of death among SSc patients [21]. PBMC is a valuable resource for investigating the immune responses involved in autoimmune diseases like SSc [22]. The involvement of multiple organs makes it difficult to recognize the SSc pathogenesis. Moreover, it is not yet clearly understood what pathways may affect SSc development in different organs [23]. Consequently, the present study accomplished an integrative analysis of microarray gene expression data of PBMC as well as the lungs and skin of SSc patients to identify the shared and tissue-specific pathways involved in different tissues.

Methods flowchart
The method procedures and steps are illustrated in Fig. 1.

Gene expression dataset selection
Gene Expression Omnibus (GEO) (https://www.ncbi. nlm.nih.gov/geo/) was searched for gene expression datasets regarding SSc [24]. Datasets containing case and control samples were selected. In addition, only SSc patients who had received no treatment were included. A total of 10 datasets possessed the selection criteria and were selected for this study. Three datasets for lung tissue (accession number: GSE81292, GSE48149, and GSE76808), three datasets for PBMC (accession number: GSE19617, GSE22356, and GSE33463), and four datasets for skin tissue (accession number: GSE32413, GSE45485, GSE9285, and GSE76807) were selected. The selected datasets comprised 69 (52 cases and 17 controls), 186 (125 cases and 61 controls), and 88 (30 cases and 58 controls) samples for lung, PBMC, and skin, respectively. Table 1 provides detailed information of each dataset and highlights the first author, tissue type, accession number, and references.

Datasets preprocessing
The data was preprocessed using R statistical programming language. Series matrix files and related annotations for each dataset were obtained from the GEO database. Selected datasets were divided into different groups based on their tissue types. Then, preprocessing steps were carried out in each tissue group independently. The data in each dataset was normalized using a quantile normalization technique function. Raw expression levels were log2 transformed. The mean was applied to replicate expressions of the same participants. To merge datasets in each tissue, probes were converted to the Entrez gene ID. The probes which were assigned to no Entrez ID were removed. The multiple expressions which were assigned to identical Entrez IDs were collapsed to the mean expression using the aggregate function in R. To make gene expression comparable across samples, batches were removed using well-established ComBat function from the SVA R/Bioconductor package [25].

Identifying differentially expressed genes (DEGs)
DEGs between healthy controls and patients in PBMCs as well as lung and skin tissues were identified using the Limma package [26]. DEGs were considered significant with an adjusted p-value< 0.05 based on the false discovery rate (FDR) using the Benjamini-Hochberg (BH) procedure and the logarithm of fold change (logFC) > ± 0.5. A total of 52, 619, and 119 DEGs between the SSc group and healthy controls were identified in PBMC, lung, and skin tissues, respectively. A complete list of DEGs in lung, PBMC, and skin datasets are provided as Additional file 1:Tables S1, Additional file 2:Tables S2, and Additional file 3: Tables S3, respectively. Analysis showed that there was no shared DEGs between all three investigated tissues; however, there were some common DEGs between each pair of tissues (Table 2).

Protein-protein interaction (PPI) network analysis
The search tool for retrieval of interacting genes (STRING) (https://string-db.org) database, which integrates both known and predicted PPIs, can be applied to predict functional interactions of proteins [27]. To seek potential interactions between DEGs according to different tissues, the STRING tool was employed. Active interaction sources, including text mining, experiments, databases, and co-expression as well as species limited to "Homo sapiens" and an interaction score > 0.4 were applied to construct the PPI networks. Cytoscape software version 3.6.1 was used to visualize the PPI network. To detect highly connected regions of the network, Cluster-ONE 1.0 software was used based on the following criteria: minimum size = 5, minimum density = 0.05, and edge weights = combined_score. Minimum size is the minimum size of each cluster; minimum density represents the average edge weight within the cluster if missing edges are supposed to have a weight of zero, and edge weights determine the weight of each edge [28]. In the networks, the nodes correspond to the proteins and the edges represent the interactions. STRING was employed to seek potential interactions among DEGs corresponding to different tissues. Active interaction sources, including experimental repositories, computational prediction methods, and public text collections as well as species limited to "Homo sapiens" and a combined score > 0.4, were applied.

Functional and pathway enrichment analysis
Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were conducted using Enrichr (http://amp.pharm.mssm. edu/Enrichr/) for clusters obtained from different SSc tissues. Enrichr is a web-based tool that allows the evaluation of annotations with its extensive gene-set libraries [29]. The GO Biological Process 2018 and KEGG 2016 of each tissue were determined. The significant terms and pathways were selected with the threshold of adjusted p-value < 0.05. The five most significant (adjusted  Tables 3, 4, and 5, respectively.

Data quality control
To ensure normal distribution of the data, the boxplots for each dataset were displayed. To confirm batch effects removal, the boxplots for each dataset were illustrated after applying the ComBat function. Boxplots before and after batch effect removal are shown in Fig. 2.

Network analysis of the DEGs
The PPI networks for lung, PBMC, and skin DEGs were constructed and used to identify 2, 12, and 4 clusters of highly interconnected nodes in PBMC, lung and skin tissues, respectively. Fig. 3 shows the 12 significant clusters that were found in the lung PPI network analysis using ClusterONE. In the cluster L.1, the most significant biological processes and pathways were associated with immunity and inflammatory responses, including cytokine-and chemokine-mediated signaling pathways, cytokinecytokine receptor interaction, as well as JAK-STAT and TNF signaling pathways. Cluster L.2 was enriched in immunity pathways containing MAPK and TNF signaling pathways and osteoclast differentiation. Enrichment was also observed in biological processes like transcriptional regulation of the RNA polymerase II gene promoter as well as genes involved in the immune system [30]. Moreover, cluster L.12 was related to the terms and pathways relevant to immunity, including neutrophil degradation, neutrophil-mediated immunity, and osteoclast differentiation. As mentioned, 3 out of 12 clusters of the lung PPI network were correlated with inflammatory and immunity responses.

Functional enrichment analysis of clusters
Several enriched biological processes and pathways in cluster L.3 were involved in cell proliferation and death. For example, mitotic cell cycle phase transition, cell cycle, p53, and FoxO signaling pathways were represented in this cluster. Cluster L.8 was enriched in the regulation of bone morphogenetic proteins (BMPs) and proliferation.
Clusters L.4, L.5, and L.10 all contained biological processes and pathways dependent upon G-protein coupled receptor (GPCR) signaling. GPCRs are a major family of cell surface receptors that are involved in the physiological processes, including regulation of immune systems, cellular motility, and differentiation [31].
The enrichment of ECM organization terms and pathways was observed to be associated with cluster L.6. For example, ECM organization, collagen fibril organization, and protein complex subunit organization terms as well as protein digestion and absorption and ECM-receptor interaction pathways were enriched in cluster L.6. Likewise, cluster L.11 was observed to be more represented in ECM degradation terms, such as proteolysis and ECM disassembly. However, no meaningful KEGG pathways were assigned to this cluster.
As shown in Table 3, clusters L.7 and L.9 were more representative of metabolic terms and pathways, such as the cholesterol metabolic process, synthesis and degradation of ketone bodies, and keratan sulfate metabolic process.
The PPI network of PBMC was divided into two significant clusters. These two clusters and the biological processes and pathways relevant to the PPI network of PBMC are listed in Table 4. Type 1 interferon and cytokine-mediated signaling pathways were prevalent in cluster P.1. However, there was no significant KEGG pathway related to this cluster. Moreover, the biological processes in cluster P.2 were involved in responses to various substances. Apoptosis, TNF signaling pathway, and osteoclast differentiation were more common KEGG pathways in this cluster. As shown in Fig. 4a, all clusters in the PPI network of PBMC were enriched in terms and pathways related to immunity.
Four significant clusters were found in the skin PPI network. Table 5 shows that cluster S.1 contains terms     and pathways related to the extracellular matrix organization. However, clusters S.2 and S.4 were mostly enriched in the immunity biological processes and KEGG pathways. Cluster S.3 represents terms and pathways related to immunity, including complement, coagulation cascade, and platelet degradation, as well as terms related to the ECM organization, comprising ECM organization and ECM disassembly. The PPI network of skin clusters is depicted in Fig. 4b.

TNF signaling significantly enriched in three investigated tissues
Analysis showed that the TNF signaling pathway was enriched significantly in cluster L.1 and cluster L.2 of lung tissue, cluster P.2 of PBMC, and cluster S.2 of skin tissue. Moreover, it was detected that the TNF signaling arose from the function of different sets of genes in individual tissues. The TNF signaling-related genes in each cluster of the three evaluated tissues are represented in Table 6.

Discussion
Despite the vast amount of research on SSc, its etiopathogenesis has not yet been fully clarified. Consequently, an effective systemic or targeted therapy does not exist [32]. Genome-wide transcriptional profiling and genome-wide association studies in different tissues from SSc patients have produced valuable information, which can be integrated thoroughly to approach SSc pathophysiology with a comprehensive understanding. Integrative gene expression analysis and the construction of PPI networks can be performed using gene expression data extracted from RNA-seq and microarray. RNA-Seq increases accuracy for low-abundance transcripts [33] and has higher resolution for identifying tissue-specific expressions [34]. However, it is a relatively new method, and there is a small amount of RNA-seq data concerning different tissues of SSc in databases compared to microarray. Therefore, in the present study, microarray data from several SSc tissues was used to investigate whether common pathways influence SSc pathogenesis across affected tissues.  The GO and KEGG pathway analyses of clusters in PBMC as well as lung and skin tissues indicated the clusters contained biological processes and pathways, including extracellular matrix organization, immune response, inflammatory response, cell proliferation, and apoptosis, which may play roles in SSc pathogenesis.
Although no shared DEGs were detected among the three evaluated tissues, TNF signaling pathway was enriched significantly in all three as a common pathway. Therefore, it seems that this common pathway arises from diverse sets of genes in distinct tissues. TNF has a pivotal role in response to infections and in the   pathogenesis of different immune-mediated disorders, like rheumatoid arthritis (RA) and spondyloarthritis (SpA) [35,36]. However, its role in fibrotic disorders like SSc is controversial [37]. Investigations have demonstrated that antagonists of TNF prevent fibrosis in mouse models of silica-induced and bleomycin-induced pulmonary fibrosis [38,39]. Moreover, progressive lung fibrosis has been indicated in patients with RA after treatment with infliximab (a TNF-α blocker) [40]. Conversely, the anti-fibrotic effects of TNF have been indicated in several in vitro studies [41,42].
Among the cytokines with increased levels in SSc, transforming growth factor (TGF)-β, interleukin (IL)-6, and IL-4 are considered as main fibrogenic cytokines in this disease. Different immune cell types, such as macrophages, T cells, B cells, and dendritic cells (DCs) have also been implicated in the immunopathogenesis of SSc. The involvement of multiple cytokines, cell types, and organs makes it difficult to clarify the precise pathogenesis of SSc. However, it seems that activation of the immune system and initiation of the autoimmunity trigger the tissue fibrosis [43,44]. Consistently, analysis in the Fig. 2 Boxplots of data before and after batch effect correction. a) Boxplot of PBMC data before batch effect removal. b) Boxplot of PBMC data after batch effect removal. c) Boxplot of lung data before batch effect removal. d) Boxplot of lung data after batch effect removal. e) Boxplot of skin data before batch effect removal. f) Boxplot of skin data after batch effect removal. Different datasets were displayed in different colors Fig. 3 Lung PPI network. Twelve distinct functional clusters were detected in lung tissue. Each cluster is a set of highly-connected nodes and is illustrated in a discrete color Fig. 4 a) PBMC PPI network. Two distinct functional clusters were identified in PBMC. b) Skin PPI network. Four distinct functional clusters were detected in skin tissue. Each cluster is a set of highly-connected nodes and is illustrated in a discrete color current study indicated that several biological processes and pathways, such as cytokine-mediated signaling, inflammatory response, and TNF signaling which are involved in immunity and inflammatory processes, were significantly enriched in the three investigated tissues.
Cheadle and colleagues have performed an integrative analysis of microarray data from PAH, pulmonary hypertension (PH), and idiopathic arterial hypertension (IPAH), SSc diseases. Their analysis showed that erythrocyte signature was enriched significantly in PBMCs from PH patients in comparison with SSc and healthy individuals [13]. However, we performed an integrative gene expression analysis only on SSc patients and our analysis revealed that the clusters regarding to the immunity and inflammatory processes and pathways were significantly enriched in PBMCs of the SSc patients.
Although all the biological processes and pathways obtained from the PPI network analysis of PBMC were implicated mostly in clusters that can be labeled as immunity or inflammatory, the lung and skin tissues were enriched in other clusters in addition to immunity and inflammatory clusters. For example, some other lung and skin clusters could be labeled as extracellular matrix (ECM) organization. As expected, the terms and pathways associated with fibrosis containing ECM organization, ECM-receptor interaction, and collagen fibril organization were common in lung and skin tissues. The lung PPI network analysis showed that its terms and pathways not only could be labeled as the inflammation, immunity, and ECM organization, but could also be implicated in cell proliferation and cell death, regulation of bone morphogenetic proteins (BMPs), GPCR signaling, and metabolic processes.
The detection of clusters related to cell proliferation and cell death as well as regulation of BMPs is compatible with previous studies that have indicated enrichment of the cell cycle, proliferation, and p53 signaling in SSc [8]. BMPs are growth factors belonging to the transforming growth factor-β (TGF-β) superfamily which play important roles in cell proliferation, apoptosis, and regeneration after injury. The serum and tissue levels of TGF-β, a major pro-fibrotic cytokine in the pathogenesis of SSc, are elevated in SSc patients [45,46]. Previous studies have indicated that the balance between TGF-β and BMP signaling is essential and is considerably perturbed in pulmonary fibrosis [47]. A report indicated that increased BMPRII degradation, arising from elevated TGF-β activity, led to impaired BMP signaling in patients with PAH and SSc [48].
The altered expression of G protein-coupled receptors (GPCRs) and their ligands has been associated with multiple immune-mediated disorders, like pulmonary arterial hypertension (PAH) and RA [49,50]. The pathophysiological mechanisms of SSc, like detrimental vasoconstriction, pro-inflammatory, proliferative, and pro-fibrotic effects, are mediated by angiotensin II (Ang II) and endothelin 1 (ET1) through type I angiotensin II receptor (AT1R) and endothelin I receptor (ETAR), respectively [51]. High levels of autoantibodies against GPCRs like AT1R and ETAR contribute to the pathogenesis of SSc [52].
Recent studies have indicated that there is an association between metabolic pathways and immunemediated disorders [7,53,54]. Metabolic processes regarding SSc have not been investigated profoundly. Blood metabolomics analysis revealed that glycolysis, gluconeogenesis, energetic pathways, degradation of ketone bodies, and pyruvate metabolism are the most important networks in SSc [7]. Analysis in the current study revealed that clusters with several metabolic processes and pathways containing synthesis and degradation of ketone bodies and steroid biosynthesis are associated with SSc lung tissue. However, no metabolic pathway or term was enriched in PBMC and skin tissue.

Conclusions
Based on the current results, it seems that common pathological pathways contribute to the pathogenesis of SSc in different tissues. However, tissue type may make it susceptible to the initiation of more complicated pathways. Areas for future exploration may include determining the role of TNF signaling pathway in the initiation and progression of SSc, the role of GPCRs in the pathophysiology of SSc, the metabolomics profiling of SSc in different tissues, and the role of the metabolic process in SSc pathogenesis. Ultimately, sampling from diverse patients should be conducted tissue by tissue in different stages of the disease to perform more accurate tissue comparisons and design effective systemic or targeted therapies for SSc in the future.