Co-expression analysis to identify key modules and hub genes associated with COVID-19 in platelets

Corona virus disease 2019 (COVID-19) increases the risk of cardiovascular occlusive/thrombotic events and is linked to poor outcomes. The underlying pathophysiological processes are complex, and remain poorly understood. To this end, platelets play important roles in regulating the cardiovascular system, including via contributions to coagulation and inflammation. There is ample evidence that circulating platelets are activated in COVID-19 patients, which is a primary driver of the observed thrombotic outcome. However, the comprehensive molecular basis of platelet activation in COVID-19 disease remains elusive, which warrants more investigation. Hence, we employed gene co-expression network analysis combined with pathways enrichment analysis to further investigate the aforementioned issues. Our study revealed three important gene clusters/modules that were closely related to COVID-19. These cluster of genes successfully identify COVID-19 cases, relative to healthy in a separate validation data set using machine learning, thereby validating our findings. Furthermore, enrichment analysis showed that these three modules were mostly related to platelet metabolism, protein translation, mitochondrial activity, and oxidative phosphorylation, as well as regulation of megakaryocyte differentiation, and apoptosis, suggesting a hyperactivation status of platelets in COVID-19. We identified the three hub genes from each of three key modules according to their intramodular connectivity value ranking, namely: COPE, CDC37, CAPNS1, AURKAIP1, LAMTOR2, GABARAP MT-ND1, MT-ND5, and MTRNR2L12. Collectively, our results offer a new and interesting insight into platelet involvement in COVID-19 disease at the molecular level, which might aid in defining new targets for treatment of COVID-19–induced thrombosis. Supplementary Information The online version contains supplementary material available at 10.1186/s12920-022-01222-y.


Introduction
The coronavirus SARS-CoV-2 is a highly contagious infection that causes a severe respiratory disease known as COVID- 19. This disease that has reached a pandemic level, is impacting tens of millions of people worldwide. In the United States, there are around 78 million reported cases, over 4 million hospital admissions, and 900 thousand deaths as of February 2022 [1]. It is now known that COVID-19-induced thrombosis increases the incidence of cardiovascular occlusive events in infected patients, a fact that has been reported in several studies [2][3][4], Indeed, abnormal hemostasis responses were observed in COVID-19 hospitalized patients, which was linked to poor prognosis [2,5,6] In addition, studies have shown that COVID-19 leads to increase in platelet activation through alterations of platelet transcriptome and proteome [7,8]. In this connection, it is now well established that platelets play roles beyond vascular hemostasis, including innate immunity and tumor metastasis [9]. Moreover, platelets were shown to be activated in the septic state, and antiplatelet therapy has been used as a strategy to prevent organ damage in sepsis [10]. To this end, evidence has indicated that viral infections are associated with coagulation disorders, and thrombotic cardiovascular events [11,12], which is consistent with the thrombotic phenotype seen in COVID-19 patients/ SARS-CoV-2 viral infection. While there has been some progress, our understanding of the pathways that govern platelet participation in COVID-19-induced thrombosis remains limited, but clearly warrants investigation.
To obtain a comprehensive insight into the pathogenesis of specific disease states, several computational and research methods have been developed [13]. Some of these approaches were employed to examine the potential gene networks, which are very instrumental to guide understanding of diseases and their mechanistic pathways. Notably, co-expression analysis is one such approach, which clusters genes into coexpressed groups known as modules. These genes that belong to the same module are thought to share functional properties [14]. This approach relies on using graph theory concepts that allow researchers to understand in a systematic way the relations between the genes of a module and the phenotype based on the module eigingene [14]. In fact, coexpression using weighted correlation network analysis (WGCNA) has been used for analyzing a number of biological processes, including cancer [15,16] and cognitive and mental disorders [17,18]. In short, gene networks provide the utility to move beyond individual-gene comparisons and comprehensively identify biologically meaningful relationships between gene products and phenotypes.
At the same time, machine learning and artificial intelligence are getting extensively used in biology [19], especially for feature selection. "Feature selection" is used to select the minimum number of features to predict the biological phenomenon or correctly classify the biological samples. This approach facilitates understanding of the underlying disease mechanisms and other factors that reasonably could have affected the disease status. One particular approach for results validation is to build a classifier using the information derived from the identified set of biomarkers (e.g., gene expression) and test the performance of that classifier on totally different data set to examine its ability to classify two status (e.g. disease vs healthy). successful classifier gives strong evidence supporting the biomarkers validity [20,21].
Previous studies on the mechanisms of thrombosis in COVID-19 disease have primarily concentrated on specific pathophysiological functions, with relatively fewer studies identifying comprehensive regulatory networks. Therefore, in the present study, WGCNA was used to determine gene networks associated with COVID-19 disease in platelets. PRJNA634489 data set-which contained a total of 15 samples from COVID-19 patients and health controls [7] was used in the present study. Three modules with the highest level of significance in correlation with COVID-19 disease were identified. Of note, the three aforementioned modules were validated as a predictor of COVID-19 phenotype using another set, and the three genes with the highest intramodular connectivity were selected as the hub genes in the respective modules for COVID-19. Gene enrichment analysis was also conducted to determine enrichments in the key modules. The results of this study may provide novel information/ insights into the underlying mechanisms of COVID-19 disease and may assist in the identification of potential biomarkers for diagnosis and/or targets for treatment.

Data preprocessing and differentially expressed genes screening
RNAseq data is publicly available and were downloaded from BioProject accession #PRJNA634489 [7]. Data comprised of ten COVID-19 patients in addition to age-and sex-matched five healthy controls. Of note, while the original paper included a total 58 subjects divided as 41 COVID-19 patients and 17 healthy controls, samples from only 15 subjects were sequenced, and hence used in our analysis. The Kallisto program was employed for pseudoalignment of reads and quantification to obtain the counts and the transcript per million (TPM) [22]. Log2CPM (log transformed counts per million) was used for the differential expression analysis by employing Voom normalization [23] and Limma R package [24] TPM normalized and filtered to exclude low variance transcripts ( ≤ 0.001) [25] was used for the weighted gene co-expression network analysis. All methods were performed in accordance with the relevant guidelines and regulations.
RNA seq data for validation was downloaded from the publicly NCBI SRA repository under accession: #PRJNA736410, analyzed and normalized by following the same steps as first data set.

Weighted gene coexpression network analysis
The weighted co-expression network was produced using R package "WCGNA" [14] as per the flowchart in Fig. 1. To weight highly correlated genes, the soft thresholding power ( β ) was set at 12, and the minimal module size was set at 30. To define clusters of genes in the data set, the adjacency matrix was used to calculate the topological overlap matrix (TOM), which shows the degree of overlap in shared neighbors between pairs of genes in the network. The resulting gene network was visualized as a heatmap.

Screening for key modules and hub genes
Correlation between module eigengenes and the COVID-19 status was calculated to identify key modules that have significant correlation. The correlation values were displayed within a heatmap. The modules that correlated with COVID-19 most significantly were considered as the key modules. Gene significance (GS) was defined as the correlation between gene expression and the COVID-19 status. Module membership (MM) was defined as the correlation between gene expression and each module's eigengene, and intramodular connectivity (K.in), which measures how connected a given gene with respect to the genes of a particular module, was also calculated using WGCNA. Subsequently, the correlation between GS and MM as well as GS and k.in were examined to verify module-COVID-19 status associations. The correlation analyses in this study were performed using the Pearson correlation as described in the "WGCNA" package [14]. All module genes were ranked according to their intramodular connectivity, and only the top three genes were selected as hub genes.

Validation of key modules using machine learning
To validate the results of the above mentioned analysis, multiple classification models (Lasso, Naiive Bayes, Random forest, SVM and XGBboost) were trained using the key modules of the original data set. Those models were employed to classify the samples of a second data set [26] of platelets gene expression in COVID-19 patients and healthy subjects. The second data was totally isolated from the training process.

Functional enrichment analysis of key modules
The genes in each key modules were extracted from the network and enrichment analysis was performed to further explore the functions of the respective modules. Targetmine [27] which is a web-based integrative data analysis platform for target prioritisation and broadbased biological knowledge discovery-was used to perform Gene Ontology (GO) and Reactome pathway enrichment analysis. In this analysis, a benjamini hochberg adjusted P-value of 0.05 was set as the significance threshold to identify the most significant functional pathways/GO terms. Only top results of enriched terms are reported.

Construction of co-expression network
The transcript per million (TPM) gene expression data set were filtered based on variance, and 7119 genes in the 15 samples of ten COVID-19 patients and five healthy controls were used to construct the co-expression network. The results of cluster analysis of the samples are demonstrated in (Fig. 2A). To construct the network, a soft-threshold of 12 was used to obtain the approximate scale-free topology (Additional file 1: Fig. S1). Genes across the 15 samples were hierarchically clustered based on topological overlap (Fig. 2C, D). We identified 16 modules in which genes are coexpressed, random colors were assigned to the modules to distinguish between them. The size (number of genes/module) of each module is presented in (Fig. 2B). To demonstrate how these modules were relatively distinctive, we plotted the network heatmap of 400 randomly selected genes based on

Correlation between modules and COVID-19 disease status
To examine the relation of COVID-19 status with the emerged modules, we built the eigengene adjacency matrix by calculating the correlation of the eigengenes matrix after inserting COVID-19 status to the matrix. The heatmap (Fig. 3B) showed the modules' relationship and the correlation between the modules namely black, cyan, yellow, blue, and magenta and COVID-19 status.

Identification of key modules in relationship to COVID-19 disease status
To further determine the closest modules to COVID-19 status, we re-clustered the eigengenes using single linkage method with absolute correlation as a distance function; the single linkage clustering algorithm looks for closest pair of modules to form a cluster, then cluster them with the next nearest module progressively until one cluster Column corresponds to a clinical trait, and each row corresponds to a module. Each row contains the correlation coefficients which correspond to the cell color; green represents negative correlation and red represents positive correlation. The P-values are stated in the brackets is formed [35]. As demonstrated in Fig. 3C, the closest three modules to COVID-19 status are magenta, yellow and black. Three essential measurements can help confirm the importance of the module to a specific trait, 1) Module membership (MM), which increases for a particular gene, when the module eigengene accurately represents this gene, 2) gene significance (GS) is measured by calculating the correlation of gene expression with the specific trait and 3) intramodular connectivity (K.in) for a gene within the module, reflecting the centrality of the gene to the module expression network. Based on WGCNA, if a gene is higher with GS, MM, and K.in, it is more meaningful to the clinical trait of interest [36,37].
Explicitly, the higher the correlation between gene significance of genes in a module and their module membership, the higher its importance. Similarly, when the gene centrality in the network increases in parallel with gene significance, that also is strong evidence that key modules are essential in that trait. The correlations between gene significance and module membership as well as between gene significance and intramodular connectivity show that yellow, black, and magenta modules have the highest correlation values with a substantial difference to the next nearest module (Blue R = 0.61) (Additional file 1: Fig. S2). For those reasons, we selected yellow, black, and magenta modules for further investigation and will refer to them using the term key modules.

Key modules show high correlation to COVID-19 disease status
The module-trait relationship was determined by correlating module eigengenes with COVID-19 disease status to identify significant correlation. The yellow and the black modules exhibited the highest positive correlation (R=0.91; p-value=3 × 10 −6 , and R=0.86; p-value= 3 × 10 −5 , respectively; Fig. 3D). On the other hand, the magenta module (R=-0.96; p-value=1 × 10 −8 ) exhibited the highest negative correlation (Fig. 3D). Therefore, these three modules were identified as key modules for COVID-19 disease and its impact on platelets. The significant correlations between the different GS, MM, and K.in for COVID-19 are illustrated in (Fig. 4A, B). We also showed the GS, MM, and K.in of the green module that showed the low correlation to COVID-19 disease status (Fig. 4A, B).
In summery, although all samples were used to identify the co-expression modules, the top modules were selected based on meeting the following criteria: 1) high correlation between module eigengene and COVID-19 status, 2) close clustering with COVID-19 status using single linkage with absolute correlation distance, 3) high correlation between genes significance and module membership, and 4) high correlation between gene significance and intramodular connectivity. Together those measures confirm the importance of the key modules in COVID-19 status

Key modules' genes can differentiate COVID-19 from normal subjects
The classification models trained using data from key modules genes showed high performance in terms of high balanced accuracy, sensitivity, specificity, Matthews correlation coefficient, as well as, area under the receiver operating characteristic curve (AUC) (Fig. 5), suggesting that the genes of these three modules are important in the pathology of COVID-19 disease. Furthermore, the accurate classification of the external validation set samples suggests that these results can be generalized and not limited to the analyzed data set.

Gene hub detection and visualization of module networks
Genes in the selected key modules were ranked according to the intramodular connectivity and the top 20 genes of each key modules were used to visualize the network of each specific module (Fig. 6). Subsequently, the top three genes of the yellow, black, and magenta modules were labeled as the hub genes in their modules that are important for COVID-19 disease. Thus, the protein coding genes COPE, CDC37 and CAPNS1 were selected as the hub genes in the yellow module, whereas AURKAIP1, LAMTOR2, and GABARAP protein coding genes were selected as the hub genes in the black module. Regarding the magenta module, MT-ND1, MT-ND5, and MTRNR2L12 were selected as hub genes. All of these hub genes exhibited a high intramodular connectivity, which established their network centrality and potentially vital roles in the COVID-19 disease. We also observed that not all of hub genes show differential gene expression (Table 1). A full list of genes and their modules can be found in the supplementary tables (Additional files 2, 3, 4, 5, 6).

Enrichment analysis of key modules
Gene ontology (GO) pathway enrichment analyses were performed on the yellow, black, and magenta modules using Targetmine platform, and the top relevant terms of each category are presented in (Fig. 7A). The pathway enrichment results demonstrated that the genes in both yellow and black modules were primarily enriched in pathways associated with metabolic process, protein translation, energy substance metabolism, mitochondrial activity, and oxidative phosphorylation. Genes in the magenta module were enriched in several pathways that are primarily associated with regulation of megakaryocyte differentiation and apoptosis, including the regulation of the execution phase of apoptosis. Reactome showed enriched pathways of metabolism, platelet degranulation, and response to elevated platelet cytosolic Ca 2+ in the yellow module. The black module shows enrichment of respiratory electron transport, ATP synthesis by chemiosmotic coupling, heat production by uncoupling proteins, citric acid (TCA) cycle, and respiratory electron transport just to name a few (Fig. 7B) (More detailed results are shown in Additional file 1: Fig. S3 and cross check of hubgenes with Disgenet database is shown in Additional file 6: Table: S5 [38]).

Discussion
The underlying pathophysiological mechanisms of thrombosis in COVID-19 are extremely complicated [39], and hence clearly require more examination. Inspecting gene co-expression patterns is proven to be an effective method to analyze and uncover complicated genetic networks. To address the aforementioned issues, in the present study, gene co-expression analysis was performed on platelet RNAseq data set containing gene expression data from ten COVID-19 patients and five healthy controls. There were three modules that were identified as the key modules in COVID-19, with the highest level of significant association. The top three genes of each key module with the highest intramodular connectivity were identified as hub genes for COVID-19 in platelets. The results of the enrichment analysis suggest that the key modules and the pathological processes underlying the disease are associated with energy metabolism, mitochondrial processes, and apoptosis. Furthermore, we also saw enrichment of platelet secretion and activation pathways. These results provide-at least in part-an insight into the comprehensive platelet regulatory network in COVID-19, which should improve the current understanding of the mechanisms underlying immunothrombosis in COVID-19 patients. Ultimately, these findings might help in finding appropriate therapeutic targets. The present study used the data in BioProject accession #PRJNA634489 [7] to perform the co-expression analysis using WGCNA. The data used in this study, which was generated by Manne et al. [7], revealed that COVID-19 disease leads to changes in platelet transcriptional profiles in comparison to control. Manne et al. showed that platelet differential gene expression in COVID-19 is associated with enrichment of protein ubiquitination, antigen presentation, and mitochondrial dysfunction. The major differences in the genes or modules obtained in the present study, compared with the results from other studies including the one by Manne et. al. [7] is that the present study used a more comprehensive method by employing WGCNA. Using this method, we were able to identify/ pull-out co-expression modules of genes, namely the yellow and black modules, which represent important regulatory modules of platelet function in COVID-19. In addition, we were able to identify the magenta module, which represents genes that are negatively correlated Notably, the co-expression analysis revealed a total cluster of 16 modules with the yellow and black modules exhibiting the strongest positive correlation and the magenta exhibiting the strongest negative correlation to COVID-19 disease. These three modules were selected as key modules and their genes deemed important for the COVID-19 disease state. This result was validated when the genes of these three key modules/clusters were used to accurately classify the subjects from another recently published platelet data set (Barrett et. al 2021) to either COVID-19 or healthy using machine learning classifiers. The high accuracy of this classification underscores the importance of these platelet gene clusters in the pathogenesis of COVID-19 disease. Enrichment analysis indicated that the genes in the yellow and black modules were primarily associated with platelet metabolism, energy, and oxidative phosphorylation. Furthermore, the analysis of the yellow module showed enrichment of a host of platelet functional responses/activities, such as platelet degranulation/secretion and increased platelet response to Ca 2+ . Indeed, other studies showed the COVID-19 disease to be associated with platelet activation and increased platelet alpha granule secretion, which are critical in the development of thrombosis seen in those patients [7,8]. It is noteworthy that the platelet alpha granule secretion response is not only important for thrombus formation, but also in inflammation by releasing receptors that facilitate adhesion of platelets with other vascular cells as well as releasing a wide range of inflammatory chemokines [40].
The yellow and black modules show strong enrichment in platelet metabolic processes, which is in agreement with the increase in platelet activation. To this end, previous data have shown that platelet transition from inactive to active state requires alteration in ATP availability [41], and furthermore, substrate metabolism (e.g. glucose) was shown to be essential for platelet activation [42], and thrombosis [43]. This seems to suggest that altered platelet metabolism may play a critical role in the pathophysiology of thrombosis in COVID-19 patients.  Fig. 6 Interaction of gene co-expression patterns in the key identified module and hub gene abundance. The module was visualized using R package "ggraph" software. The node size corresponds to the K.in level. and the thickness of the link represents the strength of correlation between genes. For sake of visualiztion clarity, edges of weight less than 0.6 were not drawn It is important to note that reports have suggested that a state of hypermetabolic demand is one of COVID-19 disease features, especially when sepsis develops [44]. Like other viruses that can impact cellular metabolism in human cells and utilize them to their advantage, SARS-CoV-2 virus appears to have the ability to localize proteins to mitochondria and hijack the host's mitochondrial function [45]. This mechanism might explain the enrichment of platelet mitochondrial processes we observed in the yellow and black modules. This finding is in fact supported by a recent study that reported that SARS-CoV-2 impacts mitochondria in platelets, which affects their involvement in the pathophysiology of thrombosis in COVID-19 patients [46]. The enrichment of protein translation in the yellow and black modules suggests an alteration in protein synthesis and possible hijacking of the translation machinery of platelet by the virus. In line with this observation, one study suggested that the cells infected with SARS-CoV-2 might exhibit a faster protein synthesis rate, which implies a higher translation rate [47]. This notion requires further investigation to determine the exact mechanism underlying enhancement of translation in platelets of COVID-19 patient.
One particular characteristic of platelet apoptotic processes is phosphatidylserine (PS) exposure, which is essential for the generation of thrombin [48]. PS exposure is found to be downregulated in activated platelets from COVID-19 patients due to mitochondrial dysfunction [46]. This observation is supported by the negative regulation of apoptotic processes in platelet enrichment in the negatively correlated magenta module. On the contrary, another report showed that COVID-19 increases PS externalization, which is linked to thrombosis [49]. The impact of platelets mitochondrial damage on hemostasis seems to depend on its severity. Thus, it leads to bleeding by progressing toward apoptosis if it is severe; or toward platelet activation pathways and development of thrombosis risk in case of mild damage [50]. Based on this reasoning, COVID-19 disease-caused mitochondrial damage in platelets is probably mild; and hence the thrombotic phenotype still prevails in these patients. Based on these considerations, more investigation is needed to confirm these observations and to understand the underlying mechanisms.
Additionally, we identified hub genes in each of the key modules. For example, in the yellow module the COPE, CDC37, and CAPNS1, which are protein coding genes involved in vesicle-mediated transport, positive regulation of cellular processes, and regulation of interferons. Furthermore, some of these protein coding genes have also been investigated in platelets and shown to regulate important aspects of their function [51][52][53][54], Interestingly, although our co-expression analysis showed that CAPNS1 is an important hub gene in the yellow module, this gene was not differentially expressed in our differential gene expression analysis. Furthermore, CAPNS1 was found to play a significant role in regulating platelet activity and thrombosis under hypoxia [53], a condition commonly seen in severe COVID-19 patients [55]. This observation might indicate that some of the important genes in establishing thrombotic phenotype in COVID-19 may not necessarily be differentially expressed.
The hub genes of the black module, AURKAIP1, LAM-TOR2, and GABARAP are linked to regulation of mitochondrial activity, regulation of signaling processes, and protein targeting. Data on the role of these genes in platelets is limited, thus, further investigation is warranted. It is noteworthy that LAMTOR2 is a known regulator of the MAPK/ERK and mTOR signaling pathways [56,57], both of which were shown to be important in regulating platelet function [58,59]. Moreover, the p14/LAMTOR2 deficiency-which is associated with one of the primary immunodeficiency diseases that also include "Hermansky-Pudlak syndrome type 2"-has been linked to platelet defects [60]. However, more needs to be done to examine the exact role of LAMTOR2 in platelets of COVID-19 patients.
In the magenta module, MT-ND1 [61], MT-ND5 [62], and MTRNR2L12 protein coding genes are related to NADH dehydrogenase activity and apoptotic processes. According to our analysis, all hub genes in the magenta module are differentially expressed and downregulated in COVID-19 patients in comparison to healthy controls. Down regulation of MT-ND1 and MT-ND5 protein coding genes might, at least in part, explain the mitochondrial dysfunction seen in platelets of COVID-19 patients. With respect to MTRNR2L12, it was observed that it is one of the differentially expressed genes in  [63]. MTRNR2L12 is a paralog of the protein coding gene MTRNR2L8, and both are expressed in platelets [64]. It is of note that MTRNR2L12 was shown to be among the top 10 RNA with differential splice junctions in platelets of patients of multiple sclerosis [65].
In addition to the identified hub genes, a number of other canonical platelet genes in the yellow and black modules were also associated with platelet function. For example, SLEB and ITGA2B protein coding genes were present in the yellow module with high intramodular connectivity (ranked in the top 50) and both proteins are critical for platelet function [66]. Moreover, another canonical platelet gene that was also identified in the yellow module, namely ITGB3 was ranked 132 with regard to its intramodular connectivity, which is considered high in the yellow module of 681 genes. Furthermore, we also noticed that the protein coding gene IFITM3 shows high module membership (black module). The protein encoded by this gene is an interferon-induced membrane protein that was shown to be important in immunity against influenza A H1N1 virus, West Nile virus, and dengue virus [67][68][69]. Most recently, IFITM3 was also found to be upregulated protein in COVID-19 disease [7,70], which importantly was also demonstrated/ confirmed by Western blot [7].
The present study has certain limitations that should be noted. Firstly, the analysis focused on only one data set, due to limited access to platelet gene expression data that were collected from COVID-19 patients. Therefore, additional data sets should be analyzed, if available, to validate our findings and/or obtain more representative results. Also, the number of samples was 15, which may be associated with some noise, albeit it is the minimum number of samples recommended for co-expression analysis by WGCNA. Finally, any limitations in the original study, from which the data was obtained will also be reflected in the results of this study.
In conclusion, our co-expression analysis of a platelet RNAseq data set from COVID-19 patients and healthy controls revealed 16 modules, amongst which the yellow, black, and magenta were identified as the most critical in COVID-19 disease and validated using machine learning. Additionally, nine hub genes were determined to potentially serve key roles in the pathophysiological mechanisms of COVID-19 in the context of platelet biology. The positively associated yellow and black modules were identified to be involved in platelet degranulation, energy metabolism, and mitochondria. The negatively associated magenta module was associated with interactive pathways of apoptosis. These data should help expand our understanding of the underlying mechanisms of thrombosis in COVID-19 disease and help promote and guide future experimental studies to investigate the roles of the protein coding genes in the pathophysiology of this disease. Additionally, these genes may serve as novel therapeutic targets for treating patients.

Funding
No external funding was utilized for this work.

Availability of data and materials
The data sets analysed during the current study are publicly available in the NCBI BioProject repositories, (PRJNA634489, PRJNA736410).

Declarations
Ethics approval and consent to participate (Not applicable) The data used are publicly available in NCBI BioProject repository, therefore no administrative permission is required to access the raw data. as well as, ethical approval and consent to participate are not applicable. The data is stored in NCBI Bioproject SRA (Sequence Read Archive), therefore, it is controlled by NIH GDS (National Institute of Health Genomic Data Sharing) policy which mandates that data should be anonymized prior to upload to the SRA repository. The original data were produced by experiments carried out in accordance with the Declaration of Helsinki according to the original published papers [7,26].

Consent for publication
Not applicable.