- Research article
- Open Access
Identification of biological correlates associated with respiratory failure in COVID-19
BMC Medical Genomics volume 13, Article number: 186 (2020)
Coronavirus disease 2019 (COVID-19) is a global public health concern. Recently, a genome-wide association study (GWAS) was performed with participants recruited from Italy and Spain by an international consortium group.
Summary GWAS statistics for 1610 patients with COVID-19 respiratory failure and 2205 controls were downloaded. In the current study, we analyzed the summary statistics with the information of loci and p-values for 8,582,968 single-nucleotide polymorphisms (SNPs), using gene ontology analysis to determine the top biological processes implicated in respiratory failure in COVID-19 patients.
We considered the top 708 SNPs, using a p-value cutoff of 5 × 10− 5, which were mapped to the nearest genes, leading to 144 unique genes. The list of genes was input into a curated database to conduct gene ontology and protein-protein interaction (PPI) analyses. The top ranked biological processes were wound healing, epithelial structure maintenance, muscle system processes, and cardiac-relevant biological processes with a false discovery rate < 0.05. In the PPI analysis, the largest connected network consisted of 8 genes. Through a literature search, 7 out of the 8 gene products were found to be implicated in both pulmonary and cardiac diseases.
Gene ontology and PPI analyses identified cardio-pulmonary processes that may partially explain the risk of respiratory failure in COVID-19 patients.
Coronavirus disease 2019 (COVID-19) caused by a novel coronavirus (severe acute respiratory syndrome coronavirus 2, SARS-CoV-2) has resulted in a global pandemic with a rapidly developing global health and economic crisis . Most people with COVID-19 are asymptomatic or experience only mild symptoms . However, about 5% of patients infected with the coronavirus develop acute lung injury and acute respiratory distress syndrome, possibly leading to lethal lung damage and even death .
The most common reported comorbidities associated with poor outcomes in COVID-19 include hypertension, diabetes, cardiovascular disease, and chronic respiratory infections [4, 5]. However, the underlying molecular mechanisms in severe COVID-19 and their interplay with such comorbidities or clinical factors are poorly understood .
To identify putative biomarkers that can help better understand the molecular basis of COVID-19, Blanco-Melo et al. investigated the host transcriptional response to SARS-CoV-2 and other respiratory infections through in vitro, ex vivo, and in vivo experiments . Bioinformatical approaches including gene ontology and protein-protein interaction (PPI) analyses were performed to identify key biological correlates. To investigate key genetic variants associated with respiratory failure in COVID-19 patients, a genome-wide association study (GWAS) was carried out on participants recruited from Italy and Spain . In the current study, we performed an in-depth biological characterization including gene ontology and PPI analyses on summary statistics that resulted from the GWAS analysis in order to identify key biological correlates relevant to respiratory failure in COVID-19 patients.
The GWAS conducted by an international consortium group involved 1980 patients with severe acute respiratory failure induced by COVID-19 at seven hospitals in Italy and Spain . After quality control, the final case-control cohort included 835 patients and 1255 control participants from Italy and 775 patients and 950 control participants from Spain. After genotyping and imputation on genome build GRCh38, univariate analysis was performed for 8,582,968 single-nucleotide polymorphisms (SNPs). The resulting summary statistics including individual SNP positions and p-values were submitted to the European Bioinformatics Institute (www.ebi.ac.uk/gwas; accession numbers, GCST90000255 and GCST90000256) and are available from www.c19-genetics.eu. The GCST90000255 was the main analysis in which all the association statistics were corrected for the top 10 principal components (PCs), whereas in the additional analysis of GCST90000256, association statistics were corrected for the top 10 PCs, age, and sex. In , the main results were found in the analysis on GCST90000255, and GCST90000256 was used for ancillary analysis. In the current study, we therefore focused on the summary statistics of GCST90000255 for further biological analysis because the analysis on GCST90000255 resulted in more plausible biological correlates likely associated with respiratory failure than those in GCST90000256.
To further enrich gene ontology terms with more plausible SNPs likely relevant to acute respiratory failure in COVID-19, we employed a relaxed p-value of 5 × 10− 5 as a filtering threshold on the summary statistics. The SNPs with p-values < 5 × 10− 5 were mapped to nearest genes using a 50 kb window on both upstream and downstream sides of each gene. The resulting list of genes was together fed into MetaCore software (Thompson Reuters, New York, NY) for gene ontology analysis. Further PPI analysis was performed to explore the largest connected network among the resulting list of genes with an option of ‘Direct interactions’ as a network building algorithm in MetaCore software, assuming that interacting proteins in a biological network may have the same or similar molecular functions [8,9,10].
To complement the biological interpretation using genes that were identified based on the proximity of candidate SNPs, the biological analysis described above was repeated using genes identified as expression quantitative trait loci (eQTL) targets from the Genotype-Tissue Expression (GTEx) database for tissues that appear to be relevant to respiratory failure, including the aorta, coronary artery, skeletal muscle, lung, and atrial appendage and left ventricle in the heart .
Gene ontology analysis
In our analysis, with a p-value threshold of 5 × 10− 5 applied to the summary statistics, 708 SNPs remained and a corresponding set of 144 unique genes in autosomes was found (Additional file 1). The list of genes was fed into a MetaCore database. Table 1 shows the top 10 biological processes and corresponding genes that appear to be relevant to respiratory failure in COVID-19 patients, all with false discovery rate (FDR) values < 0.05. Wound healing, epithelial structure maintenance, muscle system process, and cardiac-relevant biological processes were top-ranked.
PPI network analysis
The largest connected PPI network in the list of 144 genes is shown in Fig. 1. The PPI network consisted of 8 gene products for the following genes: GATA4, ID2, MAFA, NOX4, PTBP1, SMAD3, TUBB1, and WWOX. We conducted a literature search in PubMed to investigate the potential associations between those 8 genes/proteins and pulmonary or cardiac diseases. Additional file 2 contains a table that lists an overview of reported studies in terms of the associations. Interestingly, except for MAFA that is involved in insulin secretion, all 7 gene products were found to be implicated in both pulmonary and cardiac diseases.
Tissue-specific genes that had significant associations with the 708 SNPs were identified from GTEx V8 . Six tissues were examined including the aorta, coronary artery, skeletal muscle, lung, and atrial appendage and left ventricle in the heart, resulting in 17, 4, 21, 17, 8, and 10 genes, respectively (Additional file 3). Gene ontology and PPI analyses described above were repeated using the resultant 34 unique genes. Among 34 gene products, no any interaction was found. Table 2 shows the top 10 biological processes and corresponding genes, all with FDR values < 0.05. All biological processes were involved in 3 gene products: CCR3, CCR5, and CXCR6. Chemokine-related biological processes were top-ranked.
Among the 708 SNPs, 41, 48, 5, 59, and 180 SNPs had eQTL associations for multiple tissues with 6, 5, 4, 3, and 2 tissues, respectively (Additional file 4). In addition, rs8093548 (chr18, pos: 79876451 in GRCh38), rs4799099 (chr18, pos: 79879585), and rs4799100 (chr18: pos: 79880207) had eQTL associations with the most genes; all three SNPs had eQTL associations with the same five genes – HSBP1L1, PQLC1, RBFA, RBFADN, and TXNL4A – in five tissues including the aorta, skeletal muscle, lung, and atrial appendage and left ventricle in the heart.
Summary statistics from a GWAS dataset for respiratory failure in COVID-19 patients were analyzed employing bioinformatics techniques. To enrich the biological discovery, a relaxed p-value of 5 × 10− 5 was adopted, which likely enabled the inclusion of much potential genomic information in the analysis and the identification of plausible biological correlates associated with pulmonary or cardiac symptoms. A list of SNPs filtered by the relaxed p-value threshold was mapped to nearby genes. The resulting 144 genes were fed into a MetaCore database for gene ontology and PPI analyses.
Gene ontology analysis identified wound healing, cardiac-related biological process, and muscle system process as key correlates. For PPI analysis, we attempted to find the largest connected network in the list of 144 genes, assuming that interacting proteins in a biological network tend to have the same or similar molecular functions. As a result, the largest connected network consisted of 8 gene products from the following genes: GATA4, ID2, MAFA, NOX4, PTBP1, SMAD3, TUBB1, and WWOX. A literature search was conducted through PubMed to investigate whether there are previously reported results in terms of biological associations between these genes/proteins and respiratory or cardiac symptoms. Interestingly, we found that most of these gene products are relevant to both respiratory and cardiac diseases. In what follows, we describe the role of these biomarkers in biology.
A study reported that GATA4 plays a critical role as a transcription factor in the normal pulmonary development . GATA4 also has been found to be a human candidate gene relevant to congenital heart disease [13, 14]. Several studies showed that GATA4 is a key protein responsible for the development of the lung, heart, and diaphragm in mice [15,16,17].
Arwood et al. described a mechanism of pulmonary hypertension in heart failure with preserved ejection fraction (HFpEF), using transcriptome-wide RNA sequencing . When comparing the transcriptomic difference between patients without pulmonary hypertension and those with combined post- and pre-capillary pulmonary hypertension, six differentially expressed genes were identified. In a further replication test on an independent cohort, only ID2 was validated and in an additional animal study, ID2 expression was significantly upregulated in mice with HFpEF and pulmonary hypertension compared to control mice. Another study showed a functional role of ID2 as one of the culprit genes in both the arterial and the venous poles of the heart .
An increased expression of NOX4 and TGF-β was found to be correlated with the increased volume in both airway smooth muscle mass and epithelial cells of small airways in patients with chronic obstructive pulmonary disease (COPD) . Another study reported that the upregulation of NOX4 in the heart induced cardiac remodeling, suggesting its potential role to reduce the severity of established heart failure .
Gauldie et al. demonstrated a cascade of biological interactions among inflammation, TGF-β activation, SMAD3 signaling, pulmonary fibrosis, and emphysema . Huang et al. found that SMAD3 is a key mediator in chronic cardiovascular disease, and plays a critical role in hypertensive cardiac remodeling .
At 4 and 24 h after respiratory syncytial virus infection, gene expression profiles in human bronchial epithelial cells were analyzed . Among the six genes that were associated with respiratory disease and were significantly altered at both 4 and 24 h post-infection, TUBB1 was the only gene observed to be downregulated at both time points. Freson et al. showed that the TUBB1 Q43P functional variant may be a protective genetic factor against cardiovascular disease .
Caruso et al. observed the downregulation of miR-124 in patients with pulmonary arterial hypertension and its central role in contributing to abnormal cell proliferation via PTBP1 and PKM2 . Recently, Fochi et al. showed the emerging role of RBM20 and PTBP1 as key splicing factors in heart development and cardiovascular disease .
A study reported that the loss of WWOX promoted cell proliferation in pulmonary artery smooth muscle cells and contributed to pulmonary vascular remodeling in pulmonary arterial hypertension . Another study reported the vital implications of WWOX in atherosclerosis and cardiovascular diseases .
MAFA has not been found to be directly related to pulmonary or cardiac symptoms in the literature review. However, MAFA has been shown to be a key regulator that controls genes implicated in insulin secretion [30, 31]. A recent study indicated that a number of patients with COVID-19, who were comorbid with diabetes or diabetes-related traits, had increased ACE2 expression . This suggests that ACE2 appears to be a potentially key molecular link between insulin resistance and COVID-19 severity .
The combined evidence indicated that lung disease is likely to be associated with cardiovascular risk. Further research should be warranted to identify the common biological processes between lung and heart diseases and the interplay between them.
We further assessed various filtering thresholds. With a stricter p-value of 1 × 10− 5, 390 SNPs and corresponding 27 unique genes in autosomes remained. Gene ontology analysis with a relatively small number of genes resulted in immunity-related biological processes as the top important covariates. The top two biological processes were chemokine-mediated signaling pathway (FDR = 2.906E-3) and CD8-positive, gamma-delta intraepithelial T cell differentiation (FDR = 2.906E-3). With a more relaxed p-value of 1 × 10− 4, 1112 SNPs and corresponding 243 unique genes in autosomes remained. Gene ontology analysis with those genes resulted in biological processes that are irrelevant to respiratory failure, which is likely due to false positives added in the analysis. This implies that the selection of an optimal threshold is critical to identify real biological correlates. Information informed by machine learning-based predictive modeling on GWAS data, which we employed in other studies [8, 9], can help resolve the issue.
Biological analyses using genes that were identified based on the proximity of candidate SNPs resulted in cardio-pulmonary processes as associated with respiratory failure. In particular, 7 out of the 8 gene products in the largest connected network were found to be implicated in both pulmonary and cardiac diseases. In contrast, the selection of genes identified as eQTL targets uncovered chemokine-related biological processes, indicating the association with the immune system. This suggests that an integrated analysis of the two methods in identifying relevant genes can help better understand the underlying biological mechanisms of respiratory failure in COVID-19 patients.
We analyzed summary statistics from a GWAS dataset where individual SNPs were tested for associations with respiratory failure in COVID-19 patients. Bioinformatics approaches with SNPs filtered using a relaxed p-value enabled the identification of plausible biological correlates that are likely to be relevant to pulmonary or cardiac symptoms. When genotyping data become available, a more in-depth analysis using machine learning and bioinformatics techniques will provide greater insights into the underlying mechanisms of respiratory failure in COVID-19 patients.
Availability of data and materials
All the data analyzed in this study are available at http://www.c19-genetics.eu.
Coronavirus disease 2019
Genome-wide association study
False discovery rate
Heart failure with preserved ejection fraction
Chronic obstructive pulmonary disease
Expression quantitative trait loci
Blanco-Melo D, Nilsson-Payant BE, Liu WC, Uhl S, Hoagland D, Møller R, et al. Imbalanced host response to SARS-CoV-2 drives development of COVID-19. Cell. 2020;181(5):1036–1045.e1039.
Yaqinuddin A, Kvietys P, Kashir J. COVID-19: role of neutrophil extracellular traps in acute lung injury. Respir Investig. 2020;58(5):419–20.
Baksh M, Ravat V, Zaidi A, Patel RS. A systematic review of cases of acute respiratory distress syndrome in the coronavirus disease 2019 pandemic. Cureus. 2020;12(5):e8188.
Di Carlo DT, Montemurro N, Petrella G, Siciliano G, Ceravolo R, Perrini P. Exploring the clinical association between neurological symptoms and COVID-19 pandemic outbreak: a systematic review of current literature. J Neurol. 2020;1:1.
Malik YS, Kumar N, Sircar S, Kaushik R, Bhat S, Dhama K, et al. Coronavirus disease pandemic (COVID-19): challenges and a global perspective. Pathogens. 2020;9(7):519.
Kalfaoglu B, Almeida-Santos J, Adele Tye C, Satou Y, Ono M. T-cell hyperactivation and paralysis in severe COVID-19 infection revealed by single-cell analysis. Front Immunol. 2020;11:589380.
Ellinghaus D, Degenhardt F, Bujanda L, Buti M, Albillos A, Invernizzi P, et al. Genomewide association study of severe Covid-19 with respiratory failure. N Engl J Med. 2020;383:1522–34.
Oh JH, Kerns S, Ostrer H, Powell SN, Rosenstein B, Deasy JO. Computational methods using genome-wide association studies to predict radiotherapy complications and to identify correlative molecular processes. Sci Rep. 2017;7:43381.
Lee S, Kerns S, Ostrer H, Rosenstein B, Deasy JO, Oh JH. Machine learning on a genome-wide association study to predict late genitourinary toxicity after prostate radiation therapy. Int J Radiat Oncol Biol Phys. 2018;101(1):128–35.
Lee S, Liang X, Woods M, Reiner AS, Concannon P, Bernstein L, et al. Machine learning on genome-wide association studies to predict the risk of radiation-associated contralateral breast cancer in the WECARE study. PLoS One. 2020;15(2):e0226157.
Consortium G. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science. 2020;369(6509):1318–30.
Ackerman KG, Wang J, Luo L, Fujiwara Y, Orkin SH, Beier DR. Gata4 is necessary for normal pulmonary lobar development. Am J Respir Cell Mol Biol. 2007;36(4):391–7.
Garg V, Kathiriya IS, Barnes R, Schluterman MK, King IN, Butler CA, et al. GATA4 mutations cause human congenital heart defects and reveal an interaction with TBX5. Nature. 2003;424(6947):443–7.
Sarkozy A, Conti E, Neri C, D'Agostino R, Digilio MC, Esposito G, et al. Spectrum of atrial septal defects associated with mutations of NKX2.5 and GATA4 transcription factors. J Med Genet. 2005;42(2):e16.
Jay PY, Bielinska M, Erlich JM, Mannisto S, Pu WT, Heikinheimo M, et al. Impaired mesenchymal cell function in Gata4 mutant mice leads to diaphragmatic hernias and primary lung defects. Dev Biol. 2007;301(2):602–14.
Olson EN. Gene regulatory networks in the evolution and development of the heart. Science. 2006;313(5795):1922–7.
Kimura Y, Suzuki T, Kaneko C, Darnel AD, Moriya T, Suzuki S, et al. Retinoid receptors in the developing human lung. Clin Sci (Lond). 2002;103(6):613–21.
Arwood MJ, Vahabi N, Lteif C, Sharma RK, Machado RF, Duarte JD. Transcriptome-wide analysis associates ID2 expression with combined pre- and post-capillary pulmonary hypertension. Sci Rep. 2019;9(1):19572.
Jongbloed MR, Vicente-Steijn R, Douglas YL, Wisse LJ, Mori K, Yokota Y, et al. Expression of Id2 in the second heart field and cardiac defects in Id2 knock-out mice. Dev Dyn. 2011;240(11):2561–77.
Guo X, Fan Y, Cui J, Hao B, Zhu L, Sun X, et al. NOX4 expression and distal arteriolar remodeling correlate with pulmonary hypertension in COPD. BMC Pulm Med. 2018;18(1):111.
Zhao QD, Viswanadhapalli S, Williams P, Shi Q, Tan C, Yi X, et al. NADPH oxidase 4 induces cardiac fibrosis and hypertrophy through activating Akt/mTOR and NFκB signaling pathways. Circulation. 2015;131(7):643–55.
Gauldie J, Kolb M, Ask K, Martin G, Bonniaud P, Warburton D. Smad3 signaling involved in pulmonary fibrosis and emphysema. Proc Am Thorac Soc. 2006;3(8):696–702.
Huang XR, Chung AC, Yang F, Yue W, Deng C, Lau CP, et al. Smad3 mediates cardiac inflammation and fibrosis in angiotensin II-induced hypertensive cardiac remodeling. Hypertension. 2010;55(5):1165–71.
Huang YC, Li Z, Hyseni X, Schmitt M, Devlin RB, Karoly ED, et al. Identification of gene biomarkers for respiratory syncytial virus infection in a bronchial epithelial cell line. Genomic Med. 2008;2(3–4):113–25.
Freson K, De Vos R, Wittevrongel C, Thys C, Defoor J, Vanhees L, et al. The TUBB1 Q43P functional polymorphism reduces the risk of cardiovascular disease in men by modulating platelet function and structure. Blood. 2005;106(7):2356–62.
Caruso P, Dunmore BJ, Schlosser K, Schoors S, Dos Santos C, Perez-Iratxeta C, et al. Identification of MicroRNA-124 as a major regulator of enhanced endothelial cell glycolysis in pulmonary arterial hypertension via PTBP1 (Polypyrimidine tract binding protein) and pyruvate kinase M2. Circulation. 2017;136(25):2451–67.
Fochi S, Lorenzi P, Galasso M, Stefani C, Trabetti E, Zipeto D, et al. The emerging role of the RBM20 and PTBP1 Ribonucleoproteins in heart development and cardiovascular diseases. Genes (Basel). 2020;11(4):402.
Teixeira Gomes M, Chen J, Haider S, Bai Y, Singla S, Machado RF. Smooth muscle cell loss of the tumor suppressor WWOX contributes to the development of pulmonary hypertension. Am J Respir Crit Care Med. 2020;201:A7211.
Tanna M, Aqeilan RI. Modeling WWOX loss of function in vivo: what have we learned? Front Oncol. 2018;8:420.
Wang H, Brun T, Kataoka K, Sharma AJ, Wollheim CB. MAFA controls genes implicated in insulin biosynthesis and secretion. Diabetologia. 2007;50(2):348–58.
Zhang C, Moriguchi T, Kajihara M, Esaki R, Harada A, Shimohata H, et al. MafA is a key regulator of glucose-stimulated insulin secretion. Mol Cell Biol. 2005;25(12):4969–76.
Rao S, Lau A, So HC. Exploring diseases/traits and blood proteins causally related to expression of ACE2, the putative receptor of SARS-CoV-2: a Mendelian randomization analysis highlights tentative relevance of diabetes-related traits. Diabetes Care. 2020;43(7):1416–26.
Finucane FM, Davenport C. Coronavirus and obesity: could insulin resistance mediate the severity of Covid-19 infection? Front Public Health. 2020;8:184.
The authors thank Dr. Andre Franke for his valuable comments and suggestions.
This research was funded in part through the NIH/NCI Cancer Center Support Grant P30 CA008748 and R21 CA234752. The funders had no role in the study design, data collection and analysis, interpretation, and writing of the manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare no conflicts of interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
144 genes identified based on the proximity of 708 candidate SNPs.
An overview of reported studies in terms of the associations between genes/proteins and pulmonary or cardiac symptoms.
For 708 candidate SNPs, eQTL information obtained from GTEx V8 for 6 tissues, including the aorta, coronary artery, skeletal muscle, lung, and atrial appendage and left ventricle in the heart.
SNPs with eQTL associations in multiple tissues.
About this article
Cite this article
Oh, J.H., Tannenbaum, A. & Deasy, J.O. Identification of biological correlates associated with respiratory failure in COVID-19. BMC Med Genomics 13, 186 (2020). https://doi.org/10.1186/s12920-020-00839-1
- Single-nucleotide polymorphisms
- Genome-wide association study
- Respiratory failure