Genomic analysis of circular RNAs in heart

Background Heart failure is a leading cause of human morbidity and mortality. Circular RNAs (circRNAs) are a newly discovered class of RNA that have been found to have important physiological and pathological roles. In the current study, we de novo analyzed existing whole transcriptome data from 5 normal and 5 dilated cardiomyopathy (DCM) human heart samples and compared the results with circRNAs that have been previously reported in human, mouse and rat hearts. Results Our analysis identifies a list of cardiac circRNAs that are reliably detected in multiple studies. We have also defined the top 30 most abundant circRNAs in healthy human hearts which include some with previously unrecognized cardiac roles such as circHIPK3_11 and circTULP4_1. We further found that many circRNAs are dysregulated in DCM, particularly transcripts originating from DCM-related gene loci, such as TTN and RYR2. In addition, we predict the potential of cardiac circRNAs to sponge miRNAs that have reported roles in heart disease. We found that circALMS1_6 has the highest potential to bind miR-133, a microRNA that can regulate cardiac remodeling. Interestingly, we detected a novel class of circRNAs, referred to as read-though (rt)-circRNAs which are produced from exons of two different neighboring genes. Specifically, rt-circRNAs from SCAF8 and TIAM2 were observed to be dysregulated in DCM and these rt-circRNAs have the potential to sponge multiple heart disease-related miRNAs. Conclusions In summary, this study provides a valuable resource for exploring the function of circRNAs in human heart disease and establishes a functional paradigm for identifying novel circRNAs in other tissues.


Abstract
Background: Heart failure (HF) is a leading cause of human morbidity and mortality. Circular RNAs (circRNAs) are a newly discovered class of RNA that have been found to have important physiological and pathological roles. In the current study, we de novo analyzed existing whole transcriptome data from 5 normal and 5 dilated cardiomyopathy (DCM) human heart samples and compared the results with circRNAs that have been previously reported in human, mouse and rat hearts.
Results: Our analysis identi es a list of cardiac circRNAs that are reliably detected in multiple studies. We have also de ned the top 30 most abundant circRNAs in healthy human hearts which include some with previously unrecognized cardiac roles such as circHIPK3_11 and circTULP4_1. We further found that many circRNAs are dysregulated in DCM, particularly transcripts originating from DCM-related gene loci, such as TTN and RYR2. In addition, we predict the potential of cardiac circRNAs to sponge miRNAs that have reported roles in heart disease. We found that circALMS1_6 has the highest potential to bind miR-133, a microRNA that can regulate cardiac remodeling. Interestingly, we detected a novel class of circRNAs, referred to as read-though (rt)-circRNAs which are produced from exons of two different neighboring genes. Speci cally, rt-circRNAs from SCAF8 and TIAM2 were observed to be dysregulated in DCM and these rt-circRNAs have the potential to sponge multiple heart disease-related miRNAs.
Conclusions: In summary, this study provides a valuable resource for exploring the function of circRNAs in human heart disease and establishes a functional paradigm for identifying novel circRNAs in other tissues.

Background
Heart failure (HF) is a leading cause of human morbidity and mortality worldwide. Despite advances in the management of HF, it remains a tremendous and growing public health burden in an aging population with an estimated prevalence of > 37.7 million individuals globally [1]. Dilated cardiomyopathy (DCM), characterized by the dilation and impaired contraction of the left or both ventricles, is a primary cause of HF and sudden cardiac death [2]. Understanding the etiology of DCM is vitally important in the search for better therapeutic approaches for HF.
In addition to non-genetic causes, such as hypertension, in ammation, and toxins, genetic factors are increasingly recognized for their roles in HF susceptibility [3,4]. Currently, mutations in more than 100 genes have been linked to DCM [5]. However, these DCM relevant gene variants are only detected in ~37% of DCM patients [6]. Therefore, it is likely that other unknown genetic factors await discovery. The aberrant expression of different types of RNA transcripts, including those transcribed from protein coding genes [7][8][9], long non-coding genes [10,11] and miRNAs [12][13][14], have been observed in DCM. Circular RNAs (circRNAs) are a novel class of RNA that are generated by the covalent ligation of the 5' and 3' ends of a single-stranded RNA transcript to form a circular transcript [15]. CircRNAs are stable and resistant to RNase R, widespread throughout the plant and animal kingdoms and conserved across multiple species. Furthermore, circRNAs can be expressed in a tissue-speci c manner [16,17]. These properties make circRNAs excellent candidates as potential biomarkers for the diagnosis of disease and therapeutic intervention. Emerging evidence has demonstrated that circRNAs play important roles in various physiological and pathological processes, such as myogenesis [18], phenotypic modulation of smooth muscle cells [19,20], atherosclerosis [21,22], and cancer [23][24][25].
Although several efforts have been made to characterize the circRNAs expressed in the heart [26][27][28][29], our understanding of the importance of cardiac circRNAs is still very limited. Furthermore, several fundamental questions remain to be addressed. First, there is a large discrepancy between the presence and abundance of cardiac circRNAs that have been identi ed in previous studies [27][28][29], which limits any interpretation of their importance. To pinpoint truly important cardiac circRNAs, a cross-reference analysis between multiple independent studies is required. Second, it has been documented that circRNAs function primarily as molecular sponges to competitively bind miRNAs or encode small peptides. While the coding ability of cardiac circRNAs have been investigated recently by analyzing translational Ribo-seq data from human hearts [28], the potential of cardiac circRNAs to function as miRNA sponges remains incompletely understood. Finally, read through-circRNAs (rt-circRNAs), a recently discovered type of circRNAs that are produced from the exons of two adjacent genes on the same strand, have been implicated in both normal and tumor tissues [30], yet whether this type of rt-circRNAs are expressed in the human heart remains to be identi ed.
To address these questions, we de novo analyzed publicly available RNA-seq datasets from human hearts. These data sets were of su cient sample size (5 normal controls and 5 DCM patients) [10], to enable us to investigate the general landscape of human cardiac circRNAs under homeostatic (normal) and disease settings. By integrating the analysis of circRNAs identi ed in human hearts with that of mouse and rat hearts in previous studies [26,27,29], we: (i) characterized the general features of circRNAs in human hearts and compared the identi ed circRNAs with circRNAs reported by previous studies and databases; (ii) determined the key circRNAs expressed in healthy hearts as de ned by the top 30 most abundant circRNAs; (iii) identi ed circRNAs that are dysregulated in DCM hearts; (iv) predicted the potential of high-con dence cardiac circRNAs to act as a sponge for previously identi ed heart disease-related miRNAs; Lastly, we identi ed a list of rt-circRNAs, especially those from the SCAF8 and TIAM2 gene loci that are dysregulated in DCM and show strong potential of being miRNA sponges. Our work identi es a set of highly reliable circRNAs with potentially important roles in maintaining cardiac homeostasis which is a valuable resource for exploring the function of circRNAs in human heart disease. Our work further establishes a paradigm for identifying novel circRNAs in other tissues.

Methods
Identi cation of circRNAs RNA-seq raw reads from the left ventricle of 5 DCM patients and 5 norm Identi cation of circRNAs RNA-seq raw reads from the left ventricle of 5 DCM patients and 5 normal controls were obtained from SRA database (SRP108571) [10] and analyzed as illustrated in Additional le 1: Figure S1. Low-quality reads were rst removed by Trimmomatic 3.0 [31]. The clean reads were then mapped to the GRCh38 human genome reference by Bowtie2 [32]. For the linear RNA analysis, the raw count of each gene was calculated by using HTSeq [33] and Fragments Per Kilobase of exon per Million fragments mapped (FPKM) values were calculated from raw counts to quantify expression of genes [34]. Differential analysis of linear RNA between DCM and normal samples was performed with DESeq2 [35] and genes with fold change (FC) > 2 and FDR < 0.05 were considered signi cance. Next, circRNAs were identi ed following the instruction of nd_circ [36]. Brie y, by using SAMtools the unmapped reads were retrieved [37]. Then these unmapped reads were cut off 20 mers from both ends and mapped to the human genome independently in order to nd unique anchor positions. Anchors that aligned in the reversed orientation suggest a back-spliced junction. CircRNAs with ≥ 2 unique back spliced reads in at least one sample were saved for analysis of circRNA features. The expression of circRNAs was normalized to Spliced Reads per Billion Mapped Reads (SRPBM). For identi cation of the differentially expressed circRNAs between normal and DCM samples, only those high con dence circRNAs, with ≥ 2 unique reads in at least four samples from at least one group were used. Differentially expressed circRNAs were identi ed with edgeR package based on a negative binomial distribution model [38]. CircRNAs with FC > 2 and p value < 0.05 were considered signi cant. Pathway analysis was performed using Metascape [39].

Analysis of circRNAs from two different genes
The 50-bp sequences from each side of the back-spliced junction of the cricRNAs arising from the exons of two different genes (gene 1 and gene 2), were retrieved and used as query sequences to perform BLAST analysis against human cDNA sequences downloaded from ENSEMBL database with default parameters. If any of the two sequences could be aligned to both gene 1 and gene 2, the circRNA was considered as possible false positive due to the sequence homologue of gene 1 and gene 2. If the sequence from each side of the back-spliced junction returned BLAST hits only matching its expected host gene, the circRNA was considered as true rt-circRNA.
Construction of circRNA-miRNA-mRNA regulatory network A list of 50 miRNAs which have been implicated in heart disease were obtained (References were provided in Additional le 2: Table S1). Seed sequence and potential targeted genes of these miRNAs was obtained from TargetScan database (http://www.targetscan.org/). CircRNA sequence was retrieved using custom R script and TargetScan was used for predicting the binding sites of the selected miRNAs. CircRNAs with at least 5 binding sites for at least one miRNA were considered to have potential to sponge miRNAs. The circRNA-miRNA-mRNA regulatory network was visualized using Cytoscape (https://cytoscape.org/).

Experimental validation of circRNAs
Due to the lack of availability of human samples, the mouse orthologs of 6 identi ed human circRNAs were used for experimental validation. Mouse heart RNA was extracted with TRIzol reagent (Invitrogen) and reverse transcribed using random hexamers and High Capacity RNA-to-cDNA Kit (Invitrogen). Convergent primers with opposite orientation were designed (Additional le 3: Table S2) and used for PCR ampli cation on mouse heart cDNA and genomic DNA. Mouse Yap1 gene primers were used as a positive control for PCR ampli cation of genomic DNA. PCR products were then subjected to Sanger sequencing.
al controls were obtained from SRA database (SRP108571) [10]. Low-quality reads were rst removed by Trimmomatic 3.0 [58]. The clean reads were then mapped to the GRCh38 human genome reference by Bowtie2 [32]. For the linear RNA analysis, the raw count of each gene was calculated by using HTSeq [59] and Fragments Per Kilobase of exon per Million fragments mapped (FPKM) values were calculated from raw counts to quantify expression of genes [60]. Next, circRNAs were identi ed following the instruction of nd_circ [33]. Brie y, by using SAMtools the unmapped reads were retrieved [61]. Then these unmapped reads were cut off 20 mers from both ends and mapped to the human genome independently in order to nd unique anchor positions. Anchors that aligned in the reversed orientation suggest a backspliced junction. CircRNAs with ≥ 2 unique back spliced reads in at least one sample were saved for analysis of circRNA features. The expression of circRNAs was normalized to Spliced Reads per Billion Mapped Reads (SRPBM). For identi cation of the differentially expressed circRNAs between normal and DCM samples, only those high con dence circRNAs, with ≥ 2 unique reads in at least four samples from at least one group were used. Differentially expressed circRNAs were identi ed with edgeR package based on a negative binomial distribution model [62]. CircRNAs with fold change > 2 and p value < 0.05 were considered signi cant. Pathway analysis was performed using Metascape [63].

Comparison with other studies
Results from three previous studies [26,27,29], which identi ed circRNAs in human, mouse or rat hearts, were used in comparison with our results. The R package liftOver (https://www.bioconductor.org/packages/release/work ows/html/liftOver.html) was used to convert start and end coordinates (hg19, mm9, mm10 and rn5) to hg38 rst and then the converted coordinates were used for cross-reference and cross-species comparison.

Analysis of circRNAs from two different genes
The 50 bp sequences from each side of the back-spliced junction of the cricRNAs arising from the exons of two different genes (gene 1 and gene 2), were retrieved and used as query sequences to perform BLAST analysis against human cDNA sequences downloaded from ENSEMBL database with default parameters. If any of the two sequences could be aligned to both gene 1 and gene 2, the circRNA was considered as possible false positive due to the sequence homologue of gene 1 and gene 2 (as illustrated in Additional le 5: Figure S3E). If the sequence from each side of the back-spliced junction returned BLAST hits only matching its expected host gene, the circRNA was considered as true rt-circRNA. miRNA sponge analysis A list of 50 miRNAs which have been implicated in heart disease were obtained (References were provided in Additional le 10: Table S5). Seed sequence of these miRNAs was obtained from TargetScan database (http://www.targetscan.org/). CircRNA sequence was retrieved using custom R script and TargetScan was used for prediction of binding sites for the selected miRNAs. CircRNAs with at least 5 binding sites for at least one miRNA were considered to have potential to sponge miRNAs and were deposited in Supplemental Table S5.

Experimental validation of circRNAs
Due to the lack of availability of human samples, the mouse orthologs of 6 identi ed human circRNAs were used for experimental validation. Mouse heart RNA was extracted with TRIzol reagent (Invitrogen) and reverse transcribed using random hexamers and High Capacity RNA-to-cDNA Kit (Invitrogen). Convergent primers with opposite orientation were designed (Additional le 11: Table S6) and used for PCR ampli cation on mouse heart cDNA and genomic DNA. Mouse Yap1 primers were used as a positive control for PCR ampli cation of genomic DNA. PCR products were then subjected to Sanger sequencing.

General features of circRNAs in both healthy and DCM human hearts
We performed a de novo analysis of public RNA-seq data sets that were generated from left ventricular tissues obtained from 5 healthy individuals and 5 DCM patients [10]. These data sets were selected because they were generated using a ribosome-depletion library, which is the gold standard method to identify back-spliced junctions for circRNAs detection [44]. We rst removed low-quality reads and then utilized Bowtie2 [32] to align clean reads against the human reference genome. Reads continually aligned to the genome were used for linear gene expression analysis and the remaining un-mapped reads were further used as input for nd_circ [36] to detect both linear and back-spliced junctions (Additional le 1: Figure S1). We rst performed a quality control validation of this data by performing principal component analysis (PCA) based on the expression of linear genes with raw count > 10 in at least 5 samples. PCA analysis con rmed that samples from each group were clustered respectively as expected (Additional le 4: Figure S2A). We further examined the expression of a marker of HF, atrial natriuretic peptide α (NPPA). Consistent with the reported HF phenotype [45], our analysis revealed that the expression of NPPA is signi cantly increased in DCM specimens (Additional le 4: Figure S2B). Taken together, these initial analyses con rm the quality of this RNA-seq dataset and its suitability for the further analysis of circRNAs.
With a threshold cutoff of ³ 2 unique back-spliced reads present in at least one sample, we identi ed a total of 20,214 circRNAs in the 10 samples ( Figure 1A and Additional le 5: Table S3). The identi ed circRNAs originated primarily from the coding-sequence (CDS) regions of exons of single linear genes (67.2%), followed by exons spanning coding and untranslated regions (UTR) (UTR-CDS) (11.6%) ( Figure  1B). Notably, a total of 252 circRNAs (1.2%) were assigned to the exons of two adjacent genes on the same strand (Additional le 6: Table S4), likely representing a novel class of circRNAs termed as rt-circRNA [23]. In particular, the genes MYH6 and MYH7 produced the largest number (n = 22) of such circRNAs, which were also observed in human, mouse and rat hearts as reported by two earlier studies (Additional le 7: Figure S3A-C) [27,29]. In addition, these MYH6/MYH7-derived circRNAs were highly abundant and conserved in hearts of human, mouse and rat (Additional le 7: Figure S3D). However, close inspection of the sequence between the exons where they are back-spliced suggests that the backspliced junctions of these circRNAs are probably artifactual due to read misalignment caused by the high sequence homology between MYH6 and MYH7 (Additional le 7: Figure S3E). This result prompted us to employ more stringent criteria to lter circRNAs originating from two neighboring genes with highly homologous sequences. Using this revised ltering strategy, we identi ed 110 putative rt-circRNAs.
Among them, SCAF8 and TIAM2 contain the largest number of rt-circRNAs (n = 4), all of which were also previously identi ed in the human heart ( Figure 1C). We next examined whether there was a correlative relationship between the expression of circRNAs and their parent linear genes. We found that there was a signi cantly higher correlation between circRNAs and their linear parent genes compared to that between circRNAs and randomly chosen linear genes ( Figure   1D), suggesting that the generally high-abundance circRNAs tend to have high-abundance linear counterparts. We further found that the length of the majority of identi ed circRNAs was around 300-400 bp ( Figure 1E). In accordance with previous studies [46][47][48], the most common splice accepting circularized exon was exon 2 ( Figure 1F). Furthermore, we found that the majority of circRNA-producing genes generated 1 circRNA (41.7%, 2,231/5,350) and only 6.0% (n = 319) produced more than 10 circRNAs ( Figure 1G). The top two genes producing the largest number of circRNAs were TTN (Titin) and RYR2 (Ryanodine receptor 2), which generated 197 and 173 circRNAs, respectively (Additional le 8: Figure S4A-B). Because the TTN and RYR2 genes have the largest number of exons in human genome, we hypothesize that the number of circRNAs produced is positively correlated to the number of exons within their parent genes. Indeed, a proportional increase in the number of circRNAs was observed to correlate with the number of exons per gene (Additional le 8: Figure S4C). Furthermore, we found 17,733 identi ed circRNAs, in total, contain exons. Among these, single exonic circRNAs were found in 8.4% (1,494/17,733) while the rest of circRNAs were encoded by more than two exons (Additional le 8: Figure  S4D).
Next we compared the circRNAs that we identi ed with others that were previously identi ed in human hearts [27][28][29] and enlisted in circRNA databases [40,41,43,49]. We found that 16,304 circRNAs (80.7%) were previously reported by either human heart circRNA studies or enlisted in circRNA databases ( Figure  1H). In addition, 3,831 and 2,636 circRNAs have orthologs in the mouse and rat, respectively. Importantly, 1,757 were conserved in all three species by integrating mouse and rat circRNAs reported by previous studies and databases [26,27,29,40,41,43,49] (Figure 1I). We further found that 7,585 and 6,211 circRNAs were speci cally identi ed in normal and DCM hearts, respectively, and that 6,418 were identi ed in both ( Figure 1J). Further analysis revealed that only 729 circRNAs were present in all the 10 samples ( Figure 1K), suggesting that many circRNAs exhibit a high degree of variation in expression among human individuals. Additionally, the majority of circRNAs (n=12,099) were observed in only one sample ( Figure 1K), which likely represent false positives or low-abundance circRNAs. To clarify this, we compared the circRNAs observed once in this study with circRNAs reported by 3 previous human heart circRNA studies [27][28][29] and 4 circRNA databases [40,41,43,49]. This analysis revealed that 8,666 (71.6%) among them were previously detected (Additional le 9: Figure S5), suggesting these circRNAs observed in a single sample are probably authentic and thus should be retained in studies aiming for circRNA discovery. Collectively, these results indicate that the expression of circRNAs, including rt-circRNAs, is widespread in the human heart with a certain degree of conservation between species.

Identi cation of the most abundant circRNAs in healthy human heart
Given the observation that circRNA abundance was not universally distributed across samples ( Figure  1K), we restricted our analysis to 2,461 circRNAs which were considered to be high-con dence circRNAs as they had ³ 2 reads spanning the back-spliced junction in ³ 4 samples in at least one group (Additional le 10: Table S5). All these high-con dence circRNAs were overlapped with previously identi ed circRNAs (Additional le 11: Figure S6A) and 1,121 (45.6%) showed conservation in either mouse or rat (Additional le 11: Figure S6B), indicating the reliability of these circRNAs we identi ed.
CircRNAs that are abundantly expressed in normal heart likely contribute to the maintenance of cardiac homeostasis. We therefore focused on the top 30 circRNAs with the highest average expression level across the 5 healthy heart samples ( Figure 2A). The top 2 most abundant circRNAs (circSLC8A1_11 and circSLC8A1_12) originate from the SLC8A1 (sodium/calcium exchanger 1) gene. While most of the top circRNAs were also identi ed in the studies of Tan et al. [27] and Werfel et al. [29], which found high expression levels of circRNAs such as circSLC8A1_12, circHIPK3_1 and circALPK2_2 in healthy hearts, some exceptions were observed. These include circSLC8A1_11 and circYY1AP1_3 that were not detected in study of Tan et al. [27], as well as circZNF91_2 and circPCMTD1_6 whose expression was identi ed to be abundant in the heart in only one of the previous studies ( Figure 2A). Comparison of the results obtained from different species revealed that half of these highly abundant circRNAs in normal human hearts were conserved in mouse or rat hearts, but only 2 of them including circSLC8A1_11 and circHIPK3_1 exhibited high levels of expression in all species, and 2 of them including circQKI_4 and circTULP4_1 were abundantly expressed in human and mouse hearts, but not in rat. The expression levels of the remaining conserved circRNAs were moderate or low in rodent hearts. We then calculated the correlation between the abundance of top circRNAs and their host genes and found a highly positive correlation for most of these circRNAs (Additional le 12: Figure S7A), including the top 3 most abundant ones (i.e., circSLC8A1_11, circSLC8A1_12 and circHIPK3_1) ( Figure 2B). In contrast, circSETD3_3, circMB_1 and circCORO1C_1, together with some others (Additional le 12: Figure S7B), appeared to be independently transcribed with their linear parent genes ( Figure 2C). Taken together, by integrating results from multiple studies, our analyses de ne the most abundant circRNAs in the healthy human heart.
Identi cation of dysregulated circRNAs in DCM We next performed differential analysis to identify circRNAs that are dysregulated in DCM. Using a threshold of p < 0.05 and FC > 2, a total of 392 circRNAs, including 101 up-regulated and 291 down-regulated circRNAs, were identi ed to be differentially expressed in DCM hearts as compared to normal hearts ( Figure 3A and Additional le 10: Table S5). These differentially expressed circRNAs were derived from 290 unique host genes. Pathway analysis revealed that these host genes are signi cantly overrepresented in pathways associated with diseases of the heart, including arrhythmogenic right ventricular cardiomyopathy (ARVC), hypertrophic cardiomyopathy, and DCM ( Figure 3B). Interestingly, 2 rt-circRNAs originating from SCAF8 and TIAM2, including SCAF8_e4:TIAM2_e1 and SCAF8_e4:TIAM2_e2, were remarkably downregulated in DCM ( Figure 3C). Many of the signi cantly down-and up-regulated circRNAs were positively correlated with the expression of their host genes such as circFBLN1_5 with FBLN1 and circNLGN1_1 with NLGN1, respectively ( Figure 3D and E). However, the expression of some circRNAs in DCM was independent of changes in their host gene expression ( Figure 3F). Examples of this discordant relationship included circABCC1_9 and circHERC4_11, which was signi cantly up-and downregulated in DCM, respectively, while the expression of their host gene mRNAs had no obvious change in DCM ( Figure 3F). Together, these results indicate that dysregulation of circRNAs in DCM tend to originate from heart disease-related gene loci, which include the rt-circRNAs from SCAF8 and TIAM2.
Screening circRNAs for potential as miRNA sponges To screen for cardiac circRNAs with the potential to act as miRNA sponges, we analyzed the sequences of each high-con dence circRNAs for the presence of putative binding sites for 50 miRNAs that have established roles in heart disease (Additional le 2: Table S1). In total, 142 circRNAs were predicted to harbor at least 5 predicted binding sites for at least one of 34 heart disease-related miRNAs (Additional le 13: Table S6). Additionally, a total of 16,472 putative targeted genes of these miRNAs were identi ed, including 598 and 276 up-and down-regulated genes in DCM hearts compared to normal hearts (Additional le 14: Table S7). A circRNA-miRNA-mRNA regulatory network was further constructed based the predicted interactions between circRNAs and miRNAs, as well miRNAs and mRNAs ( Figure 4A). The top 32 circRNAs that contained more than 10 putative binding sites for at least one miRNA are listed in Figure 4B. Notably, 23 of these are derived from one gene, TTN ( Figure 4B). In particular, circALMS1_6 had the strongest potential to function as miRNA sponge, containing 28 and 20 putative binding sites for miR-133a-5p and miR-133a-3p, respectively ( Figure 4C). Another notable nding is that the rt-circRNA, circSCAF8_e4:TIAM2_e2, was predicted to be capable of sponging multiple miRNAs including miR-150-5p and miR-17 that have been reported to be induced in the rat heart of myocardial ischemia reperfusion injury [50] and in the mouse heart of chronic hypoxia-induced pulmonary hypertension mouse model [51] ( Figure 4D). Interestingly, some of the miR-17 downstream targets, such as SCN2B [52,53], F2R [54], KCNB1 [55] and EGLN3 [56] that have been implicated in heart dysfunction, were down-regulated in the DCM hearts as revealed by expression analysis of linear RNAs, likely through the unleashed miR-17 that resulted from the down-regulation of circSCAF8_e4:TIAM2_e2 in DCM ( Figure 4E). Collectively, these analyses provide a systematic evaluation of the potential of cardiac circRNAs to sponge miRNAs and provide a resource for studying cardiac miRNA inhibitors.

Experimental validation of identi ed circRNAs
Due to lack of availability of human samples, we next sought to validate the conserved circRNAs with mouse orthologs that we identi ed above using mouse heart samples. We designed divergent primers for the murine orthologs of 6 identi ed human cardiac circRNAs (Additional le 3: Table S2). These circRNAs were chosen to represent 2 of the most abundant cardiac circRNAs (circEXOC6B_13, circALPK2_2 and circSLC8A_11), 2 of the dysregulated circRNAs (circENAH_4 and circLRP6_4), and 1 non-changed circRNAs between DMC and normal hearts (circN4BP2L2_13). Importantly, these circRNAs have not been experimentally veri ed by previous studies, with exception of circSLC8A_11 that served as a positive control for circRNA ampli cation in this study. We performed PCR ampli cation using mouse heart cDNA as the template, with mouse heart genomic DNA used as a negative control and mouse Yap1 gene primers used as an internal control for the PCR ampli cation of genomic DNA. All 6 of the selected circRNAs were successfully ampli ed in heart cDNA but not in genomic DNA ( Figure 5A). Sanger sequencing of the PCR products veri ed back-spliced junctions ( Figure 5B-G). Taken together, all the circRNAs examined were con rmed to be bona de, suggesting the reliability of our in silico circRNA identi cation.

Discussion
Herein, we have analyzed rRNA-depleted RNA sequencing datasets from 10 human hearts. By integrating circRNAs that were previously identi ed in human, mouse and rat hearts [26-29, 40, 41, 43, 49], we have systematically interrogated the global landscape of circRNAs in human heart and have revealed the most abundant circRNAs in healthy hearts, and those that are dysregulated in DCM hearts. Furthermore, we have investigated the potential of human cardiac circRNAs to act as sponges for heart disease-related miRNAs and have generated a list of rt-circRNAs expressed in human heart. Our study adds new information to the previous annotated circRNAs and provides a valuable resource for further functional study of circRNAs in the human heart.
Three earlier studies found that circRNAs are expressed in the human heart and identi ed a large number of circRNAs [27][28][29]. The number of circRNAs identi ed in our study (n = 20,214) is larger than that of previous studies. This could be partially explained by the differences in the genetic background of the human subjects, the algorithms employed and the thresholds used for detecting and ltering circRNAs, and sample size. When comparing the degree of overlap of circRNAs identi ed in previous studies, congruency was lower than expected despite the use of the same tissue (heart) and species (human). For example, only about half of the circRNAs reported by the three previous studies are in common [27][28][29] ( Figure 1H). This observation suggests that either the expression pattern of circRNAs varies signi cantly among different individuals, or that the identi ed circRNAs in the limited samples were artifactual, which underscores the need to identify a list of reliable cardiac circRNAs by integrating multiple independent studies. Our comparative analysis revealed a total 16,304 cardiac circRNAs that are reproducible in at least one of previous studies ( Figure 1H and Additional le 5: Table S3), which represent a bona de category of cardiac circRNAs that can be functionally explored in the future.
Our analysis revealed that a signi cant number of cardiac circRNAs are derived from two different genes, which were also reported by previous studies of circRNAs in heart [27,29] and brain [16,36,46]. Due to the fact that many such circRNAs are derived from neighboring genes with highly homologous sequences, a common practice in circRNA detection is to discard circRNAs from multiple genes [28,46]. However, a recent breakthrough study showed that a signi cant portion of circRNAs indeed originate from the exons of two adjacent genes on the same strand. These circRNA are referred to as rt-circRNAs [30] and suggests that circRNAs originated from different genes should not simply be dismissed as false positives, especially in studies with a major focus on circRNA discovery. One striking observation of this study is that a number of circRNAs are identi ed to be produced from the exons of MYH6 and MYH7, which were also frequently observed, and display high abundance, in hearts of human, mouse and rat by two previous studies (Additional le 7: Figure S3A-D) [27,29], albeit using different algorithms and cohorts for circRNA detection. This could be particularly interesting if the circRNAs derived from MYH6 and MYH7 are bona de, because these two genes play important roles in healthy and diseased hearts [57]. However, in-depth examination of sequences reveals that these circRNAs are derived from highly homologous region of MYH6 and MYH7, suggesting that these circRNAs are artifacts due to the read misalignment as described by previous studies [28,46]. Therefore, cautions should be taken when annotating circRNAs from multiple genes. Nevertheless, after ltering out additional circRNAs from the homologous regions of two nearby genes, we identi ed a total of 110 potentially bona de rt-circRNAs (Additional le 6: Table S4). Especially, SCAF8 and TIAM2 gene produce the largest number of rt-circRNAs ( Figure 1C). Notably, all of the identi ed putative rt-circRNAs in human heart are not conserved in mouse or rat, suggesting that they are human-speci c and recently evolved novel non-coding molecules. Future efforts are needed to validate these rt-circRNAs in greater detail and explore their functional roles in the heart.
Our analysis yielded a list of the top 30 most abundant circRNAs expressed in normal hearts (Figure 2A).
All of the identi ed top circRNAs, except for circSPHKAP_1 and circRMST_3, are within the top 100 most abundant circRNAs in human or mouse hearts as revealed by at least one previous study (Figure 2A) [27,29], suggesting a high degree of delity of this top circRNA list. In previous studies, discordant ndings have been observed for some of these top circRNAs with regard to their presence and abundance [27,29]. For instance, circTULP4_1 is absent in the study of Tan et al. [27] but is abundantly expressed in human and mouse heart. Our cross-reference analysis of multiple independent studies helps to reduce confusion over the importance of the identi ed circRNAs. The function of one of the most abundant circRNAs, circSLC8A1_11, in heart function has been assessed by a recent study [58]. CircHIPK3_1, formed from the circularization of the second exon of HIPK3, is another conserved circRNA that is highly expressed in both human and murine heart. An earlier study has shown that circHIPK3_1 plays a critical role in promoting retinal vascular dysfunction in diabetes mellitus [59]. In future studies, it will be interesting to explore the function of circHIPK3_1, as well as some of the other top candidates we have identi ed such as circTULP4_1 (Figure 2A), in maintaining cardiac hemostasis. Furthermore, half of the most abundant human circRNAs are conserved in mouse and/or rat. However, only several of these, including circSLC8A1_11, circHIPK3_1, circQKI_4 and circTULP4_1 are observed to be highly expressed in murine heart as well, indicating differences in the circRNA landscape between humans and rodents. In addition, the expression levels of the three human-speci c circRNAs including circMB_1, circCORO1C_1 and circSETD3_3 tends to be independent of their host genes, suggesting potentially important roles of these circRNAs in human heart. Differential analysis revealed that most of the aberrant circRNAs were down-regulated in DCM as compared to normal hearts ( Figure 3A), a feature that is similar to observations in cancer [30,60]. In cancer, decreased expression of circRNAs was speculated to result from the dilution of circRNA upon cancer cell division [30,60]. However, this mechanism cannot account for the reduced expression in DCM hearts due to the fact that adult cardiac myocytes have a very limited replicative capacity and extensive cell death leads to a reduction in myocyte number in DCM [61]. Another interesting nding is that many of the abundant circRNAs originate from already-established cardiovascular disease-related gene loci, as indicated by the results of pathway analysis of their host genes ( Figure 3B). For example, four circRNAs generated from the TTN locus, including circTTN_34, circTTN_52, circTTN_70 and circTTN_132 were signi cantly down-regulated in DCM as compared to normal hearts (Additional le 10: Table S5). These data are consistent with an earlier study that reported on the dysregulation of circRNAs synthesized from the TTN gene in DCM [62]. The TTN gene encodes the protein titin, which is the largest known protein expressed in the heart. The TTN gene was also shown to generate the largest number of circRNAs in the heart (Additional le 8: Figure S4A). Another interesting gene which generated two signi cantly downregulated circRNAs (circRYR2_71 and circRYR2_95) in DCM, was the RYR2 gene. RYR2 is the major Ca 2+ release channel on the sarcoplasmic reticulum and plays a critical role in regulating excitationcontraction coupling and Ca 2+ homeostasis in the heart [63]. Sequence variations of these two genes have been linked to DCM and resulting in altered amino acid sequence and disruption of protein structure and function [64,65]. There was no evidence to support the presence of mutations in TTN and RYR2 genes in the 5 DCM patients analyzed. Furthermore, RNA-seq analysis revealed that their expression did not change at the mRNA level as compared to controls. Thus the downregulation of circRNAs observed from the TTN and RYR2 genes in DCM likely represent a novel mechanism that awaits further identi cation. We observed that multiple circRNAs were differentially expressed in DCM, with no obvious change in the parent gene expression ( Figure 3D and F), con rming previous ndings that the expression of circRNAs can be independent of their parent genes [66].
One of the best documented functions of circRNAs is to sponge miRNAs, which is similar to linear long non-coding RNAs [67,68]. Furthermore, the regulatory role of circRNAs via the circRNA-miRNA-mRNA axis has been increasingly implicated in human diseases [69]. In addition, the relatively higher stability of circRNAs compared to linear mRNAs due to the lack of free 5' and 3' ends allow them to be ideal molecular sponges for miRNAs. Our results in predicting the ability of circRNAs to bind heart diseaserelated miRNAs provides a guideline for further exploration of the regulatory roles of cardiac circRNAs ( Figure 4A, Additional le 13: Table S6, and Additional le 14: Table S7). Notably, circRNAs produced from the TTN gene show a strong potential to bind miRNAs, suggesting that the function of the cardiac TTN gene may have added complexity. In addition, circALMS1_6 has strong potential to sponge miR-133, a well-known miRNA that is abundantly expressed in skeletal muscle and cardiomyocytes [70]. Given the important roles of miR-133 in regulating cardiac hypertrophy, it will be interesting to determine whether circALMS1_6 affects cardiac hemostasis via inhibition of miRNAs. Furthermore, we identi ed a rt-circRNA, circSCAF8_e4:TIAM2_e2, with the potential to bind multiple miRNAs including miR-17 that have been shown to be induced in diseased hearts [50,51] (Figure 4D). In DCM hearts, the expression of circSCAF8_e4:TIAM2_e2 is decreased ( Figure 3C). It is therefore possible that the down-regulation of this rt-circRNA contributes to the up-regulation of miR-17 via releasing the sequestration of miR-17 interacted with, thereby inhibiting the down-stream targets of miR-17. In support of this hypothesis, several miR-17 targets with known cardiac function are down-regulated in DCM hearts ( Figure 4E). One interesting target is SCN2B that encodes voltage-gated sodium channel b2-subunits. The sequence variations of SCN2B have been associated with human cardiac arrhythmias [53] and genetic deletion of Scn2b in mice leads to ventricular and atrial arrhythmias [52]. Our study suggests a possible regulatory role of circSCAF8_e4:TIAM2_e2-miR-17-SCN2B axis in heart and expands our understanding of the function of the newly discovered rt-circRNAs.

Conclusions
In conclusion, our study provides a list of putative key circRNAs that are expressed in normal and DCM human hearts. These circRNAs we identi ed are helpful to guide future functional studies. Given their abundance and multiple functional roles, circRNAs, including rt-circRNAs, are expected to be a group of emerging players in the regulation of human heart diseases. Availability of data and materials The RNA-seq data used in this study was downloaded from NCBI database under accession number SRP108571.

Competing interests
The authors declare that they have no competing interests.

Funding
The work at the J.Z. laboratory is supported by a grant (R01HL149995) from the National Heart, Lung, and Blood Institute, NIH. J.Z. is a recipient of Established Investigator Award (17EIA33460468) and Transformational Project Award (19TPA34910181) from the American Heart Association. K.D. is supported by a postdoctoral fellowship (19POST34450071) from the American Heart Association. The funders provided the nancial support to this research, but had no role in the design of the study, analysis, interpretations of the data and in writing the manuscript.