Pinpointing miRNA and genes enrichment over trait-relevant tissue network in Genome-Wide Association Studies

Background Understanding gene regulation is important but difficult. Elucidating tissue-specific gene regulation mechanism is even more challenging and requires gene co-expression network assembled from protein–protein interaction, transcription factor and gene binding, and post-transcriptional regulation (e.g., miRNA targeting) information. The miRNA binding affinity could therefore be changed by SNP(s) located at the 3′ untranslated regions (3′UTR) of the target messenger RNA (mRNA) which miRNA(s) interacts with. Genome-wide association study (GWAS) has reported significant numbers of loci hosting SNPs associated with many traits. The goal of this study is to pinpoint GWAS functional variants located in 3′UTRs and elucidate if the genes harboring these variants along with their targeting miRNAs are associated with genetic traits relevant to certain tissues. Methods By applying MIGWAS, CoCoNet, ANNOVAR, and DAVID bioinformatics software and utilizing the gene expression database (e.g. GTEx data) to study GWAS summary statistics for 43 traits from 28 GWAS studies, we have identified a list of miRNAs and targeted genes harboring 3′UTR variants, which could contribute to trait-relevant tissue over miRNA-target gene network. Results Our result demonstrated that strong association between traits and tissues exists, and in particular, the Primary Biliary Cirrhosis (PBC) trait has the most significant p-value for all 180 tissues among all 43 traits used for this study. We reported SNPs located in 3′UTR regions of genes (SFMBT2, ZC3HAV1, and UGT3A1) targeted by miRNAs for PBC trait and its tissue association network. After employing Gene Ontology (GO) analysis for PBC trait, we have also identified a very important miRNA targeted gene over miRNA-target gene network, PFKL, which encodes the liver subunit of an enzyme. Conclusions The non-coding variants identified from GWAS studies are casually assumed to be not critical to translated protein product. However, 3′ untranslated regions (3′UTRs) of genes harbor variants can often change the binding affinity of targeting miRNAs playing important roles in protein translation degree. Our study has shown that GWAS variants could play important roles on miRNA-target gene networks by contributing the association between traits and tissues. Our analysis expands our knowledge on trait-relevant tissue network and paves way for future human disease studies.


Background
A microRNA (miRNA), a noncoding RNA which contains about 22 nucleotides, plays a significant role on the regulation of gene expression. By binding to the 3′ untranslated regions (3′UTR) of the target messenger RNA (mRNA), which transfers the genetic information from DNA to the ribosome for protein synthesis in RNA polymerase, miRNA suppresses the translation of the targeting mRNA and/or can activate gene expression under certain conditions [1]. Due to those features and miRNA's ability to control cell growth and differentiation, people believe that there is an association between miRNA's deficiency or excess and human diseases [2].
Single nucleotide polymorphism or SNP is a change or mutation at a single position in a DNA sequence. The development of human diseases is also affected by SNP(s) on the basis of the observation that SNPs can cause synonymous and non-synonymous changes which affects amino acid sequence or protein function [3]. However, 3′ untranslated regions (3′UTR) of genes often contain variants which can change binding affinity of targeting miRNAs so that gene expression and protein translation will be affected. It's important to pinpoint functional important variants in 3′UTRs and elucidate if the genes harboring these variants along with their targeting miRNAs are associated with genetic traits.
In many cancers, the gene PTEN is known to be regulated by miRNAs through competing endogenous RNA networks [4,5]. Such miRNAs can also have other targeted genes containing SNPs that could contribute to cancers and/or human diseases by changing DNA sequences or influencing gene expression.
Genome-wide association studies (GWAS) has identified and analyzed millions of genetic risk variants which trigger the development of complex diseases [6], and it has reported significant numbers of loci hosting SNPs associated with many traits as well [7]. The disease causativeness annotation for non-coding variants (e.g. 3′UTR variants) identified from GWAS studies and clarification of their effects on miRNAs binding are challenging. MIGWAS (miRNA-target gene networks enrichment on GWAS) is an analytic pipeline to quantitatively evaluate tissue enrichment and elucidate complex biology of the genetic traits over miRNA-target gene networks which provide resources for the genetics of human complex traits, and thus contributes to a deeper understanding of miRNA's influence on human diseases as well as drug discovery [7,8].
Composite likelihood-based Covariance regression Network model (CoConet) is a network method for identification of trait-relevant tissues or cell types by incorporating tissue-specific gene co-expression networks [9]. CoCoNet further understands data from GWAS by demonstrating gene co-expression sub-networks which helps predict gene-level association effect sizes on and GWAS traits and diseases. Moreover, CoConet infers trait-relevant tissue based on tissue-specific gene co-expression patterns and proves that tissue-specific gene networks underlie disease etiology [9].
ANNOVAR [10] is an efficient software tool to utilize update-to-date information to annotate genetic variants based on their functional influence. SNPs reported from large scale association studies and sequencing projects can be annotated by ANNOVAR which can be utilized to report functional score and identify variants based on SNPs' functional influence on genes [10].
The goal of our study is to identify causative SNPs which could change the binding affinity of miRNAs and genes, which could contribute to trait-tissue relevance over miRNA-target gene network.

Methods
We obtained summary statistics in the form of marginal z-scores for 43 traits from 28 GWAS studies [11]. These studies collect a wide range of complex traits and diseases that can be classified into six phenotype categories [12,13]: anthropometric traits (e.g. height and BMI), hematological traits (e.g. MCHC and RBC), autoimmune diseases (e.g. CD and IBD), neurological diseases (e.g. Alzheimer's disease and Schizophrenia), metabolic traits (e.g. FG and HDL), and social traits (e.g. ever smoked and college completion). We removed SNPs within the major histocompatibility complex (MHC) region (Chr6: 25-34 Mb) following [14]. We then intersected the SNPs from all the studies and retained a common set of 622,026 SNPs for analysis. We paired the marginal z-scores from these studies with the SNP correlation matrix estimated using 503 individuals of European ancestry from the 1000 Genomes Project [15] for inference.
We employed MIGWAS [7], CoCoNet [9], and ANNO-VAR [10] software packages to infer trait-relevant tissues based on next-generation sequencing omics data (e.g. GTEx data [16]) and annotated the variants obtained from GWAS and harbored by miRNAs targeted genes associated with traits. We also conducted functional analysis on miRNA targeted genes harboring 3′UTR variants with significant tissue-trait association using annotation tool DAVID [17,18] for the Primary Biliary Cirrhosis (PBC) trait (Fig. 1).

Data acquisition
We obtained 43 traits from various published resources and classified them into six different categories with the total number of samples 3 × 10 6 ( Table 1). Details of the information for the summary statistics of 43 traits from 29 GWAS studies are provided in Additional file 1: Table S1. The table lists the phenotype name, category, abbreviation, number of individuals, reference, and downloaded websites for each of the 43 traits.

MIGWAS Results
We applied MIGWAS to 43 traits (total samples = 3,035,223) and identified biologically relevant tissues. In particular, we identified that 94 out of 180 tissues have significant (p-value < 0.01) association with at least two traits and that 11 traits have significant association with more than 10 tissues (Fig. 2).
We calculated p-value association between traits and all tissues using MIGWAS for 43 GWAS. Interestingly, it reported that PBC trait has the most significant p-value for all tissues (Fig. 3). A detailed association result from MIGWAS is shown in Additional file 2: Table S2.
We have identified total 332 miRNAs as candidate biomarkers of all traits. The miRNA distribution statistics for all reported traits by MIGWAS is shown in Fig. 4. We found that there are no miRNAs identified through MIGWAS overlapping across significant tissue-traits. We divided 31 traits into two groups that the significant trait group has 15 traits and the non-significant trait group has 16 traits. If the p-value of the trait is less than 0.01 in at least one tissue, it is considered as a significant trait. The non-significant trait Height has 60 miRNAs which is the  Table S3.

CoCoNet output
We took candidate target genes identified by MIG-WAS to retrieve Gene Level Effective Size of each gene (Ensemble gene ID) for the PBC trait. We used CoCoNet to calculate the association between traits and tissues in terms of log likelihood (Fig. 5).

ANNOVAR annotation
We examined the PBC trait showing the smallest association p-value reported by MIGWAS to see if there are any SNPs located in 3′UTR regions of genes targeted by miR-NAs. To do so, we took the SNP list for the PBC trait and converted them to ANNOVAR variant annotated input format with ANNOVAR package. We obtained 9,890 annotated variants (5279 genes) exclusively located in 3′UTR region of these genes (Additional file 4: Table S4). We crossed the 55 genes reported by MIGWAS with 5279 genes annotated by ANNOVAR and obtained 13 genes in common. The detailed information about these 13 genes is shown in Table 2.

PTEN associated variant analysis
Since PBC is a liver-related trait, we searched the TCGA LIHC dataset [19] and checked anti-correlated pairs for PTEN targeted miRNA: hsa-mir-590. We then used hsamir-590 to identify its targeted genes in the anti-correlated pair list and obtained 22 additional target genes. We then identified SNPs using UCSC Genome Browser for ten genes including PTEN with ClinVar variant information and reported the result in a new Additional file 5: Table S5.

Functional analysis
Gene Ontology (GO) analysis on 55 miRNA targeted genes harboring 3′UTR variants with significant tissue-trait association for PBC trait only showed that PFKL is located in a set of genes with term Acetylation (p-value < 0.05). This gene encodes the liver (L) subunit of an enzyme that catalyzes the conversion of d-fructose 6-phosphate to d -fructose 1,6-bisphosphate, which is a key step in glucose metabolism (glycolysis) [20]. a b Fig. 4 The miRNA count distribution for all reported traits by MIGWAS (a includes significant traits whose p-value are less than 0.01 in at least one tissue, while b includes non-significant traits   The list of genes and their GO terms identified as the top cluster by DAVID is shown in Table 3.
We have also conducted a GO analysis for 22 genes targeted by hsa-mir-590 with target relationship of PTEN. Some enrichment analysis results are reported in Table 4. The detailed DAVID results are reported in Additional file 6: Table S6 and Additional file 7: Table S7.

Discussions
In our study, we used several cutting-edge bioinformatics tools (MIGWAS, CoCoNet and ANNOVAR) to dissecting SNPs reported for 43 traits from GWAS. The MIGWAS reported miRNA enrichment over target gene network for most (31) of traits. The CoCoNet was adopted to analyze the PBC trait for its significant tissue association over gene co-expression network. The ANNOVAR tool was employed to annotate 3′UTR variants class harbored by a list of miRNA target genes associated with Primary Biliary Cirrhosis (PBC) traits.
Although it may seem like the tools are interrelated and are three separate experiments, they are indeed connected. We first applied the tool MIGWAS to 43 traits and identified significant biologically relevant tissues. When the p-value association between traits and all tissues were calculated by using MIGWAS for 43 GWAS, we found that the PBC trait had the most significant p-value for all tissues. The CoCoNet experiment then utilized some of the results produced by MIGWAS to provide an output. Specifically, for the retrieved PBC trait result from the MIGWAS experiment, we used CoCoNet to analyze the tissues association over gene co-expression network. Again, we used the PBC trait result reported by MIGWAS in another tool: ANNOVAR. We analyzed the PBC trait for SNPs in 3′UTR regions by converting the SNP list into ANNOVAR variant annotated input, and we retrieved 5279 genes located in 3′UTR regions. We then crossed the 55 genes from MIGWAS with the 5279 genes from ANNOVAR and found 13 genes in common. The experiments might seem unrelated, but the results from MIGWAS analyzed by both CoCoNet and ANNO-VAR shows that the 3 tools are indeed connected and are not interrelated.
Total 12 traits (Alzheimer's disease, Coronary artery disease, Crohn's disease, Ever Smoked, Fasting glucose, High density lipoproteins, Inflammatory bowel disease, Low density lipoproteins, Schizophrenia, Type 1 diabetes, Triglycerides, Ulcerative colitis) do not have MIG-WAS association results reported in the analysis. It seems that liver tissue does not appear to have the smallest likelihood with PBC based on the total genes in human genome. We also ran CoCoNet with the targeted gene set of miRNAs enriched over the target network for PBC trait only to calculate loglikelihood, and the result stays the same.
The transcript expression of PFKL has the highest RPKM (Reads Per Kilobase of transcript, per Million mapped reads) in kidney samples in the RNA-seq which was performed from 4 human individuals in order to determine tissue-specificity of all protein-coding genes [30].
PFKL contains a transcript variant ATP-dependent 6-phosphofructokinase, liver type isoform a which represents the longer transcript and encodes the shorter isoform (a). The CD-Search shows the protein classification of this transcript variant, which is Eukaryotic_PFK domain-containing protein, and finds a specific hit, Eukaryotic_PFK, which shows a high confidence association between the query sequence and a domain model. Phosphofructokinase (PFK) is a key regulatory enzyme that that phosphorylates fructose 6-phosphate in glycolysis. It belongs to the PFK family, evolving from the bacterial PFKs by gene duplication and fusion events and then exhibiting complex behavior. PFK family also includes ATP-dependent phosphofructokinases (allosteric homotetramers) and pyrophosphate (PPi)-dependent phosphofructokinases (mostly dimeric and nonallosteric ASAP3 encodes a member of a subfamily of ADP-ribosylation factor(Arf ) GTPase-activating proteins that its expression level affects cell proliferation and migration, and it called up-related in liver cancer 1 [31]. DDEFL1 (cloned ASAP3) is unregulated in hepatocellular carcinoma (HCC) through microarray analysis. The studies [32] showed that DDEFL1 had increased expression and had overexpression which increased colony formation in NIH3T3 cells and human hepatoma cell lines [32].
In the transcript expression of UGT3A1, liver samples have the second highest RPKM among 27 different tissues [33].
SFMBT2 are polycomb group proteins that bind to methylated lysins in histone tails. The formation of transcription-resistant higher-order chromatin structures at target genes are induced by it so that this gene can repress transcription [34]. SFMBT2 was cloned and designated as KIAA1617 by Nagase et al. [35], who obtained clones from fetal brain cDNA library and then sequenced them [35]. Kuzmin et al. [36] reported that SFMBT2 interacted with the YY1 (600013) transcription when SFMBT2 was expressed from the paternal allele in mouse blastocysts and in mouse embryonic tissues early in development, and later during mouse embryonic development [36].
ZC3HAV1 encodes a zinc finger protein that can prevent replication certain viruses and inhibit viral gene expression by targeting and eliminating viral mRNAs in the cytoplasm [37]. Yu et al. [38] identified ZC3HAV1 and called it ZAP in their analysis of gene expressed in human fetal liver [38]. Northern blot analysis of rat Zap that was cloned by Gao et al. [39] showed that mRNA was highly expressed in kidney and liver [39].

Conclusions
Our study tried to understand GWAS data through identifying candidate miRNA-gene pairs over miRNA-target gene network. Using several cutting-edge bioinformatical tools and databases and adopting visualization, we assessed trait-tissue relevance based on tissue-specific gene co-expression information including protein-protein interaction and transcription factor binding evidence. We annotated 3′UTR variants harbored by genes targeted by miRNAs expressed in tissues significantly associated with PBC trait. Our study provides evidence that the association between tissues and traits could be affected by the 3′UTR variants of genes which change binding affinity of targeting miRNAs. The analysis emphasizes the influence of variants on genetic traits and miRNA-targeting gene networks, and thus could contribute to additional studies and detections on human diseases. We think our study provided a valuable approach for elucidating 3′UTR variants which could contribute to genetic traits with tissue relevance in the context of miRNA influential human diseases.