This article has Open Peer Review reports available.
Differential expression analysis of human endogenous retroviruses based on ENCODE RNA-seq data
© Haase et al. 2015
Received: 11 December 2014
Accepted: 28 October 2015
Published: 3 November 2015
Human endogenous retroviruses (HERVs) are flanked by long terminal repeats (LTRs), which possess promoter activity and can therefore influence the expression of neighboring genes. HERV involvement in different types of cancer has already been thoroughly documented. However, so far there has been no systematic study of HERV expression patterns in a multitude of cell types in health and disease. In particular, the publication of the comprehensive ENCODE dataset has already facilitated many gene expression studies, but none so far focusing exclusively on HERVs.
We present a comprehensive differential analysis of HERV expression based on ENCODE Tier 1 and Tier 2 RNA-seq data produced by Cold Spring Harbor Laboratories and the California Institute of Technology. This analysis was conducted for individual HERV loci and for entire HERV families in twelve different cell lines, of which six correspond to the normal condition and the other six represent cancer cell types. Although the principal component analysis revealed that the two groups of cells show distinguishable expression patterns, we were not able to link these differences to one or multiple particular HERV families. Two samples exhibit expression patterns, which are not similar to the corresponding cell lines of the other producing lab. Instead they show signs of cancer formation and expression of the pluripotency marker HERVH, despite being classified as a normal cell line and a differentiated cell, respectively.
Our study demonstrates that ENCODE data are generally comparable between the different contributing labs and that the analysis of HERV elements can provide novel insights into differentiation and disease state of a cell that are easily overlooked when focusing on protein-coding genes. Our findings hint at a change in HERV expression during cancerogenesis.
Human endogenous retroviruses (HERVs) are remnants of germline infections by exogenous retroviruses that were integrated into the host genome and passed on to the offspring. Estimates place the amount of human DNA that has a retroviral origin at 8 % . Due to the presence of the proviral pol gene, which encodes the reverse transcriptase, HERVs can reintegrate copies of themselves in other genomic locations and hence belong to the group of transposable elements. In addition to the gag, pol and env genes, which are counterparts of the original functional virus genes, HERVs also contain long terminal repeat sequences (LTRs) at the 5’ and 3’ end. These LTRs have a strong promoter function, which can increase the transcription level of neighboring genes . As internal viral genes tend to degrade over time due to the absence of evolutionary pressure, many HERV sequences are lacking some, or even all of their ORFs, or contain their fragments. In particular, many solitary LTR sequences can be found in the human genome .
Some of the HERV loci in the human genome have been identified as being beneficial to the host. For example, syncytin, encoded by the env gene of the HERV-W family is linked to differentiation and morphogenesis of the placental tissue and hypothesized to have immunosuppressive function that supports the maternofetal tolerance [4, 5]. There are also multiple studies linking HERVs to diseases such as multiple sclerosis and schizophrenia [6, 7], although they are mainly based on detecting elevated expression levels of certain HERV families in affected individuals and do not necessarily shed light on causal relationships. HERVs have also been implicated in breast cancer and melanoma [8, 9] where samples of patients contained expressed HERV genes, viral proteins and antibodies for HERV peptides. However, in order to understand the causative mechanisms of HERV involvement in disease, a rigorous tissue specific differential expression analysis of HERV families between disease and healthy states is critically required. Such differential analyses are complicated by the fact that HERVs are repetitive elements spread over the entire genome, which makes mapping of their transcripts to genomic loci particularly challenging. Previous attempts to create an overview of HERV expression patterns in different tissues relied on a specifically designed chip with various captured retroviral pol sequences [10, 11] and were thus limited to the subset of HERV family members that still contain intact pol sequences. In order to increase the number of covered retroviral elements, a more comprehensive approach, capable of identifying the full-length sequences would be required. RNA-Seq has become a method of choice for addressing such problems  as it provides precise measurements of transcript levels in the cell and thus makes it possible to map all retroviral elements, both structurally intact and partial, back to their genomic loci.
Over the past ten years the Encyclopedia Of DNA Elements project (ENCODE)  has been working on systematic identification of all functional elements in the human genome. With the most recent data release, made available in September 2014, ENCODE incorporates 27 different kinds of experiments, such as Exon Arrays, ChiP-Seq, and RNA-Seq analyses, conducted by seven research labs in order to gather as much information as possible on a standardized group of cell lines. These cell types are subdivided into three tiers depending on their assigned priority. Tier 1, which only contains three different cell types, constitutes the highest priority and has thus the largest number of conducted studies associated with it. All tiers include healthy cell lines as well as cancerous ones, and all data submissions are also subdivided based on the cellular compartments in which measurements were performed. The ENCODE guidelines force submitters to provide at least two biological replicates per experiment for more robust statistical analyses. There are a total of 151 RNA-Seq experiments in the ENCODE summary of the first stage (2007–2012), with 87 of them using small RNA-Seq (as of September 2014).
Since this valuable resource of highly standardized reference data became available, many research groups published studies integrating or comparing the ENCODE data to their own samples or conducting meta-analyses across ENCODE cell lines [14–16]. ENCODE data has also been used in computational studies on gene expression, but as of now transcriptome analyses spanning multiple cell types aim at protein-coding genes or functional regulatory RNAs [17, 18]. Examination of ENCODE RNA-Seq data with regard to HERV expression has either been limited to single cell types or covers HERVs only as a very small subset of the overall analysis [19, 20].
In this work we have comprehensively analyzed ENCODE RNA-Seq data covering all annotated HERV loci in a broad variety of cell lines, disease and developmental stages. We sought to gain an insight into the overall expression patterns of HERV elements and to examine on a large scale if there are measurable differences in HERV activity between cancer and normal cells, as already reported for individual tumor types. Furthermore, a major goal of our study was to assess the consistency of different ENCODE-contributing laboratories with regard to expression values from the same cell lines.
We present the analysis of 25 RNA-Seq samples from ENCODE’s top two priority tiers with regard to the expression of all annotated HERV loci obtained from the HERVd database [21, 22]. We found that there are considerable differences between the expression profiles obtained by single- vs paired-end sequencing, with the former showing lower overall expression. This finding holds true both for HERVs and for housekeeping genes. Apart from the discrepancies stemming from different library designs and two cases probably caused by transformation of the underlying cell lines, we did not observe any striking distinctions between the data provided by the two contributing laboratories. This suggests that ENCODEs quality standards are sufficient to provide a robust basis for comparative analysis.
Upon removal of systematic errors resulting from different sequencing strategies, we were able to identify unusual patterns in HERV expression in two of the analyzed cell lines. Members of the HERV-H family, considered to be a marker for pluripotency and normally seen in embryonic stem cells (ESCs), are strongly overexpressed in the HeLa-S3 cells. Furthermore, one blood cell line, GM12878, shows a HERV expression profile, which is more similar to cancerous blood cells than to another healthy sample, potentially pointing to a cancerous transformation of this cell line. These findings imply that patterns of HERV expression could serve as useful markers in early cancer diagnostics.
Differential expression analysis shows strong differences between sequencing technologies
We detected HERV expression in all analyzed samples. The CSHL dataset has on average 120,411,532 mapped fragments, of which 0.73 % are assigned to HERV loci. In Caltech’s paired-end data set, featureCount works with 67,730,782 fragments on average, of which approximately 1.3 % have been mapped to HERVd annotations. The single-end data sets include on average 20,640,141 reads, with 0.42 % of them assigned to HERV loci. A list showing all numbers of mapped fragments per sample as well as the fraction being assigned to HERV loci can be found in Additional file 1.
Given that single-end datasets lead to very different results compared to their paired-end counterparts, we decided to exclude them from the further analyses to prevent them from introducing a bias into the expression data. The paired-end data seems to give a better overview of the expression rates. Note that in the GEO summary of the ENCODE Caltech RNA-Seq data, the single-end protocol which is also strand-specific, is described as less reliable for quantification [http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE33480].
Housekeeping genes confirm strong differences in ENCODE expression data, especially between different sequencing protocols
The observed differences thus do not depend on the chosen transcript family, but are rather indeed an inherent pattern in the ENCODE datasets.
Two cell lines show behavior atypical for their disease and developmental stage
Caltech’s HeLa-S3 cell line shows a strong up-regulation of HERVH, typical for embryonic stem cells
Overall, the same cell types analyzed by any two laboratories show a similar composition of HERV families, with the exception of GM12878 from CSHL, which exhibits a nearly double amount of significant HERVs compared to its Caltech counterpart. Especially the large number of ERVL members in the CSHL sample is unmatched in the corresponding Caltech cell line. The only other cell types with a similar large number of active HERVL loci is CSHL’s K562 sample, which is also a blood cell type, but contrary to GM12878 cancerous.
Another cell line that exhibits an extremely deviant behavior in the lab comparison is HeLa-S3 from Caltech. It appears to over-express an immense amount of HERVH family members (290 loci), which are only found in low numbers in all other specialized cells. The over-expression is not as strong as in the embryonic stem cells, but is higher than any other number of a single HERV family in all other cell lines (Fig. 6). The difference between cancerous and normal cell lines revealed by the principal component analysis could not be linked to a particular overexpressed HERV family. We were not able to identify any expression patterns separating the six normal from the six cancer cell lines on the basis of individual HERV families.
Based on the comprehensive analysis of 25 RNA-Seq samples from the ENCODE project with regard to their HERV expression we find that datasets created with different sequencing library methods (paired- vs single-end) are not very easily comparable, because single-end samples achieve less coverage. This is expected, as the sequencing technique used by Caltech is strand specific and as such trades quantification against a qualitative analysis.
Although a principal component analysis of the HERV expression patterns in different cell types revealed the possibility to distinguish cancerous from healthy samples based on HERV activity, we could not link this difference to a specific HERV family. However, our study revealed unexpected results regarding GM12878 from CSHL, which showed hints of being a tumorous cell line in two different analyses. First, a hierarchical clustering of all HERV loci expression grouped this cell type with all four K562 (blood cancer) replicates instead of the GM12878 samples from CSHL. Second, the composition of up-regulated HERV families in this sample, when compared to all others, is much more similar to that of K562, especially regarding the strong activity of ERVL. A possible reason for this behavior could be the transformation of an initially normal cell line to a tumorous one prior to experimental measurements. However, this explanation does not seem to be particularly plausible given that ENCODE imposes strict data quality requirements, especially with regard to tier 1 cell lines to which GM12878 belongs. The respective Caltech GM12878 RNA-Seq track has been accessible through the UCSC genome browser [27, 28] since August 2012 and so far no unusual features of this dataset, including a possible progression towards a tumor line, have been reported. It is conceivable that the change in HERV expression detected in our study, which is the first comprehensive investigation of HERV expression in ENCODE samples, occurs very early in the transition from a normal to a cancer cell type and hence remained undetected in studies focusing on protein-coding gene expression, although we were able to detect aberrant behavior hinting at this change when performing PCA on housekeeping genes. Further research is needed to verify this hypothesis, as it implies that unusual HERV expression could serve as an early indication of carcinogenic transformation and thus represent a valuable diagnostic lead.
Another striking finding is the low amount of differentially expressed HERVs when comparing Caltech’s HeLa-S3 sample to the ESCs. The strongest difference between HERV expression in ESCs compared against specialized cell types is the very strong up-regulation of HERVH family members. Because HERVH activity is also high in Caltech’s HeLa cells, unmatched in any of the other differentiated cell types, the difference in expression pattern to ESCs is understandably small. The HERVH family is known to play a vital role in embryonic stem cells. In particular, since they can serve as a marker for pluripotency due to their strong association with binding sites for the pluripotency transcription factors NANOG, OCT4 and SOX2 . Furthermore, it has been suggested that HERVH and its LTR7 can recruit the transcription factors p300 and OCT4 to regulate the transcription of pluripotency-associated transcripts . Intriguingly, Santoni et al. also used ENCODE RNA-Seq data from Caltech to analyze HERVs in hESCs, although they relied on the 2010 data release whereas in this study we utilized the most recent data published in 2012. For comparison with differentiated cells, Santoni et al. also obtained the 2010 data on corresponding HeLa-S3 cells and found that “HERV-H expression is barely detectable in HeLa”, although it was identified when using transient-transfection assays . It is thus apparent that there has been a significant change between the 2010 and 2012 HERV expression data submitted to the ENCODE project by Caltech.
Santoni et al. observed that the HERV expression strength in ESCs diminishes during differentiation. Expression is highest at the undifferentiated N0 stage, still observable during N1 (early initiation), and only barely measurable during N2 (neural progenitor). Thus, a conceivable explanation for the behavior of Caltech’s HeLa-S3 cells would be reprogramming towards pluripotency, although an underlying mechanism for this process remains enigmatic.
In this study we analyzed the expression of known human endogenous retroviral elements in RNA-Seq samples from the ENCODE project. It is the first examination of sequencing data from a variety of cell types and labs with regard to HERV expression patterns.
We found that all analyzed cell lines have active HERV loci and, by performing differential expression analysis, we identified cell type specific expression patterns. Our analysis also revealed discrepancies between different RNA-Seq datasets: single cell lines showed significantly different expression profiles depending on the laboratory where measurements were conducted. We verified that this finding was not due to the particular kind of transcripts considered, namely HERVs, by repeating the same analysis with housekeeping genes.
Furthermore, in two cases deviant expression patterns were closer to those of a completely different cell type than the same line from a different research institution.
Thus, we believe it is of particular importance to monitor cultured cell lines very closely as small changes to a cell type can lead to major alterations of expression patterns for some non-coding RNAs. While conducting differential expression analysis it might not be sufficient to regard even identical cell lines as comparable. Our analysis of cancer-specific HERV families is further complicated by consideration of distinctly different cancer cell lines. However, PCA shows that there is an informative signal in the expression data differentiating cancer from normal cell lines.
If further investigations confirm that a change in HERV expression patterns is an early sign of cell transformation, it can be utilized as a diagnostic tool to help recognize tumor formation.
We obtained RNA-Seq data for the ENCODE Tier 1 and Tier 2 cell types mapped against the latest human genome assembly (hg19) using the UCSC track download portal [31, 32]. Only samples from the ENCODE category long RNA extracts (>200 bp) were considered, as short RNA extractions aim at identifying small non-coding RNAs while our proviral remnants of interest are considerably longer (mean length of 928 nt). We further restricted the considered tracks to whole cell extracts, as we are interested in the overall analysis of HERV expression in the entire human cells rather than in individual compartments.
ENCODE RNA-Seq data used in this study
Number of replicates
We used the currently most comprehensive collection of annotated HERVs, the HERVd database [21, 22]. This database contains 98,008 entries describing 224 different HERV families, from full-length proviral elements to singular Long Terminal Repeats (LTRs). Since the HERVd annotation is based on the hg17 assembly of the human genome we transferred all genomic coordinates to hg19 using the liftOver tool .
A number of HERVd entries did not survive the lifting process: 2,342 entries are completely or partially deleted and another 24 entries are split in the latest hg19 assembly. We nevertheless attempted to identify the location of these entries in hg19 by sequence similarity searches using BLAT . Similarity hits were accepted as the origin of a given HERV if the corresponding alignments were gap free, covered the complete query sequence, and had a minimum sequence identity of at least 98 %. In the same fashion we identified additional viral elements in hg19 by using all known HERV sequences as query and accepting new origins when they met the identity cutoff. It should be noted that we only performed similarity searches in regions without existing HERV annotations to avoid duplicated entries. However, while the initial HERVd already contains HERV loci that are overlapping with each other, we decided against filtering these out in order not to lose HERV annotations.
Our initial HERV data set contained 100,495 locations in hg19. HERVd entries located on chromosome Y were excluded from consideration as this chromosome is not covered by all ENCODE datasets used in our study. This filtering step left us with a total of 98,998 annotated HERV loci (Additional file 2) for which we obtained read counts.
HERV expression in ENCODE RNA-Seq
Furthermore, in the data sets comprised of paired-end reads the entire fragment was counted only once to maintain comparability with the single-end samples. Reads in stranded datasets were counted in a strand-specific manner; these include the single-ended Caltech samples and all CSHL samples.
Differential expression analysis
The coverage depth of HERV loci between 25 ENCODE samples (Table 1) was compared using the R bioconductor package DESeq [35–37] , which is specially designed for differential expression analysis. To achieve a better comparability between samples we normalized their count data by library size and carried out a variance stabilizing transformation based on the inherent biological variability between the replicates of the same condition. We then performed a principal component analysis (PCA) and calculated the Euclidean distances between the transformed expression values to detect overall differences between the samples.
The following analyses were limited to the paired-end RNA-Seq data as PCA revealed extensive differences between single- and paired-end library preparations. Hence to avoid introducing a bias in the differential expression analysis, we excluded single-end data. The read count value of every condition, normalized by the library size, was compared in a pairwise fashion against every other condition, resulting in 171 differential expression analyses and the corresponding fold changes. The DESeq implementation of the negative binomial test was than used to find significant differences in the calculated expression values. The initial p-values were adjusted for multiple testing using the Benjamini-Hochberg procedure  . In order to identify significantly differentially expressed HERVs we filtered for loci whose absolute logarithmic fold change was at least one and whose adjusted p-Value did not exceed 0.001 (which is equivalent to a false discovery rate of 0.1 %).
For every analyzed cell type, we compiled a list of HERV loci up-regulated in at least one of the pairwise comparisons. By considering the corresponding families of loci, we sought to identify HERVs that are particularly active in certain cell types and under certain conditions.
Validation on housekeeping genes
In order to ascertain that the differences in HERV expression between different conditions and library preparations reported in this study are not due to computational or experimental biases specific to endogenous viral elements, we repeated our analysis with a set of housekeeping genes. For this purpose we used the list of 3,804 genes compiled by Eisenberg and Levanon . This list was created based on RNA-seq data from 16 different human tissues by first identifying housekeeping exons, i.e. those exons expressed in all tissues, displaying low variance between tissues, and showing no exceptional expression in any single data set. Housekeeping genes were then defined as those genes, for which at least one annotated RefSeq transcript has more than half of its exons classified as housekeeping. When acquiring the annotation file for the housekeeping genes from the UCSC genome browser, only 3,801 entries could be retrieved, as three identifiers [RefSeq: NM_032937, NM_003926, NM_032560] had been removed from the RefSeq database. The assessment of coverage for every housekeeping gene in all our selected ENCODE data sets was carried out exactly as described above for HERVd, including normalization and principal component analysis (The featureCount output for all housekeeping genes can be found in Additional file 3).
The authors would like to thank Prof. Christine Leib-Mösch for valuable comments regarding this manuscript.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Griffiths D: Endogenous retroviruses in the human genome sequence. Genome Biology 2001, 2(6):reviews1017-reviews1017.5Google Scholar
- Rebollo R, Romanish MT, Mager DL. Transposable elements: An abundant and natural source of regulatory sequences for host genes. Annu Rev Genet. 2012;46:21–42.View ArticlePubMedGoogle Scholar
- Mager DL, Medstrand P. Retroviral repeat sequences. In Encyclopedia of the Human Genome. London: Nature Publishing Group; 2003:57–63.Google Scholar
- Mangeney M, Renard M, Schlecht-Louf G, Bouallaga I, Heidmann O, Letzelter C, et al. Placental syncytins: Genetic disjunction between the fusogenic and immunosuppressive activity of retroviral envelope proteins. Proc Natl Acad Sci U S A. 2007;104(51):20534–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Mi S, Lee X, Li X, Veldman GM, Finnerty H, Racie L, et al. Syncytin is a captive retroviral envelope protein involved in human placental morphogenesis. Nature. 2000;403(6771):785–9.View ArticlePubMedGoogle Scholar
- Karlsson H, Schröder J, Bachmann S, Bottmer C, Yolken RH. HERV-W-related RNA detected in plasma from individuals with recent-onset schizophrenia or schizoaffective disorder. Mol Psychiatry. 2004;9(1):12–3.View ArticlePubMedGoogle Scholar
- Kolson DL, Gonzalez-Scarano F. Endogenous retroviruses and multiple sclerosis. Ann Neurol. 2001;50(4):429–30.View ArticlePubMedGoogle Scholar
- Buscher K, Hahn S, Hofmann M, Trefzer U, Ozel M, Sterry W, et al. Expression of the human endogenous retrovirus-K transmembrane envelope, Rec and Np9 proteins in melanomas and melanoma cell lines. Melanoma Res. 2006;16(3):223–34.View ArticlePubMedGoogle Scholar
- Frank O, Verbeke C, Schwarz N, Mayer J, Fabarius A, Hehlmann R, et al. Variable transcriptional activity of endogenous retroviruses in human breast cancer. J Virol. 2008;82(4):1808–18.View ArticlePubMedGoogle Scholar
- Seifarth W, Spiess B, Zeilfelder U, Speth C, Hehlmann R, Leib-Mösch C. Assessment of retroviral activity using a universal retrovirus chip. J Virol Methods. 2003;112(1–2):79–91.View ArticlePubMedGoogle Scholar
- Seifarth W, Frank O, Zeilfelder U, Spiess B, Greenwood AD, Hehlmann R, et al. Comprehensive Analysis of Human Endogenous Retrovirus Transcriptional Activity in Human Tissues with a Retrovirus-Specific Microarray. J Virol. 2005;79(1):341–52.View ArticlePubMedPubMed CentralGoogle Scholar
- Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10(1):57–63.View ArticlePubMedPubMed CentralGoogle Scholar
- The ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science. 2004;306(5696):636–40.View ArticleGoogle Scholar
- Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A, et al. Landscape of transcription in human cells. Nature. 2012;489(7414):101–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Hart T, Komori H, LaMere S, Podshivalova K, Salomon D. Finding the active genes in deep RNA-seq gene expression studies. BMC Genomics. 2013;14(1):778.View ArticlePubMedPubMed CentralGoogle Scholar
- Park E, Williams B, Wold BJ, Mortazavi A. RNA editing in the human ENCODE RNA-seq data. Genome Res. 2012;22(9):1626–33.View ArticlePubMedPubMed CentralGoogle Scholar
- Bánfai B, Jia H, Khatun J, Wood E, Risk B, Gundling WE, et al. Long noncoding RNAs are rarely translated in two human cell lines. Genome Res. 2012;22(9):1646–57.View ArticlePubMedPubMed CentralGoogle Scholar
- Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, et al. GENCODE: The reference human genome annotation for The ENCODE Project. Genome Res. 2012;22(9):1760–74.View ArticlePubMedPubMed CentralGoogle Scholar
- Criscione S, Zhang Y, Thompson W, Sedivy J, Neretti N. Transcriptional landscape of repetitive elements in normal and cancer human cells. BMC Genomics. 2014;15(1):583.View ArticlePubMedPubMed CentralGoogle Scholar
- Lu X, Sachs F, Ramsay L, Jacques PE, Göke J, Bourque G, et al. The retrovirus HERVH is a long noncoding RNA required for human embryonic stem cell identity. Nat Struct Mol Biol. 2014;21:423–5.View ArticlePubMedGoogle Scholar
- Pačes J, Pavlíček A, Pačes V. HERVd: database of human endogenous retroviruses. Nucleic Acids Res. 2002;30(1):205–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Pačes J, Pavlíček A, Zika R, Kapitonov VV, Jurka J, Pačes V. HERVd: the Human Endogenous RetroViruses Database: update. Nucleic Acids Res. 2004;32 suppl 1:D50.PubMedPubMed CentralGoogle Scholar
- Gao Y, Xu H, Shen Y, Wang J. Transcriptomic analysis of rice (Oryza sativa) endosperm using the RNA-Seq technique. Plant Mol Biol. 2013;81(4–5):363–78.View ArticlePubMedGoogle Scholar
- Ling YH, Xiang H, Li YS, Liu Y, Zhang YH, Zhang ZJ, et al. Exploring differentially expressed genes in the ovaries of uniparous and multiparous goats using the RNA-Seq (Quantification) method. Gene. 2014;550(1):148–53.View ArticlePubMedGoogle Scholar
- Cordonnier A, Casella JF, Heidmann T. Isolation of novel human endogenous retrovirus-like elements with foamy virus-related pol sequence. J Virol. 1995;69(9):5890–7.PubMedPubMed CentralGoogle Scholar
- Tönjes RR, Löwer R, Boller K, Denner J, Hasenmaier B, Kirsch H, et al. HERV-K: the biologically most active human endogenous retrovirus family. J Acquir Immune Defic Syndr Hum Retrovirol. 1996;13 Suppl 1:S261–7.View ArticlePubMedGoogle Scholar
- Hinrichs AS, Karolchik D, Baertsch R, Barber GP, Bejerano G, Clawson H, et al. The UCSC Genome Browser Database: update 2006. Nucleic Acids Res. 2006;34(Database issue):D590–8.View ArticlePubMedGoogle Scholar
- Kent W, Sugnet C, Furey T, Roskin K, Pringle T, Zahler A, et al. The human genome browser at UCSC. Genome Res. 2002;12(6):996–1006.View ArticlePubMedPubMed CentralGoogle Scholar
- Santoni F, Guerra J, Luban J. HERV-H RNA is abundant in human embryonic stem cells and a precise marker for pluripotency. Retrovirology. 2012;9(1):111.View ArticlePubMedPubMed CentralGoogle Scholar
- Schön U, Diem O, Leitner L, Gunzburg WH, Mager DL, Salmons B, et al. Human endogenous retroviral long terminal repeat sequences as cell type-specific promoters in retroviral vectors. J Virol. 2009;83(23):12643–50.View ArticlePubMedPubMed CentralGoogle Scholar
- Raney BJ, Dreszer TR, Barber GP, Clawson H, Fujita PA, Wang T, et al. Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC Genome Browser. Bioinformatics. 2014;30(7):1003–5.View ArticlePubMedGoogle Scholar
- Rosenbloom KR, Sloan CA, Malladi VS, Dreszer TR, Learned K, Kirkup VM, et al. ENCODE Data in the UCSC Genome Browser: year 5 update. Nucleic Acids Res. 2013;41(D1):D56–63.View ArticlePubMedGoogle Scholar
- Kent WJ. BLAT-the BLAST-like alignment tool. Genome Res. 2002;12(4):656–64.View ArticlePubMedPubMed CentralGoogle Scholar
- Liao Y, Smyth GK, Shi W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014;30(7):923–30.View ArticlePubMedGoogle Scholar
- Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11:R106.View ArticlePubMedPubMed CentralGoogle Scholar
- Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, et al. Bioconductor: Open software development for computational biology and bioinformatics. Genome Biol. 2004;5:R80.View ArticlePubMedPubMed CentralGoogle Scholar
- R Core Team. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing; 2013. http://www.R-project.org.Google Scholar
- Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J R Stat Soc Ser B Methodol. 1995;57(1):289–300.Google Scholar
- Eisenberg E, Levanon EY. Human housekeeping genes, revisited. Trends Genet. 2013;29(10):569–74.View ArticlePubMedGoogle Scholar