While the earliest inciting events that result in ILDs remain unclear, IPF and some closely related interstitial lung diseases share a clinical course in which a zone of fibrosis expands at the expense of alveolar structures [1, 4]. The relentless expansion of fibrosis and the failure of re-epithelialization suggest that the initiation, maintenance, and resolution of the tissue repair process in ILD lungs are probably changed in some fundamental ways. To understand these processes in more detail, we investigated expression of mRNA and miRNA in human lung biopsy specimens from patients with ILDs and compared the findings with control lung samples to identify the networks and regulatory modules involved in disease.
Global transcriptome analysis, including miRNAs, is a useful tool to investigate how molecular networks change between biological states. However, caution is called for when interpreting these data. RNA profiling methods measure the concentrations of RNA (for both mRNA and miRNA) at a specific state; dynamically modulated processes and transcripts can be missed. In addition, this approach cannot detect regulatory changes associated with translation of messages. Nor do the levels of miRNAs always correspond to their regulatory activities, which can be modulated by other proteins or RNAs. Significant variations in measurements can also come from sample heterogeneity and sample preparation, such as the difference in biopsy locations, the heterogeneity of the disease within the biopsy sample, the diverse spectrum of cell types in the sample, and genetic and environmental effects on individuals. Finally, the profiling methods used in the study can only observe the average effect of multiple cell types and numbers of cells in different states in the sample which can add uncertainly to the data. The generally modest correlation among different miRNA measurement platforms  also needs to be taken into consideration when comparing data from other studies. Nonetheless, with all these qualifications, global mRNA/miRNA profiling is currently the most powerful available approach to identify disease associated molecular networks.
By analyzing mRNA and miRNA spectra, we identified 1423 differentially expressed genes and 125 differentially expressed miRNAs between ILD and control lung samples. Comparing the data from three male and three female control lung samples, females generally exhibit lower (albeit modestly) levels of expression of both mRNA and miRNA than males. There are exceptions, however, for example, the average levels of miR-101 and miR-27b are 8 times higher (per cell) in males than in females, and Atlastin3 (ATL3) and Sec23 homolog A (SEC23A) mRNA are three times higher in males when compared to females. Unfortunately, we do not have sufficient numbers of samples to determine whether these gender differences play any role in the progression of ILDs. Interestingly, gender-associated pathological development has been observed in other lung conditions [45, 46]. It is certainly worth further investigation in the future with larger, carefully age- and gender-matched sample sets.
Regardless of gender, our data also indicates a general inverse trend in global gene expression patterns as the severity of the disease increases and as lung function decreases (Figure 1). For example, mRNAs for the transcription factors, MYC and JUNB show significant differences in samples from patients in FVC group 3 (lung function greater than 80% of normal) and group 2 (lung function between 50 to 80% of normal), but almost no change in samples from FVC group 1 patients (lung function less than 50% of normal) when compared to controls. This contradicts the general expectation of larger changes in more advanced disease. However, in ILDs there is significant tissue remodeling and functional adaption. Individuals with the most severe disease, FVC group 1, may adapt to the deficiency of lung capacity, and the repopulation of the lung parenchyma with fibroblasts may also mask the signals from stressed pulmonary epithelial cells, which are proportionally a smaller and smaller fraction of the tissue as disease progresses.
Pairs of genes and miRNAs can be used to distinguish different disease types
Biomarkers that help diagnose the ILDs or that can distinguish idiopathic pulmonary fibrosis from other forms of ILDs, or reflect the severity of a disease would be useful tools to combine with other clinical information such as patient histories, pulmonary function tests, and radiographic findings to achieve better diagnosis of the diseases. Using "top scoring pair analysis", we identified several pairs of mRNAs and microRNAs that had the power to discriminate ILDs from controls (see Additional files 4 and 5). The most robust pairs, COL3A1 and ARHGEF7, and miR-99b and miR-21*, can perfectly discriminate the ILD samples from controls (Additional file 5) in our sample set. Using the same approach, we also identified mRNA pairs that distinguish IPF from NSIP (Additional file 5). Even though these mRNA or miRNA pairs can successfully distinguish ILDs from normal or separate UIP/IPF from NSIP, these candidate markers were derived from lung tissue biopsies, which are not routinely used in the clinic; as a result, their immediate value may be limited.
Several pathways are significantly perturbed in disease
As in previous studies, our transcriptomic study also revealed a number of strongly perturbed pathways in the ILDs. These include TGF-β, Wnt signaling, focal adhesion, extracellular matrix-receptor interactions, insulin signaling, MAPK signaling, B-cell signaling, adipocytokine signaling, and vascular smooth muscle contraction pathways [4, 7, 9]. The involvement of these pathways clearly suggests the involvement of the immune response and a disruption of the normal cellular microenvironment in the ILDs. We also observed that a number of metabolic processes such as glycine, serine and threonine metabolism and steroid biosynthesis were suppressed in the ILD samples, which implies a decrease in general lung metabolic activities in ILD patients. Consistent with this finding from DEGs, we also observed changes in several immune response related miRNAs including miR-100, mir-21, miR-140, miR-146, miR-155, and miR-223 [47, 48].
An interesting finding of the pathway enrichment analyses is the similarity of the differentially expressed mRNA we observed with the predicted target mRNAs of the differentially expressed miRNAs (Table 2). This suggests that the miRNAs we observed may have key roles in regulating the basic pathological processes of the ILDs as illustrated by pathways for Wnt signaling, ECM-receptor interaction, focal adhesion, vascular smooth muscle contraction, adipocytokine signaling and insulin signaling.
Focusing on transcription factors and miRNAs provides an incisive view of regulatory networks associated with ILDs
Even though pathway enrichment analysis with DEGs and DEmiRNA targets allowed us to identify molecular pathways associated with ILDs, this approach has two limitations: 1) it limits us to previously curated pathway information and 2) it lacks sufficient information to build an integrated view associated with ILDs from the dataset. We thus focused on the transcription factors and miRNAs, to provide a global view of the regulatory machinery involved in the ILDs. We computed a network with 640 DEGs, and 49 miRNAs with a total of 1391 interactions (excluding DEmiRNA-DEG interactions). Compared to a simple pathway-based analysis, in which only 395 of the 1423 identified DEGs could be assigned to any pathway, this transcription factor and miRNA-mediated network analysis allowed us to capture and integrate much more information. It is important to realize however, that until we can measure the cognate protein levels for all the relevant genes in these networks, the translation regulation component of the network will remain hidden. This component undoubtedly plays an important role in the disease-related network regulation, and so it must be a priority for future work to assess the protein levels for all the relevant genes if we are to understand the perturbation of these gene networks more completely.
Grouping the most highly interacting nodes together , the network can be grouped into 7 modules. It is encouraging that the biology is reflected in this grouping, since the modules align with functional associations based on pathway and GO term enrichment analyses (Table 3 and Additional file 1). For example, module 1 contains genes from a number of metabolic pathways, and module 2 contains genes involved in immune responses and signal transduction processes (Table 3). This approach also allows us to integrate various molecular processes into a single module and reveal key molecules linking these processes. For example, CDC42, phospholipase C, beta 4 (PLCB4), mitogen-activated protein kinase 14 (MAPK14), and Src homology 2 domain containing transforming protein 3 (SHC3) are all located in module 2 and these gene products are clearly involved in the receiving and transmission of various extracellular signals to the nucleus.
Because of their documented functions, genes like MYOCD or ZEB1 may be used as indicators of EMT or the presence of mesenchymal cells such as myofibroblasts in the ILD samples. Indeed, when we grouped patients based on the expression levels of these genes, several groups of genes related to MYOCD or ZEB1 were revealed. In addition to the obvious sub-networks representing changes in the cytoskeleton, the extracellular matrix and the process of EMT (Figure 7), several key transcription factors such as the zinc-finger KLF family members, JUN, MYC, EGR1, and MEIS1 also showed significant changes associated with the MYOCD and ZEB1 mRNA expression levels. The involvement of these transcription factors and their effects on the ILDs warrants further study with more patient samples and different clinical pathologies.
Another reason to build an integrated molecular network associated with ILD is to gain a better understanding of the overall pathology. In most cases we could not observe a clear association between the changes of network modules with different clinical conditions such as FVC groups or diagnosed subtypes of ILDs. This most likely reflects diversity in the ILD pathologies, the differing degrees of progression and especially the difficulty of making precise and consistent clinical observations embodied in the recorded diagnoses. This further illustrates the need for new approaches like those used here, such as using the levels of ZEB1 and MYOCD mRNAs to provide a more consistent and accurate clinical stratification in this complex of diseases.
Anti-apoptosis signal may promote survival of myofibroblats in ILD lungs
The lung tissues from ILDs in general, and from IPF in particular, are enriched with myofibroblasts. Myofibroblasts utilize their actin-linked cytoskeletal machinery to migrate, proliferate, form foci and perhaps contract under stimulation. A series of up-regulated genes, from calcium and potassium regulated channels located at the cell membrane (calcium channel, voltage-dependent, L type, alpha 1C subunit (CACNA1C) and potassium large conductance calcium-activated channel, subfamily M, beta member 1 a (KCNMB1)), to the cytosolic genes that regulate the levels of secondary messengers (cyclicGMP, cyclicAMP, diacylglycerol, Ca++, inositoltriphosphate) to myosin light chain kinases, (MYLK and MYLK3) actins, and myosins (MYL9, MYH11, ACTA2, ACTG2), outline key molecular pathways and constitute an unmistakable signature of the myofibroblast. These genes are clustered on the lower right corner of module 1 (Figure 4a). The apparent monolithic expression changes of genes associated with α-smooth muscle networks are in agreement with the abundance of myofibroblasts in the biopsy tissue we studied. The strong down-regulation of several apoptosis related genes such as receptor-interacting serine-threonine kinase 1(RIP1), bcl2-associated X protein (BAX), and phosphoinositide-3-kinase (PI3K) may also contribute to the resistance to apoptosis of the myofibroblasts.
miRNAs from the miR-23 cluster may mediate Zeb1 induced epithelial to mesenchymal transition through modulating TGF-β activity
The origin of the fibroblasts in the ILD lung could come from three major sources--the proliferation of resident pulmonary fibroblasts, the recruitment of circulating fibrocytes into the lung and EMT of pulmonary epithelial cells. Zeb-1 has been shown to be a positive regulator of mesenchymal cell related genes; it can participate both in epithelial to mesenchymal as well as a mesenchymal to epithelial transitions . Prolonged exposure to TGF-β has been seen to induce expression of the transcription factors Zeb-1 and Zeb-2, leading to EMT. In addition to other targets, Zeb-1 and Zeb-2 are known to negatively regulate expression of the miR-200 family. These miRNAs have been proposed to maintain an epithelial phenotype in some cells and their decline in response to Zeb-1 commits the cells to transition to a mesenchymal phenotype [51–54]. In our ILD samples, we observed a significant increase of Zeb-1 mRNA levels which is consistent with a role in EMT in the ILD. Surprisingly, we also observed an increase in the miR-200 family in our ILD samples. These observations raise the possibility that both processes could be active in our lung tissue biopsies, but in different cells: negative regulation of EMT mediated by miR-200 family in epithelial cells and positive regulation of EMT, mediated by TGF-β, Zeb-1, and the miR-23 cluster, in transitional cells and mature fibroblasts.
Our network analysis also suggested a preliminary model for how a signal from TGF-β could result in transcriptional and post-translational regulation that would lead to EMT and maintenance of the mesenchymal myfibroblast compartment in ILD. Zeb-1 mRNA expression was up-regulated in ILD as was the expression of miR-23a, miR-27a and miR-24. These miRNAs are encoded at one locus in the human genome at 19p13 and constitute the miR-23a cluster. The similar expression profile of these miRNAs in our ILD data suggested that Zeb-1 may positively regulate this cluster of miRNAs. Putative binding sites for Zeb-1 proximal to the cluster support this hypothesis (see Additional file 8). We investigated this relationship further by over-expressing Zeb-1 in MDCK cells in vitro, which confirmed this suggestion: Zeb-1 clearly caused the levels of the miR-23a cluster to increase (Figure 6D). In addition, we identified the ubiquitin ligase Nedd4L as likely being subject to post-transcriptional negative regulation by the miR-23a cluster, which is driven in turn by over-expression by Zeb-1. The changes in Nedd4L expression are shown in Figure 6E. We believe the presence of multiple Nedd4L isoforms in the Western blot is due to the combined effect of alternative start codons and alternative splicing events (the detailed analysis of this phenomenon will be reported elsewhere). It is also possible that these results, based on the use of a kidney cell line, may not be applicable in all details to the lung. Since Nedd4L accelerates the degradation of Smad2/3 and Tgfr1, thereby attenuating TGF-β signaling [42, 43], the Zeb-1 feed forward loop (FFL) we identified here is likely to be a component of the larger pathway by which TGF-β signaling contributes to EMT, and thus to expansion and persistence of the myofibroblast compartment in ILD. In susceptible pulmonary epithelial cells, TGF-β signals via Smad2/3 and 4 to upregulate Zeb-1, which in turn promotes the transcription of the miR-23a cluster of miRNAs. These miRNAs suppress Nedd4L activity at a post-transcriptional level, resulting in uninterrupted signaling activity from Smad2/3 (Figure 6F). We believe it should be possible to validate this aspect of TGF-β signaling in myofibroblasts derived from ILD tissue. It illustrates the utility of network predictions based on comprehensive transcriptome data, and suggests a promising line of further investigation.
Recently, Pandit et al.  reported that the level of let-7d, regulated by TGF-β signaling, may play a role in EMT and contribute to IPF pathogenesis. This conclusion was partly based on the observed increase of mesenchymal markers including CDH2, VIM and ACTA2 relative to let-7d levels. We also have observed perturbation of the TGF-β pathway (Table 3, Figure 4a. module 5 and 6), over-expression of ZEB1 (Figure 4A, module 6), and increased levels of mesenchyme-associated genes such as ACTA2, CDH2 and VIM in ILD patients (Figure 4A, module 1). Although Pandit et al. and our study agree on the observation of mesenchymal markers and the lack of a significant decrease in the miR-200 family, we did not observe a decrease in let-7d in the ILD patient samples in this study. This difference could be due to the fact that our patient population has a more diverse set of diagnoses, even though none of the IPF patients in our study were observed to exhibit this decrease, or to different miRNA quantitation methods. These overall findings support the idea of multiple parallel processes involved in perturbing key biological pathways or networks in the pathogenesis of ILD.