Pivotal role of the muscle-contraction pathway in cryptorchidism and evidence for genomic connections with cardiomyopathy pathways in RASopathies

Background Cryptorchidism is the most frequent congenital disorder in male children; however the genetic causes of cryptorchidism remain poorly investigated. Comparative integratomics combined with systems biology approach was employed to elucidate genetic factors and molecular pathways underlying testis descent. Methods Literature mining was performed to collect genomic loci associated with cryptorchidism in seven mammalian species. Information regarding the collected candidate genes was stored in MySQL relational database. Genomic view of the loci was presented using Flash GViewer web tool (http://gmod.org/wiki/Flashgviewer/). DAVID Bioinformatics Resources 6.7 was used for pathway enrichment analysis. Cytoscape plug-in PiNGO 1.11 was employed for protein-network-based prediction of novel candidate genes. Relevant protein-protein interactions were confirmed and visualized using the STRING database (version 9.0). Results The developed cryptorchidism gene atlas includes 217 candidate loci (genes, regions involved in chromosomal mutations, and copy number variations) identified at the genomic, transcriptomic, and proteomic level. Human orthologs of the collected candidate loci were presented using a genomic map viewer. The cryptorchidism gene atlas is freely available online: http://www.integratomics-time.com/cryptorchidism/. Pathway analysis suggested the presence of twelve enriched pathways associated with the list of 179 literature-derived candidate genes. Additionally, a list of 43 network-predicted novel candidate genes was significantly associated with four enriched pathways. Joint pathway analysis of the collected and predicted candidate genes revealed the pivotal importance of the muscle-contraction pathway in cryptorchidism and evidence for genomic associations with cardiomyopathy pathways in RASopathies. Conclusions The developed gene atlas represents an important resource for the scientific community researching genetics of cryptorchidism. The collected data will further facilitate development of novel genetic markers and could be of interest for functional studies in animals and human. The proposed network-based systems biology approach elucidates molecular mechanisms underlying co-presence of cryptorchidism and cardiomyopathy in RASopathies. Such approach could also aid in molecular explanation of co-presence of diverse and apparently unrelated clinical manifestations in other syndromes.


Background
Cryptorchidism (CO) is the most frequent congenital disorder in male children (2-4% of full-term male births) and is defined as incomplete descent of one (unilateral) or both (bilateral) testes and associated structures. Cryptorchidism has a potential effect on health; defects in testes descent usually cause impaired spermatogenesis, resulting in reduced fertility and increased rates of testicular neoplasia, and testicular torsion (reviewed in [1]). Cryptorchidism is common in human, pigs, and companion animals (2-12%) but relatively rare in cattle, and sheep (≤ 1%) [2]. Testicular descent is a complex series of events which requires concerted action of hormones, constitutive mechanisms, and the nervous system. In most species, including human, the complete descent of testes usually occurs prenatally, while in some (e.g. dogs), postnatally. Beside environmental factors like endocrine disruptors, CO is at least in part determined by genetic causes (chromosome or gene mutations), and is often a common feature of different syndromes. For example, Klinefelter syndrome and mutations in INSL3 gene have already been recognized as a cause of CO in some cases [3].
The comparative knowledge attained through study of animal models has been of great importance in understanding complex disease etiology, suggesting several candidate genes involved also in the pathogenesis of human diseases [4]. Therefore, the use of comparative genomics approach, integrating and cross-filtering the available knowledge from different species seems highly justified. Different animal models for CO exist; for example natural mutants or transgenic mice, rat, rabbit, dog, pig and rhesus monkeys are used to elucidate the role of different factors involved in CO [5]. Based on mouse knock-out models from Mouse Genome Informatics (MGI) database, several genes appear as possible candidates (AR, HOX genes, INSL3, RXFP2, and WT1). Additionally, the technological progress in the last years enabled the use of high-throughput omics-information, at coding (DNA), expression (RNA), and proteomic level. This technological revolution creates a vast amount of data, which increases the need for application of bioinformatics tools that are able to connect omics data with phenotype and enable search for overlapping pathogenetic mechanisms in different genetic diseases [6]. However, this existing technology hasn't been significantly employed in human CO research on a genome and transcriptome-wide scale; to date only one genomewide expression study has been performed in rat [7].
Integratomics represents a novel trend in the omicsresearch and is based on the integration of diverse omics-data (genomic, transcriptomic, proteomic, etc.), regardless of the study approach or species [8][9][10]. High genetic homology between mammals and the availability of well annotated genomes from different species allows the assembled data to be presented in a form of a comparative genomic view, displaying candidate genes as a single species orthologs.
Information extracted from diverse and methodologically focused studies are often fragmented and controversial. To overcome this problem we integrated the collected data, using a holistic (map-driven) approach, and developed freely available interactive genomic visualization tool. Such map-based approach allows identification and prioritization of candidate genes [11] based on a number of literature sources (references), genomic position, and pathway analyses, employing all currently available knowledge in different species. However, extrapolating the gained knowledge from one species to another is often difficult due to different anatomical and physiological characteristics, which should be considered when comparing pathology of the disease in different species.
To identify genetic factors potentially involved in CO pathogenesis in human we 1) applied comparative integratomics approach and assembled the database of all CO-associated genomic loci reported in the literature, regardless of the study approach and species, 2) presented the loci on a genomic map as human orthologs, and 3) prioritized the collected data using systems biology approach. The collected candidate genes were classified in corresponding biological pathways and the most significant CO-enriched pathways were proposed. Such classification of candidate genes allowed us to prioritize biological pathways (characterized by genes involved in the pathogenesis of CO), which revealed importance of several pathways (for example muscle contraction mechanisms) that may also play a role in the pathogenesis of other clinical features distinctive for different syndromes often concurrent with CO. In order to additionally illuminate the CO-associated pathways we performed a network-based protein-protein interaction analysis, which resulted in prediction of 43 additional CO candidate genes.

Methods
In search for CO associated candidate loci seven different research approaches were considered: (i) chromosomal abnormalities associated with CO, (ii) copy number variations, (iii) clinical syndromes with known genetic mutations that feature CO, (iv) transgenes and knock-outs that result in CO associated phenotypes, (v) association studies/mutation screening that show association between sequence variation/mutation screening and CO, (vi) expression patterns associated with CO, and (vii) candidates associated with CO at proteomic level.

Data mining
We reviewed the literature published up to 9/2012 searching for the relevant publications through PubMed (http://www.ncbi.nlm.nih.gov/pubmed/) and Web of Science (http://isiknowledge.com) using key phrases: genetics, gene candidates, cryptorchidism, testicular descent, undescended testes, male infertility, QTL, microarray, association, microRNA, non-coding RNA, epigenetic, reproduction, and assisted reproduction. CO-associated candidate genes from different sources and species were retrieved from the literature search. Human clinical syndromes that may cause or feature CO were retrieved from Online Mendelian Inheritance in Man (OMIM) database (http://www.ncbi.nlm.nih.gov/sites/entrez?db=omim) and Disease database (http://www.diseasesdatabase.com/). The data for CO-related experiments on mouse models were retrieved from the Mouse Genome Informatics (MGI) database (http://www.informatics.jax.org/). Human orthologs for the CO associated genes were extracted from the MGI database, which contains information about mammalian ortholog genes for different species. Overlap analysis of the CO candidate genes with genomic regions involved in chromosome mutations was performed using data retrieved from Ensembl via BioMart data mining tool.

Database implementation
CO-associated candidate genes database is a web resource, which provides integrated and curated information on molecular components involved in the pathogenesis of CO. Information regarding collected COassociated candidate genes has been stored in relational MySQL database, which is publicly available for search, data entry and update at http://www.integratomics-time. com/cryptorchidism/. Search interface enables users to find specific CO-associated candidate genes based on the number of criteria. Online data entry interface enables users to update or submit new CO-associated candidate genes.

Genomic view of the CO associated loci
Overview of the chromosomal locations of CO associated loci is graphically represented in genomic view, as previously described [12]. It is possible to visualize the literature-collected and network-predicted CO genes on the same genomic view or separately. Genomic view is visible through the web-based interactive visualization tool Flash GViewer (http://gmod.org/wiki/ Flashgviewer/), which was developed by the GMOD project.

Pathway and network analysis
In the first pathway analysis we considered human orthologs of the literature-collected candidate genes (179 genes). DAVID Bioinformatics Resources 6.7 [13] was employed for the enrichment (overrepresentation) analysis. The background for the analysis was defined using the 179 candidate genes plus their first neighbours (5018 proteins) selected in the human protein-protein interaction network (PPIN). The result of the enrichment analysis was obtained using Bonferroni multiple test correction and a p-value significant threshold of 0.01. The human PPIN was obtained by fusion of the following human networks: IRefIndex [14], Chuang et al. article [15], Ravasi et al. article [16], Consensus-PathDB [17].
A new cohort of 43 candidate genes was predicted using PiNGO 1.11 [18]. PiNGO is a tool designed to find candidate genes in biological networks and it is freely provided as a plug-in for Cytoscape 2.8 [19], which is an open source software platform for visualizing and integrating molecular interaction networks. PiNGO predicts the categorization of a candidate gene based on the annotations of its neighbors, using enrichment statistics. In our analysis we quested which first-neighbour-genes significantly interact with the original cohort of 179 literature-collected genes in the human PPIN. We adopted: hypergeometric statistical test, Bonferroni multiple testing correction and p-value significant threshold of 0.01. The cohort of 43 network-predicted genes resulted strongly significant (Bonferroni p-value < 0.0095) for being new candidate genes.
In order to evaluate the importance of this new cohort of 43 candidate genes we performed the pathway analysis according to the procedure already described for the 179 literature-collected candidate genes.
Finally, in order to investigate the biological relations between the 179 literature-collected and 43 networkpredicted genes, we repeated the pathway analysis in DAVID (using the same procedure previously described) considering the 222 (179 + 43) candidate genes. The background for the analysis was defined using the 222 candidate genes plus their first neighbours in the human PPIN. In addition, we visualized the protein-protein interactions occurring between the genes present in at least two pathways using the STRING database (version 9.0) [20] and selecting only interactions with high confidence score.

Genetic variability of candidate genes
Genetic variability for the most promising CO candidate genes was extracted from the Ensembl database (http://www.ensembl.org/). Probably damaging genetic variations were predicted by PolyPhen-2, version 2.1.0, provided by Ensembl database. Putative polymorphic miRNA target sites in candidate genes were obtained from Patrocles database (http://www.patrocles.org/) [21].

Results and discussion
Extensive literature mining was performed resulting in 217 collected candidate loci (chromosome regions and genes) reported to be involved in CO in human or/and animals. The generated database served as the foundation for the development of freely available interactive genomics viewer designed to integrate multi-species data from various research approaches. Enriched biological pathways and 43 additional CO candidate genes were suggested, based on protein-protein interaction network (PPIN) analysis. The workflow of the study is presented in the Figure 1.

Collection of the cryptorchidism associated loci from the literature
The collected data incorporates genomic loci associated with cryptorchidism by seven different types of research approaches (chromosomal mutations, copy number variations, clinical syndromes, transgenes and knock-outs, association studies/mutation screening, transcriptomic/expression studies, and proteomic studies). The collected data originates from seven different species (human, cattle, horse, sheep, dog, rat, and mouse) ( Table 1). The collected CO data is available in Additional file 1: Table  S1, Additional file 2: Table S2, Additional file 3: Table S3, Additional file 4: Table S4 and Additional file 5: Table S5 and include physical locations of the candidate loci in human and species of origin.

Clinical syndromes
Studies of complex disease traits can be facilitated by analysis of the molecular pathways represented by genes responsible for monogenic syndromes that also exhibit these traits [7,35]. There are over 200 different human syndromes with known molecular basis in OMIM database that feature "cryptorchidism" or "undescended testis" as a possible feature in their clinical synopsis. Since cryptorchidism phenotype prevalence is low in some syndromes, and could only occur coincidentally, it is difficult to justify association of syndrome causative genes with a particular phenotype.
To collect CO candidate genes (Additional file 2: Table S2) we obtained list of syndromes from the literature [4,36,37], OMIM and Diseases database ("may be caused or feature") and then further examined phenotype-gene relationships and clinical features for each of the syndromes. Only syndromes where cryptorchidism is present as a regular feature, described in multiple clinical cases, and where gene(s) causing the syndrome is/are known were included.

Transgenes and knock-outs
From the Mouse Genome Informatics (MGI) database and the literature [38][39][40][41][42] we retrieved 39 mouse and one rat KO and transgenic experiments that result in phenotypes associated with CO (Additional file 3: Table S3).

Expression patterns
There are several studies comparing expression profiles in testes between cryptorchid and normal males investigating the resulting effects of but not causes for development of CO (e.g. [75,76]). To our knowledge, there is only one microarray study that analyzed transcript profiles in gubernaculum during normal and abnormal testicular descent and reported 3589 differentially expressed genes between inherited cryptorchydism orl rats and a control group [7]. We included a subset of 112 promising candidate genes to our candidate gene list that were selected by the authors of the study based on expression levels, inclusion in specific pathways of interest and/or previous reports showing association with cryptorchidism (Additional file 5: Table S5).

Protein level
Hutson et al. (1998) [77] investigated the effect of exogenous calcitonin gene-related peptide (CGRP) in  [78] failed to confirm CGRP (in human also known as CALCA) pathway genes as a major players in human sporadic CO.

Development of the CO database and genomic viewer
The CO-associated loci, obtained by comparative integratomics approach, were assembled into a freely accessible database available at http://www.integratomicstime.com/cryptorchidism/. The curated database is open for public data entry. Researchers are invited to submit new cryptorchidism candidate genes from their research or other publications by filling data entry form on our web site. The collected loci from human and animal species were presented as genomic view for human orthologs (in a form of a human genomic view) ( Figure 2). Some candidate genes have been associated with CO by multiple independent literature reports in multiple species. For example, twenty genes (AMH, AMHR2, AR, ARID5B, BMP7, EPHA4, ESR1, FGFR2, HOXA10, HRAS, INSL3, LHCGR, MAP2K1, MSX1, NR5A1, RXFP2, SOS1, TNNI2, TNNT3, and WT1) have been associated with Figure 2 Genomic view of the cryptorchidism candidate genes. A. Genomic view of the literature-collected (red) and network-predicted (blue) CO associated candidate loci presented as human orthologs. The view includes syndromes with known genetic mutations that feature CO, mouse transgenic and knock-out experiments, chromosomal abnormalities, genes tested for association with CO, genes with expression patterns associated with CO, and genes associated with CO on proteomic level. Loci are placed at approximate positions on chromosome map. B. Enlargement of the chromosome 9.
CO in at least two independent studies using different study approaches (Table 2). These genes are denoted in bold in the online database (http://www.integratomicstime.com/cryptorchidism/candidate_genes/).
Pathway identification and network-based data mining discovery Pathway analysis of the cryptorchidism associated candidate genes We performed pathway analysis of the 179 literaturecollected CO-candidate genes (refer to Methods). This pathway enrichment analysis, conducted by applying very stringent criteria (Bonferroni multiple test correction and p-value significant threshold of 0.01), yielded the presence of twelve significant pathways associated with the list of our CO candidate genes in human ( Table 3). The literature-collected candidate genes involved in multiple (at least four) pathways are presented in Additional file 6: Table S6 and marked with an asterisk in the online database.
The presence of pathways related to "cytoskeleton", "muscle development", "muscle contraction", "focal adhesion", and "insulin signaling" was previously reported in rat [7]. In addition to these pathways, our analysis showed new pathways: "cardiomyopathy" (hypertrophic and dilated),"RAS signaling", "signaling by PDGF", "signaling by EGFR", "role of MAL in Rho-mediated activation of SRF", "IGF-1 signaling pathway", and "integrin signaling". The results represent a valid example of pathway-based data mining discovery. As an additional validation analysis, we excluded the 112 candidate genes proposed by Barthold et al. (2008) [7] from the overall candidate genes list (consisting of 179 unique human genes) and repeated the pathway analysis. Nine genes from Barthold et al. (2008) [7] were reported as CO candidate genes also in other studies, therefore we retained them in the analysis, so that the new list of candidate genes consisted of 79 genes. The pathway analysis of these remaining 79 candidate genes returned similar results as were obtained when using the overall 179 candidate gene list. In fact, 10 of the 12 enriched pathways were the same after excluding the discussed data from the candidate gene list. In particular, the five pathways reported by Barthold et al. (2008) [7] in rat ("cytoskeleton", "muscle development", "muscle contraction", "focal adhesion", and "insulin signaling") were all confirmed in this independent validation analysis. The main effect of the gene removal were higher, but still significant, p-values in the pathway analysis. According to these results we can infer that inclusion of the candidate genes from Barthold et al. (2008) [7] is not the reason for the substantial overlap of the five pathways identified in both studies. On the contrary, the findings proposed here are a further confirmation of the validity of the conclusions made by Barthold et al. (2008) [7].
Surprisingly, when we searched the medical literature for articles that describe pathologies where CO, cardiomyopathy, and RAS signaling are common features, we found a perfect matching with Noonan, Cardiofaciocutaneous, LEOPARD, and Costello syndrome that all belong to the class of RASopathies [79,80]. Features of all four syndromes are different physical anomalies including concomitant presence of cardiomyopathy due to heart defects and, in males, cryptorchidism [79]. Noonan syndrome (NS) is the most common single gene cause of congenital heart disease, and NS subjects also present other features as leukemia predisposition [81]. In particular, five different mutations in RAF1 were identified in individuals with NS; four mutations causing changes in the CR2 domain of RAF1 were associated with hypertrophic cardiomyopathy (HCM), whereas mutations in the CR3 domain were not [82]. Additionally, PTPN11, RAF1, and SOS1 mutants were identified as a major cause of Noonan syndrome, BRAF of Cardiofaciocutaneous, PTPN11 of LEOPARD, and HRAS of Costello syndrome, providing new insights into RAS regulation [80,81]. These genes have also been found to be mutated in patients with RASopathies having cryptorchidism in a clinical picture. In NS patients having CO in their clinical picture 11/14 had mutated PTPN11, 4/5 had mutated SOS1, and 1/2 had mutated RAF1. BRAF has been found to be mutated in 2/3 patients with Cardiofaciocutaneous syndrome having CO, PTPN11 in 1/4 patients with LEOPARD having CO, and HRAS in 2/ 4 patients with Costello syndrome and CO [80,81]. However, the genes responsible for the remainder are unknown, and the gene pathway relations responsible for potential connections between unrelated features such as cryptorchidism and HCM in RASopathies are not clear. Therefore, we performed a network-based prediction (see next paragraph) of CO candidate genes by identifying the most significant first neighbors (in the human protein-protein interaction network; PPIN) of the 179 literature-collected candidates.

Pathway analysis of the network-predicted candidate genes
A new cohort of 43 candidate genes (Additional file 7: Table S7) was predicted by PiNGO 1.11 [18], which is a Cytoscape plug-in (see Methods) [19]. The question we tried to address was which first-neighbor genes significantly interact with the original cohort of 179 literature-collected genes in the human PPIN. We adopted hypergeometric statistical test and Bonferroni multiple testing correction. The cohort of 43 network-predicted genes was strongly significant (Bonferroni p-value < 0.0095); therefore, we consider them as additional CO candidate genes. In order to evaluate the importance of these new candidate genes we performed the pathway analysis (Table 3), according to the same procedure already used in the previous paragraph (and described in the methods). The most intriguing evidence is the presence of significant pathways related to cardiomyopathy and muscle contraction in both sets of candidate genes (i.e. literaturecollected and network-predicted). Pathways common to both sets of candidate genes represent a confirmation of the validity and robustness of the results obtained in the first pathway analysis and regarding the hypothesis of connection between CO and cardiomyopathy, in NS. Yet, it is also a quality proof of the procedure adopted for network prediction of new candidate genes.
Pathway analysis of the overall CO candidate gene list (179 literature-collected and 43 network-predicted genes) The first cohort of 179 literature-collected genes and the second one containing 43 network-predicted genes were condensed in a list of 222 unique genes -the overall candidate gene list. We repeated the pathway analysis on this list applying the same very stringent criteria used above (Bonferroni multiple test correction and p-value significant threshold of 0.01). The analysis suggested the presence of 12 significant pathways associated with the overall list of candidate genes in human ( Table 4).
The "muscle contraction" pathway was the most significant (in absolute) with Bonferroni corrected p-value of 4.55E-33 ( Figure 3A); while the "hypertrophic cardiomyopathy" was the second most significant pathway with Bonferroni corrected p-value of 1.21E-09 ( Figure 3B). These results are crucial for our study because they suggest the presence of a strong genomic connection among diverse pathways associated with clinical features that seemed unrelated. To address relationship among these mechanisms we created a matrix merging the information related to the gene participation in several identified pathways. Of the 222 (179 + 43) candidate genes, 172 were filtered out because they were not present in at least two of the 12 significant pathways. The resulting matrix consists of 50 candidate genes in the rows and 12 enriched pathways in the columns (Additional file 8: Table S8). The matrix values are binary: 0 indicates that the gene is not present in a pathway, whereas 1 indicates that the gene is present.
Hierarchical bi-clustering of the matrix [83], both in the rows and in the columns, was performed to detect clusters of genes which participated in common pathways, and clusters of pathways which share the same genes, respectively. The result of this analysis is provided in the Figure 4. The presence of two main groups of clusters is evident. The first group is constituted of "cardiomyopathy" (hypertrophic and dilated), "muscle contraction" and "cardiac muscle contraction" pathways. The second group is constituted of "focal adhesion", "regulation of actin cytoskeleton, "integrin signaling", "vascular smooth muscle contraction", "signaling by insulin receptor", "signaling by PDGF", "RAS pathway", and "TGF-beta signaling".
In order to further investigate the relation between the genes involved in the "cardiomyopathy" (hypertrophic and dilated), the "muscle contraction" and the "RAS pathway" and to interpret their role in creating connections between the diverse pathway modules, we searched the STRING database [20] for protein-protein interactions, selecting only the interactions with high confidence score. The outcome of this analysis is represented in Figure 5. All the 50 genes presented at least one interaction in the PPI network produced by the STRING database. This network is provided as a supplementary material (Additional file 9: Table S9).
The principal pathways involved in both, CO and RASopathies, are displayed on the PPIN (Figure 5), and also marked in the Figure 4 to facilitate the comparison. This figure addresses the question of the relation between the common genetic mechanisms underlying CO and RASopathies. Figure 5 provides a clear visualization of the overlapping pathways and of the integrated network of relations existing on proteomic level. At the best of our knowledge, this is the first time that such relation is presented, and it might help in understanding the relation between co-presence of CO and cardiomyopathy as clinical and apparently unrelated features in RASopathies. This fact is clarified by the layout offered in Figure 5 that reveals how the "cardiomyopathy" and the "RAS signaling" pathways are connected by a plethora of interactions with high confidence score in the STRING database. To investigate the precise type of intra-and inter-pathway interactions we suggest to mine the network that we provide in the supplementary material (Additional file 9: Table S9). Figure 5 further emphasizes how the "focal adhesion" and the "TGF-beta signaling" pathways overlap the "cardiomyopathy", the "muscle contraction" and the "RAS signaling" pathways by connecting proteins at different metabolic levels. The relevance of the "focal adhesion" pathway, as well as the importance of "cytoskeleton", "muscle development", "muscle contraction", and "insulin signaling" pathways in cryptorchidism were widely discussed [7]. However, the referred study was conducted on a rat model and all of the pathways were considered and treated separately. Here, for the first time, we proceed to an integratomic investigation of the genetic factors linked to CO in human. Meanwhile, we offer the holistic perspective that points out how clinical features apparently unrelated with CO might be generated by genetic mutation(s) which propagate at different pathway levels of the network. This propagation on different pathway-modules can justify the onset of multiple unrelated clinical features in complex diseases, such as RASopathies. The selection of 43 network-based predicted genes considered together with these last disease-related evidences are another proof that confirms the power of PPIN for association of genes with diseases [21,84].

Candidate gene prioritization
Prioritization of candidate genes underlying complex traits remains one of the main challenges in molecular biology [11]. In this study we used three criteria for selecting the most promising candidate genes: 1) number of independent literature reports connecting the candidate gene with CO (Table 2), 2) involvement of candidate genes in enriched pathways (Table 3), and 3) position of candidate genes on the genomic map (genes positioned in regions where multiple CO associated data overlap were considered positional candidates) ( Figure 2). Twenty genes have been suggested as a genetic cause for CO in at least two independent studies (criterion 1) using different study approaches (AMH, AMHR2, AR,  ARID5B, BMP7, EPHA4, ESR1, FGFR2, HOXA10, HRAS,  INSL3, LHCGR, MAP2K1, MSX1, NR5A1, RXFP2, SOS1,   TNNI2, TNNT3, and WT1). Among them, INSL3 has been associated with CO in eight, RXFP2 in five, and AR in four independent studies. However, this approach should be treated with some caution because of the possible bias towards research interest into more "popular" genes. The approach will be more reliable after significant amount of unbiased genome-wide studies is available.
Considering involvement in enriched pathways (criterion 2), the most promising candidates would be HRAS, MAP2K1, MAP2K2, GRB2, RAF1 and SOS1, which are all involved in seven or more enriched pathways. For the literature-collected candidate genes involved in multiple (four or more) CO-enriched pathways we assembled genetic information relevant for further functional analyses: assignment to corresponding biological pathways, genetic variability, and putative presence of polymorphic microRNA (miRNA) target sites (Additional file 6: Table  S6). The importance of small non-coding RNAs (ncRNAs) in gene regulation and pathogenesis of the Figure 4 Hierarchical bi-clustering of the CO candidate genes. Hierarchical bi-clustering of the matrix of the CO candidate genes present in at least two pathways. The matrix consists of 50 candidate genes (rows) and 12 enriched pathways (columns). The black lines (full and dashed) are used to indicate the modules corresponding to clusters of interacting proteins in the respective pathways.
diseases, including reduced fertility, is today evident [90]. However, to our knowledge, there are no literature reports associating ncRNAs or epigenetic factors with CO.
Reliability of such methodologically different approaches is not always comparable (for example, data from genome-wide expression experiments is much less validated than syndromic or transgenic data); therefore, ranking candidate genes based only on a number of different reports/approaches is not always feasible. However, less validated data may also be of high biological relevance and should not be discarded for hypothesis-driven approaches. To increase reliability of the collected heterogeneous data we tested in silico how candidate genes interact at the proteomic level. Although integratomic approaches are only partially established yet and have several drawbacks, including already mentioned heterogeneity of input data, we believe that such approaches are a reasonable and at the moment among the most promising Figure 5 String protein-protein interaction network (PPIN) of the CO candidate genes. PPIN is obtained from the 50 CO candidate genes interacting in the STRING database. The lines (full and dashed) delimit the presence of the diverse overlapped protein-pathway-modules. The same line is used in the Figure 4 to indicate the modules corresponding to clusters of interacting proteins in the respective pathways.
ways for hypothesis generation, which should be further experimentally validated in animal and/or human populations. Similar integratomics approach was already used for identification of candidate loci for mammary gland associated phenotypes [8], male infertility [9], and obesity [10,91], and could be adapted to any other complex trait.

Conclusions
In this study we present an overview of CO associated candidate regions/genes and suggest pathways potentially involved in the pathogenesis of the disease. The integrative, comparative-genomics approach, and in silico analyses of the collected data aim to help solving the problem of fragmented and often contradictory data extracted from different methodologically focused studies. The protein-protein interactions analysis revealed the most relevant pathways associated with CO candidate gene list and enabled us to suggest additional candidate genes based on network prediction. Described systems biology approach will contribute to a better understanding of genetic causes for cryptorchidism and provides possible example how integration and linking of complex traits related data can be used for hypothesis generation. Publicly available online CO gene atlas and data entry option will allow researcher to enter, browse, and visualize CO associated data. The proposed network-based approach elucidates co-presence of similar pathogenetic mechanisms underlying diverse clinical syndromes/defects and could be of a great importance in research in the field of molecular syndromology. This approach has also a potential to be used for future development of diagnostic, prognostic, and therapeutic markers. The developed integratomics approach can be extrapolated to study genetic background of any other complex traits/diseases and to generate hypothesis for downstream experimental validation.