Identifying significant genetic regulatory networks in the prostate cancer from microarray data based on transcription factor analysis and conditional independency
- Hsiang-Yuan Yeh1Email author,
- Shih-Wu Cheng†2,
- Yu-Chun Lin†2,
- Cheng-Yu Yeh2,
- Shih-Fang Lin2 and
- Von-Wun Soo1, 3
© Yeh et al; licensee BioMed Central Ltd. 2009
Received: 17 February 2009
Accepted: 21 December 2009
Published: 21 December 2009
Prostate cancer is a world wide leading cancer and it is characterized by its aggressive metastasis. According to the clinical heterogeneity, prostate cancer displays different stages and grades related to the aggressive metastasis disease. Although numerous studies used microarray analysis and traditional clustering method to identify the individual genes during the disease processes, the important gene regulations remain unclear. We present a computational method for inferring genetic regulatory networks from micorarray data automatically with transcription factor analysis and conditional independence testing to explore the potential significant gene regulatory networks that are correlated with cancer, tumor grade and stage in the prostate cancer.
To deal with missing values in microarray data, we used a K-nearest-neighbors (KNN) algorithm to determine the precise expression values. We applied web services technology to wrap the bioinformatics toolkits and databases to automatically extract the promoter regions of DNA sequences and predicted the transcription factors that regulate the gene expressions. We adopt the microarray datasets consists of 62 primary tumors, 41 normal prostate tissues from Stanford Microarray Database (SMD) as a target dataset to evaluate our method. The predicted results showed that the possible biomarker genes related to cancer and denoted the androgen functions and processes may be in the development of the prostate cancer and promote the cell death in cell cycle. Our predicted results showed that sub-networks of genes SREBF1, STAT6 and PBX1 are strongly related to a high extent while ETS transcription factors ELK1, JUN and EGR2 are related to a low extent. Gene SLC22A3 may explain clinically the differentiation associated with the high grade cancer compared with low grade cancer. Enhancer of Zeste Homolg 2 (EZH2) regulated by RUNX1 and STAT3 is correlated to the pathological stage.
We provide a computational framework to reconstruct the genetic regulatory network from the microarray data using biological knowledge and constraint-based inferences. Our method is helpful in verifying possible interaction relations in gene regulatory networks and filtering out incorrect relations inferred by imperfect methods. We predicted not only individual gene related to cancer but also discovered significant gene regulation networks. Our method is also validated in several enriched published papers and databases and the significant gene regulatory networks perform critical biological functions and processes including cell adhesion molecules, androgen and estrogen metabolism, smooth muscle contraction, and GO-annotated processes. Those significant gene regulations and the critical concept of tumor progression are useful to understand cancer biology and disease treatment.
Prostate cancer is a leading cancer and aggressive metastasis disease worldwide and it is the second common cancer-death among men . According to the clinical heterogeneity, prostate cancer displays different behaviors related to aggressive metastasis disease. Some experiments discovered that high Gleason grade and advanced pathological stage tumours associated with cancer recurrence tend to have higher aggressive cancer . Currently, prognostication and treatment are based on the clinical stage and Gleason stage but the gene regulation and biological processes correlated to the progression of the prostate cancer are still unclear.
The recent microarray technology provides a large-scale measurement of expressions of thousands of genes and uses to manifest the expressions of genes in a particular cell type of an organism at a particular time under particular conditions. This high-throughput experimental technology is a powerful tool for comparing mutant or diseased cells with normal cells and searching for differences in gene expressions that can be the potential key factors leading to diseases. Several studies use the wet-lab experiments and microarray data analysis to detect strong significant genes as markers from gene expression level. Although microarray studies of prostate cancer have already identified the different gene expressions between normal and cancer, they still use the traditional unsupervised clustering methods to realize the potential molecular variation with individual genes. However, microarray data reveals information related to not only gene expressions but also to genetic networks of biological experiments or in vivo screen examinations. The general purpose of inferring genetic regulatory network is to extract the expression features, activations and inhibitions from the changes of gene expressions among those genes in microarray data. Recently, researches study the reverse engineering methods and try to understand the complex interactions that are directly affected by the genetic networks. Several mathematical methods for modelling the genetic networks have been proposed such as Boolean networks , differential equations , Bayesian networks , and Petri Net . Although they could successfully model the networks to some extent for each gene, it is in general difficult to determine the correct interactions among genes without involving the detailed biological knowledge about their DNA sequences and transcription factors. There are two approaches can be used for learning the popular-used Bayesian networks from data  and both two approaches have their advantages and disadvantages. The first one is searching and scoring method, which computes the conditional probability of each network given the data, ranks the networks and searches the best network that can fit the data. The advantage of this approach is the result of network graph with fine-grained probabilistic information but the drawback of this approach is the number of possible networks becomes super-exponential when the number of nodes is very large. Because this approach is NP-hard, the search heuristics method must be adopted. The second approach is constraint-based learning method which uses a different viewpoint to learn the network from data. The basic idea to construct a network is based on the conditional dependencies among nodes given the data. The approach tries to discover all the conditional independencies from data and uses these conditional independencies (CI) to infer the networks. Since the constraint-based learning method needs to get all the conditional independencies which are developed to measure the relationship of dependencies, it is also a hard work to generate the while possible assembling patterns among genes in the microarray data.
However, gene networks inferred solely based on the microarray data are often not sufficient for rigorous analysis. A common problem in such kind of data-driven learning approaches is that only a small number of genes can be modelled. Without sufficient background knowledge supported, it is hard to reconstruct gene regulatory networks merely based on Bayesian learning from scarce data. To overcome the problem, integrate the biological knowledge into the modelling process becomes necessary [8–11]. In molecular biology, biologists believe the expressions of the genes are always controlled by the transcription factors that leads to gene expression change observed in microarray data. Therefore networks between the transcription factors and their target genes are important in understanding the complex regulatory mechanisms in a cell.
Our original idea is to develop an initial gene network combining independency test and transcription factor analysis from the microarray data. We revise and infer the gene networks using d-separation criteria and conditional independency for the direct or indirect interactions in the network. Many biological databases and information services are also available on web browsers via internet and they allow us to gather information about the biological sequences and predict their functionalities and promoter regions to some extent. We apply web services technology to integrate all tools and databases developed by ourselves and others to automatically carry out the workflow of all tasks needed in the computational analysis.
Cope with missing values in Microarray data
The microarray dataset consists of N genes and M experiments can be represented as an M*N matrix. It presents different gene expression levels Xij (i ∈ M, j ∈ N) in this matrix. Gene expressions (either over-expressed or under-expressed) can be revealed in terms of two colors in the microarray data with the symbol "R" representing the red dye; whereas the symbol "G" representing the green dye. The ratios between the two colors reflect the relative degrees of expressions of genes. We extract the data Log2 [R/G Normalized Ratio (Medium)] of each gene because the mean value of the normalized ratio is much easier to be affected by noise than the medium value.
- 1)We consider gene A with the missing value in experiment t and calculate the Euclidean distance between Gene A and other genes without missing values in other t-1 experiment. Suppose (p1, p2, pt-1, pt+1, ..., pN) and (q1, q2, qt-1, qt+1,..., qN) are the expression values of the gene A and other genes in other N-1 experiments. The Euclidean distance between the two gene expressions is as follows:
Select k most similar genes with Euclidean distance to impute missing expression values.
Consider the Euclidean distance as weights to average the expression values of k genes.
Gene expressions of the microarray experiment
In particular, we transformed the continuous expression levels into discrete expression to determine the under-expression and over-expression of genes. The expression values of genes can be separated into two binary values: positive (+) and negative (-). We set reference expression value as the average expression value from all expressions of genes in cancer and normal microarray data . If the gene expression value Xij greater than the reference expression value, we regarded as positive (+); else, we regarded as negative (-), respectively. In our experiment, we set -0.06 as reference value.
Constructing initial gene networks by transcription factor analysis
= the number of times the expression level of gene = a
= the number of times both the expression levels of gene = a and gene = b respectively.
M = total data.
G2 has the chi-square distribution with appropriate degrees of freedom f = (r1-1)(r2-1) where r1, r2 are the number of expression levels of the data spaces.
Sample data for independency test
The degrees of freedom = (2-1)(2-1) = 1 and thus the data has a chi-square distribution with 1 degree of freedom. There are just two variables to do one hypothesis independent testing with chi-square method and the significant p-value is still 0.05. The p-value is calculated as P(U > .54) ≅ .47. Because this p-value is larger than 0.05, we cannot reject the hypothesis that gene1 and gene2 are independent. If p-value is less than 0.05, there is enough evidence to conclude gene1 and gene2 are not independent. We use a pair of genes as an individual independent testing and we do not perform Bonferroni correction to reconstruct the networks. Because the large amount of genes and lots of permutations, the appropriate p-value calculated by Bonferroni correction is too small and conservative. Therefore by using statistical hypothesis testing for a transcription regulator gene against all other genes in the microarray, we could obtain a set of candidate dependent genes with the transcription regulator gene.
Bioinformatics tools been wrapped
The functional profiles of each tool
promoter DNA sequence
promoter DNA sequence
Revising and inferring the gene networks using conditional independency
The direction of the transcription factor and its dependent gene represents the causal relationship of two nodes in the network structure. We consider the two nodes that are separated by the other nodes and determine whether the relationship between two nodes is direct or indirect. However, if a link connected by a pair of genes not a simple path but also connected by other paths, it is possible that dependency of the pair of genes could not be due to this directed link. Conditional independency test can be used to verify the direct or indirect relationships between the pair of genes when the d-separation set is to be determined [19, 20].
Only transcription regulator gene can link to its dependent genes and transcription regulator gene can be an active node.
If the dependent gene is a collider node in the sub-networks such like type I in Figure 4, the path must be a close path and will be deleted.
means the number of times if the expression level of X = a and the expression level of Y = b and the expression level of Z = c
means the number of times if the expression level of X = a and the expression level of Z = c.
means the number of times if the expression level of Z = c
The number of degrees of freedom used in the test is
where ri is the number of expression levels of each Xi's space.
For the traditional constraint-based method, it is no way to avoid an exponential number on CI tests for every pair of nodes to make sure that the edges should be kept or removed . After we find the minimum d-separating sets, we determine whether an indirect edge between two nodes should be needed and there are repeated tests of conditional independencies given minimum d-separating sets. With the small size of the minimum d-separating sets, we can do the permutation comparisons by applying Bonferroni correction for multiple testing to renew the significant threshold for each of the n individual tests to maintain an experiment-wise error rate. Comparing with the whole nodes in the network, it is a small set of nodes should be tested in conditional independent testing with Bonferroni correction. Take an example in Figure 6, we want to verify the direct link between node X and Y should be deleted or not in the sub-network. We use the procedure in Figure 5 to extract the minimum dseparate genes, node T and U, to help us determine whether an edge between two nodes should be removed. According to the small size of the d-separating genes, we do the tests in each of the two predicted conditional independence relations as CI(X, Y|T) and CI(X, Y|U) and reject the null hypothesis that both tests are independent with the p-value less than 0.05/2 = 0.025. For different d-separated genes, we can get different significant p-value and so on.
X activates Y: If the expression level of X is over-expressed (+), then Y is also over-expressed; If the expression level of X is under-expressed (-), then Y is also under-expressed.
X inhibits Y: If the expression level of X is over-expressed (+), then Y is also under-expressed; If the expression level of X is under-expressed (-), then Y is also over-expressed.
However, genes may have inconsistent values across similar samples because the change of environment and some experimental error. The relations of two genes are not always the same in different experiments under the same conditions. In order to determine the relations between two genes with the large number of microarray data supported, we choose the higher number of gene relations in the experiments between pair of genes as the relations based on the heuristic rules. For example, Table 2 shows the binary expression level of gene A and B from 8 microarray experiments. The number of the activation event is 3 and the number of inhibition relation is 5. Because the number of the inhibition relation is higher than the ratio of activation relation, the system identifies the link between gene A and B as inhibition. However, it may hard to determine the relations if the number of action and inhibition is equal. We assume a pair of gene expects to have the same relations under the same condition in microarray data. In our microarray data, there are 66% of the genes with above 80% consistent expression and 99.4% of the genes with above 50% consistent expression across similar samples and more genes with consistent gene expressions will help us to identify the relations between pair of genes correctly.
Where l = the total number of links in the network
N = the total number of vertices in the network
Where N(k) = the number of nodes which have k links
N = the total number of vertices in the network
In directed graphs, the in-degree, k in , is the number of incoming edges of the vertex and the out-degree, k out , is the number of outgoing edges of the vertex.
Where n = number of triangles that go through the vertex with k links.
K = the number of nearest neighbours of the vertex
Where the clustering coefficients of all N nodes are averaged over index i.
We applied our methods to analyze two microarray datasets: "Gene expression profiling identifies clinically relevant subtypes of prostate cancer" . It consists of 62 primary tumors and 41 normal prostate tissues. The detailed pathological and clinical data are provided in . We extracted the ratio value Log2 [R/G Normalized Ratio (Medium)] of each gene by using the normalization function provided by Stanford Microarray Database (SMD).
Microarray data pre-processing
the number of genes in different steps in processing
Biological knowledge processing
Use the genes after microarray data pre-processing to map 2665 genes that belong to the "transcription regulator activity" category specified in GO. Each gene that can find a match in the category of transcription regulator activity in GO is regarded as a transcription regulator gene. Take an example gene SRF in normal dataset to be treated as a transcription regulator gene and 494 genes are first found to be dependent with SRF using statistical test method. The transcription factor analysis then helps to filter out those links possibly without biological significance and finally resulted in 13 dependency genes that can be considered to "effectively" interact with SRF. Since the biological toolkits and databases are not complete enough, they would tend to miss transcription factors that are not yet found and it may cause the incompleteness of the inferred interaction networks and thus reduces the recall of the inference method that misses some inferred gene relations. But the gene interaction networks found are at least under the sanction of current biological knowledge of transcription factors to the reasonable extent.
Revising gene regulatory networks based on Bayesian network
Number of links between two networks
Initial gene regulatory network (Before CI test)
Revised gene regulatory network (After CI test without Bonferroni correction)
Revised gene regulatory network (After CI test with Bonferroni correction)
The p-value and d-separation set of SRF gene and its Transcription factor dependent gene
Dependent gene of SRF
Complex diseases depend on the altered interactions among multiple genes and different expression change in the critical genes comparing with normal cell. We use two points of view to see the different between normal and cancer network: one is global and the other is detailed. Global point of view provides the network topology approach we mentioned in and overall function and pathway enrichment using DAVID and GSEA toolkits. The detailed can give new and interesting genes involved in the specific network motifs which may relate to the cancer and are often quite subjective.
where s = scaling exponent of the network.
where w = scaling exponent of the network
The parameter of degree distribution and clustering coefficient between normal and cancer
scaling exponent s
scaling exponent w
Comparison of gene regulatory networks between cancer and normal data
In the cancer and normal network comparison, the transcription regulator genes and their dependent target genes passing through significant p-value using statistical hypothesis testing and promoter analysis. The transcription factors as biomarkers (PBX1, EP300, STAT6, SREBF1, NFKB1, STAT3, EGR1, E2F3, NR2F2) see Additional file 3 are only involved in the cancer networks and those genes are annotated in cancer-related transcription regulatory factors (p-value 1.18E-9). Otherwise, E2F4 only exists in normal network. The regulation of the transcription regulator gene E2F4 plays a key role in the control of normal development and proliferation . 561 extras dependent genes are in normal network; 3495 extras dependent genes are in cancer network and 2,283 genes interact with biomarkers. SREBF1 gene has been shown as up-regulated in the prostate cancer and the early growth response 1 (EGR-1) is a transcription factor regulates the expression of its dependent genes involved in cell growth or survival. We take 2,283 dependent genes affected by biomarkers (PBX1, EP300, STAT6, SREBF1, NFKB1, STAT3, EGR1, E2F3, NR2F2) and not exist in normal network to do the functional annotation using DAVID online toolkit and there are 2,110 genes can be annotated in DAVID toolkit. We filtered the results at least 3 members in each functional category and P-value < 0.05 with Bonferroni correction and FDR<0.25 see Additional file 4. The functional annotation clustering results show that the cancer networks are associated with regulation of progression through cell differentiation, cell death, I-kappaB kinase/NF-kappaB cascade, vesicle-mediated transport, apoptosis biological functions and processes. We also consider performing the pathway enrichment from GSEA online tool which is calculated by hypergeometric distribution method and there are 2,259 genes can be annotated in GSEA online tool. We filtered the indeed functional enrichment canonical pathways from the gene set in our networks with at least 3 members in each functional category and P-value < 0.05. The results denote cell adhesion molecules, androgen and estrogen metabolism, smooth muscle contraction and some GO annotated pathways see Additional file 5. The genes in the cancer network are involved in the significant pathways such as Toll like receptor, PPAR, ERBB, P53 and WNT signaling pathway see Additional file 6. In summary, we have identified androgen related gene TMPRSS2 that is regulated by SREBF1, PBX1 and ETS family members that are associated with the prostate cancer and gene TMPRSS2 has been found in 80% of tumor experiments .
For more general evaluation, we used Atlas of Genetics and Cytogenetics in Oncology and Haematology database  which collects genes in cancers and divides them into two groups: the annotated match genes and the genes possibly implicated in cancer. There are 37.5% "Match" genes and 62.5% "Possible" genes in our predicted results see Additional file 12 and it also indicate that our method is useful to detect to the possible genes implicated in cancer and the gene regulatory networks constructed by our methods seem to be modelled effectively. Although the verification of the modelling process through literature reports is an indirect way to evaluate gene regulatory networks, it at least shows that the gene regulatory networks modelled by our methods are compatible with the existing literature findings. More detail verifications based on the literature reports see Additional file 13.
Comparison of gene regulatory networks with different clinical data
Besides comparing the normal and cancer networks, we also identify the significant networks of tumor differentiation at different grades and stages. For the clinical data provided from , we divided 62 primary prostate tumor Gleason grade into two classes of low grade and high grade (≦3+4 vs. ≧4+3). It consists of 39 data in low grade class and 23 data in the other class. We detect genes SREBF1, STAT6 and PBX1 that only exist in high grade class and SP1, Elk1, JUN and EGR2 that only exist in low grade class. The expression of ETS transcription factor Elk1 decreases from low grade to high grade samples. We predict the differential expression of several transcription regulator genes (including HSF2, ARNT, MEF2A, ATF2 and YY1) that are strongly related to the cancer grade.
In the other experiments, we divided the pathological stage into two parts, the stage of 28 data belong to early stage (≦T2) and the other 34 data belong to late stage (≧T3). GATA3 regulation happened in the early stage and the outcome of the prostate cancer at the late stage of tumor development that are related to genes MZF1, SREBF1, PRL, and DDIT3. The SP1, MAX, RUNX1 and STAT3 sub-networks are involved in both early and late stages of the tumor but are expressed with different gene expressions. Enhancer of Zeste Homolg 2 (EZH2) expression is regulated by RUNX1, STAT3 and E2F3 and high expression of EZH2 gene is associated with the tumor death and also correlated to the pathological stage . For example, gene SLC22A3 regulated by our predicted significant markers (EP300, STAT6, SRF, PBX1) is strong related in high grade and late stage. The discovery is also reported in most enriched literatures associated with the tumor progression.
We provide a computational framework to reconstruct the genetic regulatory network from the microarray data using biological knowledge and constraint-based inferences. The method validated in is helpful in verifying possible gene interaction relations in gene networks and filtering out incorrect relations inferred by imperfect methods. We predicted not only individual gene related to cancer but also discovered significant gene regulation networks and the predicted results are also validated in published journals or experiment results. However, to elaborate the work to its best extent, there are still problems to be solved. Since the biological toolkits and databases are not complete enough, they would tend to miss transcription factors that are not yet found. For example, the PSSM from TFSEARCH database is incomplete to detect necessary transcription factors and binding sites. This can reduce the recall of the inference method to miss the inferring genes in the interaction networks. In future work, we could use different microarray data about the cancers to test our methods and integrate further the protein-protein interaction information to construct a more complete gene and protein networks. Then the biologists armed with information of the discover up-stream and down-stream biological interaction mechanisms of genes and proteins could possibly understand more clearly the reaction pathways of biological organisms response to various diseases. We want to explore the network variation underlying different conditions and develop a networkbased method to classify the different clinical heterogeneity.
This research is partially supported by the Bioresources Collection and Research Center of LinkoChung Gang Hospital and National Tsing Hua Universityof Taiwan R. O. C. under the grant number CGTH96-T13 (CGMH-NTHU Joint Research No.13).
- Parkin DM, Bray FI, Devesa SS: Cancer burden in the year 2000. The global picture. European Journal of Cancer. 2001, 37 (Supplement 8): 4-66. 10.1016/S0959-8049(01)00267-2.View Article
- Lapointe J, Li C, Higgins JP, Rijn van de M, Bair E, Montgomery K, Ferrari M, Egevad L, Rayford W, Bergerheim U, Ekman P, DeMarzo AM, Tibshirani R, Botstein D, Brown PO, Brooks JD, Pollack JR: Gene expression profiling identifies clinically relevant subtypes of prostate cancer. Proceedings of the National Academy of Sciences of the United States of America. 2004, 811-816. 10.1073/pnas.0304146101.
- Akutsu T, Miyano S, Kuhara S: Algorithms for identifying Boolean networks and related biological networks based on matrix multiplication and fingerprint function. Proceedings of the fourth annual international conference on Computational molecular biology, New York, NY, USA. 2000, 8-14.
- de Hoon MJL, Imoto S, Miyano S: Inferring Gene Regulatory Networks from Time-Ordered Gene Expression Data Using Differential Equations. Proceedings of the 5th International Conference on Discovery Science. 2002, London, UK: Springer-Verlag, 267-274.
- Friedman N, Linial M, Nachman I, Pe'er D: Using Bayesian networks to analyze expression data. Journal of Computational Biology. 2000, 7 (3): 601-620. 10.1089/106652700750050961.PubMedView Article
- Mayo M: Learning Petri net models of non-linear gene interactions. Biosystems. 2005, 82: 74-82. 10.1016/j.biosystems.2005.06.002.PubMedView Article
- Cheng J, Bell D, Liu W: Learning bayesian networks from data: An efficient approach based on information theory. Technical Report. 1998, University of Alberta
- Segal E, Barash Y, Simon I, Friechnan N, Koller D: From promoter sequence to expression: A probabilistic framework. Proceedings of Sixth Annual International Conference on Computational Molecular Biology. 2002, 263-272.
- Haverty PM: Computational inference of transcriptional regulatory networks from expression profiling and transcription factor binding site identification. Nucleic Acids Research. 2004, 32: 179-188. 10.1093/nar/gkh183.PubMedPubMed CentralView Article
- Wei H, Kaznessis Y: Inferring gene regulatory relationships by combining target-target pattern recognition and regulator-specific motif examination. Biotechnology and Bioengineering. 2005, 1: 53-77. 10.1002/bit.20305.View Article
- Tamada Y, Kim S, Bannai H, Imoto S, Tashiro K, Kuhara S, Miyano S: Estimating gene networks from gene expression data by combining Bayesian network model with promoter element detection. Bioinformatics. 2003, 19: 227-236. 10.1093/bioinformatics/btg1082.View Article
- Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB: Missing value estimation methods for DNA microarrays. Bioinformatics. 2001, 17 (6): 520-525. 10.1093/bioinformatics/17.6.520.PubMedView Article
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene Ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics. 2000, 25 (1): 25-29. 10.1038/75556.PubMedPubMed CentralView Article
- Neapolitan RE: Learning Bayesian Networks. 2003, Prentice Hall
- Bairoch A: The ENZYME database in 2000. Nucl Acids Res. 2000, 28: 304-305. 10.1093/nar/28.1.304.PubMedPubMed CentralView Article
- Curwen V, Eyras E, Andrews TD, Clarke L, Mongin E, Searle SM, Clamp M: The Ensembl Automatic Gene Annotation System. Genome Research. 2004, 14 (5): 942-950. 10.1101/gr.1858004.PubMedPubMed CentralView Article
- TFSEARCH. [http://www.cbrc.jp/research/db/TFSEARCH.html]
- Lopez-Serra L, Esteller M: Proteins that bind methylated DNA and human cancer: reading the wrong words. Br J Cancer. 2008, 98 (12): 1881-1885. 10.1038/sj.bjc.6604374.PubMedPubMed CentralView Article
- David JC, Bell DA, Liu W: An Algorithm for Bayesian Belief Network Construction from Data. Proceedings of AI & STAT. 1997, 83-90.
- Acid S, Campos LMD: An Algorithm for Finding Minimum d-Separating Sets in Belief Networks. Proceedings of the twelfth Conference of Uncertainty in Artificial Intelligence. 1996, 3-10.
- Barabasi AL, Oltvai ZN: Network biology: Understanding the cell's functional organization. Nature Reviews Genetics. 2004, 5 (2): 101-13. 10.1038/nrg1272.PubMedView Article
- Shen-Orr SS, Milo R, Mangan S, Alon U: Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet. 2002, 31 (1): 64-68. 10.1038/ng881.PubMedView Article
- Sherlock G, Hernandez-Boussard T, Kasarskis A, Binkley G, Matese JC, Dwight SS, Kaloper M, Weng S, Jin H, Ball CA, Eisen MB, Spellman PT, Brown PO, Botstein D, Cherry JM: The stanford microarray database. Nucleic Acids Research. 2001, 29: 152-155. 10.1093/nar/29.1.152.PubMedPubMed CentralView Article
- Huang DW, Sherman B, Lempicki R: Systematic and integrative analysis of large gene lists using DAVID Bioinformatics Resources. Nat Protoc. 2009, 4 (1): 44-57. 10.1038/nprot.2008.211.View Article
- Subramanian A, Tamayo Po, Mootha V, Mukherjee S, Ebert Bn, Gillette M, Paulovich A, Pomeroy S, Golub T, Lander E, Mesirov J: Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS. 2005, 102 (43): 15545-15550. 10.1073/pnas.0506580102.PubMedPubMed CentralView Article
- Potapov A, Voss N, Sasse N, Wingender E: Topology of Mammalian Transcription Networks. Genome Informatics. 2005, 16 (2): 270-278.PubMed
- Sol'e RV, Ferrer-Cancho R, Montoya JM, Valverde S: Selection, tinkering, and emergence in complex networks. Complex. 2002, 8: 20-33. 10.1002/cplx.10055.View Article
- Wagner A, Wright J: Alternative routes and mutational robustness in complex regulatory networks. Biosystems. 2007, 88 (1-2): 163-172. 10.1016/j.biosystems.2006.06.002.PubMedView Article
- Dijkstra EW: A note on two problems in connexion with graphs. Numerische Mathematik. 1959, 1: 269-271. 10.1007/BF01386390.View Article
- Benson M, Breitling R: Network Theory to Understand Microarray Studies of Complex Diseases. Current Molecular Medicine. 2006, 6 (6): 695-701. 10.2174/156652406778195044.PubMedView Article
- Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M: The KEGG resource for deciphering the genome. Nucleic Acids Res. 2004, 1: 277-80. 10.1093/nar/gkh063.View Article
- Humbert P, Rogers O, Ganiatsas C, Landsberg S, LTrimarchi R, Dandapani JM, Brugnara S, Erdman C, Schrenzel S, M Bronson RT: E2F4 is essential for normal erythrocyte maturation and neonatal viability. Molecular cell. 2000, 6 (2): 281-291. 10.1016/S1097-2765(00)00029-0.PubMedView Article
- Tomlins SA, Rhodes DR, Perner S, Dhanasekaran SM, Mehra R, Sun XW, Varambally S, Cao X, Tchinda J, Kuefer R, Lee C, Montie JE, Shah RB, Pienta KJ, Rubin MA, Chinnaiyan AM: Recurrent Fusion of TMPRSS2 and ETS Transcription Factor Genes in Prostate Cancer. Science. 2005, 310 (5748): 644-648. 10.1126/science.1117679.PubMedView Article
- Gordon S, Akopyan G, Garban H, Bonavida B: Transcription factor YY1: structure, function, and therapeutic implications in cancer biology. Oncogene. 2005, 25 (8): 1125-1142. 10.1038/sj.onc.1209080.View Article
- Atlas of Genetics and Cytogenetics in Oncology and Haematology database. [http://atlasgeneticsoncology.org/Genes/Geneliste.html]
- Tomlins SA, Mehra R, Rhodes DR, Cao X, Wang L, Dhanasekaran SM, Kalyana-Sundaram S, Wei JT, Rubin MA, Pienta KJ, Shah RB, Chinnaiyan AM: Integrative molecular concept modeling of prostate cancer progression. Nat Genet. 2007, 39 (1): 41-51. 10.1038/ng1935.PubMedView Article
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1755-8794/2/70/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.