Skip to main content
Fig. 2 | BMC Medical Genomics

Fig. 2

From: Comprehensive discovery of subsample gene expression components by information explanation: therapeutic implications in cancer

Fig. 2

Correlation Explanation algorithm applied to tumor RNA-seq. For training, CorEx is provided only a matrix of normalized gene expression values for the available tumor samples. The number of possible labels for each latent factor is specified, here set to three. In this application, we also set the number of layer one latent factors to 200. CorEx finds probabilistic assignments of genes to latent factors by maximizing the total correlation of the genes in groups, simultaneously minimizing dependence between latent factor groups. The factor labels from lower layers are used as input to upper layers in order to generate a hierarchical model. The output from CorEx is thus a hierarchical model, specified as a set of probabilities characterizing the association of genes or factors with latent factors at the next highest layer as well as probabilities that a given tumor sample’s expression pattern can be explained by each factor in a particular label state. The three probabilities for a given tumor sample and factor are can be usefully summarized by a single value that is the natural logarithm of the probability difference for the factor labels corresponding to extremal expression. The genes with high mutual information relative to latent factors show clear patterns of correlation when viewed on expression heat maps with tumor samples ordered by the summary latent factor score

Back to article page