Skip to main content
Fig. 1 | BMC Medical Genomics

Fig. 1

From: Considerations for feature selection using gene pairs and applications in large-scale dataset integration, novel oncogene discovery, and interpretable cancer screening

Fig. 1

Strategies for using gene pairs to identify salient oncogenes and construct cancer screening tools. a Traditionally, differential gene expression is measured by comparing group means of individual genes. Gene pair analysis uses within-patient pairwise comparisons to obtain relative gene expression levels before making group comparisons. Thus, gene pairs only depend on gene rankings, not actual expression values. b This ranking-based methodology allows for integration of data across platforms. Next-generation sequencing (NGS), microarray, and qPCR all use different units for gene expression, but all three forms of data can be adapted to the gene pair framework. First, genes are ranked within each sample, enabling datasets to be combined. Next, pairwise comparisons are made in an exhaustive manner and feature selection is performed using filtering methods. The selected gene pairs can then be used for classification using ensemble methods. Also, the top features can be examined for their role in carcinogenesis. The use of gene pairs facilitates dataset integration in order to increase sample size and statistical power for robust oncogene detection. c Another application of gene pairs involves transparent and interpretable clinical screening tools. Circulating miRNA pairs can be used for cancer screening. First, miRNA in blood samples are quantified and ordered by expression level within each patient. Pairwise comparisons are made and feature selection is performed. Ensemble classifiers can then be built. In order to create more interpretable models, important rules are extracted and simplified tree ensemble learners (STEL) are constructed. This test is highly practical in a clinical setting because it is noninvasive, and the use of within-patient values is invariant to measurement platform and does not require the use of specific cutoffs or standard values

Back to article page