Gene expression profiling in whole blood identifies distinct biological pathways associated with obesity
© Ghosh et al. 2010
Received: 18 June 2010
Accepted: 1 December 2010
Published: 1 December 2010
Skip to main content
© Ghosh et al. 2010
Received: 18 June 2010
Accepted: 1 December 2010
Published: 1 December 2010
Obesity is reaching epidemic proportions and represents a significant risk factor for cardiovascular disease, diabetes, and cancer.
To explore the relationship between increased body mass and gene expression in blood, we conducted whole-genome expression profiling of whole blood from seventeen obese and seventeen well matched lean subjects. Gene expression data was analyzed at the individual gene and pathway level and a preliminary assessment of the predictive value of blood gene expression profiles in obesity was carried out.
Principal components analysis of whole-blood gene expression data from obese and lean subjects led to efficient separation of the two cohorts. Pathway analysis by gene-set enrichment demonstrated increased transcript levels for genes belonging to the "ribosome", "apoptosis" and "oxidative phosphorylation" pathways in the obese cohort, consistent with an altered metabolic state including increased protein synthesis, enhanced cell death from proinflammatory or lipotoxic stimuli, and increased energy demands. A subset of pathway-specific genes acted as efficient predictors of obese or lean class membership when used in Naive Bayes or logistic regression based classifiers.
This study provides a comprehensive characterization of the whole blood transcriptome in obesity and demonstrates that the investigation of gene expression profiles from whole blood can inform and illustrate the biological processes related to regulation of body mass. Additionally, the ability of pathway-related gene expression to predict class membership suggests the feasibility of a similar approach for identifying clinically useful blood-based predictors of weight loss success following dietary or surgical interventions.
While excess energy intake and declining energy expenditure are clearly important contributors, individual susceptibility to obesity is also strongly influenced by genetic factors. Twin, adoption, and family studies have indicated that 40-70% of inter-individual variation in body mass index (BMI) is heritable [1, 2]. A compendium of evidence for the genetic bases of obesity have been accrued from single-gene mutation studies, Mendelian inheritance patterns, transgenic and knockout murine models, animal and human quantitative trait loci (QTL), candidate-gene association studies, and genome scan linkages and have been incorporated into the Obesity Gene Map database . Also recently, a number of genome-wide association studies (GWAS) have demonstrated associations of single-nucleotide polymorphisms (SNPs) to qualitative and quantitative indices of adiposity in several populations [2, 4–10]. A combination of independent studies and meta-analysis of existing GWAS data have implicated a total of 18 genetic loci as relevant for body weight regulation to date .
In addition to DNA sequence variants, genetic influences are also manifested through differences in gene transcription, leading to differential messenger RNA levels. While such differences might be expected to occur in biologically relevant tissues (muscle and adipose tissue in obesity, for example), several recent studies have demonstrated an alteration in the peripheral blood transcriptome in diseases of non-hematologic origin. These include disorders such as chronic fatigue syndrome, schizophrenia and colon cancer [12–17]. Additionally, the blood transcriptome has also been found to be responsive to diverse environmental and socio-economic stimuli including ionizing radiation in cancer therapy, benzene exposure, socio-economic status, etc. [18–21]. These findings raise the intriguing possibility that blood transcriptome profiles might provide a valid biological readout for otherwise hard to study disease processes in humans and additionally generate information of high predictive and diagnostic content. In line with this argument, we postulated that differences in transcript abundance might also occur in blood from obese subjects compared to lean subjects, as a consequence of either pre-existing genetic variations, or as an adaptive response to obesity, independent of the genetic background. To test this hypothesis, we have carried out transcriptional profiling of peripheral blood from obese subjects and well-matched lean controls and conducted enrichment analysis to identify biological pathways that are preferentially associated with obesity. Our study demonstrates significant gene expression differences in blood from obese subjects compared to lean controls, particularly along the lines of differential expression of genes in key metabolic pathways regulating cell survival, protein synthesis and energy harvest. These findings are important on three levels. First, our results demonstrate the importance of blood as a biologically informative tissue in the elucidation of the obese state. Second, as differences in gene expression are often driven by sequence variants in gene regulatory regions, our study provides a mechanism for the selection of obesity-associated candidate genes for the determination of possible regulatory sequence variants. Finally, the identification of adiposity related gene expression differences in a clinically accessible tissue such as blood leads the way for the determination of biomarkers of weight regulation that could be implemented in a clinical setting.
Demographic and phenotypic characteristics of the study population.
BMI at baseline (kg/m2)
BP, diastolic (mm Hg)
BP, systolic (mm Hg)
Waist circumference (cm)
Body fat (%)
Fat free mass (kg)
Fat mass (kg)
Fasting glucose (mmol/L)
Thyroid stimulating hormone (mU/L)
Genes showing differential expression between the obese and lean subjects were identified via the Comparative Marker Selection module in GenePattern , using the signal-to-noise algorithm for ranking genes. A permutation testing was performed to compute the significance (nominal p-value) of the rank assigned to each gene. A false discovery rate (FDR) was also calculated to control for multiple testing. A total of 12127 probesets were detected above background (set to 50 units) among which 374 probesetes were overexpressed (2-fold or greater) and 75 probesets were underexpressed (2-fold or greater) in the obese samples compared to the leans. The results of the differential gene analysis are presented in Additional Files 3 and 4. Inspection of the gene list showed that a majority of the genes upregulated in the obese subjects were genes known to be selectively expressed in erythrocytes/reticulocytes. These included genes such as carbonic anhydrase, ferrochelatase, synuclein, glycophorin B, etc. This finding is consistent with previous observations of higher red blood cell counts (hematocrit) in obesity [23–26] and provides evidence for the expansion of transcriptionally active reticulocytes in obesity. Conversely, several genes related to immune function showed reduced expression in the obese subjects.
The transcriptome data was next subjected to bioinformatic pathway analysis by the Gene Set Enrichment Analysis (GSEA) algorithm . The values for the GSEA algorithmic parameters used in the current study are indicated in Additional File 5 and details about the GSEA algorithm have been explained in Materials and Methods. Pathway analysis was conducted either with the Kyoto Encyclopedia for Genes and Genomes (KEGG) metabolic pathway database , or a user-created custom database consisting of pathways drawn from several sources (Additional File 6). Pathways were evaluated by their normalized enrichment score (NES), nominal p-values (permuted) and false discovery rates, as described in .
In addition to investigating pathway enrichment based on the KEGG database, we also subjected a set of 'custom' pathways to analysis by GSEA (Additional File 6). GSEA analysis of the custom pathways identified 2 pathways as significantly upregulated in the obese, at a nominal p-value < 5% and FDR < 5%. These were the 'electron transport chain pathway' and the 'erythrocyte/reticulocytespecific_affytechnote' pathways (Additional File 9). The 'electron transport chain pathway' (National Cancer Institute Pathway Interaction Database ) is a subset of the KEGG 'oxidative phosphorylation' pathway. The 'erythrocyte/reticulocytespecific_affytechnote' pathway consists of genes reported to be selectively enriched for expression in erythrocytes/reticulocytes (Affymetrix, [31, 32]). Identification of this gene-set as an obesity-upregulated pathway further supports our earlier observation of increased expression of individual erythrocyte/reticulocyte specific genes in the obese subjects. Details are provided in Additional Files 10 and 11.
Since our study cohort contained both male and female subjects, the contribution of gender to pathway enrichment was investigated. To determine whether pathway ranks were influenced by gender, we carried out independent gene-set enrichment analyses on subgroups comprised of female or male subjects only. We compared the relative ranks of the KEGG pathways in the three analyses as an indication of their sensitivity to gender. 'Apoptosis' was ranked 7th, 8th and 3rd and 'oxidative phosphorylation' was ranked 10th, 12th and 18th for All subjects, Females and Males respectively. The 'ribosome' pathway was the top ranked pathway for All subjects and Females analysis, but was ranked 27th in the analysis involving the Males. We repeated the same subgroup analyses on the custom pathway set and in all cases the 'electron transport chain pathway' and the 'erythrocyte/reticulocytespecific_affytechnote' pathways remained the top 2 ranked pathways for all groups tested. Details are provided in Additional File 12.
Since whole-blood consists of a mixture of various cell types, we investigated the relation between the observed enrichment in "ribosome", "apoptosis" and "oxidative phosphorylation" pathways in the obese and enrichment of reticulocytes/erythrocytes in obese subjects as previously reported [23–26]. We scaled the gene expression data independently by the expression of 2 erythrocyte-specific transcripts, hemoglobin D (HBD) and erythrocyte membrane protein, band 2 (EMPB2) and subjected the scaled data to gene-set enrichment analysis. Of the original 3 pathways found to be enriched in the obese subjects, the "ribosome" pathway was still the top differentially expressed pathway with both unscaled and scaled data. However, the "apoptosis" and "oxidative phosphorylation" pathways were no longer significantly enriched, with either of the scaled datasets. Pathway enrichment results with scaled data are provided in Additional File 13.
Classification of lean and obese subjects
True Positive Rate (Sensitivity)
False Positive Rate
True Negative Rate (Specificity)
False Negative Rate
Identity of genes constituting the 11-gene classifier.
cytochrome c oxidase subunit VIIb
ATP synthase, H+ transporting, mitochondrial F0 complex, subunit G
ATPase, H+ transporting, lysosomal 42kDa, V1 subunit C1
Fas (TNF receptor superfamily, member 6)
cytochrome c oxidase subunit VIIc
v-rel reticuloendotheliosis viral oncogene homolog A
baculoviral IAP repeat-containing 2
ATPase, H+ transporting, lysosomal 13kDa, V1 subunit G isoform 1
protein phosphatase 3 (formerly 2B), catalytic subunit, alpha isoform
DNA fragmentation factor, 40kDa, beta polypeptide
protein kinase, cAMP-dependent, regulatory, type II, alpha
Our study demonstrates significant gene expression differences in whole blood from age-matched obese and lean subjects of Northern European White genetic ancestry. These differences further lead to the identification of differentially enriched biological pathways in obesity and lead to an increased appreciation and understanding of genomic changes in whole blood related to body mass expansion. The current study is not designed to resolve whether the observed transcriptional differences are causal or caused, i.e. whether the differences in gene expression are related to the development of obesity or reflect an adaptive mechanism in response to increased body mass. Although blood is usually not considered to be a target organ for obesity, certain observations are pertinent. First, the physiological role of blood as a sentinel tissue and a systemic integrator of tissue and organ-level perturbations could lead to adaptive responses in response to major metabolic perturbations such as excessive build-up of body mass and the attendant increases in the demand for nutrient and oxygen transport. Secondly, the chronic low-grade tissue inflammation observed in obesity  is expected to have a direct effect on circulating leukocytes, including immune dysfunction and apoptosis. Finally, macrophages in blood share many functional and antigenic properties with preadipocytes and adipocytes and transcriptome profiles of preadipocytes are reportedly closer to the macrophages than to adipocytes . In this context, our study provides the first detailed investigation of the blood transcriptome in relation to obesity and provides evidence in favor of its dynamic involvement in the process. It is important to note here that the between-group differences in gene expression were usually small and there was considerable heterogeneity in individual gene expression values among subjects in the obese or lean categories. However, the between-group variation exceeded the within-group variation for several genes leading to statistically significant differences between the groups. Additionally, as demonstrated by principal components analysis, blood gene expression profiles were able to distinguish lean subjects from obese subjects even when the subject classes were not exposed a priori (unsupervised clustering). Since gene expression measures were used as input for the PCA analysis, these results suggest that the differences in blood transcript levels between obese and lean subjects were significant and informative enough to cause a separation between the two classes.
The application of pathway analysis provided additional information and insight into the biological processes that are differentially regulated in obese and lean blood samples. Some of the pathways with increased component transcript abundances included the "ribosome", "apoptosis" and "oxidative phosphorylation" pathways. Upregulation of the ribosomal pathway in the obese subjects was due to an increased expression of several ribosomal protein-encoding genes, indicative of enhanced protein synthesis in blood cells, possibly as a consequence of enhanced metabolic demands in the obese state. This observation is consistent with a recent report that links ribosomal RNA synthesis to cellular energy supply through activation of the AMP-activated protein kinase . The presence of increased apoptosis in the obese phenotype has also been well documented in animal and human cell culture models. For example, increased cardiomyocyte apoptosis has been reported in leptin-deficient ob/ob mice and leptin-resistant db/db mice . Prolonged exposure to free fatty acids also have pro-apoptotic effects on human pancreatic islets  and circulating cytokines, such as tumor necrosis factor alpha (TNF-α) have been reported to induce apoptosis in cultured human preadipocytes and adipocytes . Our findings now provide evidence for activation of a similar apoptotic program in blood from obese subjects. While the current study does not allow us to pinpoint the cause of the enhanced apoptosis, we speculate that obesity-associated chronic inflammation [39, 40] or lipotoxicity are contributing factors. Finally, the observed upregulation of the 'oxidative phosphorylation' pathway in obese subjects is consistent with a response to increased energy demands in obese subjects. Functional and gene expression studies have previously indicated impairment in oxidative phosphorylation and mitochondrial function in subjects with type 2 diabetes compared to controls [29, 41, 42]. Our findings are consistent with Takamura et al., who demonstrated an upregulation of oxidative phosphorylation genes in the livers of obese, type 2 diabetic patients compared to non-obese diabetics . More interestingly, our findings now point to a similar involvement of energy-harvesting mechanisms in obese blood and provide further evidence in favor of a role for mitochondrial dysfunction in obesity [44, 45]. A gender-based sub-analysis demonstrated relative stability of the "apoptosis" and "oxidative phosphorylation" pathway ranks in both genders; in contrast, the "ribosome" pathway differed significantly in rank between females and males, suggesting a gender-specific effect (Additional File 7). Since a majority of genes upregulated in the obese subjects are highly expressed in erythrocytes and reticulocytes, we scaled the gene expression data independently by the expression of two erythrocyte-specific transcripts, hemoglobin D (HBD) and erythrocyte membrane protein, band 2 (EMPB2) and subjected the scaled data to gene-set enrichment analysis. Of the three pathways found to be differentially upregulated in the obese subjects, the "ribosome" pathway remained the top differentially expressed pathway (with the scaled data) whereas the "apoptosis" and "oxidative phosphorylation" pathways were no longer significantly enriched, with either of the scaled datasets. These findings suggest that an increase in erythrocyte/reticulocyte numbers in the obese (differential hematocrit) is a possible explanatory mechanism for the observed increase in transcript levels for "apoptosis" and "oxidative phosphorylation" in the obese subjects. The results for the "ribosome" pathway, in contrast, suggest a significant upregulation of the transcripts for the component genes of this pathway in the obese subjects, even after adjustment for erythrocyte-specific gene expression. We note one caveat to the scaling approach used here for investigating cell number effects. Since the same amount of cRNA was used from each sample for hybridization, the relative enrichment of cell types is expected to have a real effect on gene expression only for genes that are differentially expressed among the cell types (e.g. hemoglobin transcripts that are expressed only in reticulocytes and not lymphocytes). For genes expressed at comparable levels across cell types, the differential cell type representation should not have an effect on expression unless there is a true upregulation or downregulation of these genes between the two groups (although the cellular origin for the differential expression may not be known). Scaling the gene expression data by the expression of reticulocyte/erythrocyte specific genes cannot distinguish between the above two mechanisms of enhanced gene expression and can lead to potentially incorrect conclusions. However, our results clearly demonstrate that inter-individual variations in hematocrit, especially between obese and lean subjects, may affect interpretation of expression data and should be considered as an important co-variate in future studies.
Several recent publications have reported on the successful application of gene expression signatures as classifiers or predictors of phenotypic class, disease progression and therapeutic prognosis, primarily in the area of diagnosis and treatment of several types of cancers [16, 46–48]. However, the biological mechanisms linking the predictive genes to the outcomes being predicted are not always clear. This lack of mechanism has often been criticized as a barrier to the clinical utility of the gene predictors. One solution to the problem is to choose gene predictors from biological pathways associated a priori with the phenotype or outcome of interest. This approach was pursued in this study and led to the identification of an 11-gene based classifier that could distinguish and predict obese and lean subjects with high accuracy. Our motivation for this exercise was to provide proof-of-concept data to test if blood gene expression patterns can have predictive value in the context of obesity. While such prediction is not necessarily required for distinguishing obesity from leanness, blood based gene biomarkers can significantly advance the clinical management of obesity by, for example, allowing the prediction of weight loss success from diet or bariatric surgery.
One potential downstream application of differential gene expression analysis in whole-blood is the selection of candidate genes with possible regulatory polymorphisms (single nucleotide polymorphisms in promoter regions, for example) that associate with obesity and help explain the observed differences in expression. Comprehensive sequencing of the regulatory regions of such candidate genes are expected to yield additional insights into the genetics of obesity such as the identification of expression QTLs (eQTLs). While a direct subject-level association of gene regulatory polymorphisms to gene expression levels is outside the scope of the current work, we conducted a preliminary analysis of the existence of putative regulatory variants in the 11 gene predictors identified in our analysis. Based on data from the NCBI dbSNP database (Build 131), several genes contained common sequence variants near the 5'-end of the gene spanning a region 2000 bases upstream of the start codon (SNPs rs2515192 and rs3019164 for ATP6V1C1, rs1317775 and rs1318199 for BIRC2, rs11709092 for PRKAR2A, etc.). It is reasonable to speculate that a subset of these upstream sequence variants could influence transcription.
Our study relied on whole-blood collected in PAXgene tubes instead of peripheral blood mononuclear cells (PBMCs), consistent with our ultimate goal of identifying clinically relevant and useful predictors of weight loss success. This procedure, however, has the disadvantage of investigating a relatively heterogeneous cell population where noise could mask gene expression differences in specific cell types. PBMC's, consisting of lymphocytes and monocytes provide a consistent and homogeneous sample for transcriptome analysis. However, the extra fractionation procedure for PBMCs requires a prolonged period before RNA stabilization leading to significant ex vivo changes in gene expression profiling . Additionally, compared to whole blood, several cell types including neutrophils, basophils, eosinophils, platelets, reticulocytes and erythrocytes are depleted in PBMCs which lead to loss of important transcription information. On the other hand, PAX samples show a decrease in the number of expressed genes and lower gene expression values with higher variability compared to the PBMCs , primarily due to the high abundance of globin transcripts that constitute over 70% of whole blood mRNA . However, the PAXgene system employs an easy way to collect, store, transport and stabilize RNA from whole blood and based on our overall goals, was the method of choice for our analysis. In this context, the ability of gene expression signatures from biologically relevant pathways to accurately classify and predict obese and lean classes, as observed in this study, provides further validation of our approach and suggests future suitability of the PAXgene based whole blood transcriptome for yielding clinically usable biomarkers related to weight regulation. Additional sensitivity could be obtained in future studies via selective reduction of the globin transcript from whole blood RNA samples [52, 53].
There are the following limitations to the current study. First, since the study employed whole blood, the relative contribution of the number and transcriptional programs in specific cell types towards the observed gene expression differences cannot be clearly delineated. Second, the relatively small sample sizes reduced the power for detection of subtle differences in expression. Also, due to small sample numbers, we had to rely on cross-validation methods for calculation of prediction errors instead of testing candidate predictors on new samples. The possibility of over-fitting cannot, therefore, be entirely ruled out.
Gene expression profiling in whole blood demonstrated significant differences in transcript levels that were capable of separating obese and lean phenotypes in multivariate analysis. Gene-set enrichment analysis further identified differences in biological pathways relating to cell survival, protein synthesis and energy harvest between the obese and lean groups. A subset of genes responsible for pathway enrichment also acted as efficient predictors of phenotype (obese or lean) when their expression signatures were used as inputs to Naive Bayes or logistic regression based classifiers. Together, our study is the first to investigate the information content in whole blood in relation to obesity. Our findings demonstrate that the investigation of gene expression profiles from whole blood can inform and illustrate the biological processes related to regulation of body mass. Additionally, the ability of pathway-related gene expression to predict class membership suggests the feasibility of a similar approach to identify blood-based robust predictors of weight loss success in response to dietary and surgical interventions.
Twenty consecutive obese subjects enrolled in the Ottawa Hospital Weight Management Program at the Ottawa Hospital, Ottawa, with a body mass index (BMI) of 30-50 kg/m2, were recruited for study. All subjects were of Northern European White genetic ancestry. Patients were excluded on the basis of medical conditions possibly affecting whole blood gene expression, including out of normal range thyroid indices (TSH, free T3) at week 1 or week 13, diabetes mellitus treated with insulin or oral hypoglycemic agents, cigarette smoking, congestive heart failure, obstructive sleep apnea, active malignancy. Patients treated with weight-altering medications including tricyclic antidepressants, paroxetine, mirtazepine, lithium, valproate, gabapentin and typical and atypical antipsychotics, fluoxetine in doses greater than 20mg, bupropion, topiramate, systemic glucocorticoids and weight management drugs were also excluded. Blood samples were collected at baseline prior to initiation of weight loss therapy. Twenty lean subjects from the same genetic ancestry (Northern European White), with a BMI ≤ the 10th percentile for age and sex and no prior history of having had a BMI> 25th percentile for more than a 2 year consecutive period, were recruited from the Ottawa community. Lean subjects were excluded if they had any medical conditions affecting weight gain such as hyperthyroidism, anorexia nervosa, bulimia, major depression, or malabsorption syndromes. BMI for obese and lean subjects was categorized according to the population percentiles for age and sex using the Canadian Heart Health Survey data for subjects over the age of 18 years (data on file; Health Canada). The study protocol was approved by the Human Research Ethics Committees of the Ottawa Hospital and the University of Ottawa Heart Institute and informed consent was obtained from all participants prior to their enrolling into the program.
2.5 ml of fasting whole blood was drawn from study subjects by standard venipuncture and directly transferred to PAXgene blood RNA tubes (Qiagen, Santa Clara, CA). PAXgene tubes were processed at designated times after phlebotomy by the PAXgene protocol. Isolation of total RNA was accomplished according to the manufacturer's instructions. Prior to further processing, RNA quality was ascertained by electropherograms on the Agilent 2100 Bioanalyzer. Extracted RNA from all samples was stored -70°C until processed for microarray hybridizations.
Hybridization of 100 nanograms of labeled cRNA from each sample was carried out on Affymetrix GeneChip® Human Genome U133 Plus 2.0 Arrays according to the manufacturer's instructions. Microarray data was deposited in the Gene Expression Omnibus data repository (accession number GSE18897). Gene expression signals were generated from hybridized and scanned Affymetrix arrays by the GC-RMA algorithm . Probesets with a normalized average expression level of less than 50 units in all of the tested groups were eliminated from further analysis. Significance of differential gene expression was ascertained via the signal-to-noise algorithm from the GenePattern Comparative Marker Selection module , employing a permutation-based t-test and false discovery rate (FDR) control. The Signal-to-Noise feature selection method is a variation of the more commonly used t-test statistic and looks at the difference of the means in each of the classes scaled by the sum of the standard deviations: Sx = (μ0-μ1)/(σ0 + σ1) where μ0 is the mean of class 0 and σ0 is the standard deviation of class 0. The Signal-to-Noise statistic penalizes genes that have higher variance in each class more than those genes that have a high variance in one class and a low variance in another.
Bioinformatic pathway analysis was conducted with the Gene Set Enrichment Analysis (GSEA) software package [27, 55]. GSEA is a computational method to detect statistically significant, concordant differences in a priori defined gene sets (pathways) between two biological states. GSEA accomplishes this task by calculating a weighted Kolmogorov-Smirnov statistic, adjusted for gene-set size (known as the Normalized Enrichment Score, NES) for each gene-set, based on the over-representation of members of a gene-set towards the top or bottom of a list of genes ranked by the strength of their correlation (positive or negative) to one of the two phenotypes. The statistical significance of NES score is estimated by a permutation test based on random shuffling of the phenotype or tag (gene) labels. GSEA addresses the problem of multiple testing (testing hundreds of gene-sets simultaneously) by calculating a false-discovery rate and a family-wise error rate on the ES p-values.
Whole blood was collected in PAXgene™ blood tubes (Qiagen, Santa Clara, CA) and total RNA was extracted using the PAXgene™ blood kit. All RNA was treated with DNase I to remove genomic DNA contamination. The RNA was converted to cDNA in a 96-well microtiter plate on an ABI PRISM 7700 Sequence Detector System (Applied Biosystems, Foster City, CA) using the Applied Biosystems High Capacity cDNA archive kit. Gene expression was conducted on the Applied Biosystems 7900 using TaqMan® RT-PCR technology. A global median absolute deviation (MAD) was computed from the gene expression values by taking the median deviation for each set of technical replicates, using either the Ct values or log2 calculated abundances. Outliers were defined as having more than five times the global MAD. Following technical and biological outlier identification the data was normalized using reference housekeeper genes. The mean Ct value of all reference genes across all samples ("global mean Ct") was subtracted from the mean Ct value of all reference genes within each sample ("sample reference mean") to determine a normalization factor for each sample. The normalization factor for a given sample was then subtracted from its Ct value resulting in a normalized Ct. All Ct values were then converted to log2 abundances.
Class prediction (obese or lean) from gene expression data was carried out through the WEKA Explorer and WEKA Experimenter applications. First, 183 genes belonging to the 3 obese-upregulated pathways (ribosome, apoptosis and oxidative phosphorylation) were used to identify a subset of maximally informative features (genes) for classifier testing while removing irrelevant or redundant features that could negatively impact algorithm performance. Feature selection was accomplished by two independent 'filtering-based' algorithms (Information Gain and Cfs Subset Evaluator) and using 10-fold cross validation for each method [56, 57]. We did not use 'wrapper-based' feature-selection because we wanted the selected features to be independent of classification algorithms . Both procedures resulted in a list of genes that were then ranked based on their importance in each feature selection method. From these ranked lists, we selected a total of 11 genes that were ranked within the top 20 genes in both lists. Gene expression signals for these 11 genes were then used as input in 4 different classifiers (Naïve Bayes, Logistic Regression, Random Forests and ZeroR) representing 4 different algorithmic approaches (Bayesian, regression, decision trees and rule-based, respectively) which were independently tested for predictive performance (Additional File 13) [59, 60]. Classifer-specific parameters were kept at the defaults provided in WEKA Experimenter. Each classifier used 66% of the samples for training (from a total of 34 obese plus lean subjects) and 33% for testing (chosen at random for each round) for a total of 100 iterations. For each classifier, the true positive rate, true negative rate, false positive rate, and false negative rates were calculated (average plus standard deviation over 100 iterations) and the values used to compare individual classifiers for their predictive performance.
This work was conducted with a grant support from GlaxoSmithKline. Part of the study was supported by NIH grants NHLBI-5R25HL059868-10 and NIDDK-1R21DK088319-01 (Ghosh) and a grant from the Heart & Stroke Foundation of Ontario (NA-5413; McPherson, Dent and Harper).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.