A comparison of statistical methods for the detection of hepatocellular carcinoma based on serum biomarkers and clinical variables

Background Currently, a surgical approach is the best curative treatment for those with hepatocellular carcinoma (HCC). However, this requires HCC detection and removal of the lesion at an early stage. Unfortunately, most cases of HCC are detected at an advanced stage because of the lack of accurate biomarkers that can be used in the surveillance of those at risk. It is believed that biomarkers that could detect HCC early will play an important role in the successful treatment of HCC. Methods In this study, we analyzed serum levels of alpha fetoprotein, Golgi protein, fucosylated alpha-1-anti-trypsin, and fucosylated kininogen from 113 patients with cirrhosis and 164 serum samples from patients with cirrhosis plus HCC. We utilized two different methods, namely, stepwise penalized logistic regression (stepPLR) and model-based classification and regression trees (mob), along with the inclusion of clinical and demographic factors such as age and gender, to determine if these improved algorithms could be used to increase the detection of cancer. Results and discussion The performance of multiple biomarkers was found to be better than that of individual biomarkers. Using several statistical methods, we were able to detect HCC in the background of cirrhosis with an area under the receiver operating characteristic curve of at least 0.95. stepPLR and mob demonstrated better predictive performance relative to logistic regression (LR), penalized LR and classification and regression trees (CART) used in our prior study based on three-fold cross-validation and leave one out cross-validation. In addition, mob provided unparalleled intuitive interpretation of results and potential cut-points for biomarker levels. The inclusion of age and gender improved the overall performance of both methods among all models considered, while the stratified male-only subset provided the best overall performance among all methods and models considered. Conclusions In addition to multiple biomarkers, the incorporation of age and gender into statistical models significantly improved their predictive performance in the detection of HCC.


Background
The major etiology of hepatocellular carcinoma is infection with hepatitis B virus (HBV) and/or hepatitis C virus (HCV) [1][2][3][4][5], which can lead to liver cirrhosis, the main risk factor for HCC. Worldwide, it is estimated that between 500,000-700,000 people die as a result of HCC every year [2,5,7].
Surgical treatments, such as tumor ablation, resection and transplantation still offer the best hope for long term survival but work best when tumors are caught at an early stage. Thus, the screening of the cirrhotic patient population for early detection is thought to be an important step to increase survival.
Currently, patients at risk for HCC are monitored either by imaging and/or through the use by serum levels of the glycoprotein, alpha-fetoprotein (AFP) or the core fucosylated glycoform of AFP (AFP-L3). However, AFP can have poor sensitivity and specificity [8][9][10], and is not present in many patients with HCC. Therefore the use of AFP as the primary screen for HCC is questioned [11] and more specific and sensitive, serum biomarkers for HCC are urgently needed [12][13][14][15][16].
We have previously observed increased levels of fucosylated proteins in the serum of those with HCC and through the use of fucose specific lectins we identified many of the proteins that become fucosylated with liver disease [17][18][19]. In the current study we have analyzed the performance of several of these potential biomarkers in the serum from 113 patients with cirrhosis and 164 serum samples from patients with cirrhosis plus HCC.
In an effort to maximize the detection of patients with cancer, we applied several novel bio-statistical tools to determine if improved algorithms would aid in the detection of cancer. This included combining biomarker values with clinical and demographic factors such as age and gender to improve diagnosis. Using several of these methods, we are able to detect HCC in the background of cirrhosis with a predictive probability of at least 0.95, a significant improvement relative to that of any marker when used alone. The potential benefit of using this combination of markers and clinical variables is discussed in this paper.

Patients
Serum samples were obtained from Saint Louis University School of Medicine or the University of Michigan. For samples obtained from the University of Michigan, the study protocol was approved by the University of Michigan's Institutional Review Board and written informed consent was obtained from each subject. Demographic and clinical information was obtained, and a blood sample was collected from each subject. Patients with HCC, and patients with cirrhosis that were age, gender, and race/ethnicity matched to the HCC patients were enrolled from the Liver Clinic during this period. The diagnosis of HCC was made by histopathology, including all T1 lesions, and if histopathology was NA by two imaging modalities (ultrasound [US], magnetic resonance imaging [MRI], or computed tomography) showing a vascular enhancing mass > 2 cm) [5]. Diagnosis of cirrhosis was based on liver histology or clinical, laboratory and imaging evidence of hepatic decompensation or portal hypertension [15]. Each of the patients with cirrhosis had a normal US and, if serum AFP was elevated, a MRI of the liver within 3 months prior to enrollment and another one 6 months after enrollment that showed no liver mass. The cirrhotic controls have been followed for a median of 12 months (range 7-18 months) after enrollment, and no one has developed HCC. A 20-ml blood sample was drawn from each subject, spun, aliquoted, and serum stored at -80°C until testing. Blood samples were drawn prior to initiation of HCC treatment. AFP was tested using commercially available immunoassays utilizing enhanced chemiluminescence at the University of Michigan Hospital Clinical Diagnostic Laboratory. The upper limit of normal was 8 ng/ml. For samples obtained from Saint Louis University School of Medicine, the study protocol was approved by the Saint Louis University Institutional Review Board and written informed consent was obtained from each subject. Demographic and clinical information was obtained, and a blood sample was collected from each subject in a serum separator tube, spun within 2 hours and serum stored at -80°C until testing. showing features characteristic of HCC and either an increase in size over time after initial discovery (at least doubling if less than 1 cm) or an increase in AFP to > 200 ng/ml. For the cirrhosis group, patients with Hepatitis C and biopsy proven cirrhosis were enrolled. All cirrhotic controls were screened for HCC using US, CT or MRI prior to enrollment.

Lectin FLISA
Monoclonal antibodies are fucosylated and are reactive with fucose binding lectins. Hence they must be modified prior to analysis via the Lectin-FLISA. Briefly, to remove the fucosylation of the capture antibody (Mouse antihuman A1AT or rabbit anti-human LMW kininogen, Bethyl Laboratories, Montgomery, TX), antibody was incubated with 10mM sodium periodate for 1 hour at 37°C. An equal volume of ethylene glycol was added and the oxidized antibody brought to a concentration of 10 μg/mL with sodium carbonate buffer, pH 9.5. Antibody (1 μg/well) was added to the plate and following incubation washed with 0.1% Tween 20/PBS 7.4 and blocked overnight with 3% BSA/PBS. For analysis, 5 μl of serum was diluted in 95 μL of Heterophilic Blocking tubes (Scantibodies Laboratory, Inc. Santee, CA 92071 USA) and incubated at room temperature for 1 hour. Subsequently, samples were added to the plates for 2 hours and washed 5 times in lectin incubation buffer (10mM Tris pH 8.0, 0.15M NaCl, 0.1%Tween 20) before fucosylated protein was detected with a biotin conjugated Aleuria aurantia (AAL) lectin (Vector Laboratories, Burlingame, CA). Bound lectin was detected using IRDye™ 800 Conjugated streptavidin and signal intensity measured using the Odyssey ® Infrared Imaging System (LI-COR Biotechnology, Lincoln, Nebraska) as described in [20,21]. In all cases sample intensity was compared to commercially purchased human serum (Sigma Inc., St Louis, MO.).

Immunoblotting for GP73
Equal volumes of patient sera were resolved by SDS-PAGE on 10% polyacrylamide gels and the proteins transferred to a PVDF membrane by immunoblotting. The membranes were blocked by incubating with a blocking buffer of 1x TBS (50 mM Tris-HCl, pH 7.6, 150 mM sodium chloride), 5% non-fat dried milk, and 0.1% Tween 20 for 1 hour at room temperature. The blots were incubated overnight with polyclonal anti-GP73 antibody (1:2000) and incubated with rocking at room temperature for 2 hours. Blots were subsequently washed 3x10mins 0.1% Tween-PBS and GP73 visualized using an IRDye™ 700 Conjugated mouse anti-rabbit secondary antibody (1:10,000). Signal intensity measured using the Odyssey ® Infrared Imaging System (LI-COR Biotechnology, Lincoln, Nebraska). In all cases sample intensity was compared to commercially purchased human serum (Sigma Inc., St Louis, MO).

Statistical methods
Univariate statistical analyses were performed using Fisher's exact test for categorical variables and the Mann-Whitney test for continuous variables. Univariate logistic regression analyses were also performed for each individual biomarker separately. Details of univariate analysis results are presented in [26]. A variety of methods and models were used in multivariable analyses for associating the incidence of HCC with biomarker levels and clinical/demographic variables such as age and gender. Specifically, two different but related methods were investigated in this approach -stepwise PLR (stepPLR) and model-based CART (mob). These two methods are improvisations of PLR and CART described in our previous work [26]. A variety of models were considered for each method. Details of these methods are provided in the ensuing paragraphs. All tests were two-sided and used a Type I Error of 0.05 to determine statistical significance.
PLR is a variant of logistic regression based on a quadratic penalty that is ideal for associating discrete factors and continuous variables such as gender, age and biomarker levels with a binary response such as HCC incidence. In PLR, we maximize the log-likelihood subject to a size constraint on the L 2 -norm of the coefficients (excluding the intercept) [22]. This penalized likelihood can be written as where l indicates the binomial log-likelihood and λ is a positive constant. The use of quadratic penalization provides stability to the model fit by overcoming collinearity among variables. Even though the number of variables in our application is limited, PLR is well suited for modeling a large number of variables. The sample size does not limit the number of such variables, and variable selection can be done using a forward stepwise approach. PLR is implemented in the open-source R package step PLR (http://www.r-project.org) [23]. The standard PLR approach is applied to a fixed set of biomarkers, clinical and/or demographic variables and was used in our previous work [26]. We extended this method to incorporate stepwise model selection in this paper. stepPLR provides the functionality for stepwise model selection based on PLR for a fixed value of λ. It tests for interactions between biomarkers, demographic and clinical variables and removes all non-significant terms. Stepwise regression is then performed for the pre-specified λ, and the remaining significant terms are included in the final model. stepPLR is typically repeated for various pre-specified values of λ and the best performing model is chosen. Three different values of the penalty parameter λ (0.1,1,10) were considered in our approach.
CART is based on decision trees and is a non-parametric approach. A decision tree is a logical model represented as a binary tree that shows how the value of a response variable can be predicted by using the values of a set of clinical variables. If the response variable is binary such as whether a patient developed HCC or not, then a classification tree is generated that predicts the probability of developing HCC. The unified CART framework based on conditional inference trees embeds recursive binary partitioning into the theory of permutation tests [24]. This methodology is applicable to all types of regression settings and overcomes the problem of over-fitting and selection bias towards variables with many possible splits or missing values. The conditional distribution of statistics used in this approach results in unbiased selection among covariates measured at different scales. Significance testing procedures are applied to determine whether no significant association between any of the covariates and the response can be stated and the recursion needs to stop. The function ctree() in the open-source R package PARTY (http://www.r-project.org) [25] implements this non-parametric approach and was used in our previous work [26]. In this paper, we extended this approach to incorporate parametric modeling. Specifically, it borrows strength from binary recursive partitioning in CART and the parametric approach in LR. This modelbased approach to CART is based on generalized linear models [24] and is implemented in the function mob() of the package PARTY. It was used to model the effects of biomarker levels, gender and age associated with the development of HCC.
In order to evaluate the performance of statistical models combining multiple biomarkers and/or clinical variables, values of multiple biomarkers were inputted into the model from the appropriate method, and in each case the output (predicted value) was between 0 and 1, with 0 being cirrhosis and 1 being cancer. A cut-off of 0.5 was used for the predicted probability p and patients were classified as being HCC positive when p >= 0.5, otherwise they were classified as cirrhotic (p<0.5). To determine the optimal cutoff value for each biomarker or a combination of biomarkers and/or clinical variables, Receiver Operating Characteristic (ROC) curves were constructed using all possible cutoffs for each method. Sensitivity and specificity (along with 95% confidence interval (CI)) were used to characterize the precision of binary predictions from stepPLR and mob. Area under the ROC curves (AUC) (along with 95% CI), prediction accuracy (ACC) positive predictive value (PPV) and negative predictive value (NPV) were used to characterize the predictive value of models from these methods. For each model considered, the Akaike Information Criterion (AIC) was calculated.
In addition, the performance of each model was evaluated using leave-one-out cross validation (LOOCV) and three-fold cross validation (3CV). For details on LOOCV and 3CV, the interested reader is referred to our previous work [26]. Using results from LOOCV, an ROC curve and its AUC (with 95% CI) was computed based on the predicted probabilities. This is the cross-validated AUC. Likewise, sensitivities at set specificities from this ROC curve can be estimated. In order to evaluate the performance of each model on independent data in the absence of a validation set, 3CV was used. Using 200 random partitions of the dataset based on 3CV, the mean AUC, its standard deviation and 95% CI were computed.

Results and discussion
Univariate analysis A significant association between gender and the incidence of HCC was found, with a significantly increased odds of HCC in males (odds ratio = 1.75) compared to females. A statistically significant association between age and incidence of HCC was also observed. Results of univariate analyses are reported in detail in our previous study [26].

Multivariable analysis
Data obtained across two sites were used in the analyses. In order to adjust for any potential differences in biomarker levels obtained at different sites, a dichotomous, nominal variable site (indicating the site where the data was obtained for each observation) was incorporated into the modeling as a covariate. For each statistical method used, four different models were considered based on the inclusion of age and gender in multivariable analysis. These are listed in Table 1. The stratified dataset consisting of males only (with or without age) was of particular importance due to the known higher incidence of HCC in male patients [2]. Results from multivariable analyses (presented in Tables 1, 2, 3 Figures 1, 2, 3, 4, 5, 6, 7) were compared with those from univariate LR (presented in Tables 1A, B and C, Figure 1 of [26]) and multivariable LR, PLR and CART analyses (presented in Tables 2, 3, 4, Figures 2, 3, 4, 5 of [26]) reported in our previous study [26].
It is evident from the results reported in our previous study [26] that univariate LR models performed uniformly worse than multivariable models that utilized multiple biomarkers using any of the three methods considered in that study, namely, multivariable LR, PLR and CART. It turned out that the best performing univariate model (GP73) produced a model-based AUC of 0.87 (95% CI (0.84, 0.91)) and ACC of 0.78, a result that fell far short of those of multivariable models, and thus emphasized the need for including multiple biomarkers and additional confounding clinical variables into the model. In addition, among the three multivariable methods considered, PLR and CART outperformed LR. PLR provided the best overall performance while CART served as a useful alternative by providing useful cut-points for biomarker levels. In this study, we improve upon the performance of these two methods by implementing a stepwise PLR (stepPLR) and a modelbased CART (mob) approach, respectively.
In the following paragraphs, the performance of various models is compared for each method and the results summarized and interpreted. Performance measures such as AUC, ACC, PPV, NPV, sensitivity and specificity are compared between age-adjusted and age-unadjusted models when gender effect is considered and also for the stratified male-only subset. With the exception of AUC, which is expressed on the [0,1] scale, each quantity is measured on a [0,100] scale. We re-scale each measure to [0,1] in our comparisons for the sake of uniformity and convenience. Difference between models for each quantity is expressed as actual difference (indicating better or worse performance) and not as relative difference, i.e., a difference of 5 units on the [0,100] scale is equivalent to 5% (or 0.05) on the [0,1] scale.
In particular, stepPLR and mob showed significant improvements in predictive performance when age was included in the model after adjusting for gender differences compared to the model excluding age (Table 1  Figures 2, 4). stepPLR showed a median increase in AUC (ACC) of 2% (3.02%) (across all choices of λ) while mob  showed a median increase of 4% (2.1%) across the four models considered. When the stratified subset consisting of only males was used in the analysis, this difference increased to 5% (4.81%) and 5.5% (5.22%), respectively, for stepPLR and mob (Table 1 Figures 1, 3). For this subset, the mob model based on the biomarker AFP conditional on the tree analysis using GP73, AAT, Kininogen and age, after controlling for site, resulted in the maximum increase of 10% in AUC and 10.36% in ACC due to the inclusion of age (Table 1).
A considerable increase of 6% in AUC and 7.63% in ACC were also noted due to the inclusion of age in the 1. Method utilized for analysis, stepPLR stepwise penalized logistic regression with or without age and/or gender, mob model-based CART, classification and regression trees developed with or without age and/or gender; 2) LOOCV AUC, leave one out cross validation area under the curve; 3) LOOCV ACC, leave one out cross validation prediction accuracy; 4) 3CV AUC, three fold cross validation area under the curve; 5) 3CV ACC, three fold cross validation prediction accuracy. Method utilized for analysis, stepPLR stepwise penalized logistic regression with or without age and/or gender, mob model-based CART, classification and regression trees developed with or without age and/or gender; 2) optimal sensitivity and 95% Confidence interval; 3) Optimal sensitivity following leave one out cross validation; 4) Optimal specificity; 5) Optimal specificity following leave one out cross validation.
mob model for GP73 conditional on the tree analysis using AFP, AAT, Kininogen and age, after controlling for site. In addition, this model resulted in the maximum increase of 4% in AUC and 5.57% in ACC when controlled for gender effect, among all methods and models considered. These are significant improvements over our previous findings in which all three multivariable methods used (LR, PLR and CART) showed improvements in AUC and ACC in excess of 4% for this data, with PLR (λ = 1) showing the best overall increase in ACC of only about 5% [26]. Consistent with our recent findings [26], a marked improvement was observed in the predictive performance of each method based on this stratified dataset independent of whether age is included in the model. The inclusion of age, however, resulted in the best predictive model across all combinations of methods and models considered (Table 1). In addition, the inclusion of age resulted in a substantial decrease in Akaike Information Criterion (AIC) for stepPLR (across all choices of λ) and mob (across all models considered) both for the stratified male only dataset and when gender differences are accounted for in the model (Table 1). This finding underscores the significant role played by the variable age in model selection and in the predictive performance of the final model. Furthermore, PPV and NPV capture other critical aspects of the performance of a model. For our application, PPV represents the proportion of patients correctly predicted to have HCC while NPV represents the proportion of patients correctly predicted to have cirrhosis. A high PPV means that the model only rarely classifies a HCC patient as having cirrhosis, and is therefore a desirable characteristic in a model. Table 1 lists the best performing models and methods in terms of PPV and NPV. Models that adjusted for age effect generally showed a higher median PPV or NPV compared to those that did not (across all choices of λ and models considered), a result consistent with our previous findings [26]. A significantly higher increase in NPV was observed in models adjusting for age, Figure 1 ROC curves based on multivariable stepwise penalized logistic regression models (stepPLR) using the stratified male-only subset. The age-adjusted final model for λ = 0.1 showed the best performance in terms of AUC. A clear distinction is seen in the ROC curves for age-adjusted models compared to age-unadjusted models. Age-adjusted models demonstrated superior performance overall across all choices of λ. See Table 1 for detailed results and the text for discussion of these results. compared to PPV, using both methods (median increase of 4.75% and 8.29% for stepPLR, and 3.3% and 5.78% for mob, respectively, when adjusted for gender effect and in the stratified male-only subset). For the stratified male only subset, mob improved PPV by 5.22% with the inclusion of age. The mob model based on AFP conditional on the tree analysis using GP73, AAT, Kininogen and age, after controlling for site, resulted in the maximum improvement in PPV of 8.93% and in NPV of 13.53% due to the inclusion of age for this subset. These compare with maximum increases of 3.57% for PPV (LR and PLR, λ = 0.1) and 9% for NPV (PLR, λ = 10) from our previous study [26]. When gender effect was adjusted for, the above mob model also showed the maximum increase in PPV (2.43%) due to the inclusion of age. On the other hand, stepPLR (λ = 10) resulted in the maximum increase in NPV of 6.1% compared to the 4.7% maximum increase achieved by PLR (λ = 10) in our previous study [26]. The highest PPV (93.1%) was achieved for the stratified male only data by stepPLR (λ = 10) across both methods and all models considered, also an improvement over the maximum 91.96% achieved in our previous study [26].
In terms of model-based sensitivity and specificity, both stepPLR and mob produced an improvement due to the inclusion of age in stratified male only data. stepPLR showed the highest overall increase in sensitivity (median increase of 5.94% across choices of λ) while mob showed the highest overall increase in specificity (median increase of 9.83% across all models considered). Once again, the mob model based on the biomarker AFP resulted in the maximum improvement in both sensitivity (6.49%) and specificity (17.16%) due to the inclusion of age for this subset. In comparison, LR and PLR (λ = 10) produced the greatest improvement (of over 6% each) in our previous study [26]. Model-based and cross-validation based sensitivities and specificities are displayed in Table 3. When gender effect was adjusted for in the model, a more sensitive model (increase of 5.04%) was afforded by stepPLR (λ = 1,10) while mob provided a more specific (increase of 3.81%) model due to the inclusion of age (Table 1). Figure 2 ROC curves based on multivariable stepwise penalized logistic regression models (stepPLR) adjusting for gender effect. Models that are also adjusted for age effect outperformed those that did not control for age, across all choices of the parameter λ. The age-adjusted final model for λ = 0.1 showed the best performance in terms of AUC. See Table 1 for detailed results and the text for discussion of these results.

Predictive performance of multivariable models using cross-validation
While model based metrics such as AUC, ACC, PPV and NPV provide a measure of the predictive performance of a model, equivalent versions of these quantities based on cross-validation are based on blinded, independent datasets and therefore provide the true predictive performance of the model. Table 2 presents the AUC (with 95% CI) and ACC for each model and method used based on LOOCV and 3CV. A considerable improvement in AUC was observed in models that included age across both methods for the stratified male only data. The median value of this increase was 5.5% for AUC based on LOOCV and 4% for AUC based on 3CV. When gender is accounted for in the model, the inclusion of age also results in an improvement in AUC of 3% for stepPLR (median value across choices of λ, based on both LOOCV and 3CV) and a 2.5% increase for mob. In terms of prediction accuracy, a significant improvement in ACC was observed in models that included age for both stepPLR and mob for the stratified male only data. For stepPLR, the median value of this increase was around 5.88% for ACC based on LOOCV and around 5.32% for ACC based on 3CV. For mob, the median value of this increase was around 7.51% for ACC based on LOOCV and around 1.93% for ACC based on 3CV. The mob model based on AFP conditional on the tree analysis using GP73, AAT, Kininogen and age, after controlling for site showed increases of 8.5% and 7.12% in ACC based on LOOCV and 3CV, respectively. When gender effect was accounted for in the model, the inclusion of age also resulted in improvements of 3.41% and 3.75% for stepPLR (median value across choices of λ) based on LOOCV and 3CV, respectively. On the other hand, the performance of mob was observed to vary between models and cross-validation methods. These results show an overall improvement in the predictive performance of models based on stepPLR and mob over our previous findings based on multivariable LR, PLR and CART [26]. Age-adjusted models demonstrated superior performance in terms of AUC. A clear distinction is seen in the ROC curves for age-adjusted models (solid lines) compared to age-unadjusted models (dotted lines). See Table 1 for detailed results and the text for discussion of these results.

Interpretation of model-based CART (mob) results
Multivariable model-based CART (mob) analyses revealed several different and interesting aspects of the data that more traditional methods such as multivariable LR and PLR are not capable of exposing. To a lesser extent, methods like stepPLR and CART (ctree) also suffer from this issue. Four different statistical models, one for each biomarker conditional on the tree analysis based on the remaining biomarkers, age and/or gender (controlling for site) were considered in this analysis. As noted earlier, mob combines a parametric approach based on generalized linear models with CART. In this case, the outcome variable is binary, i.e., whether a patient has HCC or not, and hence the parametric method of choice is logistic regression.
The model based on the biomarker GP73 conditional on the tree analysis using AFP, AAT, Kininogen and age, after controlling for site, showed excellent performance in terms of cross-validated measures, particularly when gender effect was also included ( Table 2). This model also showed a substantial improvement in AUC and ACC due to the inclusion of age. First, we will use this model to illustrate mob results. Figures 5 and 6 graphically represent the results for this model when controlling for gender effect and for the stratified male only subset, respectively. When gender effect is controlled for in the model it is evident (from Figure 5) that age alone, independent of other biomarkers, plays a significant role in the incidence of HCC (p < 0.001) (n = 75 patients corresponding to node pair (1,9) in Figure 5). Older patients (>61) are at an increased risk of HCC incidence. Among those aged 61 or younger, higher level of AFP (>1.48) is significantly associated with increased incidence of HCC (p = 0.017) irrespective of GP73 level (n = 66 patients corresponding to node pairs (1,2), (2,6) and (6,8)). The subgroup of 20 patients aged 61 or lower whose AFP level lies in the range (1.29,1.48] represents varying incidence of HCC depending on GP73 level. Figure 4 ROC curves based on multivariable model-based CART analyses (mob) incorporating gender and/or age. Age-adjusted models demonstrated superior performance in terms of AUC when gender effect is accounted for in each model. A clear distinction is seen in the ROC curves for age-adjusted models (solid lines) compared to age-unadjusted models (dotted lines). Table 1 lists the performance measures for these models. A detailed discussion of the results is provided in the text.
For the stratified male only subset it is evident (from Figure 6) that age alone, independent of other biomarkers, plays a significant role in the incidence of HCC (p < 0.001) (n = 29 men corresponding to node pair (1,5) in Figure 6). Older men (>65) are at an increased risk of HCC incidence. Among those aged 65 or younger, higher level of AFP (>1.48) is significantly associated with increased incidence of HCC (p = 0.001) irrespective of GP73 level (n = 59 men corresponding to node pairs (1,2) and (2,4)). Even among younger men with lower levels of AFP (age <= 65, AFP <= 1.48), the incidence of HCC increases with higher GP73 levels (n = 93 corresponding to node pair (2,3)) as indicated by the increasing red curve.
In terms of overall and consistent improvement in performance (evaluated by the various measures) due to the inclusion of age, the mob model based on AFP conditional on the tree analysis using GP73, AAT, Kininogen and age (after controlling for site) is the best performer for this subset. Figure 7 graphically illustrates this model. Once again, age alone plays a significant role in the incidence of HCC (p < 0.001) (n = 42 men corresponding to node pair (1,7) in Figure 7). Older men (>61) are at an increased risk of HCC incidence. This is consistent with the finding based on the gender-adjusted model shown in Figure 5. For men 61 years of age or younger, a higher level of GP73 (>6.3) is significantly associated with increased HCC incidence (p < 0.001) independent of AFP and AAT levels (n = 49 men corresponding to node pair (2,6) in Figure 7). However, men with a lower GP73 (<= 6.3) and higher Kininogen (>1.55) Figure 5 Model-based CART analysis based on the biomarker GP73 conditional on the tree analysis using AFP, AAT, Kininogen and age, after controlling for site, for the complete data set. Variables that appear in the tree were involved in a statistically significant split (based on p-value < 0.05). Any two (or more) bins that appear at the bottom child nodes in this tree sharing the same mother node represent disjoint sub-groups of patients identified by this method to be (statistically) significantly different. The sub-groups are defined by the respective cut-points for biomarker levels and age. For example, when gender effect is controlled for in the model it is evident that age alone, independent of other biomarkers, plays a significant role in the incidence of HCC (p < 0.001). The node pair (1,9) represents the sub-group of 75 patients older than 61 years that have a significantly higher incidence of HCC compared to younger patients. It provides a unique, visual representation of complex interactions between biomarkers, age and gender though gender is not found to be statistically significant in any of the interactions. In addition, this approach identifies potential cut-points for biomarker levels that are significantly associated with the incidence of HCC. A detailed interpretation of this tree is provided in the text. levels in this subgroup have an increased incidence of HCC with higher levels of AFP (n = 30 men corresponding to node pairs (2,3) and (3,5) in Figure 7). This is indicated by the steep increasing red line below node 5 in Figure 7.

Conclusions
HCC, like many cancers, is characterized by a large degree of heterogeneity. This makes the detection of cancer by serum biomarkers difficult, which results in late detection and poor outcome. With the large degree of genetic heterogeneity, it is generally assumed that no single serum biomarker will be able to detect all cases of HCC. Currently, serum levels of AFP are used in combination with several imaging methodologies to identify HCC. However, the clinical usefulness of AFP is limited by the poor sensitivity of this marker. That is, AFP is elevated in only 60-70% of individuals with HCC. It is important to note that genetically, AFP negative cancers are thought to be fundamentally different than AFP positive tumors. Thus, the detection of serum AFP is useful in the detection of a specific type of HCC. However, it is assumed that multiple markers will be required for the detection of all cases of HCC.
In this paper, we demonstrated the usefulness of incorporating multiple biomarkers and relevant clinical variables into a statistical model for predicting the incidence of HCC. We built on the foundation provided by our recent work [26] and investigated the predictive performance of two different yet related methods, namely stepPLR and mob, in distinguishing HCC patients from cirrhotic patients. These two methods are improvisations of PLR and CART discussed in our previous study, the former incorporating stepwise model selection in PLR and the latter incorporating a model-based approach to CART. Both these approaches provided significantly improved results not only compared to the use of single and multiple biomarkers (univariate and multivariable LR) but also compared to those based on their counterparts, PLR and CART. A novel aspect of our previous approach was the application of CART for analyzing and interpreting biomarker data for HCC. This non-parametric approach is a useful alternative to traditional parametric methods like LR and PLR that automatically incorporates interactions Figure 6 Model-based CART analysis based on the biomarker GP73 conditional on the tree analysis using AFP, AAT, Kininogen and age, after controlling for site, for the stratified male-only subset. For this subset, age alone, independent of other biomarkers, plays a significant role in the incidence of HCC (p < 0.001). The node pair (1,5) represents the sub-group of 29 men aged >65 that have a significantly higher incidence of HCC. The node pairs (1,2) and (2,4) represent the sub-group of 59 men aged 65 or younger for whom a higher AFP level (>1.48) is significantly associated with increased incidence of HCC (p = 0.001) irrespective of GP73 level. A detailed interpretation of this tree is provided in the text.
between multiple biomarkers and/or clinical variables. The extension of this approach using mob in this paper borrows strength from the binary recursive partitioning approach in CART as well as the parametric approach in traditional multivariable LR and is based on generalized linear models. This is reflected in the significantly improved predictive performance of this method relative to those based on PLR and CART presented in our recent study [26]. This flexible modeling approach provided potentially useful cut-offs for biomarkers and clinical variables alike that indicated a statistically significant association with increased HCC incidence in an interpretable and systematic manner. The two methods outlined in this paper can be seen as complementary to PLR and CART and it sets the stage for further evaluation and validation of the clinical significance of these results in future, larger studies. An important finding in this study, as in our previous study, is the marked improvement in predictive performance due to the inclusion of clinical factors such as age and gender. This improvement was seen to be independent of the method used in the analysis. The inclusion of other clinical factors such as Alanine transaminase (ALT), Aspartate transaminase (AST) and Alkaline phosphatase (ALK) levels may be able to increase performance even further. This is currently under investigation.