Skip to main content

Table 2 Demographic and clinical characteristics of training and test sets focusing on within-indication subjects

From: Improving lung cancer risk stratification leveraging whole transcriptome RNA sequencing and machine learning across multiple cohorts

  Training Test P-value
Characteristic AEGIS
(N = 189)
Registry
(N = 122)
AEGIS
(N = 246)
Registry
(N = 166)
 
Sex      0.36
 Female 72 65 83 84  
 Male 117 57 163 82  
Median age (IQR) 62 (54–70) 64 (57–71) 62 (54–70) 65 (58–71) 0.45
Race      0.59
 White 141 106 192 132  
 Black 34 14 42 29  
 Other 11 2 12 4  
 Unknown 3 0 0 1  
Smoking status      0.45
 Current 79 48 107 73  
 Former 110 74 139 93  
Median cumulative tobacco use (IQR) – pack-year 40 (18–57) 35 (20–50) 35 (20–56) 35 (20–56) 0.82
Lesion size      <  0.001
 Infiltrate 0 0 25 0  
 < 2 cm 42 61 88 80  
 2 to 3 cm 30 29 48 29  
 > 3 cm 41 26 75 44  
 Unknown 60 6 10 13  
Lesion location      0.47
 Central 50 9 72 10  
 Peripheral 78 107 108 144  
 Central and peripheral 46 0 53 0  
 Unknown 15 6 13 12  
Lung-cancer histologic type      0.025
 Small-cell 8 3 8 1  
 Non-small-cell 69 48 100 43 0.18
  Adenocarcinoma 30 25 58 25  
  Squamous 28 12 26 10  
  Large-cell 6 1 4 0  
  Non-small-cell not otherwise specified 5 10 12 8  
 Other 0 2 0 2  
 Unknown 21 3 3 6  
Diagnosis of a benign condition      < 0.001
 Fibrosis 1 0 1 0  
 Granuloma 15 6 26 10  
 Infection 30 15 36 15  
 Inflammation 4 2 1 2  
 Multiple 6 0 8 0  
 Other 17 4 25 2  
 Resolution of Stability 18 39 38 40  
 Clinically benign 0 0 0 45