The 777 study participants consisted of non-Hispanic white adults (322 male and 455 female) from 357 sibships that were initially enrolled in the Genetic Epidemiology Network of Arteriopathy (GENOA) study, a community-based study of hypertensive sibships that aims to identify genes influencing blood pressure (BP) [9, 13]. The study was approved by the Institutional Review Board of Mayo Clinic, Rochester MN, and written informed consent was obtained from each participant. In the initial phase of the GENOA study (9/1995 to 6/2001), sibships containing ≥ 2 individuals with essential hypertension diagnosed before age 60 years were selected for participation. Participants returned for a second phase of the study (12/2000 to 6/2004) which included a physical examination and measurement of conventional and novel risk factors.
As an ancillary study of GENOA conducted between August 2001 and May 2006, the Genetics of Microangiopathic Brain Injury (GMBI) study was undertaken to determine susceptibility genes for ischemic brain injury. Leukoaraiosis was quantified by magnetic resonance imaging (MRI) in 916 non-Hispanic white subjects who participated in the second phase of the GENOA study, had a sibling willing and eligible to participate in the GMBI study, and had no history of stroke or neurological disease and no implanted metal devices. The median time between the second GENOA examination and the GMBI brain MRI was 11.9 months. Brain MRIs were suitable for analysis in 883 of the 916 participants; in the 33 without analyzable data, the most common reasons were unsuspected prior brain infarctions, masses, metallic artifacts, and failure to complete the MRI. After removing individuals who did not have genotyping data available, the final analysis subset consisted of 777 GMBI participants.
Clinical Assessments and Covariate Definitions
The diagnosis of hypertension was established based on BP levels measured at the study visit (>140 mmHg average systolic BP or >90 mmHg average diastolic BP) or a prior diagnosis of hypertension and current treatment with antihypertensive medications. Height was measured by stadiometer, weight by electronic balance, and body mass index (BMI) was calculated as weight in kilograms divided by the square of height in meters. Resting systolic blood pressure (SBP) and diastolic blood pressure (DBP) were measured by a random zero sphygmomanometer, and pulse pressure was calculated as the difference between SBP and DBP. A person was considered having ever smoked if they had smoked more than 100 cigarettes in their lifetime, was considered to have coronary heart disease if they had ever experienced a myocardial infarction or had surgery for a blocked artery in the heart or neck (carotid artery), and was considered obese if they had a BMI > 30 kg/m2.
Blood was drawn by venipuncture after an overnight fast. Serum triglycerides (TG), creatinine, total cholesterol, and high-density lipoprotein (HDL) cholesterol were measured by standard enzymatic methods on a Hitachi 911 Chemistry Analyzer (Roche Diagnostics, Indianapolis IN), and low-density lipoprotein (LDL) cholesterol levels were calculated using the Friedewald formula . Five novel vascular risk factors including C-reactive protein, homocysteine, fibrinogen, Lp(a), and LDL particle size were also measured. C-reactive protein was measured by a highly sensitive immunoturbidimetric assay , fibrinogen was measured by the Clauss (clotting time based) method , and plasma homocysteine was measured by high-pressure liquid chromatography. Lp(a) in serum was measured by an immunoturbidimetric assay using the SPQ™ Test System (Diasorin, Stillwater MN) as previously described , and LDL particle size was measured by polyacrylamide gel electrophoresis . Level of physical activity was calculated as a continuous variable based on the self-reported average number of hours per day that the subject engaged in heavy, moderate, and sedentary activities according the following formula: 2*Heavy + Moderate – 2*Sedentary.
Leukoaraiosis volume (cm3) was obtained via magnetic resonance imaging (MRI) in a separate clinical visit. All MRI scans were performed on identically equipped Signa 1.5 T MRI scanners (GE Medical Systems, Waukesha, WI, USA) and images were centrally processed at the Mayo Clinic. Symmetric head positioning with respect to orthogonal axes was verified by a series of short scout scans. Total intracranial volume (head size) was measured from T1-weighted spin echo sagittal images, each set consisting of 32 contiguous 5 mm thick slices with no interslice gap, field of view = 24 cm, matrix = 256 × 192, obtained with the following sequence: scan time = 2.5 min, echo time = 14 ms, repetitions = 2, replication time = 500 ms . Total brain and leukoaraiosis volumes were determined from axial fluid-attenuated inversion recovery (FLAIR) images, each set consisting of 48 contiguous 3-mm interleaved slices with no interslice gap, field of view = 22 cm, matrix = 256 × 160, obtained with the following sequence: scan time = 9 min, echo time = 144.8 ms, inversion time = 2,600 ms, repetition time = 26,002 ms, bandwidth = +/- 15.6 kHz, one signal average. A FLAIR image is a T2-weighted image with the signal of the cerebrospinal fluid nulled, such that brain pathology appears as the brightest intracranial tissue. Interactive imaging processing steps were performed by a research associate who had no knowledge of the subjects' personal or medical histories or biological relationships. A fully automated algorithm was used to segment each slice of the edited multi-slice FLAIR sequence into voxels assigned to one of three categories: brain, cerebrospinal fluid, or leukoaraiosis. The mean absolute error of this method is 1.4% for brain volume and 6.6% for leukoaraiosis volume, and the mean test-retest coefficient of variation is 0.3% for brain volume and 1.4% for leukoaraiosis volume . White matter hyperintensities in the corona-radiata and periventricular zone, as well as central gray infarcts (ie, lacunes) were included in the global leukoaraiosis measurements. Brain scans with cortical infarctions were excluded from the analyses because of the distortion of the leukoaraiosis volume estimates that would be introduced in the automated segmentation algorithm.
One thousand nine hundred and fifty six SNPs from 268 genes known or hypothesized to be involved in blood pressure regulation, lipoprotein metabolism, inflammation, oxidative stress, vascular wall biology, obesity and diabetes were identified from the genetic association literature and positional candidate gene studies . SNPs were chosen based on a number of different criteria including the published literature, non-synonymous SNPs with a minor allele frequency (MAF) > 0.02, and tag SNPs identified using public databases such as dbSNP http://www.ncbi.nlm.nih.gov/SNP/ and the Seattle SNPs database http://pga.mbt.washington.edu.
DNA was isolated using the PureGene DNA Isolation Kit from Gentra Systems (Minneapolis MN). Genotyping, based on polymerase chain reaction (PCR) amplification techniques, was conducted at the University of Texas-Health Sciences Center at Houston using the TaqMan assay and ABI Prism® Sequence Detection System (Applied Biosystems, Foster City CA). Primers and probes are available from the authors upon request. Quality control measures for genotyping assays included robotic liquid handling, separate pre- and post-PCR areas, standard protocols and quality control analyses including 5% duplicates, positive and negative controls, computerized sample tracking, and data validity checks. After removal of SNPs that were monomorphic in the study sample, 1649 SNPs remained for analysis (see Additional file 1).
All analyses were carried out using the R statistical language, version 2.8 . Covariate correlations were estimated using Pearson's product moment correlation. Linkage disequilibrium (LD), as measured by r2 , was estimated using an expectation maximization (EM) algorithm. Hardy-Weinberg Equilibrium was assessed using a chi-square test or Fisher's exact test if a genotype class had less than 5 individuals . Variables that showed a large deviation from a normal distribution in diagnostic plots, including leukoaraiosis, were transformed by taking the natural logarithm. The outcome variable for all models is the residual value of the natural logarithm of leukoaraiosis volume (cm3) after adjustment for age, sex, and total brain volume.
In the first stage of the analysis, we tested for association between leukoaraiosis and each of the predictor variables (SNPs and quantitative covariates) using least-squares linear regression methods [24, 25]. Categorical covariates were modeled using logistic regression . We also tested for association between each SNP and covariate to identify potential confounders. To determine whether interactions among predictors explained additional variation in the outcome, we tested pairwise interactions among all possible pairs of predictors (i.e. SNP-SNP, SNP-covariate, and covariate-covariate interactions) for all covariates and the 444 SNPs that had a model p-value < 0.2 in the association testing described above. Associations involving interactions were assessed with a partial F test, which compares a full model that includes both the interaction terms and the main effects of the variables comprising the interaction terms to a reduced model that includes only the main effects. Models with a p-value < 0.1 (for single variable models) or a partial F p-value < 0.1 (for models with interaction terms) were evaluated in the next stage of analysis.
To reduce false positives we used three different approaches: 1) adjustment for multiple testing using the False Discovery Rate (FDR) < 0.30 , 2) internal replication with two subsets of the data (constructed so individuals were unrelated within subset), and 3) four-fold cross-validation (repeated 10 times) . To create internal replication subsets, we randomly selected one sibling from each sibship without replacement to create subset 1 and then randomly selected another sibling from each sibship to create subset 2. The GMBI cohort contained a small number of singletons (ie, subjects who had no enrolled sibling) that were equally divided between the two samples. Associations that had a p-value < 0.1 in both subsets were considered internally replicated if the effect of the genotype was homogeneous among subsets (the partial F p-value > 0.05 from a test of the interaction between subset designation and the predictors(s) under consideration).
Cross-validation significantly reduces false positive results by eliminating associations that lack predictive ability in independent test samples. For each association, we performed four-fold cross-validation by dividing the full sample into four equally sized groups. Three of the four groups were combined into a training dataset, and the modeling strategy outlined above was carried out to estimate model coefficients. These coefficients were then applied to the fourth group, the testing dataset, to predict the value of the outcome variable for each individual in this independent test sample. This process was repeated for each of the four testing sets. Predicted values for all individuals in the test set were then subtracted from their observed values, yielding the total residual variability (SSE),
. The total variability in the outcome (SST) – the difference between each individual's observed value and the mean value for the outcome – was then calculated,
. In order to estimate the proportion of variation in the outcome predicted in the independent test samples, the cross-validated R2 (CV R2) was calculated as follows:
. This cross-validation method provides a more accurate measure of the predictive ability of the genetic models and will be negative when the model's predictive ability is poor. Because random variations in the sampling of the four mutually exclusive test groups can potentially impact the estimates of CV R2, this procedure was repeated 10 times and the CV R2 values were averaged .
Univariate associations were considered cross-validated if the average percent variation predicted in independent test samples was greater than 0.5% and interactions were considered cross-validated if the difference in average percent variation predicted in independent test samples between the full model containing the interaction term and the reduced model containing only main effect terms was greater than 0.5%. Using permutation testing on the models investigated in this paper, we found that the probability of observing a CV R2 × 100 greater than 0.5% by chance alone was less than 5%. That is, Pr(CV R2 × 100 > 0.5%) < 0.05 under the null hypothesis of no association. Due to small cell sizes (<4 subjects in a particular class), 0.3% of the SNP-covariate interaction models and 2.3% of the SNP-SNP interaction models were unable to complete the cross-validation procedure.
All single SNP or interaction models that passed the three different approaches for reducing false positives (FDR, internal replication, and cross-validation) were modeled using linear mixed effects (LME) , which accounts for the sibship structure among GMBI study participants while retaining a valid type I error rate . Associations with a p-value <0.1 in the F test (described above) but a p-value >0.1 from the likelihood ratio test of the appropriate full and reduced mixed effects models were considered to be associations due to family structure and were removed from the results.
To visualize the genetic architecture of leukoaraiosis volume, we applied a novel data visualization scheme, the KGraph, described in Kelly et al. . The KGraph was developed for the visualization of genetic association results and the underlying relationships among predictors such as SNP-SNP frequency correlations (i.e. LD), SNP-covariate associations, and covariate-covariate correlations. It simultaneously displays both significant univariate associations and pairwise interactions with the outcome of interest, leukoaraiosis volume, as well as the underlying correlation structure among the predictor variables.
In the final step, multivariable linear regression models combining the most predictive SNPs, covariates, and their interactions were constructed. The top four single SNP, SNP-covariate, and SNP-SNP interaction models were chosen for multiple variable modeling based on the following criteria: 1) passed all three filters to reduce false positive associations (FDR, internal replication, and cross-validation), 2) had the highest CV R2 values of the particular modeling strategy, and 3) didn't involve SNPs in strong LD with SNPs already included in the multiple variable model. Percent variation in leukoaraiosis volume explained by each model was assessed with the adjusted R2 value, and predictive ability of the models was assessed by four-fold, ten-iteration cross-validation (CV R2 value).