Validation of PhenX measures in the personalized medicine research project for use in gene/environment studies

Background The purpose of this paper is to describe the data collection efforts and validation of PhenX measures in the Personalized Medicine Research Project (PMRP) cohort. Methods Thirty-six measures were chosen from the PhenX Toolkit within the following domains: demographics; anthropometrics; alcohol, tobacco and other substances; cardiovascular; environmental exposures; cancer; psychiatric; neurology; and physical activity and physical fitness. Eligibility criteria for the current study included: living PMRP subjects with known addresses who consented to future contact and were not currently living in a nursing home, available GWAS data from eMERGE I for subjects where age-related cataract, HDL, dementia and resistant hypertension were the primary phenotypes, thus biasing the sample to the older PMRP participants. The questionnaires were mailed twice. Data from the PhenX measures were compared with information from PMRP questionnaires and data from Marshfield Clinic electronic medical records. Results Completed PhenX questionnaires were returned by 2271 subjects for a final response rate of 70%. The mean age reported on the PhenX questionnaire (73.1 years) was greater than the PMRP questionnaire (64.8 years) because the data were collected at different time points. The mean self-reported weight, and subsequently calculated BMI, were less on the PhenX survey than the measured values at the time of enrollment into PMRP (PhenX means 173.5 pounds and BMI 28.2 kg/m2 versus PMRP 182.9 pounds and BMI 29.6 kg/m2). There was 95.3% agreement between the two questionnaires about having ever smoked at least 100 cigarettes. 139 (6.2%) of subjects indicated on the PhenX questionnaire that they had been told they had a stroke. Of them, only 15 (10.8%) had no electronic indication of a prior stroke or TIA. All of the age-and gender-specific 95% confidence limits around point estimates for major depressive episodes overlap and show that 31% of women aged 50–64 reported symptoms associated with a major depressive episode. Conclusions The approach employed resulted in a high response rate and valuable data for future gene/environment analyses. These results and high response rate highlight the utility of the PhenX Toolkit to collect valid phenotypic data that can be shared across groups to facilitate gene/environment studies.

Results: Completed PhenX questionnaires were returned by 2271 subjects for a final response rate of 70%. The mean age reported on the PhenX questionnaire (73.1 years) was greater than the PMRP questionnaire (64.8 years) because the data were collected at different time points. The mean self-reported weight, and subsequently calculated BMI, were less on the PhenX survey than the measured values at the time of enrollment into PMRP (PhenX means 173.5 pounds and BMI 28.2 kg/m 2 versus PMRP 182.9 pounds and BMI 29.6 kg/m 2 ). There was 95.3% agreement between the two questionnaires about having ever smoked at least 100 cigarettes. 139 (6.2%) of subjects indicated on the PhenX questionnaire that they had been told they had a stroke. Of them, only 15 (10.8%) had no electronic indication of a prior stroke or TIA. All of the age-and gender-specific 95% confidence limits around point estimates for major depressive episodes overlap and show that 31% of women aged 50-64 reported symptoms associated with a major depressive episode. Conclusions: The approach employed resulted in a high response rate and valuable data for future gene/ environment analyses. These results and high response rate highlight the utility of the PhenX Toolkit to collect valid phenotypic data that can be shared across groups to facilitate gene/environment studies.

Background
The National Human Genome Research Institute funded the development of consensus measures for Phenotypes and eXposures (PhenX) [1,2]. The goal of PhenX was to develop 15 measures for 21 different phenotypic domains. Data collection worksheets are available through the PhenX Toolkit (www.phenxtoolkit.org), with the hope that broad acceptance and use of the PhenX measures will allow for cross-study comparisons and improve the statistical power for gene/environment analyses in the context of genome-wide association studies (GWAS). PhenX measures were selected by working groups of domain experts using a consensus process that included input from the scientific community.
The eMERGE network (www.gwas.net), also funded by the National Human Genome Research Institute, is a national consortium formed to develop, disseminate, and apply approaches to research that combine DNA biorepositories with electronic medical record (EMR) systems for large-scale, high-throughput genetic research [3]. The Marshfield Clinic Personalized Medicine Research Project (PMRP) [4] was one of the five initial eMERGE sites, with cataract, HDL and diabetic retinopathy as the primary phenotypic outcomes.
An administrative supplement funded by NHGRI to the eMERGE grant allowed PMRP investigators to collect PhenX measures for subjects with available GWAS data from eMERGE. The PMRP team was one of seven sites to makeup the PhenX RISING network that was funded through administrative supplements to incorporate PhenX measures into existing population-based genomic studies (https://www.phenxtoolkit.org/index.php? pageLink=phenxrising). In total, the seven groups incorporated 76 PhenX measures, representing a quarter of the 295 measures present in the Toolkit as of July 2011. The measures encompass demographics, psychosocial risk factors, psychiatric assessments, and a variety of exposures. Each group is adding between 4 and 37 measures with five groups, including PMRP, adding more than 20 measures. In all, 55 of these 81 measures are shared by two or more groups providing common ground for future cross-study analysis.
The purpose of this paper is to describe the data collection efforts and validation of the PhenX measures in the PMRP cohort.

Methods
The Marshfield Clinic Personalized Medicine Research Project (PMRP) is a population-based biorepository linked to the comprehensive electronic medical record of Marshfield Clinic, details of which have been published previously [4]. Self-administered questionnaire data are available for the cohort to facilitate gene/environment analyses, including the detailed Dietary History Questionnaire [5].
As part of the initial written informed consent to participate in PMRP, subjects were given the option to opt out of future contact. Less than 1% of subjects elected this option. Eligibility criteria for the current study included: living PMRP subjects with known addresses who consented to future contact and were not currently living in a nursing home. In addition, subjects were required to have available GWAS data from eMERGE I, where agerelated cataract, HDL, dementia and resistant hypertension were the primary phenotypes [6], thus biasing the sample to the older PMRP participants.
The current study was reviewed and approved by the institutional review boards at Marshfield Clinic and Essentia Institute of Rural Health. The PhenX Toolkit (www.phenxtoolkit.org) was accessed to develop a selfadministered questionnaire to include the 36 items listed in Table 1. Also listed in Table 1 are all data elements available for comparison with PMRP. Some of the PhenX measures were included because of the potential for gene/environment associations with age related cataract (smoking, alcohol, ultraviolet light exposure), some were included because data were available for validation by comparison with prior PMRP questionnaire data and medical history information (demographics, physical activity, family history of heart attack, history of stroke) and the rest were included because of the potential for future research and cross-site collaborations (hypomania/mania symptoms, hand dominance) within the PhenX RISING network funded through administrative supplements to collect PhenX measures. The time to complete the questionnaire ranged from 20 to 40 minutes in pre-testing, depending on how many questions were logical skips.
The 32-page self-administered questionnaire was mailed to all eligible subjects with a cover letter and return address envelope. A second mailing was employed to increase the response rate. Subjects were offered $10 for their time to complete the questionnaire. The mailings occurred at the end of 2011 and beginning months of 2012. The majority of PMRP participants were enrolled between September 2002 and April 2004 so there is a considerable time lag between completion of questionnaires.
PhenX survey data were entered and merged with prior PMRP questionnaire information and data about prior stroke from clinical diagnoses in the Marshfield Clinic electronic medical record. Analyses in this report include standard descriptive statistics and approximate confidence limits. For validation purposes, the clinical diagnoses and measurements from electronic medical record were considered to be the gold standard when it was used for comparison. The signed-ranks test was used to compare PhenX self-reported weight and BMI with measurements at PMRP enrollment, simple kappa statistics and 95% CL were calculated for nominal categories and Fleiss-Cohen weighted kappas and 95% CL for ordinal classifications as   Other Only 2 (0.1) 0 (00

Results
Questionnaires were mailed to 3344 PMRP participants with GWAS data. The denominator decreased to 3246 after participants were removed for eligibility reasons (no known address, current nursing home residence, deceased). Completed questionnaires were returned by 2271 subjects for a final response rate of 70%. Upon comparing age and gender responses with Marshfield Clinic EMR data, it was determined that two of the respondents were the spouses of intended respondents who had participated in the PMRP biobank but for whom GWAS data were not available. Fifty-nine percent of the respondents were female, reflecting a similar response rate by gender ( Table 2). The vast majority of the study population reported being White (96.2%) and of non-Hispanic (93.3%) ethnicity. The gender and race/ ethnicity of the respondents to the PhenX survey is nearly identical to the original PMRP cohort, which is similar to the general population of central Wisconsin, other than an under-representation from men who were less likely to participate initially in the PMRP biobank [4]. There was good agreement between the PhenX Toolkit questions and the PMRP questionnaire on demographics. The mean age at completion of the PhenX questionnaire (73.1 years) was greater than the PMRP age at enrollment from the EMR (64.8 years) because the data were collected at different time points. The mean self-reported weight, and subsequently calculated BMI, were significantly less on the PhenX survey than the measured values at the time of enrollment into PMRP (PhenX means 173.5 pounds and BMI 28.2 versus PMRP 182.9 pounds and BMI 29.6, each p < 0.001).
The smoking and alcohol questions are far more detailed in the PhenX measures than the PMRP questionnaire. Table 3 present a comparison of responses to identical smoking questions from the two sources, queried on average eight years apart. There was 95.3% agreement between the two questionnaires about having ever smoked at least 100 cigarettes. The agreement between the two questionnaires for frequency of current smoking was also 95.3%. Kappa statistics reflect the lower agreement for current smoking than ever smoked (0.673 versus 0.905). The agreement for self-reported alcohol intake was not as strong as for smoking and lower for usual drinks per day in comparison with drinking in the past 30 days (69.6% agreement for drinking in the past 30 days, Table 4, kappa = 0.666; and 73.9% for usual number of drinks per day, Table 4, kappa = 0.507). This makes sense because the referent 30-day period for drinking was different. Table 5 summarizes the comparison of PhenX measures with PMRP questionnaire and Marshfield Clinic medical record data. The PhenX questionnaire included a question about whether the respondent had ever been told by a physician that they had a stroke, as well as a series of questions about symptoms associated with stroke. The PhenX responses were compared with diagnosis codes for stroke and transient ischemic attack (TIA) from Marshfield Clinic electronic medical records. The numbers are the actual counts (and percent) of people responding "yes" or "no" on the two questionnaires about whether they had ever smoked 100 cigarettes in their lifetime and whether they were currently smoking. Agreement (no/no or yes/yes) is noted in bold.
139 (6.2%) of subjects indicated on the PhenX questionnaire that they had been told they had a stroke. Of them, only 15 (10.8%) had no electronic indication of a prior stroke or TIA. The agreement for no report of physicianreported stroke on the PhenX questionnaire with no stroke or TIA codes appearing in the Marshfield Clinic EMR was 99.2%. The negative predictive value of selfreported physician-diagnosed stroke (1875/1912, 98.1% when no TIA code was found) was found to be higher than the positive predictive value (92/113, 81.4% when TIA code was present in the EMR). Table 6 includes data to compare self-reported family history of myocardial infarction between PhenX and PMRP. The simple kappa statistic for the agreement was 0.352 (95% CL = 0.317, 0.386). In the PMRP enrollment questionnaire, subjects were asked if they had two or more first degree relatives, including themselves, who had ever had heart attack or angina. 589 of the subjects in the current study reported a family history of heart attack or angina on the PMRP questionnaire. 1108 of subjects reported in the PhenX questionnaire that at least one of their first degree relatives had a myocardial infarction. It is difficult to compare the two responses because the questions were asked differently, included different people (self in the PMRP questionnaire), and there was a time gap of an average eight years between administration of the two questionnaires. Table 7 summarizes age-and sex-specific prevalence of major depressive disorder from the PhenX measure and previously published data [7][8][9] using the WHO Table 5 Comparison of stroke history as reported on PhenX survey and as in medical records    The numbers are the actual counts (and percent) of people reporting on the two questionnaires the number of days that they had an alcoholic drink in the previous 30 days how drinks they had on a typical days in the previous 30 days. Agreement is noted in bold.
CIDI-SF (the selected PhenX measure of depression). All of the stratum-specific 95% confidence limits overlap and show that 31% of women aged 50-64 reported symptoms associated with a major depressive episode. Current symptom severity for respondents reporting lifetime major depression symptoms was moderate or greater in 4.9% of respondents while 75.6% of participants reported no current symptoms of depression (Table 8).

Discussion
To our knowledge, this is one of the first large-scale implementations of PhenX Toolkit measures since their release. The use of standardized tools is vital to discovery efforts in the field of medical genomics. We quickly discovered in the eMERGE network that larger sample sizes than were originally anticipated were needed for straight GWAS analyses, in part because of different technologies and phenotype definitions used across the network [3]. Gene/environment analyses are further compromised when standardized tools are not used because data cannot be reliably merged across studies to allow for necessary validation or increased sample sizes for meta analyses that yield statistically significant results. Use and incorporation of PhenX data into dbGaP along with GWAS data will facilitate large-scale gene/environment studies and we support these efforts. The PhenX data have been submitted to dbGaP (dbGaP study accession: phs000170.v1.p1) for the current study to be merged with other phenotypic data and GWAS genotypes already available in dbGaP to the research community. The dbGaP website contains information about how to access data (www.ncbi.nlm.nih.gov/gap).
Many of the items that we selected from the Toolkit were intended for interviewer-administration. We selected items based on content, not mode of administration and had to remove interviewer instructions prior to administration. With feedback from the PhenX RISING network, the Toolkit has been amended to allow researchers to select a self-administered option. After completion of formatting to allow self-administration, we found the PhenX Toolkit easy to use with minimal queries from participants about how to complete the forms. Most questions were related to the Family Health History section for heart attack or myocardial infarction because of difficulty in understanding the table format. Some people needed clarification related to the type of dwelling they lived in fitting their home into one of the category options listed. A few queries were related to depression, stroke follow-up questions and sun exposure. The data are being mapped in dbGaP to the PhenX Toolkit measures to allow other researchers to combine PhenX data across studies to increase statistical power for gene/environment studies.
Observed differences between the PhenX and PMRP were expected for some variables, such as age, because of the time difference between enrollment into PMRP and completion of the PhenX questionnaire. The lower mean weight and concomitant BMI in PhenX would not be expected because average weight generally increases as a population ages. However, the mode of data collection was different. At the time of enrollment into PMRP, participants had standardized measurements of height and weight from which BMI was calculated [4]. For PhenX, weight and height were self-reported. A systematic review of studies comparing self-reported and measured height and weight found a trend of under-reporting of weight and over-reporting of height which was inconsistent [10], and which would explain the lower mean weight observed in the PhenX questionnaire when compared with the direct measurement at enrollment into PMRP. Specific instructions within the PhenX Toolkit warn researchers that "Self-reported weight values are considered to be less accurate. Self-reported weight is subject to error and is used when measured weight cannot be obtained".  Because of the inconsistency in the inaccuracy of selfreport, it is not possible to create rules to adjust selfreported weight or to assume the relative position of weight in a population is constant. Our data support the PhenX Toolkit cautionary note to only use self-reported weight when it is not possible to obtain a measured weight.
There was a large difference in self-reported family history of heart attack between the two questionnaires in the current study (52.8% versus 28.1%) and there are several potential reasons for this difference. First, the time difference between administration of the two questionnaires provided more opportunity for first degree relatives to experience a heart attack by the time of the PhenX questionnaire and in fact the rate was higher in that survey. Second, the questions were not asked identically. The PMRP question included both angina and heart attack. Accuracy of self-reported family history has been shown to vary by personal health history [11]. The positive predictive value of self-reported physiciandiagnosed stroke was found to be lower than the negative predictive-value in the present study (81.4% versus 98.1%). A study conducted in Olmstead County, Minnesota revealed positive and negative predictive values for stroke including TIA of 67.4% and 99.2% respectively, with higher levels of agreement observed in older ages, women, and more educated individuals [12]. In addition to the difference in disease definition, mode of administration may have led to observed differences. The PhenX stroke protocol was intended to be interviewer-administered and was self-administered in the current study and the gold standard for the current study was physician assessment. Consideration should be given to being more specific with the PhenX question so that respondents understand the difference between TIA and stroke because they are not identical terms.
Data for direct validation of the major depressive episode (MDE) PhenX questions were not available but a comparison of the rates documented in PMRP with the PhenX Toolkit revealed markedly similar MDE rates with previously published age-and genderspecific rates from the WHO World Mental Health Survey Initiative [8,9]. This lends external validity to the results.

Conclusions
In conclusion, we demonstrated the ease and utility of the PhenX Toolkit to quantify exposures that can be used to facilitate gene/environment analyses. Future studies will leverage available GWAS data for this cohort of participants.