PhenX RISING: real world implementation and sharing of PhenX measures

Background The purpose of this manuscript is to describe the PhenX RISING network and the site experiences in the implementation of PhenX measures into ongoing population-based genomic studies. Methods Eighty PhenX measures were implemented across the seven PhenX RISING groups, thirty-three of which were used at more than two sites, allowing for cross-site collaboration. Each site used between four and 37 individual measures and five of the sites are validating the PhenX measures through comparison with other study measures. Self-administered and computer-based administration modes are being evaluated at several sites which required changes to the original PhenX Toolkit protocols. A network-wide data use agreement was developed to facilitate data sharing and collaboration. Results PhenX Toolkit measures have been collected for more than 17,000 participants across the PhenX RISING network. The process of implementation provided information that was used to improve the PhenX Toolkit. The Toolkit was revised to allow researchers to select self- or interviewer administration when creating the data collection worksheets and ranges of specimens necessary to run biological assays has been added to the Toolkit. Conclusions The PhenX RISING network has demonstrated that the PhenX Toolkit measures can be implemented successfully in ongoing genomic studies. The next step will be to conduct gene/environment studies.


Background
The PhenX (consensus measures for Phenotypes and eXposures) Toolkit (www.phenxtoolkit.org) is a set of validated measures across 21 research domains that can be used to facilitate cross-study comparisons to increase statistical power to study gene/environment interactions [1,2]. The National Human Genome Research Institute (NHGRI) issued administrative supplements for the addition of PhenX measures into existing populationbased genomic studies sponsored by NIH to evaluate the usefulness of the PhenX measures and to stimulate their uptake (http://grants.nih.gov/grants/guide/notice-files/ NOT-HG- 11-009.html). Seven research groups were funded through this granting mechanism, coming together to form the PhenX RISING (Real world Implementation and ShaRING) consortium. The purpose of this manuscript is to describe the network and the site experiences in the implementation of PhenX measures into ongoing population-based genomic studies. The information gained will be used to further improve the PhenX Toolkit and to provide guidance to other scientists seeking to incorporate PhenX measures in their studies.

Methods
The PhenX RISING consortium comprises seven groups. A network-wide data use agreement was written and implemented to facilitate transfer of de-identified data among the seven groups and NHGRI. It is available on the PhexX Toolkit website (https://www.phenxtoolkit. org/index.php?pageLink=phenxrising). Research Triangle Institute (RTI) International (Research Triangle Park, North Carolina) serves as the administrative coordinator for the PhenX RISING network. NHGRI and RTI International documented PhenX protocol changes at each site. Monthly teleconferences between NHGRI, RTI International and the seven groups were used to share implementation findings and to discuss cross-study collaborations. Institutional certification was obtained from all sites to share de-identified data collected for this project with dbGaP (database of Genotypes and Phenotypes).
Site-specific information is summarized in Table 1. Eighty PhenX measures were implemented across the seven PhenX RISING groups, thirty-three of which were used at more than two sites, allowing for cross-site collaboration ( Table 2). The PhenC Toolkit contains ID numbers for the measures and separate numbers (often ending in 1) for the detailed protocols for specific measures.
Each site used between four and 37 individual measures and five of the sites validated the PhenX measures against other study measures (all but Asian Indian Diabetic Heart Study and Chinese Longitudinal Healthy Longevity Survey). Eight of the measures were only collected at a single site. Measures were selected to augment the data already available for the specific study cohorts and outcomes. Some sites also included additional measures to allow comparison across PhenX RIS-ING sites. The following section contains descriptions of the seven sites, the PhenX measures employed and the administration of protocols for each site.

The Asian Indian Diabetic Heart Study/Sikh Diabetes Study (AIDHS/SDS)
The AIDHS/SDS was established in India in 2002 and was funded by Fogarty International Center of National Institute of Health (NIH) [3]. Of the currently available 4,510 subjects from Phases I & II of the AIDHS/SDS, 1,200 subjects belong to family cohort and remaining 3,310 subjects are unrelated diabetic and healthy individuals recruited from India and the US. The goals of AIDHS/ SDS are to discover unique genetic markers associated with type 2 diabetes (T2D) and related metabolic and lipid traits by performing genome-wide association scans (GWAS) and validation studies. All participants signed a written informed consent for these investigations. The AIHDS/SDS was reviewed and approved by the University of Oklahoma Health Sciences Center's Institutional Review Board, as well as the Human Subject Protection Committees at the participating hospitals and institutes in India. Institutional certification was obtained for the submission of genotype and phenotype data of AIDHS to dbGaP.
Men and women aged 25-79 years participated. The diagnoses of T2D were confirmed by reviewing medical records for symptoms, use of medication, and measuring fasting blood glucose (FBG) levels following the guidelines of the American Diabetes Association (2004) [4], as described previously [5]. The 2 h oral glucose tolerance test (OGTT) was performed following the criteria of the World Health Organizations (WHO) (75 g oral load of glucose). BMI was calculated as (weight [kg]/height [meter] [4]). Subjects with type I diabetes, or those having a family member with type I diabetes, or rare forms of T2D sub-types (maturity onset diabetes of young [MODYs]), or secondary diabetes (from e.g. hemochromatosis, pancreatitis) were excluded from the study. The selection of controls was based on a fasting glycemia < 100.8 mg/dL (< 5.6 mmol/L) or 2 h glucose <141.0 mg/dL (< 7.8 mmol/L) were clinically free of T2D, impaired glucose tolerance (IGT).
Fasting blood samples (overnight, 12 hr) were drawn by trained assistants and serum and plasma aliquots were prepared for storage at −80°C. Blood pressure, anthropometric measurements (height, weight, and waist to hip ratio), FBG, insulin, serum cholesterol (total, HDL-C and LDL-C, and triglycerides) have been measured on all participants as described previously [5,6].
A GWAS was performed on 1,983 AIDHS/SDS subjects (980 T2D cases and 1,003 controls) from Punjabi Sikh community using a Human660W-Quad BeadChip arrays (Illumina, USA). Frozen serum samples of 1,983 subjects with GWAS data are used to perform biomarker estimations in the PhenX RISING study. We measured biomarkers related to beta cell function (c-peptide, total amylin), obesity (leptin), inflammation (TNF-α, MCP-1), T2D (vitamin D-25-OH), and kidney function (creatinine). These assays were performed following protocols and basic specifications in the PhenX Toolkit (http://www. phenxtoolkit.org/) to aid compatibility across different studies. The multiplex assays for c-peptide, leptin, total amylin, TNF-α, and MCP-1 were performed using Magnetic MILLIPLEX Human Metabolic panel from Millipore (St. Charles, Missouri) on Luminex platform (PhenX protocol #141201). The assays for 25-OH vitamin D (PhenX protocol #051100) were performed using standard monoclonal antibody-based florescence ELISA assays kits from ALPCO Diagnostics (Salem, NH). Serum creatinine was measured at Oklahoma University Medical Center Laboratory using standard Jaffe rate methodology according to the PhenX protocol (141201). All assay kits for each biomarker were used from a single source.

Detroit Neighborhood Health Study -University of Michigan
The Detroit Neighborhood Health Study (DNHS) is a prospective, representative longitudinal cohort study of predominantly African American adults living in Detroit, Michigan. The overall goal of the DNHS is to identify how genetic variation, lifetime experience of stressful and traumatic events, and features of the neighborhood environment predict psychopathology and behavior. As such, the study includes two parts: a neighborhood assessment   and a participant cohort. A systematic evaluation of Detroit neighborhoods was conducted June-July 2008. Data was collected on various aspects of neighborhoods, such as external building condition, sidewalk/street condition, presence of graffiti, presence of community gardens, and number of vacant lots. Cohort participants were selected with a dual-frame probability design, using telephone numbers obtained from the U.S. Postal Service Delivery Sequence Files as well as a listed-assisted random-digitdial frame [7]. Individuals without listed landlines or telephones and individuals with only a cell phone listed were invited to participate through a postal mail effort. Participants completed a 40 minute structured telephone interview annually between 2008-2012 to assess perceptions of participants' neighborhoods, mental and physical health status, social support, exposure to traumatic events, and alcohol and tobacco use; each participant was compensated $25USD [7,8]. All survey participants were offered the opportunity to provide a blood specimen (venipuncture, blood spot, or saliva) for immune and inflammatory marker testing as well as genetic testing of DNA [9]. Participants received an additional $25USD if they elected to give a sample. Informed consent was obtained at the beginning of each interview and again at specimen collection. The Institutional Review Board of the University of Michigan reviewed and approved the study protocol.
Of the 1,547 participants in Wave 1 (Sept. 2008-April 2009) and Wave 2 (Sept. 2009-June 2010), 917 provided a biospecimen sample yielding DNA; 800 were randomly selected for GWAS testing using the Illumina OmniExpress GWAS chip. The fourth wave of the DNHS interview incorporated PhenX Toolkit phenotype measures (Table 2) and targeted these 800 individuals. Institutional certification was obtained for the deposition of genetic and phenotypic data into dbGaP.
Identifying and modifying a risk phenotype for self-regulation failure -Duke University The parent study was designed to validate a hypothesized gene/environment/self-regulation risk phenotype (a combination of individual differences in regulatory focus, COMT genotype, and chronic failure to attain a particular kind of personal goal) that is believed to confer vulnerability to failures of self-regulation, which in turn increase risk for psychopathology with significant public health implications such as aggression, gambling, and excessive use of alcohol and other drugs. The parent study includes the best-validated measures in the field of imaging genetics for quantifying the phenotypes of interest. This list shares little overlap with the specific measures included in the PhenX toolkit, but significant overlap in the domains of interest including Psychiatric, Psychosocial, and Alcohol, Tobacco, and Other Substances. Conceptually overlapping measures from these domains were integrated into our existing protocol, allowing for evaluation of relationships between PhenX toolkit measures and behavioral, clinical, neural, hormonal and genetic variables of clear significance for psychopathology. Thus, the broad goals of our proposed research are (1) to add measures from the PhenX toolkit that overlap with phenotypic measures in the existing study and (2) to add measures from the PhenX toolkit that represent important areas of measurement that were not included in the parent grant because they were not specifically related to the aims of that investigation. We will then evaluate the utility of PhenX toolkit measures on several criteria including validation against intermediate neurobiological phenotypes identified through functional neuroimaging.
Over the 1-year funding period, we collected data on N = 200 subjects from a college sample and N = 50 subjects from an adolescent sample. All subjects were recruited through existing protocols approved by the Duke University Medical Center Institutional Review Board and provide written informed consent before participation. We will now work with collaborators to combine our data sets with others that have used the same PhenX tools to provide the greatest power possible to address questions of genetic influences on phenotypes of interest to our colleagues in the field as well as those phenotypes most directly related to our own work. Of note, we anticipate continuing to use the added PhenX measures for the duration of the parent project, giving a total sample size of N = 400 college students and N = 100 adolescents.

Marshfield Clinic Personalized Medicine Research Project (PMRP)
The Marshfield Clinic Personalized Medicine Research Project (PMRP) is a population-based biobank linked to the electronic health records of Marshfield Clinic [10]. After providing written informed consent, subjects aged 18 years and older completed questionnaires that included questions on demographics, family health history, smoking and alcohol exposure and dietary intake [11] and physical activity questionnaires. The biobank was reviewed and approved by the institutional review board (IRB) of Marshfield Clinic. The PhenX RISING project was reviewed and approved by the IRBs at Essentia Institute of Rural Health, Marshfield Clinic and Pennsylvania State University.
The Marshfield Clinic PMRP is a member of the NHGRIfunded eMERGE network (www.gwas.net) [12]. The goal of eMERGE1 was to conduct genome-wide association studies using electronic health records to define phenotype. The primary Marshfield phenotypic outcomes used to identify subjects for GWAS genotyping were age-related cataract and HDL. Additional subjects were genotyped for dementia, resistant hypertension and open-angle glaucoma. The subjects with GWAS data who were alive with known, non-institutionalized addresses and who had given consent for re-contact were eligible for participation in the PhenX RISING study.
The PhenX measures listed in Table 2 from the PhenX Toolkit were incorporated into a 32-page, self-administered questionnaire. The questionnaires were mailed with a cover letter to eligible subjects with a stamped, self-addressed envelope. A second mailing was employed to maximize the response rate. Subjects were offered $10 for their time to complete the questionnaire. PhenX responses were validated using data from PMRP questionnaires and Marshfield Clinic electronic health records [13].

Pediatric Imaging, Neurocognition, and Genetics (PING)
Pediatric Imaging, Neurocognition, and Genetics (PING) is a multi-site cross-sectional study of typically developing children, adolescents, and young adults ranging in age from 3 to 20 (see Acknowledgements for a description of participating members from the PING infrastructure) funded by the National Institute on Drug Abuse (NIDA) and the National Institute of Child Health and Human Development (NICHD). The primary goal of PING is to create a pediatric imaging-genomics database of approximately 1400 cases that is freely available to the scientific community. Participants aged 18 and up provided written informed consent to undergo approximately three hours of neurocognitive testing and a one hour neuroimaging session, and to provide a saliva sample for [14,15]. The majority of participants also consented to allow these data to be shared in the publicly available database. For participants under the age of 18, parent versions of this consent were signed and the children and adolescents provided their assent where appropriate. This study structure was approved by IRBs at all participating PING sites. Six of the 9 PING sites chose to participate in the PhenX RISING project, and each participating site's IRB approved this project as well.
Initially, only self-report PhenX measures were chosen for inclusion in PING. Given the reading limitations of the youngest children in the PING age range, only participants ages 9 and above were asked to complete these measures. Although the original PING age range was 3 to 20, a few of the participants were 22 years old by the time they were brought back to complete the PhenX measures. Between the 6 sites who opted to participate in the PhenX RISING project, 585 subjects met the initial age criteria for inclusion. Subsequently, the UC San Diego site opted to include data from one PhenX parent-report measure (Childhood Behavior Questionnaire; CBQ) that they were already administering in the lab for 3 to 7 year old participants prior to beginning the PhenX RISING project. Table 2 lists all the PhenX measures that were chosen for inclusion. Not all of the original measures were deemed appropriate for all ages. As such, study arms were created for different age ranges, and measures were included in each age range as appropriate. Table 3 indicates which measures were used in each study arm. Several PhenX instruments were available in both child/adolescent and adult versions. With one exception, child/adolescent versions were used for all participants in order to maximize consistency across PING cases. Separate versions were used only for the General Self-Efficacy scale because the child version asked a large number of schoolrelated questions that were not appropriate for young adults. Some questions were modified from their original form in order to broaden their applicability to all participants within the PING age range (see Table 4 for specific modifications).
Although PING recruitment strategies varied by site, the general approach that was taken was to enroll and complete participants in the older age ranges first. This strategy allowed investigators to observe responses to testing and imaging, and to better anticipate and plan for any challenges that seemed likely to arise when running younger subjects. PhenX Toolkit measures were not incorporated into PING data acquisition protocols until most children over age 8 were already completed. As such, the time between collection of the initial PING deliverables and the collection of PhenX data varied greatly for participants who were enrolled in PING after the addition of the PhenX protocol. Overall, the time difference ranged from 0 to 2.5 years (M = 0.93, SD = 0.76). Participants who were adolescents when they assented to participation in PING, but then turned 18 prior to PhenX completion, were asked to complete an additional adult consent form.
In order to improve response rate from participants who already completed PING, it was decided that PhenX data would be completed in a web-based format. An NIHsponsored web-based data collection tool called Assessment Center (www.assessmentcenter.net) was used for online data collection. A multi-arm study was created in Assessment Center, with each arm representing an age range, and the PhenX instruments were added to the study arms as appropriate. Items appeared on the screen one at a time, and participants could choose a response option or press "Next" to skip to the next question. Participants could also click "Previous" to go back and change a response to a previous item within an instrument. The structure of the alcohol and substance abuse instruments was changed slightly, and skip logic was employed, in order to adapt them to the web-based format.
Participant recruitment and reimbursement strategies for the PhenX RISING project varied by PING site. For retrospectively collected cases, some sites called or emailed participants or parents and offered an opportunity to participate in exchange for reimbursement. Other sites brought participants back into the lab for additional studies, and asked them to complete the web-based PhenX study at that time. Sites also differed on prospective data collection procedures, where some collected PhenX data in lab and others allowed participants to complete the questionnaire from home. When login information was emailed to participants, a username was sent in an email with the study link, and a password was sent in a separate email for security purposes. Reimbursements were sent after verification of completion, and ranged from $20 to $40.

UCLA Consortium for Neuropsychiatric Phenomics
The Consortium for Neuropsychiatric Phenomics comprises eight linked grants awarded under the aegis of the NIH Roadmap Initiative. The PhenX supplement grant was awarded to the Human Translational Applications Core, a center core that conducted extensive phenotyping of more than 1000 healthy volunteers aged 21 to 50 in the Los Angeles metropolitan area from 2007 to 2012.
The phenotyping efforts focused on two primary themesmemory mechanisms and response inhibition mechanismsand participants completed approximately 12 hours of cognitive phenotyping, and a subset of these participants received also several hours of neuroimaging procedures to examine brain structure and function (descriptions of these procedures are available at www. phenomics.ucla.edu). The PhenX supplement to this protocol focused on behavioral and cognitive variables, and involved two components: (1) a Web-based component comprising participant self-report questionnaires, which was offered to all English-speaking completers of the parent study who agreed to be recontacted; and (2) an inlaboratory study English-speaking completers who were willing to have additional procedures conducted in the laboratory. These measures are listed in Table 2. Participants received $15/hour for participating and those who came for in-lab procedures additional received reimbursement for public transportation or parking. The project was approved by the UCLA IRB.  Did not modify "Protocol Text" field but modified other fields (frequency, used "everday" instead of "every 30 days") CLHLS 010100 Did modify "Protocol Text" field for specific applications (added "animal year" for birth year) CLHLS 011000 Did modify "Protocol Text" field for specific applications (added "the year of attending school" instead of the degree of education) CLHLS 011300 Did modify "Protocol Text" field for specific applications (most of respondents are retired at present) CLHLS 010500 Did modify "Protocol Text" field for specific applications (respondents are Chinese, not Americans) CLHLS 101100 Did not modify any fields but used a subset of the protocols CLHLS 030800 Did not modify "Protocol Text" field but modified other fields (frequency, used "everday" instead of "every 30 days") CLHLS 150703 Did modify "Protocol Text" field for specific applications (added "playing mah-jong") CLHLS 011100 Unfortunately, we had to drop the PhenX family income measure. If the participant does not outright answer their best estimate for total family income, a series of higher or lower questions are asked which relies on poverty threshold information determined by the US Census Bureau. However, the poverty threshold levels contradict the pre-determined series of higher or lower questions such that a family could be making more than $30 K a year but still be under the poverty threshold depending on the number of members living in the household. Changed "refused" response option to "decline to state" PING 010700 Changed "refused" response option to "decline to state" PING 010900 Changed "refused" response option to "decline to state" PING 180400 Did not modify any protocol text, but modified the age range such that the child version will be given to all participants (up to age 21) PING 211000 Added instructions and a question at the beginning asking for an education level so that the questionnaire could be administered to only the participants who are still in school. Also changed wording of items 7 ("help us children with our…" to "help students with their…"), 9 ("teacher" to "teachers"), and 37 ("when we play" to "when we do activities") in order to make it apply to entire age range.

PING 120500
Administering the child protocol to all participants ( up to age 21). Also modified wording of items 4 ("other kids" to "others") and 15 ("kids" to "people") in order to make it applicable to entire age range.

PING 120200
Administering child version to all participants (up to age 21). Changed the wording of items 6 ("I want that things are in a fixed order" to "I want things to be in a fixed order"), 52 ("I worry that bad happens to my parents" to "I worry that bad things happen to my parents"), and 64 ("I have unbidden thoughts about a very aversive event I once experienced" to "I have unwanted thoughts about a very unpleasant event I once experienced") in order to make them more easily understood by young children.

PING All Alcohol, Tobacco, & Substance Questionnaires
Administered all questions, but added questions and modified order when necessary to allow for the skip logic to work properly in Assessment Center. Will be emailing reference cards as separate files in email to participants.

PING 180500
Did not modify any protocol text, but modified the age range such that the child version will be given to all participants (  To gain better understanding of social, behavioral and genetic factors and their interactions may affect healthy longevity, as well as to provide database for academic research, health and aging policy analysis, the Chinese Longitudinal Healthy Longevity Survey (CLHLS) conducted about 80,000 face-to-face interviews with participants in 1998, 2000, 20002, 2005, and 2008 Supported by the administrative supplement awarded by The National Human Genome Research Institute, the CLHLS team added 13 PhenX measures (including 32 data items) in our CLHLS 2011/2012 new wave. These additional relevant standard phenotypic and environmental exposure measures related to healthy aging selected from the NIH PhenX Toolkit will be used together with other internationally-standardized data which have been collected in CLHLS to address scientific questions on the effects of genetic, social, behavioral, environmental factors and their interactions on healthy aging at old ages.
The CLHLS study protocols (such as the informed consensus forms and other relevant materials) was reviewed and approved by the Institutional Review Boards of Duke University and Peking University.
In addition to the site-specific projects outlined, crossnetwork analyses are being undertaken for three projects where two or more sites have collected the same PhenX measures. Data harmonization for race/ethnicity was undertaken across all seven sites. This measure was chosen because it was being used by all sites and several sites had more than one measure used. Also, race/ethnicity is important for gene/environment analyses and the administrative supplement was specifically made available to support gene/environment studies. The process employed to harmonize the measures was first to compare the questions asked and the mode of administration. The PhenX Toolkit measures were considered to be the common measure for harmonization. The next step was to determine if the race/ethnicity categories were the same for all sites. The category "other" served as the common denominator where sites did not have the same level of detail. Finally, variable names and codes were checked for consistency.

Results
Study-specific protocols and data will eventually be available in dbGaP for all of the PhenX RISING sites.
The various implementation strategies employed across the sites and different study populations resulted in different response rates and knowledge gained ( Table 4). Sitespecific experiences follow.

Asian Indian Diabetic Heart Study/Sikh Diabetes Study (AIDHS/SDS)
Quantification of serum biomarkers using PhenX Toolkit measures were performed on frozen serum samples of participants with genotyping data available from GWAS. Informed consent was obtained from each individual upon initial inclusion into these investigations for participation in genetic and biomarkers study therefore no additional contact was required for these investigations. Results for each biomarker were included in an extensive database for analysis. Enrichment of GWAS data with additional biomarkers could lead to identification of variants regulating important metabolic pathways through cross-study analysis.
Two planned assays were not run because they would have used too much of the remaining biological sample. After discussion with RTI, ranges of sample volume requirements was added to the PhenX Toolkit.
Our study strongly recommends biomarker assay optimization (especially those measured using different platforms) to reduce inter-study variability.

Detroit Neighborhood Health Study
The PhenX Toolkit measures were incorporated into the fourth interview wave of the DNHS. The PhenX Toolkit measures required formatting for telephone administration and CATI programing from their original written application. Questions were re-numbered to fit into the existing annual survey structure. Response coding was adjusted to match existing survey codes for consistency.
The PhenX Toolkit measure for Annual Family Income was not included in the final version of the survey. The Annual Family Income measure requires the interviewer to have information on current poverty levels from the U.S. Census Bureau. We found the question structure was not compatible with certain poverty threshold scenarios based on 2008 poverty data for Detroit, MI. For example, it would be possible for a participant to have an annual family income above $35,000, yet still be below the poverty threshold based on their family size. As a consequence, their response would not trigger the poverty threshold specific income component of the PhenX question because they fell into a previously asked income category. Due to this inconsistency and potential for incorrect classification, we reverted to an annual family income question structure successfully implemented in previous survey waves.
We also found it necessary to change the administration of substance use questions for both the lifetime use and 30-day frequency. These questions were originally developed to be asked at an in-person interview and the materials on the PhenX Toolkit include a "flashcard" describing the various types of substances included in this measure. To effectively adapt these questions for telephone administration and determine licit from illicit use, we modified the structure to ask: 1-if the participant had ever used the substance in their lifetime, 2-if the participant answered yes to #1 they were asked the 30-day frequency of use, 3-if the participant answered yes to #1, they were asked if they ever used the substance illicitly in their lifetime for the drug categories sedatives, tranquilizers, painkillers, stimulants, and marijuana, 4-if the participant answered yes to #3, they were asked the 30-day frequency of illicit use, again for the drug categories sedatives, tranquilizers, painkillers, stimulants, and marijuana. All reported use of the drug categories cocaine, hallucinogens, inhalants/solvents, and heroin was assumed to be illicit as they are controlled substances. These alterations kept the essence of original measure yet tease out the difference between licit and illicit use in a telephone interview format.
The survey was administered to participants by Abt SRBI (New York, NY) beginning in September 2011 and concluded in February 2012. The average administration length was 32.3 minutes and a response rate of 80% (845 of 1050) was achieved.
The Aiello Group identified some limitations associated with a few PhenX toolkit measures in their survey when applied to their population of participants in the "Detroit Neighborhood Health Study". Certain validated measures, such as substance use and annual family income, had to be altered to be successfully administered by telephone in the DNHS population. Though the Aiello Group supports the use of standardized measures to foster collaboration and analysis between studies, further refinement of the PhenX Toolkit measures will be needed to reflect the diverse settings in which they may be used, such as phone, personal computer, or in-person interviews.

Duke University Imaging Genetics Study
The response rate for PhenX measures in this study was 100%. The high response rate was likely due to the conditions of the study. Participants required minimal additional instruction from research staff, suggesting that online administration of the PhenX measures is viable.
The PhenX Toolkit measures required time to format for computerized administration, including automated skip-logic (i.e., creating computerized instructions to ignore some questions if previous answers suggest they are irrelevant) and custom formatting of some items. This initial investment of resources resulted in significant advantages over paper administration however once the questionnaires were converted to electronic format. The value of computerized administration increases with sample size, such that in any large scale study, it is difficult to imagine using a paper format unless absolutely required. We would like to emphasize however that PhenX measures requiring an interviewer were generally avoided for this study and would pose unique challenges to computerized administration.

Marshfield Clinic Personalized Medicine Research Project
The strategy of two mailings with a modest financial incentive has been used successfully in Marshfield previously for self-administered questionnaires [11]. With the 32-page PhenX self-administered questionnaire, this strategy resulted in a 70% response rate.
The PhenX Toolkit measures required substantial time to format for self-administration ( Table 4). The instructions for a person to administer the questions and the instructions for scoring some of the sections had to be removed. Distracting notations for data entry were deleted. Response order was changed to be consistent between questions, such that "yes" always came before "no". "Refused" was deleted as a response category. Numbering was changed to reflect the total number of questions included. RTI responded by creating an option to select "self-administered" within the PhenX toolkit when creating the data worksheets.
A number of rules for coding non-standard responses were developed and shared with RTI International and NHGRI program staff to be incorporated into the Phenx Toolkit. Where subjects entered numbers that were not integers and the PhenX measure only allowed for integers, numbers were rounded up. When subjects indicated two education levels, the highest was selected for data entry. Improbable responses such as a height of 11 feet were changed to missing data. In the section with questions about health problems related to drinking, if a health problem not clearly related to drinking (such as a hiatal hernia) was indicated, that response was not used. In the depression symptom assessment, if more than one response category was entered the more severe level was entered. If number ranges were given, the mid-point of the range was entered and rounded up if an integer was required.
In the data cleaning process, several genders errors were discovered. After checking the medical records for these subjects, it became clear that the spouse of the intended subject had completed the PhenX questionnaire.
One of the biggest issues I think we had discussed in the beginning was the way the Domains had to be transferred from their original form to the questionnaire form. (very time and labor intensive).
The Domains could have been written in a more basic easy to follow manner and then ready to be inserted in to questionnaire form. To be very explicit and try to eliminate the replies that result in such outliers.
Scoring was huge issue. This again should be more consistent across all the sites. Our site overcame the inconsistent answers by looking at each of the situations individually and creating a rule for each situation.

Pediatric Imaging, Neurocognition and Genetics (PING)
As previously noted, strategies for acquiring data from participants who already completed the rest of their PING assessment varied by site. As such, response rates also varied by site. Of the original 585 participants targeted at the 6 participating PING sites, 286 completed the PhenX measures (49%). The UC San Diego site added 77 CBQ parent-report measures in the 3 to 7 age range, for a total 361 cases (2 participants who completed the CBQ at age 7 were also among the participants who completed the web-based assessment when they turned 9). The length of the online study varied according to the age-based study arm the participant qualified for, but completion time ranged from 20 minutes for children ages 9-10 to 1.5 hours for adolescents who endorsed use of a variety of substances.
Creation of the study in Assessment Center had a number of strengths and weaknesses. Assessment Center was designed for the purpose of secure data collection, and this made it an ideal medium for collecting this small amount of data and adding it to the larger set of data that was already collected for these participants. Creation of the short form instruments was simple, and once an instrument was created, it could be placed in as many study arms as necessary, or even shared with other studies in Assessment Center. Creation of the substance use forms was somewhat more difficult. The skip logic options in Assessment Center are relatively basic, allowing the instrument only to skip ahead on the basis of specific responses to the current item. It would not allow for more complex branching involving decisions based on responses to previous questions. Therefore, it was sometimes necessary to change the order of items or add additional items to allow the instrument to flow continuously from beginning to end. In addition, some of the substance use questionnaires came with reference materials describing alcoholic beverages and substances for the participant. Such materials could not be provided on-screen with the relevant questions using Assessment Center. Because of this, a PDF file was created with the reference materials and emailed to the relevant participants when sending them the study link.
For the purpose of ongoing quality assurance, data were scored using the PhenX protocol. One issue that was discovered relating to online data collection was that some items were skipped. Items that were skipped by design due to the program's skip logic were denoted by missing values in the output table. However, skipped items for which a response was expected were denoted with the word "SKIP" in the output. When a participant skipped an item, there was no way of knowing if this was accidental or whether s/he chose to skip it, and if so, why. Some may have skipped items because they did not feel comfortable answering, but others may have skipped because they did not understand the question. This may be one potential drawback to collecting data online rather than in a lab where a researcher can answer any questions and ensure that any missing data was intentional and/or unavoidable. Because most of the scoring instructions for the PhenX short form measures involve summing items across subdomains, missing items heavily impacted scores. As such, it may be necessary to develop scoring protocols that either compute mean scores rather than summed scores, or impute missing data.
A number of challenges arose in the implementation of the PhenX RISING supplement to the PING study. Because PING is itself a multisite initiative taking place at 9 different sites across the U.S., we ran into a number of problems throughout the data collection and sharing process. Each site had specific language regarding what measures would be administered and how that data could be shared in their consent forms and IRB protocols, and this language was not standardized across sites. Because the measures given through the PhenX RISING initiative were added on to an already existing protocol that was specific to each site, the process of amending IRB protocols was time consuming. It turned out that it was not feasible for each site to amend its IRB to accommodate the collection of PhenX RISING data. As a result, three of our nine sites opted not to participate in PhenX RISING. Additionally, another two sites determined after data collection was complete that they were unable to share the data that was collected. Because of this, we would strongly recommend that multisite studies standardization their IRB protocols as early as possible, paying particularly close attention to data sharing language.
Another challenge associated with PhenX data collection was that a large number of PING participants had already completed their visit before the PhenX initiative was implemented. As a result, we felt that the best chance we had of maximizing our response rate to PhenX measures was to offer the battery of questionnaires as an online survey. We spent a great deal of time converting PhenX measures to Assessment Center, and we ran into a number of problems with questionnaires that used loops and skip logic. Overcoming these obstacles took some time. This barrier and the IRB difficulties described above were the primary reasons why so many PING participants had participant had completed their visits by the time we were able to launch PhenX data collection. Despite our efforts to make it as easy as possible to respond to our PhenX questionnaires, we ultimately overestimated the number of participants from which we would be able to acquire the added PhenX data. If we had our IRB issues resolved earlier, it would have been very useful to have an already existing webbased mechanism for acquiring data. One way of accomplishing this would be to have standard versions of these questionnaires in Assessment Center.
In addition to the challenges described above, we also learned a valuable lesson about the use of these measures in a developmental sample. The PhenX Toolkit has a number of measures that have child and adult versions, and this is useful for studies with more narrow age ranges. However, our sample ranged in age from 3 to 20. We were not able to find measures that could be given across our entire age range. As a result, we ended up with much smaller sample sizes than we had hoped for many of the measures, even when taking the other challenges we faced into consideration. Combining versions was often not possible because the administration format and domain scores are often quite different. We attempted to get around this to some extent by modifying the wording of some questions so we could expand the age range of a single form. However, to the extent possible, it would be very helpful if some measures could be identified for inclusion that span a wider age range for children and adolescents.

UCLA Consortium for Neuropsychiatric Phenomics
UCLA Consortium for Neuropsychiatric Phenomics Paralleling other sites we found that the formatting of PhenX measures to our unique Web-based platform involved more effort than we would have hoped, particularly for certain branching questionnaires. Given the high likelihood that future studies may well move towards increasing Web-based data acquisition, it may be useful to consider developing a centralized Web-service that would help better standardize the acquisition process and data capture, because the current model is going to involve yet another "translation" to integrate with other PhenX data even though the data are designed to be compatible.
For the in-laboratory components of examination (in our case, for neurocognitive phenotyping), we think standardization would be enhanced if PhenX were to provide standard instructions and training guidelines. Our group has extensive experience with the PhenX instruments, but as we organized the training it became clear that there are many "devils in the details" of test administration training and quality assurance that we are familiar with as a site primarily dedicated to cognitive assessment, but sites with less experience aiming to "add on" some cognitive phenotype measures will likely benefit from more guidance. For example, the different vendors of the psychological tests do not uniformly provide instructions on key elements of the examination procedure including stimulus presentation, response collection, and scoring of ambiguous responses.

Chinese Longitudinal Healthy Longevity Survey (CLHLS)
The data entry and cleaning for the CLHLS 2011 survey in all other sampled areas was completed. We have conducted interviews with 7,375 surviving CLHLS participants aged 65+, and 4,918 interviews with a close family member of the deceased CLHLS Participants aged 65+. The response rate of our CLHLS 2011 survey is 86.1%, lost-follow-up rate (mainly due to outmigration and the interviewers could not find them any more) is 11.5%, and the refusing rate is 2.4%. The refusal rate was fairly close to that in previous waves, which may show that the newly added 13 PhenX measures (with 32 data items) are in general workable among Chinese elderly population. The interview refusal rate among the Chinese elderly especially the oldest-old was low. The low refusal rate likely is due to the fact that the Chinese elders especially the oldest-old in general like to talk to outside people, plus they stay at home without a job or other duties. Many of the oldestold and their family members may also feel honored to participate in survey interviews concerning healthy longevity, as they may be proud of being a member of a longlived group.
Age reporting of Han Chinese consisting of 94.4% of the total sample of our CLHLS 2011 survey is acceptably accurate, which is rather unique as compared to many other developing countries. Acceptably accurate age reporting among the Han Chinese elderly including the oldest-old is due to their cultural tradition of memorizing their date of birth for determining important life events such as dates of engagement, marriage, starting to build a residential house, and even for long-distance traveling. This has been confirmed previously [16]. We have conducted evaluations of the data quality of the CLHLS 2011 survey including the newly added PhenX measures. The evaluation include assessments of mortality rate, proxy use, non-response rate, sample attrition, reliability and validity of major health measures, and the rates of logically inconsistent answers, with generally satisfactory results compared to other major aging studies. Factor analyses on cognitive functioning, physical performance, and functional limitations demonstrate that the interviewees' answers to questions concerning different aspects of the same category are generally consistent. The rates of logically inconsistent answers and incomplete data are relatively low. Careful assessments have led us to believe that, similar to previous CLHLS waves, the data quality of the CLHLS 2011 survey is generally good. However, we realize that some problems also exist in the datasets, which will be addressed in our forthcoming technical reports.
As the first batch of the results of our CLHLS PhenX study component, we have produced a 35-page report including 34 tables containing the 34 data items in the 2011 CLHLS questionnaires corresponding to the 13 newly-added PhenX measures, supported by the NIH administrative supplement grant awarded to CLHLS research team. These PhenX measures are based on the healthy-ageing relevant items from the internationally well-known PhenX Toolkit (https://www.phenxtoolkit. org/), and adopted to Chinese culture and social reality.
As previously planned, our CLHLS 2012 survey in 8 longevity areas (counties or cities) where the density of centenarians is exceptionally high is still ongoing. We adopts the same study protocol but with added more sophistic components in our 2012 survey in these 8 longevity areas, as compared to the survey in the other sampled areas of the 22 provinces surveyed in 2011. We expect to complete all field work of face-to-face interviews around the end of October in 2012 (note: Our previous fifth wave of CLHLS was conducted in 2008/2009, and thus the current sixth wave is in 2011/2012). We will conduct data analysis on the relevant PhenX measures newly collected, aiming at: (1) to enhance the interdisciplinary research of genetics and its interactions with social and behavioral factors; (2) to broaden the scope of our CLHLS study and combine it with other investigations using the same or similar PhenX measures to increase power and efficiency of discoveries on effects of genetic, social, behavioral factors and their interactions on healthy aging.
As demonstrated in Table 5 by the large number of study subjects representing diverse racial/ethnic groups, the PhenX RISING network was able to successfully implement PhenX measures into ongoing studies in a relatively short time frame (one-year administrative supplements to parent grants).

Discussion and conclusions
There are a number of consortium efforts to standardize phenotypic measures to facilitate large-scale data sharing and comparison for genomic studies. The eMERGE network has shown that electronic algorithms can be developed and applied to electronic medical records to produce valid phenotypes for use in genome-wide association studies [12]. Similarly, the Phenotype Standardization Project is developing valid phenotypes for pharmacogenetic studies of serious adverse drug reactions [17][18][19]. The goal of the PhenX RISING network was to evaluate implementation of PhenX measures into ongoing genomic studies. We have shown the PhenX measures to be useful for large-scale studies linking genotypes and phenotypes and we identified a number of issues in the use of the PhenX Toolkit that were addressed to improve the Toolkit for future users. Advantages include the large number of measures employed and the diversity of administration and study cohorts. The diversity could also be viewed as a disadvantage because there was little replication for specific measures and study cohort types. Ongoing validation efforts at many of the sites will provide information about the accuracy of the data collected in various formats and with any modifications implemented at the sites.
Several cross-network analyses are ongoing between the groups that have collected the same PhenX measures. The Data Use Agreement and the standardized PhenX measures will facilitate these collaborations. The within-and between-group gene/environment analyses will be the ultimate test of the PhenX measures. Other researchers who use the PhenX measures are encouraged to provide feedback to RTI for continual improvement of the Toolkit.