The impact of Personal Genomics (PGen) study
The PGen Study is a collaboration between academic researchers and two PGT companies, and the academic-industry partnership [18] and recruitment and data collection methods [12] have been described in detail elsewhere. Briefly, new customers of 23andMe and Pathway Genomics [19] (Pathway) were recruited online after placing an order for direct-to-consumer PGT between March and July 2012. Following an online consent process, participants were invited to three web-based surveys administered by Survey Sciences Group, LLC (Ann Arbor, Michigan): at baseline, after they had ordered testing but prior to receiving their results (BL); 2 weeks after viewing their results (2 W); and 6 months after viewing their results (6 M). Results were returned to customers according to the standard practice of each company and were linked to survey data at the end of survey administration. In total, 1464 participants completed the baseline survey and were eligible for follow-up; of these, 1046 (71.4 %) and 1042 (71.2 %) submitted the 2-week and 6-month surveys, respectively. Institutional approval was obtained from the Partners Human Research Committee and the University of Michigan School of Public Health Institutional Review Board.
Exposure variables
Participants received a genetic risk estimate, based on genotyping of multiple SNPs (Additional file 1: Table S1), for each of breast (women only), prostate (men only), colorectal and lung cancer. 23andMe customers were provided with a report that included: (1) a baseline, 10-year age-adjusted risk of each cancer, assigned prior to genetic testing; (2) an age-adjusted relative risk for each cancer based on the customer’s genetic profile; and (3) a revised 10-year age-adjusted risk of each cancer computed by multiplying together values (1) and (2). Risk estimates were additionally adjusted for biological sex in the case of colorectal cancer (Additional file 1: Table S2). These results were presented in the form of two diagrams each with 100 human figures, the first with a proportion shaded in to represent the general population risk (quantity 1, above), and the second with a proportion shaded in to represent the genetics-adjusted risk (quantity 3, above).
Pathway customers generally received results on a 5-category scale corresponding to increasing RR of disease; however, in the case of the four cancers being studied, all results provided were in either the second lowest category (Learn More: “Your genetic profile gives you an average predisposition to these conditions, and most people fall in this category. You should focus on disease prevention, learn about your family history and how lifestyle choices influence disease onset”) or the middle category (Be Proactive: “Your genetic profile shows increased susceptibility for these health conditions. You should make an effort to learn the warning signs, contributing lifestyle factors, and your family history for these conditions. Speak with your doctor about developing a prevention plan.”) (Additional file 1: Table S3).
In order to harmonize genetic risk information across companies, a threshold RR level was selected to distinguish elevated from average genetic risk. This process was undertaken during the data cleaning stage of the PGen Study and prior to any analysis of study data, including but not limited to the analysis presented here. Based upon consideration of their results reporting standards, each company advised PGen Study researchers on determination of this threshold. A threshold of RR ≥ 1.2 to distinguish elevated genetic risk was ultimately chosen based on three considerations: 1) 23andMe representatives indicated that the company generally considers average risk as within 20 % of general population risk; 2) Pathway representatives agreed that this threshold would in most instances match well the cut-point between their risk categories of “Learn More” and “Be Proactive”; and 3) PGen Study researchers agreed that in the context of DTC-PGT and genetic testing of common, low-penetrance variants, a RR ≥ 1.2 was appropriately indicative of an elevated genetic risk. Results across both companies were therefore dichotomized into two categories: average genetic risk (RR < 1.2; “Learn More” category) and elevated genetic risk (RR ≥ 1.2; “Be Proactive” category). Due to the restricted distribution of Pathway results, we were unable to discriminate in our analyses between average and reduced genetic risk results.
Outcome variables
Participants were asked on all surveys to rate their chances of developing each type of cancer “compared to the average [man or woman] of [the same] age.” Responses were recorded on a 5-point risk perception scale ranging from much lower than average (1) to much higher than average (5); [20, 21] alternatively, participants could select “I have been diagnosed with this condition.” Perceived risk (PR) was operationalized as a continuous variable, with each step on the 5-point scale corresponding to a 1 unit change.
Other variables
Age, race/ethnicity [22], gender, annual household income, highest education level, interest in cancer-specific PGT results (‘very interested,’ ‘somewhat interested,’ ‘not at all interested,’ for each cancer type), smoking status (‘never,’ ‘past,’ or ‘current’), and cancer family history were measured at baseline. Participants were asked to report on each blood relative with cancer, including both maternal and paternal families, and to indicate the type of cancer diagnosed in that relative. A series of conditional survey branches were used to obtain this information: e.g., (1) “Which of your blood relatives have ever had [cancer]?” (choose all that apply from list of relation types); (2) if “brother or sister” is selected, participant would be prompted with “Please select the type(s) of [cancer] that [a brother or sister] has had” (choose all that apply from a list of cancer types). Additional conditional branches were generated for each relative reported to have a history of cancer, but we did not collect information on age at diagnosis or family size. Using this data, a site-specific, 3-level ordinal family history variable was created for each of the four cancers, with levels corresponding to “no family history,” “family history in 2nd degree relative(s) only,” and “family history in 1st degree relative(s).” A general cancer family history variable, inclusive of all cancer types, was created in the same way. Use of cancer screening services since undergoing PGT was queried at 6 months with questions from the 2011 Behavioral Risk Factor Surveillance System Questionnaire, modified to reflect a 6-month window of interest [23].
Statistical analyses
Data for this analysis were obtained from 1155 participants who completed the BL survey plus at least one follow-up survey. Primary analysis samples were restricted to participants with an available genetic risk estimate for the cancer being studied; no missing data for BL-, 2 W-, or 6 M-PR (as necessary) for that cancer; and no reported diagnosis of the cancer being studied at any time during the data collection period.
We first performed linear regressions of change in PR from BL to 2 W (ΔPR2W) and from BL to 6 M (ΔPR6M) for each cancer. Analyses were adjusted for BL-PR, age, gender, race (White vs. non-White), Hispanic/Latino ethnicity, education (4 categories), smoking status (lung cancer only), and testing company; cancer family history and interest in cancer-specific results were evaluated as possible confounders. From the resulting linear regression models, least squares-adjusted mean ΔPRs were computed, stratified by genetic risk estimate (elevated risk versus average risk).
We used generalized estimation equations (GEEs) to account for the expected correlation between ΔPR2W and ΔPR6M and to evaluate the hypothesis that the effect of genetic risk estimate on ΔPR varied by follow-up time. In each model we included an interaction term between follow-up survey time and genetic risk estimate, and used a Wald test of significance to evaluate our hypothesis.
We next investigated effect modification by baseline participant characteristics. In order to maximize power, we conducted interaction analyses in the relevant 2 W follow-up samples, which had the smallest amount of missing data, and used the corresponding linear regression model (described above). Wald tests of significance were used to evaluate, in turn, interaction terms between genetic risk estimate and each of the following: baseline interest, age, gender (for colorectal and lung cancers), cancer family history, and smoking status (lung cancer only). Significant interaction terms were retained, and a final regression model was obtained for each cancer. From these, least squares-adjusted mean ΔPRs, stratified by genetic risk estimate and significant effect modifiers, were computed.
Sensitivity analyses
To assess the impact on our results of the decision to use linear regression modeling of ΔPR, which assumes a constant effect of genetic risk information on any 1-unit change in PR, we alternately performed each of the 2 W regression analyses using generalized logistic regression with ΔPR = 0 as the reference category for the outcome. To evaluate our selection of relative risk rather than absolute risk as the main predictor of ΔPR, we evaluated the correlation between RR and absolute risk results for each cancer. We also transformed RR into a four-category variable corresponding to quartiles (RRq), performed a linear trend test of RRq in the ΔPR6M linear regression models for each cancer, and then evaluated RRq as a categorical variable to observe the pattern of effect across quartiles. Because absolute risk and RR values were not available for Pathway Genomics participants, they were excluded from these analyses.
To evaluate the possibility that ΔPR was affected by cancer screening undertaken as a result of PGT, we repeated the ΔPR6M linear regressions, excluding anyone who had received screening for the relevant cancer since receiving their results. To evaluate the impact of informative censoring due to missing PR data at follow-up, we repeated the ΔPR6M linear regressions on a pseudo-population created using inverse probability-weighting (IPW) for missing data [24].
All analyses were conducted using SAS software (version 9.3; SAS Institute, Cary, NC), and models were fitted using PROC GLM (linear regressions) and PROC GENMOD (longitudinal analyses and IPW). Statistical significance for all analyses was set at p < 0.05.