Personal Genome Project UK (PGP-UK): a research and citizen science hybrid project in support of personalized medicine

Background Molecular analyses such as whole-genome sequencing have become routine and are expected to be transformational for future healthcare and lifestyle decisions. Population-wide implementation of such analyses is, however, not without challenges, and multiple studies are ongoing to identify what these are and explore how they can be addressed. Methods Defined as a research project, the Personal Genome Project UK (PGP-UK) is part of the global PGP network and focuses on open data sharing and citizen science to advance and accelerate personalized genomics and medicine. Results Here we report our findings on using an open consent recruitment protocol, active participant involvement, open access release of personal genome, methylome and transcriptome data and associated analyses, including 47 new variants predicted to affect gene function and innovative reports based on the analysis of genetic and epigenetic variants. For this pilot study, we recruited 10 participants willing to actively engage as citizen scientists with the project. In addition, we introduce Genome Donation as a novel mechanism for openly sharing previously restricted data and discuss the first three donations received. Lastly, we present GenoME, a free, open-source educational app suitable for the lay public to allow exploration of personal genomes. Conclusions Our findings demonstrate that citizen science-based approaches like PGP-UK have an important role to play in the public awareness, acceptance and implementation of genomics and personalized medicine. Electronic supplementary material The online version of this article (10.1186/s12920-018-0423-1) contains supplementary material, which is available to authorized users.


Background
The sequencing of the first human genome in 2001 [1,2] catalysed a revolution in technology development, resulting in around 1 million human genomes having been sequenced to date at ever decreasing costs [3]. This still expanding effort is underpinned by a widespread consensus among researchers, clinicians and politicians that 'omics' in one form or another will transform biomedical research, healthcare and lifestyle decisions. For this transformation to happen successfully, the provision of choices that accommodate the differing needs and priorities of science and society are necessary. The clinical need is being addressed by efforts such as the Genomics England 100 K Genome Project [4] and the US Precision Medicine Initiative [5] (recently renamed to ' All of Us') whilst the public's desire for direct-to-consumer genetic testing is met by a growing number of companies [6]. However, little of the data from these sources are being made available for research under open access which, in the past, has been a driving force for discovery and tool development [7]. This important research need for unrestricted access to data was first recognised by the Human Genome Project and implemented in the 'Bermuda Principles' [8]. The concept proved highly successful and was developed further by personal genome projects such as PGP [9][10][11][12] and iPOP [13] and, more recently, has also been adopted by some medical genome projects like TXCRB [14] and MSSNG [11] the latter of which uses a variant of registered access [15]. PGP-UK is part of the Global PGP Network (see Links) which was founded by George Church and colleagues at Harvard University. The Network currently comprises five active PGPs in the United States (Boston, since 2005), Canada (Toronto, since 2012), United Kingdom (London, since 2013), Austria (Vienna, since 2014) and China (Shanghai, since 2017). In Europe, PGP-UK was the first project to implement the open consent framework [16] pioneered by the Harvard PGP for participant recruitment and data release. Under this ethics approved framework, PGP-UK participants agree for their omics and associated trait, phenotype and health data to be deposited in public databases under open access. Despite the risks associated with sharing identifiable personal information, PGP-UK has received an enthusiastic response by prospective participants and even had to pause enrolment after more than 10,000 people registered interest within a month of launching the project. The rigorous enrolment procedure includes a single exam to document that the risks as well as the benefits of open data sharing have been understood by prospective participants and the first 1,000 have been allowed to fully enrol and consent.
Taking advantage of PGP-UK being a hybrid between a research and a citizen science project, we (the researchers and participants) describe here our initial findings from the pilot study of the first 10 participants and the resulting variant reports. Specifically, this includes the description of variants identified in the participants' genomes and methylomes as well as our interpretation relating to ancestry, predicted traits, self-reported phenotypes and environmental exposures. Citizen Science and Citizen Scientist has many definitions and facets [17]. As examples of citizen science, which we define here as activity that encourages members of the public to participate in research by taking on the roles of both subject and scientist [18], we describe the first three genome donations received by PGP-UK and we present GenoME, the first app developed as an educational tool for the lay public to better understand personalized and medical genomics. Fronted by four PGP-UK ambassadors, GenoME's audio-visual effects provide an intuitive interface to learn about genome interpretation in the contexts outlined above. Mobile apps have become the method of choice for the public to engage with complex information and processes such as navigation using global positioning systems, internet shopping/banking and a variety of educational and health-related activities [19]. The open nature of the PGP-UK data make them an attractive resource for investigating interactions between genomics, environmental exposures, health-related behaviours and outcomes in health and disease. For example, the MedSeq Project recently trialled the impact of whole-genome sequencing (WGS) on the primary care and outcomes of healthy adult patients and identified sample size as one of the limiting factors [20].

Ethics
The research conformed to the requirements of the Declaration of Helsinki, UK national laws and to UK regulatory requirements for medical research. All participants were informed, consented, subjected to an online entrance exam and enrolled as described on the PGP-UK sign-up web site (www.personalgenomes.org.uk/sign-up). The study was approved by the UCL Research Ethics Committee (ID Number 4700/001) and is subject to annual reviews and renewals. PGP-UK participants uk35C650, uk33D02F, uk481F67 and uk4CA868 all self-identified and consented for their names, photos, videos and data to be used in the manuscript and GenoME app.

Genome donations
Ethics approval for PGP-UK to receive genomes, exomes and genotypes (e.g. 23andMe) and associated data generated elsewhere was obtained from the UCL Research Ethics Committee through an amendment of ID Number 4700/001. Enrolment in PGP-UK is accepted as proof that prospective donors have been adequately informed and have understood the risks of holding and donating their genome and associated data. Equal to regular participants, donors agree for their data and associated reports to be made publicly available under open access by PGP-UK. Once a genome donation has been received, the data are processed and reports produced as for genomes generated by PGP-UK. Donors are also eligible to provide samples for the generation of additional data and reports as implemented here for 450K methylome analysis.

Participant input and communication
The interests of the pilot participants were considered in two ways. First, through personal telephone conversations at the beginning of the project and through in-person/ skype meetings at the completion stage of their genome and methylome reports. Examples of expressed interests and motivations for participating are included in the ambassador's videos in the GenoME app. Second, through social media and communications accounts on a range of public platforms (Twitter, Facebook, YouTube and WordPress; see LINKs, PGP-UK social media) which can be viewed, followed, and subscribed to by participants and members of the public. These media provide different platforms for participants to communicate their wishes or concerns and aim to increase public awareness of PGP-UK, personal genomics and citizen science, consequently engaging participants and members of the public further in developments relating to PGP-UK and encouraging involvement as citizen scientists. Through these platforms, updates on PGP-UK activity, research and events can be published directly to the public, alongside the sharing of additional content to inform and educate the public in relevant fields (e.g. personalised medicine). Examples of posts authored by participant citizen scientists can be viewed in the PGP-UK blog. The communication platforms are complemented by the PGP-UK YouTube channel as a more visual and engaging form of communication. Through Twitter and Facebook, short and accessible posts are published as frequent updates on PGP-UK which regularly link to the blogs and videos on other platforms, as well as relevant external content, for example journal articles, to promote further engagement in scientific progress.

Samples
Blood samples (2 × 4 ml) were taken by a medical doctor at the UCL Hospital using EDTA Vacutainers (Becton Dickinson). Saliva samples were collected using Oragene OG-500 self-sampling kits (DNA Genotek). All samples were processed and stored at the UCL/UCLH Biobank for Studying Health and Disease (http://www.ucl.ac.uk/ human-tissue/hta-biobanks/UCL-Cancer-Institute) using HTA-approved standard operating procedures (SOPs).

Data generation and analysis
Whole-genome sequencing (WGS) was subcontracted to the Kinghorn Centre for Clinical Genomics (Australia) and conducted on an Illumina HiSeq X platform. Illumina TruSeq Nano libraries were prepared according to SOPs and sequenced to an average depth of 30×. The sequenced reads were trimmed using TrimGalore (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) and mapped against the hg19 (GRCh37) human reference genome using the BWA-MEM algorithm from BWA v0.7.12 [21]. After removing ambiguously mapped reads (MAPQ < 10) with SAMtools 1.2 [22] and marking duplicated reads with Picard 1.130 (http://broadinstitute.github.io/picard/), genomic variants were called following the Genome Analysis toolkit (GATK 3.4-46; https://software.broadinstitute.org/gatk/) best practices, which involves local realignment around indels, base quality score recalibration, variant calling using the GATK HaplotyeCaller, variant filtering using the variant quality scoring recalibration (VQSR) protocol, and genotype refinement for high-quality identification of individual genotypes. Additionally, variants of phenotypic interest identified from SNPedia [23] that were not called using the above pipeline due to being identical to the human reference genome (homozygous reference variants), were obtained by preselecting a list of phenotypically interesting variants and requesting the GATK HaplotypeCaller to emit genotypes on these chromosomal locations. The WGS data (FASTQ and BAM files) have been deposited in the European Nucleotide Archive (ENA) under accession number PRJEB24961. The variant files (VCFs) have been deposited in the European Variant Archive (EVA) under accession number PRJEB17529 and linked to the Global Alliance for Genomics and Health (GA4GH) Beacon project (https://www.ebi.ac.uk/eva/ ?GA4GH) under the same accession number.
Whole-genome bisulfite sequencing (WGBS) was subcontracted to the National Genomics Infrastructure Science for Life Laboratory (Sweden) and conducted on an Illumina HiSeq X platform. Bisulfite conversion and library preparation were carried out using a TruMethyl Whole Genome Kit v2.1 (Cambridge Epigenetix, now marketed by NuGEN) and libraries sequenced to an average depth of 15×. The resulting FASTQ files were analysed using GEMBS [24]. As reported previously [25], WGBS on the Illumina HiSeq X platform is not straightforward as the data are of inferior quality to those that can be obtained on other HiSeq or NovaSeq platforms. In our case, the average unique mapping quality was 63.86% for paired-end (PE) and 86.18% for single-end (SE, forward) reads as assessed with GEMBS [24]. The WGBS data have been deposited in ENA under accession number PRJEB24961.
Genome-wide DNA methylation profiling was conducted with Infinium HumanMethylation450 (450K) BeadChips (Illumina). Genomic DNA (500 ng) was bisulfite-converted using an EZ DNA Methylation Kit (Zymo Research) and processed by UCL Genomics using SOPs for hybridisation to 450K BeadChips, singlenucleotide extension followed by immunohistochemistry staining using a Freedom EVO robot (Tecan) and imaging using an iScan Microarray Scanner (Illumina). The resulting data were quality controlled and analysed using the ChAMP [26,27] and minfi [28] analysis pipelines. The 450K data have been deposited in ArrayExpress under accession number E-MTAB-5377.
Smoking scores were generated using the method developed by Elliott et al. [29]. The smoking score was calculated using weighted methylation values of 187 well established smoking-associated CpG sites [30] and has been shown to accurately predict whether individuals are past/never or current smokers [29]. This study showed that a smoking score of more than 17.55 for Europeans, or more than 11.79 for South Asians indicated that an individual is a current smoker, while values below these thresholds indicate that individuals are past or never smokers. The threshold for classification of smoking status in European populations was applied to the smoking scores generated from both saliva and blood samples for the PGP-UK pilot project.
Epigenetic age was calculated using the multi-tissue Horvath clock [31], which uses the weighted average of 353 CpG sites associated with ageing to predict age and has been extensively validated [32]. Age acceleration and deceleration are calculated as the difference between chronological age and epigenetic age. As the epigenetic clock is accurate to within 3.6 years, individuals are considered to have age acceleration if their epigenetic age is > 3.6 years above their chronological age, and are considered to have age deceleration if their epigenetic age is > 3.6 years below their chronological age [31].
RNA sequencing (RNA-seq) was carried out on RNA extracted from blood using both targeted and whole RNA-seq. For targeted RNA-seq, library preparation was carried out using AmpliSeq (Thermo Fisher Scientific). A barcoded cDNA library was first generated with SuperScript VILO cDNA Synthesis kit from 20 ng of total RNA treated with Turbo DNase (Thermo Fisher Scientific), followed by amplification using Ion Ampli-Seq technology. Amplified cDNA libraries were QC-analysed using Agilent Bioanalyzer high sensitivity chips. Libraries were then diluted to 100 pM and pooled equally, with two individual samples per pool. Pooled libraries were amplified using emulsion PCR on Ion Torrent OneTouch2 instruments (OT2) following manufacturer's instructions and then sequenced on an Ion Torrent Proton sequencing system, using Ion PI kit and chip version 2.
For whole RNA-seq, the libraries were prepared from 20 ng of total RNA with Illumina-compatible SENSE mRNA-Seq Library Prep Kit V2 (Lexogen, NH, USA) according to the manufacturer's protocol. The resulting double-stranded library was purified and amplified (18 PCR cycles) prior to adding the adaptors and indexes. The final PCR product (sequencing library) was purified using SPRI (Solid Phase Reversible Immobilisation) beads followed by library quality control check, quantified using Qubit fluorometer (Thermo Fisher Scientific) and QC-analysed on Bioanalyzer 2100 (Agilent) and further quantified by qPCR using KAPA library quantification kit for Illumina (Kapa Biosystems). The libraries were sequenced on HiSeq 4000 (Illumina) for 150 bp paired-end chemistry according to manufacturer's protocol. The average raw read per sample was 36,632,921 reads and the number of expressed transcripts per sample was 25,182.
The RNA-seq data have been deposited in ArrayExpress under accession number E-MTAB-6523.

Private variants
We define single nucleotide variants (SNVs) as private (e.g. unique to individuals or families) in line with ACMG standards and guidelines [33] if the variant has not been recorded in any public database based on the Beacon Network (https://beacon-network.org/) after being corrected for batch effects. Such private SNVs were then additionally filtered to be coding and analysed with four orthogonal effect predictor methods CADD [34]), DANN [35], FATHMM-MKL [36] and ExAC-pLI [37] using default thresholds of 20, 0.95, 0.5 and 0.95, respectively to identify private SNVs with the highest possible confidence.

Generation of reports
The genome reports were generated using variant calls derived from the WGS data as described above. A whole genome overview of the variant landscape of each participant was obtained by running the Variant Effect Predictor (VEP) v84 [38] with hg19 (GRCh37) cache. The called variants were interpreted in conjunction with public data from SNPedia (as of 02-Aug-2018) [23], gnomAD v2.0.2 [37], GetEvidence (as of 10-Aug-2018) [10] and ClinVar (as of 10-Aug-2018) [39] for potentially beneficial and potentially harmful traits. A visual summary of the ancestry of each participant was obtained by merging the genotypes of each participant with genotypes from 2504 unrelated samples from 26 worldwide populations from the 1000 Genomes Project (phase 3 v20130502) [40] and applying principal component analysis on the merged genotype matrix. Population membership proportions were inferred using the Admixture v1.3.0 software [41] on the same genotype matrix.
The methylome reports were generated from the 450K data in conjunction with the epigenetic clock [31] for predictions on ageing and for predictions of exposure to smoking [29].

Data access
All data reported here are available under open access from the PGP-UK data web page (https://www.personalgenomes.org.uk/data) which provides direct links to the corresponding public databases. However, as it is increasingly difficult to transfer data to the user, even under open access, there is a growing need for the analytics to be moved to where the relevant data are being stored. This concept is being addressed by cloud-based computing platforms e.g. through public-private partnerships offering a variety of models from open to fee-based data access [42][43][44], and easy access to training in big data analytics such as the online DataCamp programme [45]. Therefore, the reported PGP-UK data can also be accessed free of charge for non-commercial use on the Seven Bridges Cancer Genomics Cloud (CGC) [46], where PGP-UK data are hosted alongside relevant analyses tools enabling researchers to compute over these data in a cloud-based environment (see Links).

GenoME app
GenoME was developed as an educational app for Apple iPads running iOS 9+. The app provides the public with a means to explore and better understand personal genomes. The app is fronted by four volunteer PGP-UK ambassadors, who share their personal genome stories through embedded videos and animated charts/information. All the features within the app that illustrate ancestry, traits and environmental exposures are populated by actual PGP-UK data from the corresponding participants. GenoME is freely available from the Apple App Store (https://itunes.apple.com/gb/app/genome/id1358680703?mt=8).

Data types and access options
To demonstrate the feasibility of citizen science-driven contributions to personalized medicine, we actively engaged the first 10 participants and first three Genome Donors in all aspects of this PGP-UK pilot study. Table 1 summarizes the matrix of 9 types of information (WGS, whole exome sequencing (WES), genotyping (e.g. 23andMe), 450K, WGBS, RNA-seq, Baseline Phenotypes, Reports and GenoME) which was generated for six categories (genome, methylome, transcriptome, phenotype, reports and GenoME app) and, where appropriate, the biological source from which the information was derived. The matrix comprises 103 datasets (~2.5 TB) which were deposited according to data type in four different databases (ENA, EVA, ArrayExpress and PGP-UK), as there was no single public database able to host all data under open access. While easy access is facilitated through the PGP-UK data portal (see Links), the time required to download all the data can present a challenge that is common to many large-scale omics projects. The time to download all the datasets from Table 1 using broadband with UK national average download speed of 36.2Mbps (according to official UK communication regulator Ofcom, 2017) would be more than 140 h, indicating that faster solutions are required. To address this, we joined forces with Seven Bridges Genomics Inc. (SBG), a leading provider of cloud-based computing infrastructure who pioneered such a platform for The Cancer Genome Atlas (TCGA). The Cancer Genomics Cloud (CGC, see Links) [46], funded as a pilot project by the US National Cancer Institute (NCI), allows academic researchers to access and collaborate on massive public cancer datasets, including the TCGA data. Researchers worldwide can create a free profile online or log in via their eRA Commons or NIH Center for Information Technology account to gain access to nearly 3 petabytes of publicly available data and relevant tools to analyse them. Following a successful trial and the open ethos of PGP, the data generated by the PGP-UK consortium for the first 13 participants are now easily accessible through the CGC for rapid, integrated and scalable cloud-based analysis using publicly available or custom-built pipelines.

Genome reports
While data are the most useful information for the wider research community, reports were the most anticipated and intelligible information for the PGP-UK participants themselves. Great consideration was given to the content and format of the reports, taking on board valuable feedback from individual participants and the entire pilot group. At all times, the participants were made aware that both the data and reports were for information and research use only and not clinically accredited or actionable.
For the reporting of genetic variants, we opted for strict criteria so that the reported variants are low in number but as informative as possible (see Methods). On average, this resulted in over 200 incidental variants being reported for possibly beneficial and harmful traits. Additional file 1 shows an exemplar genome report for participant uk35C650. In total, 4,105,373 SNVs were identified of which 97.5% were known and 2.5% (103,667 SNVs) appeared to be novel and thus private to this Table 1 Information matrix of the PGP-UK pilot study. Ticks [✔] indicate the types of information available for each of the participants, the colour code depicts the biological source from which the information was derived and boxing highlights the information provided via Genome Donations. Genotype data are from 23andMe but other formats can also be donated. WGS refers to whole-genome sequencing, WES to whole-exom sequencing, WGBS to whole-genome bisulfite sequencing and 450K to Infinium HumanMethylation450 BeadChip.
participant. Similar numbers were found for the other participants sequenced by PGP-UK, which is consistent with previous findings [40]. Of the known variants of participant uk35C650, 63 were associated with possibly beneficial traits (e.g. 9 SNVs associated with higher levels of high density lipoprotein (HDL) which is the 'good' type of cholesterol) and 217 with possibly harmful traits (e.g. 14 SNVs associated with Crohn's disease, according to previously published studies). Taking advantage again of the open nature of PGP-UK, we shared the reports among all 10 participants, which helped them to better understand the concept and meaning of beneficial or harmful SNV frequencies and distributions in the population. Since any genome report has the potential to uncover unexpected and even disconcerting information, the opportunity for participants to view other reports alongside their own provides context and reduces the likely anxiety if such reports are received and viewed in isolation. This was indeed confirmed as a positive aspect by all participants in the pilot study. In addition to learning about possibly beneficial and harmful variants, the participants were also interested to learn more about the 'novel' and potentially 'private' variants for which, by definition, nothing is yet known. This prompted us to investigate them in more detail.

Private variants
A definition of what we consider private variants is described in the Methods section. Additional file 2 shows the number of all, novel and private SNVs identified in 10 of the participants using the PGP-UK analysis pipeline and additional, more stringent filtering against all openly accessible resources (see Methods). While this approach is imperfect due to some variants being represented in different ways and therefore easily missed [47], this effort reduced the number SNVs that are likely to be private to < 20,000 per participant on average. To obtain a first insight into their possible functions we used multiple independent methods (see Methods and Additional file 2) to predict their effects. Of the 177,804 private SNVs identified, 29,558 (16.6%) passed the detection thresholds described in Methods and Additional file 3. As private SNVs cannot be validated in the traditional way, we used a Venn diagram (Fig. 1) to assess the level of concordance/discordance between the four orthologous methods used. Fourty seven SNVs were predicted to have significant impact by all four methods (Fig. 1), providing the highest level of confidence that these are novel SNVs affecting gene function. Finally, we mapped these 47 private SNVs to their respective coding exons to reveal the affected genes (Additional file 4). The majority (41 SNVs) were predicted to have moderate impact, one was predicted to have high impact and four were predicted to have a modifier impact (Additional file 4). See Definitions section for descriptions of moderate, high and modifier impact.

Methylome reports
There are currently no national or international policies or guidelines in place for the reporting of incidental epigenetic findings, including those based on DNA methylation [48,49]. We limited our reports to categories for which findings had been independently validated and replicated, including the prediction of sex [31,50], age [31] and smoking status [51]. Additional file 5 shows an exemplar methylome report and Additional file 6 summarizes our reported incidental epigenetic findings for the participants of the PGP-UK pilot. While the current methods for prediction of chronological age and sex are already well established and were highly accurate for all participants compared to the self-reported data, methods for an accurate prediction and interpretation of age deviation are still experimental. Averaged over two samples of different origin (blood and saliva), three of the thirteen participants showed significant age acceleration whereby the DNA methylation age is higher than the actual (chronological) age by more than 3.6 years, and three showed age deceleration (DNA methylation age is lower than the actual age by more than 3.6 years). Age deviation has already proved to be an informative biomarker. For instance, age acceleration has been reported to predict all-cause mortality in later life [52,53] as well as cancer risk [54] and age deceleration has been linked to longevity [55,56]. The final category which was reported back to participants was exposure to smoking. Epigenetic associations with environmentally mediated exposures are typically measured through epigenome-wide association studies (EWAS) [57]. Based on the analysis of DNA methylation in saliva and blood samples, all participants in the PGP-UK pilot study were predicted to be past or never smokers. The prediction was correct for 12 out of the 13 participants who self-reported as either past or never smokers. However, one participant (uk0C72FF) self-reported as an 'occasional smoker'. This aberrant prediction could be explained by the study population in which the threshold was set; the individuals considered 'current smokers' smoked a mean of 23 cigarettes per day for Europeans and 13 per day for South Asians. Consequently, very occasional smoking may not classify as 'current smoking' using the thresholds of Elliott et al. [29]. Another limitation is that the smoking score was tested in European and South Asian populations, thus it may be less accurate in other ethnicities.

GenoME app
To make genome and methylome reports more accessible and understandable to the lay public, we developed GenoME as a free and open source genome app for Apple iPads. The main purpose was to have actual people presenting real incidental findings in an innovative and engaging way. For that, we recruited four volunteers (ambassadors) from the pilot cohort who were willing to self-identify and share their personal genome story through embedded videos, specifically composed music and artistically animated examples of incidental findings from their genomes. To illustrate this, we selected two traits (eye colour and smoking status) for which we reported genetic and epigenetic variants, respectively. Figure 2 shows three screen shots of how SNVs associated with eye colour are communicated. Figure 2a shows one of the ambassadors and explanatory text in the left panel and a whirling cloud of colour representing all possible eye colours in the right panel. Figure  2b shows an intermediate state of the colours coalescing into the eye colour predicted by the SNVs for this participant and Fig. 2c shows the final stage of zooming in on the ambassador's actual eye colour for comparison with the predicted eye colour which was correct in this case. In GenoME, the sequence of screens is complemented by integrated music elements to enable people with compromised sight to experience genetic variation through sound. Figure 3 shows a similar sequence of three screens for the prediction smoking status based on epigenetic (DNA methylation) variants. In this case, a cloud of smoke coalesces into 'never/past' or 'current' smoker icons depending on the epigenetic profile of the participant. Other features (not shown) include variants associated with ancestry using an animated world map and disease using population-specific allele frequency graphics.

Discussion
In this study, we report the study design, data processing and findings of the PGP-UK pilot, and demonstrate the suitability of PGP-UK as a hybrid between a research and a citizen science project. For the latter, we enlisted 11 citizen scientists who made up a third of the named authors and contributed vitally to the assessment of our reporting strategy, features of GenoME and advocacy of citizen science in general. As part of our citizen science programme, PGP-UK encourages such interactions also on the international level through membership of the global PGP Network, DNA. Land [58] and Open Humans (see Links), a project which enables individuals to connect their data with research and citizen science worldwide.
The resource value of PGP-UK will become more apparent as more participants are enrolled and data released. Towards this, a second batch of data and reports has already been released (see Links) for another 94 participants and the ultimate goal is to eventually reach the 100 K participants mark for which ethics approval has been obtained. Considering the scale of past and on-going UK sequencing projects [4,59], we believe this is achievable especially though utilizing the genome donation procedure described here. In the meantime, the PGP-UK data also contribute to the global PGP resource. According to Repositive (see Links), a platform linking open access data across 49 resources, the global PGP network has collectively generated over 1,121 data sets. Additionally and in the N = 1 context of personalised medicine, each data set is of course highly informative in its own right [60].
To our knowledge, the methylome reports described here are the first of their kind issued for any incidental epigenetic findings. The open nature of PGP-UK makes it possible to explore appropriate frameworks and guidelines in a controlled environment [48,49]. Based on our experience, there is high interest and acceptance for adequately validated and replicated epigenetic findings to be reported back alongside genetic findings, particularly those that capture environmental exposures such as tobacco smoke, alcohol consumption and air pollution. Accordingly, we are now evaluating if any EWAS-derived variants are yet appropriate for inclusion. Another area of potential future interest is the prediction of a participant's suitability as donor in the context of transplant medicine [61]. Another innovation reported here is GenoME, an Fig. 3 Time series (a-c) of screens showing how GenoME communicates epigenetic SNVs associated with the participant's smoking status. PGP-UK participant uk35C650 self-identified and consented for his photos to be used app for exploring personal genomes. Apps can easily reach millions of people and thus are an ideal stepping-stone to engage with citizen science, which plays an important role in making personal and medical genomics acceptable to the public. A recent study -Your DNA, Your Sayconcluded that "Genomic medicine can only be successfully integrated into healthcare if there are aligned programmes of engagement that enlighten the public to what genomics is and what it can offer" [62]. This is particularly important as we reach the cusp of widespread implementation of genomic and personalised medicine. To gauge the level and types of possible concerns or problems experienced by participants as result of their participation in PGP-UK, our tracking system automatically prompts them for feedback twice a year. So far, we have received 1423 responses of which 1414 (99.3%) had nothing to report. Of the nine participants who did file a report, four reported interest expressed by family members and friends, three reported personal health-related problems and two simply acknowledged having been contacted. None of the 103 participants (including the 13 pilot participants) who have received their genome reports have filed a report or withdrawn consent. Of the 1105 participants who were so far allowed to consent and enrol, seven have withdrawn their consent and participation, five stating no reason, one stating too long waiting time as reason and one stated change of mind. Prior to the enrolment opening, PGP-UK experienced an email incident in 2014 which tested the resolve of those who had expressed interest in participating in an open project like PGP-UK.
Our study also highlighted some limitations. For instance, adequate public databases for multi-dimensional open access data, equivalent to dbGaP [63] or EGA [64] for controlled access data, are currently still lacking. Consequently, the PGP-UK data were submitted to multiple open access public databases, depending on type of data. Furthermore, most public databases are not built to host or enable downloading of TB-scale datasets and don't enable easy access and analysis without downloading the data. To overcome these current limitations, we made the PGP-UK pilot data also available in a cloud-based system [46]. The high level of automation implemented for the PGP-UK analysis pipeline allows updates to be generated and released as and when required and new reports (e.g. based on WGBS and RNA-seq data) to be added in the future. At the time of submission, over 100 genomes and associated reports had been generated, released (see Links) and deposited into public databases.

Conclusion
Our results demonstrate that omics-based research and citizen science can be successfully hybridised into an open access resource for personal and medical genomics.
The key features that allowed this were transparency and interoperability on the people and data levels, resulting in a degree of openness that is not generally found in medical research and thus provides an alternative to traditional research models. The introduction of the GenoME app and the framework for Genome Donations provide two novel modes for the public to engage with personal and medical genomics. and renewals. PGP-UK participants uk35C650, uk33D02F, uk481F67 and uk4CA868 all self-identified and consented for their names, photos, videos and data to be used in the manuscript and GenoME app.

Consent for publication
All authors have approved the manuscript for publication. In addition, PGP-UK participant uk35C650 has self-identified and consented for his photo to be used.