PhenoVar: a phenotype-driven approach in clinical genomics for the diagnosis of polymalformative syndromes

Background We propose a phenotype-driven analysis of encrypted exome data to facilitate the widespread implementation of exome sequencing as a clinical genetic screening test. Twenty test-patients with varied syndromes were selected from the literature. For each patient, the mutation, phenotypic data, and genetic diagnosis were available. Next, control exome-files, each modified to include one of these twenty mutations, were assigned to the corresponding test-patients. These data were used by a geneticist blinded to the diagnoses to test the efficiency of our software, PhenoVar. The score assigned by PhenoVar to any genetic diagnosis listed in OMIM (Online Mendelian Inheritance in Man) took into consideration both the patient’s phenotype and all variations present in the corresponding exome. The physician did not have access to the individual mutations. PhenoVar filtered the search using a cut-off phenotypic match threshold to prevent undesired discovery of incidental findings and ranked the OMIM entries according to diagnostic score. Results When assigning the same weight to all variants in the exome, PhenoVar predicted the correct diagnosis in 10/20 patients, while in 15/20 the correct diagnosis was among the 4 highest ranked diagnoses. When assigning a higher weight to variants known, or bioinformatically predicted, to cause disease, PhenoVar’s yield increased to 14/20 (18/20 in top 4). No incidental findings were identified using our cut-off phenotypic threshold. Conclusion The phenotype-driven approach described could render widespread use of ES more practical, ethical and clinically useful. The implications about novel disease identification, advancement of complex diseases and personalized medicine are discussed.


Background
Exome Sequencing (ES) allows simultaneous screening for variants in the coding portion of all genes present in a patient's genome.Over the last few years, ES has aided in the elucidation of the genetic basis of multiple genetic syndromes (for a review of some examples see Ku et al. [1]).The relatively low cost of ES and its' high diagnostic yield have stimulated discussion about its promising role in clinic [2][3][4].However, despite the unprecedented success of ES as a research tool, its utilization as a genetic screening test in clinic remains largely prohibitive due to challenges associated with consent, incidental findings, and the management of the massive amounts of data generated (see "Challenges of integrating ES in clinic" subsection).Furthermore, in many families there is a single affected individual available, which adds further complexity to the analysis of the results [5], unless the genetic variant responsible for the disease is not present in the parents.

Challenges of integrating ES in clinic
Adapted from Trakadis [6], published with the permission of author.To address these challenges, variant prioritization using bioinformatic tools (e.g.Berg et al. [7]; Berg et al. [8]) and practice guidelines/recommendations (e.g.Christenhusz et al. [9]; ACMG Policy statement on Genomic Sequencing, May 2012 [10,11]) have been suggested.These approaches, however, do not adequately address all the challenges summarized in the "Challenges of integrating ES in clinic" subsection (e.g.incidental findings, findings of uncertain clinical significance, risk for genetic discrimination, requirements in time & resources).Moreover, they are limited by the efficiency of the bioinformatic tools to accurately predict the clinical impact of different variants [12,13].At present, different tools often lead to opposite predictions about the functional impact of the same variant [14].Nonetheless, the ability of ES to facilitate diagnosis and inform therapy will likely lead to its premature introduction in clinic using an approach similar to the one followed for chromosomal microarray [15][16][17][18][19][20].
In the light of rapid developments in genomic technologies, medical genetics is shifting from the present "phenotype-first" medical model to a "data-first" model, which leads to multiple complexities.An alternative phenotype-driven approach was recently put forward [6].This approach, namely Individualized Mutationweighed Phenotype On-line Search (I-MPOS), aims to address the above mentioned issues and facilitate widespread clinical utilization of ES.We hereby present PhenoVar, a software consistent with this phenotype-driven approach, and provide preliminary evidence of its potential benefits.

Implementation
PhenoVar and phenotype-driven analysis of exome data Figure 1 summarizes the overall workflow of PhenoVar.In brief, PhenoVar automatically prioritizes diagnoses for validation based on both the phenotypic and genomic information of a proband.It calculates a patient-specific diagnostic score for each OMIM entry (Online Mendelian Inheritance in Man; http://www.ncbi.nlm.nih.gov/omim) with known molecular basis.The diagnostic score assigned to a given syndrome is the sum of its phenotypic and genotypic weight.

Calculation of phenotypic weight
For each syndrome listed in the Human Phenotype Ontology (HPO; http://www.human-phenotype-ontology. org) the phenotypic weight is determined by calculating the similarity between the proband and the different (simulated) patients available in a local database, as described below.
In order to compare the phenotype of a patient with an unknown diagnosis to phenotypes corresponding to known genetic syndromes, we simulated a large number of sample patients, hereafter referred to as simulated patients, using HPO and OMIM databases.For each syndrome listed in the HPO database, twenty to twenty-five simulated patients were randomly generated using the phenotypic traits corresponding to that diagnosis and the information was stored in a local database (Phenobase).On average, a total of 5 traits corresponding to the respective disease were assigned to each simulated patient.The probability of each trait to be present in the phenotype of a given simulated patient was chosen to be proportional to the prevalence of that trait in the respective disease, as available in the HPO database.Only simulated patients corresponding to a syndrome with a known molecular basis, according to the OMIM database, are considered in the subsequent steps of the analysis.
Each trait entered by the user for a given proband is analyzed by PhenoVar using the HPO ontology, a directed acyclic graph representing the relationships between the traits.For each node (trait) in the graph, previous (parent) nodes are more generic traits and forward (child) nodes are more accurate descriptions of the trait.This allows more flexibility in the terms used by the clinician, while still enabling recognition of similarity between the proband and each (simulated) patient in Phenobase.For a given syndrome, accurate and general hits are limited to one child or parent node, in relation with traits listed in HPO.The PhenoVar algorithm first compares the proband to all patients contained in Phenobase (currently including mostly simulated but also some real patients with known diagnoses) and calculates a phenotypic similarity weight for each patient in PhenoVar relative to the proband.For every syndrome (represented by different patients in Phenobase) the phenotypic similarity weights are summed and then averaged to obtain the final phenotypic weight for that syndrome.A higher weight correlates with a higher likelihood that the proband is affected by this syndrome, based on phenotype only.
The following formula summarizes the details of this process: Where nbPatient is the number of simulated patients with the same syndrome; nbHit i is the number of traits shared between the i th simulated patient and the patient; nbAccurateHit i is the number of traits in the patient that correspond to more accurate versions of a trait in the i th simulated patient.nbGeneralHit i is the number of traits in the patient that correspond to more general versions of a trait in the i th simulated patient; nbMiss i is the number of patient traits not matched with the i th simulated patient; KH, KA, KG and KP are constant parameters which were determined based on an independent cohort of test patients (KH, KA and KG =2, KP = 1).
Figure 1 Workflow of PhenoVar.PhenoVar automatically prioritizes diagnoses for validation based on both the phenotypic and genomic information of a proband.It calculates a patient-specific diagnostic score for each OMIM entry with known molecular basis.The diagnostic score assigned to a given syndrome is the sum of its phenotypic and genotypic weight.For each syndrome listed in the HPO database the phenotypic weight is determined by calculating the similarity between the proband and the different patients available in a local database (Phenobase).Phenobase includes simulated patients using HPO and real patients (here denoted as "local patients").The genotypic weight for each syndrome corresponds to the (predicted) pathogenicity of any variants present in the proband's exome specifically in the gene(s) causing the respective syndrome.When no variation is found in these genes, the genotypic weight for that syndrome is automatically set to null value.Otherwise, the variants are sorted into known disease-causing variants (DC var) versus possibly pathogenic variants (other var) and assigned a different score.The genotypic weight and phenotypic weight described above are summed to obtain the diagnostic score for each syndrome.The different syndromes are then ranked according to their diagnostic score.

Calculation of genotypic weight
The genotypic weight for each syndrome corresponds to the (predicted) pathogenicity of any variants present in the proband's exome specifically in the gene(s) causing the respective syndrome.Hence, the genotypic weight is generated in parallel for each syndrome using the proband's exome VCF (Variant Call Format) data file.When no variation is found in these genes, the genotypic weight for that syndrome is automatically set to null value.Predetermined values (weight) are assigned to each variation in the VCF file, according to SNPEff annotation or known disease-causing status (ClinVar, HGMD).Two different models are used in the present paper.In the first model all filtered variations are assigned by PhenoVar the same weight (arbitrary value of 1000), irrespective of zygosity.The final genotypic weight for each syndrome corresponds to the greatest variation weight across all genes causing the disorder.When no variation is found in these genes, the weight is automatically set to null value.
The only difference in the second model is that the variants are filtered and sorted in two groups: known disease-causing variants (group 1) versus possibly pathogenic variants (group 2).The disease-causing variants listed in ClinVar and HGMD, as described above, are classified in group 1, while the genetic variations with moderate or high functional impact based on SNPEff predictions (i.e.frameshift, missense, non-sense and splice sites) are classified in group 2. In this model, variants in groups 1 and 2 are arbitrary assigned a weight of 1000 and 100, respectively, and the remaining variants a null value.

Phenotype and genotype score integration, ranking of possible diagnoses and filter for incidental findings
The genotypic weight and phenotypic weight described above are summed to obtain the diagnostic score for each syndrome.The different syndromes are then ranked according to their diagnostic score.The syndrome with the highest diagnostic score represents PhenoVar's prediction of the most likely diagnosis.An option to filter the ranked syndromes based on the suspected mode of inheritance is also available.Finally, using an empirically determined phenotypic threshold, disorders unrelated to the proband's phenotype were filtered out.
To determine this phenotypic threshold (cut-off: 0.9705), simulated patients whose exome VCF files were modified to include a pathogenic variant corresponding to their diagnoses but also an incidental finding were used.

Test-patients selection and preparation of variants files
Ten test-patients with different polymalformative genetic syndromes were randomly selected from previously published case reports (patients 1a to 10a in Table 1).For each patient selected, the mutation, phenotypic data, and genetic diagnosis were available in the published manuscript.Next, the VCF files of 10 control exomes were obtained from the National Institute of Environmental Health Sciences (NIEHS) Environmental Genome Project (EGP) (http://evs.gs.washington.edu).These control exome VCF files were modified so that each of them subsequently included the disease causing variant corresponding to one of the test patients (1a to 10a) in Table 1.
Next, ten genetic syndromes whose phenotypic features had a documented prevalence in HPO were identified.Each of these syndromes was then searched in OMIM for previously published case reports and one patient representing each syndrome was selected from the literature (1b to 10b in Table 1).The ten original (unmodified) control exome-files were now modified so that each of them subsequently included the disease causing variant corresponding to one of the test-patients 1b to 10b in Table 1.
Each test patient was thus assigned a specific exome VCF file modified to include his/her disease causing variant.The resulting files were first annotated using SNPEff (version 2.0.5) for variation functional impact, and then for known disease-causing variants as classified in ClinVar database (clinical significance = "4", probable pathogenic or "5", pathogenic) and in the professional version of Human Gene Mutation Database (HGMD) (Disease mutation -"DM" variants).The files were further filtered to exclude nondisease-causing intronic or synonymous variants, as well as variations with >5% frequency listed in dbSNP (build 135).
A medical geneticist, "blinded" to the diagnoses of the test-patients, was provided with the clinical description of the twenty patients.Based on our previous experience PhenoVar performs best when three or more traits are used.Using terms in HPO, the geneticist selected for each case three traits that he perceived as significant and more specific.Subsequently, for each patient, he introduced the selected terms along with the respective modified exome VCF file in the web-based interface of PhenoVar.The results obtained were analyzed for the position of the correct diagnosis, by the members of the team aware of the diagnoses, to test the efficiency of PhenoVar.
Next, the medical geneticist, while still blinded to the diagnoses, was asked to select different keywords and ensure that 2-3 of the keywords selected were present in Phenobase and the analysis was repeated.Of note, the number of matching traits from the traits entered is evident after each analysis with the software.

Incidental findings
For each test-patient, we reviewed genes known to be responsible for mendelian disorders which harboured previously reported diseases-causing variants (ClinVar, HGMD) or variants predicted to be likely pathogenic (non-sense, frameshift, consensus splice site).To report incidental findings, we focused mostly on the ACMG minimal list [11].Moreover, the modified control exome VCF files mentioned above, which were further modified by introducing a known BRCA1 pathogenic variant (incidental finding) in each case, were analyzed.Entering this VCF file in PhenoVar along with corresponding set of 3 phenotypic traits summarized for each case in Table 1 allowed for testing the phenotype filter for incidental findings.) [40] Table 1 summarizes the characteristics of the test-patients selected from the literature.The first column lists the identification number assigned to each patient.The phenotypic traits selected by the medical geneticist "blinded" to the diagnoses and the reference articles are listed in the second column.The affected gene, exact mutation, and corresponding diagnosis for each test-patient are also included in this table.

Analysis of real patients with unknown genetic disorders
To illustrate that PhenoVar can be with data from real patients, four patients with multiples congenital anomalies, previously diagnosed via exome sequencing to have a known mendelian disorder, were used.With regards to exome sequencing, DNA libraries were prepared for each patient (TruSeq, Illumina), followed by target enrichment (Agilent SureSelect All Exon kit v4) and sequencing on a HiSeq2000 (Illumina) with 3 exomes per lane, giving an average coverage of ~100X.Analysis of sequencing data was done with the GATK v2 package as per the recommendations of the Broad Institute [41].SnpEff 2.0.5 (GRCh37/hg19) was used for variant annotation and ClinVar database for identification of known pathogenic variants.The variants identified were filtered using a cut off of 1% for minor allele frequency (dbSNP database built 137 and local control exomes).From the remaining variants, the ones predicted to alter amino acid sequences or consensus splice sites junctions, which were determined to not be tolerated by Polyphen2 or SIFT software, were manually reviewed under the supervision of a medical geneticist (SL).After confirming the diagnosis, the data analysis was repeated for these patients using PhenoVar, as illustrated in Table 2.

Results
Table 1 summarizes the information of the patients selected from the literature, including only the phenotypic traits selected by the medical geneticist blinded to their diagnoses.The genetic syndromes represented in Table 1 include both autosomal recessive and dominant conditions.Table 3  Intellectual disability

Microcephaly Seizures
Table 2 summarizes four examples illustrating that Phenovar can be used with real patients data.The first column lists the identification letter assigned to each patient.The phenotypic traits used when running PhenoVar are listed in the second column.The next three columns denote the affected gene, exact mutation, and corresponding diagnosis (as determined after standard analysis of all the data, i.e. without using PhenoVar) for each patient.Finally, the last column indicates the ranking assigned by PhenoVar to the correct diagnosis.
possible diagnoses.The correct diagnosis was not ranked in the top 20 diagnoses in only two patients.When using PhenoVar's option to automatically assign a higher weight to variants known, or bioinformatically predicted, to cause disease, PhenoVar's diagnostic yield increased to 14/20, with the correct diagnosis ranking in the top 4 highest ranked diagnoses in 18/20 patients.More specifically, in the second set of patients (1b-10b) the diagnosis was successfully predicted by PhenoVar in 8/10 patients and ranked within the top 4 diagnoses in the other two patients (Table 3, Column 5).Optimization of the selected keywords to ensure that two or three traits from the ones used were present in Phenobase further improved the diagnostic yield: PhenoVar successfully identified the correct diagnosis in 17 out of 20 patients, while 20/20 were ranked in the top 3 (data not shown).
With regards to incidental two findings unrelated to the presenting complaint were identified when manually analyzing the VCF files of the test-patients.One of the variants was previously reported to cause Lynch syndrome (MLH3) and the other to cause Renal cell carcinoma/MODY type 3 (HNF1A).These incidental findings were not identified when using PhenoVar's cut-off phenotypic threshold.Furthermore, when repeating the Phenovar analysis for all patients using the VCF files which had been further modified to include a variant known to cause the BRCA1 cancer syndrome, this incidental finding was not identified using our cut-off phenotypic threshold.
Table 2 demonstrates that PhenoVar can also be used with real patients' data.All four patients' diagnoses (previously identified by standard bioinformatics analysis of exome sequencing results) were ranked highly by PhenoVar.Moreover, the two incidental findings known to be present in the real patients (specifically, a diseasecausing variant for pigmented nodular adrenocortical disease, OMIM 610475, and another one in BRCA2, found in patients B and D, respectively) were not found using our cut-off phenotypic threshold.

Discussion
With conventional approaches multiple genetic tests are typically required before a molecular diagnosis is reached.The first column in this table lists the identification number assigned to each test-patient.The number of variants with global minor allele frequency (GMAF) of less than 5% present in the modified exome assigned to each patient is highlighted in the second column.The next three columns denote the position of the correct diagnosis for each patient, as ranked by Phenovar using some of its different options: first solely based on the selected phenotypic traits of the respective patient (third column); next, by integrating the phenotypic traits and variants present in the exome of the patient: while assigning the same weight to all variants (fourth column); and finally, by assigning a higher weight to mutations known or predicted to cause disease (fifth column).The last column indicates how many of the traits selected by the medical geneticist "blinded" to the correct diagnoses matched any traits in Phenobase.*Mutation annotated incorrectly (please refer to discussion).
This leads to increased cost and time Widespread use of clinical ES could accelerate genetic diagnosis to an unprecedented scale at low cost.PhenoVar prioritizes diagnoses (mendelian disorders whose molecular bases are known) for validation based on both the phenotypic and genomic information of a proband.It was shown to perform very well with a limited number of phenotypic traits being used (three traits).Our data demonstrate that, by taking into consideration both the patient's phenotype and encrypted exome data, the correct diagnosis for patients with different clinical presentations was prioritized more efficiently than relying solely on the patient's phenotype (as seen when comparing columns 3 and 5 in Table 3).This was true when using different variants databases (ClinVar or HGMD, data not shown) or, to a lesser but still significant degree, when assigning the same weight to all variants present in an exome.Selecting PhenoVar's option to assign the same weight for all variations present in a patient's exome, rather than prioritizing the known pathogenic variants, has an important advantage: it minimizes the impact that erroneous variant classification, as benign or pathogenic, has on the efficiency of the software.
Interestingly, PhenoVar performed equally well for the cases where the phenotypic keywords selected were not specific for the correct diagnosis.For instance, in the case of patient 3b, although the phenotype-based ranking was very poor (Table 3, Column 3), the final ranking by PhenoVar was not compromised (Table 3, Columns 4 and 5).This suggests that PhenoVar will be particularly helpful in the unfortunate occasions where an important clinical trait is missed during the genetic evaluation or when dealing with atypical presentations of known genetic syndromes.In turn, diagnosing more patients with atypical presentations of known genetic syndromes would potentially help to better define the spectrum of clinical characteristics of these conditions.Additional studies using real patient data, in collaboration with individuals who will be using this software, need to be prospectively performed to achieve this goal but also to further validate this analysis tool.Table 2 summarizes four examples of real patients whose diagnosis was facilitated using PhenoVar.This table illustrates that PhenoVar can be used with real patient data.However, a follow-up study using a large cohort of patients will be needed to evaluate how effective PhenoVar is in prioritizing the correct diagnosis in a real clinical setting.
One of PhenoVar's major advantages it that it optimizes prioritization of possible diagnoses taking into consideration the patient's exome data without requiring an increase in the bioinformatics human resources available in the clinical setting.This could potentially allow for a widespread use of ES in clinical practice, as a screening test for known mendelian conditions.Moreover, through the optional use of a cut-off phenotypic weight threshold, the clinician can focus the analysis on the genetic causes which can potentially explain the specific phenotype/medical-issue at hand, thus preventing the undesired discovery of incidental findings.As a result, the approach described simplifies pre-test counselling and informed consent for exome sequencing as a clinical screening tool.It does not contradict but rather complements the bining approach previously put forward [7,10].Widespread usage of ES in clinic will help evaluate the significance of different variants, including their penetrance/expressivity.It will thus aid in the identification of appropriate genes to target for screening [42,43], as well as, help improve the interpretation of incidental findings of interest to the patient.
As illustrated by cases 5a, 6a, 8a (Table 3) correct gene annotation (e.g.correct exon/intro borders) is crucial for the software to run properly, albeit, this limitation is not specific to PhenoVar, as it affects similarly the currently standard ES data analysis.Moreover, to optimize the efficiency of the software, the HPO database and Phenobase need to be properly curated.For instance, including in HPO the prevalence of the phenotypic features for different genetic syndromes has a significant impact on the efficiency of the software, as illustrated by PhenoVar's efficiency in patients 1b-10b (Table 3).Including more real patients in Phenobase will, in time, overcome this problem.Also, an option to explore the presence or absence of a given variant in affected/unaffected family members (based on simultaneous comparison of their encrypted genomes) could allow for adjusting the weight assigned to different variants and further improve PhenoVar's efficiency.Finally, the VCF files in our study were filtered to include variations with >5% frequency listed in dbSNP (build 135).The filter was intentionally set higher than usual to illustrate the efficiency of PhenoVar.However, since most genetic conditions are rare (low carrier frequency), one could opt to use a filter with a lower threshold (e.g.1%).This would filter out more benign variants and thus improve the efficiency of PhenoVar.
The proposed approach follows the existing "phenotypefirst" medical model and allows for better prioritization of the genes to be tested in a clinical lab.It is particularly useful in phenotypes caused by multiple different genes (e.g.evaluation of global developmental delay).Recent studies have provided evidence for the high diagnostic yield of exome sequencing [44][45][46].Using ES as a screening test can increase the diagnostic yield of a clinical evaluation in a cost-effective fashion and decrease the time to diagnosis [2][3][4]47,48].If used properly, PhenoVar can help address many of the challenges associated with integrating genomic technologies into clinical practice (see "Challenges of integrating ES in clinic" subsection).It remains the responsibility of the physician to seek confirmatory clinical diagnostic test targeting the suspected diagnosis and, for the unresolved cases, to clinically prioritize testing using the whole spectrum clinical genetic testing modalities available.

Future directions
In the future, Phenobase can be expanded to incorporate special databases containing phenotypic and genomic data of real patients [49,50], thus enabling encrypted online data sharing from consenting individuals.By including both patients with known and not-yet identified diagnoses in such interconnected encrypted databases, the method described above may have numerous benefits.First, when a large cohort of real patient data is made available in Phenobase, PhenoVar's diagnostic yield for known syndromes would likely improve.This would be particularly true when atypical features are present.Secondly, continuously upgrading the database could allow the software to evolve and contribute in the identification of new syndromes as illustrated in Figure 2. In brief, a patient evaluated in clinic (target patient) could be automatically matched by the software with patients in the database based on phenotypic similarity.The software could then perform an automatic comparison of the exomes of the matched patients and that of the target patient.When the target patient's exome file contains a variant shared by the phenotypically matched patients in the database whose diagnosis is known, his/her diagnosis could be inferred.In cases where the diagnosis of the phenotypically matched patients in the database is not yet known, an identical variant shared by these patients and the target patient could point to a candidate gene explaining their shared phenotype, a potentially newly described syndrome.This can be achieved while maintaining the genomic Figure 2 Future directions about Phenovar or similar software using the I-MPOS approach and data from real patients.A database containing phenotypic and encrypted genomic information of real patients with known or not-yet identified diagnoses can be made available (1).A patient with an unknown diagnosis presents in clinic.His encrypted ES data are obtained and his phenotype is assessed (2 α, β).The software automatically searches the "Encrypted Patients Database" using target patient's assessed phenotype (3) thereby providing a first ranking of possible genetic conditions based on "phenotype weight" (4), (5).For all patients in the database meeting a specific phenotype-similarity threshold in relation to the proband, the software will compare the changes present in their genomes against the ones present in the genome of the patient seen in clinic (6).Matching the proband with the phenotypically similar subjects in the database based on similarity of their genetic changes ("mutation weight") forms the basis of adjusting the first ranking to calculate the second ranking (6), (7).Subjects sharing adequate phenotypic characteristics who also share a genetic variant cluster together.As a result, a given match is indicative of the possibility that the target subject shares the same genetic condition with the matched other subject(s).After the second ranking, the information about the shared phenotype and genotype of the patients clustering together is accessible and can aid in reaching the diagnosis.It should be noted that "phenotype" (steps 2-5) is not limited to clinical traits but also refers to other levels of phenotype, such as a metabolomic profile.Also, the word "mutation" (steps 6, 7) can refer to variants in more than one genetic loci which are simultaneously present in all matched patients allowing one to explore the possibility of gene-gene interaction.
information of all participating patients in the database encrypted at all times.
an approach would also have implications about the delineation of heterogeneous complex genetic diseases with high heritability (e.g.schizophrenia [51][52][53][54]) into more homogenous endophenotypes based on subgroups of patients present in the database.Finally, the software could ultimately evolve to use, besides clinical traits, other levels of phenotypic information (e.g.metabolomic, transcriptomic, miRNomic data) when matching the target patient with patients in the database.The metabolome and/or other levels of phenotype, which can be accurately quantified and followed over time, constitute the downstream effect of unknown gene-gene or geneenvironment interactions.If the metabolomic profiles of the patients in the database and the target patient were made available, such a software could integrate this information during the phenotype matching step.This approach would thus indirectly factor in differences in the genomic backgrounds and environmental exposures (both potentially influencing the pathogenic role of a specific shared variant).Hence, the affected patients with the shared variant would be prioritized, accounting for variations in penetrance and/or expressivity of different genetic conditions.As a result, this approach could facilitate screening, even in the newborn period, for genetic diseases whose biochemical phenotype (e.g.metabolomic profile) precedes the clinical presentation.Similarly, in time, such a tool could potentially be used at regular intervals in a patient's lifetime through routine visits to a general clinic and facilitate the transition towards a more personalized practice of medicine [55].

Conclusions
PhenoVar follows the existing "phenotype-first" medical model and facilitates the diagnostic approach by taking into consideration both the patient's phenotype and all variations present in his exome, when ranking possible diagnoses (see Additional file 1).It is particularly useful in phenotypes caused by multiple different genes (e.g.evaluation of global developmental delay).Besides addressing many of the challenges associated with integrating genomic technologies into clinical practice, it can potentially provide in the future the infrastructure needed to further advance these tools safely and effectively.

Table 1
Characteristics of the test patients selected from the literature [31] and fibula absence of with severe limb deficiency (276820) Aplasia/hypoplasia of the fibula Aplasia/hypoplasia of the ulna (Woods et al.)[31]

Table 1
Characteristics of the test patients selected from the literature (Continued)

Table 3 ,
Column 4), PhenoVar predicted the correct diagnosis in 10 out of 20 patients, while in 15 out of 20 the correct diagnosis was among the 4 highest ranked

Table 2
Four real patients analyzed by PhenoVar

Table 3
Diagnosis prediction for test-patients using PhenoVar