NeoMutate: an ensemble machine learning framework for the prediction of somatic mutations in cancer

BMC Medical Genomics

Table 5 Definition of selected performance metrics used for algorithm evaluation. The four variables present in a 2 × 2 contingency table: true positive (TP) (variants predicted and validated), true negative (TN) (variants not predicted and not validated), false positive (FP) (variants predicted but failed in validation), and false negative (FN) (variants not predicted but validated) are used to calculate the metrics and assess model performance

Metric	Formula	Definition
Accuracy	\( \frac{TP+ TN}{\left( TP+ TN+ FP+ FN\right)} \)	The ratio of correct calls out of the total number of positions.
Precision	\( \frac{TP}{\left( TP+ FP\right)} \)	The ratio of correct variant calls out of the total number of variant calls. Synonyms: Positive predictive value (PPV)
Recall	\( \frac{TP}{\left( TP+ FN\right)} \)	The ratio of correct variant calls out of the total number of variant positions. Synonyms: Sensitivity, true-positive rate (TPR).
False discovery rate (FDR)	\( \frac{FP}{\left( TP+ FP\right)} \)	The ratio of incorrect calls out of the total number of variant calls.
F1-Score	\( \frac{2\ast Precision\ast Recall}{\left( Precision+ Recall\right)} \)	Harmonic mean of precision and recall, where 1 is the best score and 0 the worst. Synonyms: F-score
Matthews correlation coefficient (MCC)	\( \frac{TP\ast TN- FP\ast FN}{\sqrt{\left( TP+ FP\left)\right( TP+ FN\right)\left( TN+ FP\right)\left( TN+ FN\right)}} \)	A measure of the quality of binary (two-class) classifications. The MCC represents the correlation coefficient between the observed and predicted binary classifications, where −1 indicates a completely wrong binary classifier while 1 indicates a completely correct classifier.

ISSN: 1755-8794