Skip to main content

Table 4 List of biological and sequencing features selected for downstream ML analysis

From: NeoMutate: an ensemble machine learning framework for the prediction of somatic mutations in cancer

Feature name Source Group Description
indel_or_snp BAM 3 Is the given variant a SNP, insertion or deletion?
ts_or_tv BAM 3 Transition or transversion
depth_TUM BAM 1 Coverage in tumor sample for the given variant position
alt_counts_TUM BAM 1 Alternative read counts (number of reads supporting the variant)
alt_avg_MQ_TUM BAM 2 Average mapping quality of reads containing the variant. Quantification of the probability that a read is misplaced.
alt_avg_BQ_TUM BAM 2 Average base quality of the reads containing the variant. Accuracy of a base sequenced by the sequencing machine.
alt_plus_TUM BAM 1 Number of reads on the plus/forward strand supporting the variant
alt_minus_TUM BAM 1 Number of reads on the minus/reverse strand supporting the variant
ref_plus_TUM BAM 1 Number of reads on the plus/forward strand supporting the reference allele
ref_minus_TUM BAM 1 Number of reads on the minus/reverse strand supporting the reference allele
VAF BAM 1 Variant allele frequency
depth_WT BAM 1 Coverage in normal sample for the given variant position
alt_counts_WT BAM 1 Number of reads supporting the variant in normal sample (germline risk)
ref_counts_WT BAM 1 Number of reads supporting the reference in normal sample
num_of_indels_closeby BAM 3 Are there indels closeby? (false positive risk factor)
GC_content BAM 3 Number of GC bases relative to the total number of bases located + − 20 bp for the given variant position
shannon_entropy BAM 3 A mathematical measure of the degree of randomness in a set of data. The smaller the entropy value, the less complex the sequence is.
detection_status VCF 4 Classification status (“somatic” or “non somatic”) for the given variant caller
“Tool”_F VCF 4 Quality tag in FILTER column (“PASS” or “non PASS”)
“Tool”_alt_counts VCF 1, 4 Number of reads supporting the variant reported by the specific tool
“Tool”_ref_counts VCF 1, 4 Number of reads supporting the reference reported by the specific tool