Skip to main content

Table 4 List of biological and sequencing features selected for downstream ML analysis

From: NeoMutate: an ensemble machine learning framework for the prediction of somatic mutations in cancer

Feature name

Source

Group

Description

indel_or_snp

BAM

3

Is the given variant a SNP, insertion or deletion?

ts_or_tv

BAM

3

Transition or transversion

depth_TUM

BAM

1

Coverage in tumor sample for the given variant position

alt_counts_TUM

BAM

1

Alternative read counts (number of reads supporting the variant)

alt_avg_MQ_TUM

BAM

2

Average mapping quality of reads containing the variant. Quantification of the probability that a read is misplaced.

alt_avg_BQ_TUM

BAM

2

Average base quality of the reads containing the variant. Accuracy of a base sequenced by the sequencing machine.

alt_plus_TUM

BAM

1

Number of reads on the plus/forward strand supporting the variant

alt_minus_TUM

BAM

1

Number of reads on the minus/reverse strand supporting the variant

ref_plus_TUM

BAM

1

Number of reads on the plus/forward strand supporting the reference allele

ref_minus_TUM

BAM

1

Number of reads on the minus/reverse strand supporting the reference allele

VAF

BAM

1

Variant allele frequency

depth_WT

BAM

1

Coverage in normal sample for the given variant position

alt_counts_WT

BAM

1

Number of reads supporting the variant in normal sample (germline risk)

ref_counts_WT

BAM

1

Number of reads supporting the reference in normal sample

num_of_indels_closeby

BAM

3

Are there indels closeby? (false positive risk factor)

GC_content

BAM

3

Number of GC bases relative to the total number of bases located + − 20 bp for the given variant position

shannon_entropy

BAM

3

A mathematical measure of the degree of randomness in a set of data. The smaller the entropy value, the less complex the sequence is.

detection_status

VCF

4

Classification status (“somatic” or “non somatic”) for the given variant caller

“Tool”_F

VCF

4

Quality tag in FILTER column (“PASS” or “non PASS”)

“Tool”_alt_counts

VCF

1, 4

Number of reads supporting the variant reported by the specific tool

“Tool”_ref_counts

VCF

1, 4

Number of reads supporting the reference reported by the specific tool