Global-scale GWAS associates a subset of SNPs with animal-adapted variants in M. tuberculosis complex

Background While Mycobacterium tuberculosis complex (MTBC) variants are clonal, variant tuberculosis is a human-adapted pathogen, and variant bovis infects many hosts. Despite nucleotide identity between MTBC variants exceeding 99.95%, it remains unclear what drives these differences. Markers of adaptation into variants were sought by bacterial genome-wide association study of single nucleotide polymorphisms extracted from 6,362 MTBC members from varied hosts and countries. Results The search identified 120 genetic loci associated with MTBC variant classification and certain hosts. In many cases, these changes are uniformly fixed in certain variants while absent in others in this dataset, providing good discriminatory power in distinguishing variants by polymorphisms. Multiple changes were seen in genes for cholesterol and fatty acid metabolism, pathways previously proposed to be important for host adaptation, including Mce4F (part of the fundamental cholesterol intake Mce4 pathway), 4 FadD and FadE genes (playing roles in cholesterol and fatty acid utilization), and other targets like Rv3548c and PTPB, genes shown essential for growth on cholesterol by transposon studies. Conclusions These findings provide a robust set of genetic loci associated with the split of variant bovis and variant tuberculosis, and suggest that adaptation to new hosts could involve adjustments in uptake and catabolism of cholesterol and fatty acids, like the proposed specialization to different populations in MTB lineages by alterations to host lipid composition. Future studies are required to elucidate how the associations between cholesterol profiles and pathogen utilization differences between hosts and MTBC variants, as well as the investigation of uncharacterized genes discovered in this study. This information will likely provide an understanding on the diversification of MBO away from humans and specialization towards a broad host range. Supplementary Information The online version contains supplementary material available at 10.1186/s12920-023-01695-5.


Background
The Mycobacterium tuberculosis complex (MTBC) has afflicted human and animal health since the dawn of civilization.This ancient pathogen, typified by M. tuberculosis variant tuberculosis (MTB), infects humans primarily and is considered specialized for this niche [1,2].Its subversion of host immune responses, dormancy in granulomas for years or decades, and transmissibility suggest fine adaptation to humans, potentially to perlineage adaptation to different human populations [3,4].MTB infects non-human hosts, including primates, and other animals (such as cattle) more infrequently [5,6].On the other hand, M. tuberculosis variant bovis (MBO) is a generalist pathogen -its host range includes foxes, seals, cattle, cervids, lions, dogs, mustelids, badgers, and others [1,[7][8][9][10].The eponymous bovine reservoir is one of several for MBO [10], including white-tailed deer, elk, or bison in the US and Canada [9,11,12], red deer and wild boar populations in Spain [7], European badgers in the UK and Ireland [13,14], and possums in New Zealand [14].Despite host range differences, MTBC variants show a rigid population structure [15].From the initial whole genome sequencing (WGS), researchers were surprised to find MTB and MBO shared 99.95% nucleotide identity, excluding genomic deletions in MBO [16].Within a few years of the first MBO genome being sequenced, research began on what might drive variant differences, including gene expression (17), omics analysis [8], and metabolism [18,19], among others.Meaningful variations have been reported, but it remains uncertain how MBO has evolved towards a generalist lifestyle away from a presumed MTB-like specialist ancestor.
To help address this gap in knowledge, WGS datasets were collected for MTBC variants from diverse hosts and countries.Paired-end read SRA datasets with metadata including country and host of isolation, and MTBC variant (n = 6,360 taxa, plus reference and outgroup) were used to create a set of 9,755 SNPs for a bacterial genomewide association study (bGWAS).This sought to detect loci associated with classification as MTB or MBO, as well as any detectable host-specific markers (e.g., SNPs associated with isolation from cervids).Using RAxML-NG [20], prewas [21], and TreeWAS [22], bGWAS was performed with isolates classified by MBO (1) or not (0).

Results
Core SNP extraction was successful for 6,362 isolates.Isolates were from 27 countries (Fig. 1A); included 2,096 MTB (including reference), 4,105 MBO, 152 variant caprae, and 8 variant orygis (Fig. 1B); and across 30 hosts (Fig. 1C).We additionally added M. canettii as the MTBC outgroup (n = 1).All sequence data used in this project are publicly available through NCBI and ENA.Accession IDs for all data are recorded in the table Additional file 1, with BioProjects in Column A, and corresponding SRA identifiers in Column B per sequence.The masked core SNP alignment was used for phylogenetic tree generation by RAxML-NG (Additional file 2), which shows splits based on MTBC variant, but is only used for GWAS and not intended for visualization due to its scale.The unmasked core SNP alignment is also provided for reference (Additional file 3).After prewas and maximum likelihood-based ancestral reconstruction, a final set of 7,524 variants over the 6,362 taxa was used for treeWAS input.
TreeWAS runs three tests for statistical significance -the terminal, simultaneous, and subsequent tests.The terminal test identifies broad associations between genotype and phenotype looking only at terminal nodes in the tree; the simultaneous test more stringently identifies deterministic relationships of genotype and phenotype, without necessitating the relationship occur at all branches; the subsequent test utilizes the terminal test but adds ancestral state reconstruction to analyze all nodes of the tree [22].A thorough explanation of these tests is provided by Dr. Collins on the treeWAS GitHub page [23].The simultaneous and subsequent tests were used initially, as an ancestral reconstruction was available.When analyzing by phenotype of MBO (1) and Not MBO (0), treeWAS produced significant loci for both simultaneous and subsequent statistical scoring metrics (Fig. 2).Analysis by phenotype of Bovidae also produced the same 120 significant loci by the same subsequent test but did not produce loci for the simultaneous test (Fig. 3).Another search by phenotype of Homo sapiens produced the same 120 loci again by subsequent test and no loci by simultaneous test (Fig. 4).A phenotype of "non-standard hosts" (where Homo sapiens is the standard host for MTB, Bovidae for MBO, Capra for MCP, and Oryx for MOR) yielded no significant hits (Additional file 4).Analysis by phenotype Cervidae returned no significance (Additional file 5), but phenotype Meles meles (European badger) showed an unusual pattern by subsequent statistical test where nearly all SNPs clustered just around the significance cutoff, yielding hundreds of loci technically exceeding the cutoff yet are tightly clustered with those under it (Additional file 6A).The simultaneous score did not similar hits (Additional file 6B).A similar pattern was seen for Sus (Additional file 7).To investigate if these associations reflected geographic effects for Meles meles, as all badger samples were from the UK, GWAS analysis was performed by phenotype of UK origin, which yielded no significance by the subsequent test, and a single locus by simultaneous test (Additional file 8).This hit, for a variant present in only two isolates, is a spurious result.For Meles meles and Sus, it would appear the dataset composition makes it difficult to detect significance against a background already tightly associated with specific genotypes.MCP produced no significant hits (Additional file 9), and MOR was not attempted due to low representation on M. orygis samples in the dataset (n = 8).
The 32 loci identified by the simultaneous test for MBO are listed in Table 1.
As mentioned for the spurious hit for UK samples, the simultaneous test can report loci as associated even if a SNP is only present in a few isolates.These false positives are included in the data table but are marked by bolding to indicate they are not meaningful.Subtracting these spurious hits, the simultaneous test identifies 22 loci, all of which are also identified by the subsequent test.The 120 loci concordantly identified as associated by Bovidae, MBO, and Homo sapiens by the subsequent test are listed in Table 2.
The subset of loci called by both tests in MBO (n = 22) are presented in Table 3.
After GWAS, several apparent genotypic edge cases arose.FadD11 was highlighted as significantly associated by a single non-synonymous variant, FadD11 L286S, which appeared fixed in MBO and MCP, while MTB and MOR showed WT nearly exclusively.Of 4,105 MBO isolates and 152 MCP isolates, only 1 MBO isolate showed WT at this position, suggesting a reversion in this genome.Likewise, of 2,087 MTB and 8 MOR isolates, only 9 MTB isolates showed L286S.These genotypic exceptions were checked further: a Chinese cattle isolate SRR16278270 for the 1 MBO outlier, and 9 UK MTB isolates from humans for the MTB outliers (Table 4).
These 10 isolates' VCF files were checked against the SNP barcode [24], with lineage-determining positions searched per VCF through Unix command "grep" and the SNP coordinate.The MBO isolate bore no MBO lineage-determining SNPs, and instead was cleanly typable as MTB lineage 2.2.1 (Table 5).Evidently, this isolate is a case of bovine MTB being incorrectly identified as MBO when uploaded to NCBI.Likewise, of the 9 human MTB cases that stood out, all bore the 3 lineage-determining SNPs for MBO (Table 4), and no MTB lineage-determining markers.In these cases, 9 Fig. 1 MTBC isolate collection metadata.A Geographical distribution of isolates worldwide, where darker colors represent more isolates from that country.B MTBC variant makeup of the collection.Values are based on NCBI/ENA SRA designations.C Host origin for the collection.Bos taurus is the dominant host type, followed by Homo sapiens and Meles meles cases of human MBO were misclassified as MTB.These results were validated using SNP-IT software [25], which also typed ERR387001 as a BCG strain, suggesting a case of BCG-osis misdiagnosed or mislabeled as TB.After correcting these calls, there is a perfect divide between MTB/MOR and MBO/MCP, with 100% of isolates bearing the WT for MTB/MOR, and 100% of isolates bearing the SNP for MBO/MCP.This discrepancy may have affected robustness of GWAS based on phenotype of "MBO variant." However, it is noted that comparisons for "host Bovidae" and "host Homo sapiens" are unaffected by this, and all 3 analyses produced perfect concordance of their 120 associated loci by subsequent tests, suggesting this mislabeling had minimal influence.In summary, a set of 120 discriminatory loci was identified, which was also identified by bGWAS of phenotypes classified as a host of Bovidae (1) or not (0), and Homo sapiens (1) or not (0).

Discussion
Despite remarkable similarity between MTB and MBO, evidence of clear divides was present in SNPs highlighted as associated by treeWAS analysis.
The Fad family of proteins are important in MTBC, with MTB known to carry 36 FadD and FadE loci [26,27].GWAS identified SNPs in FadD11, FadE5, FadE27, and FadE32 associated with differentiation into MBO.These genes are involved in fatty acid and cholesterol handling inside the environment of the host [28].Mycobacterial reliance on cholesterol is known to be critical for pathogenesis, and MTB features around 80 genes involved with cholesterol balance and metabolism [29].Disruption of cholesterol import is severely disruptive to infection and persistence [29,30].
Cholesterol intake in MTBC requires a functional Mce4 system [30].A missense mutation (A734G, Asp254Gly) in Mce4F was seen in all MBO/MCP isolates, and only a single MTB isolate from Russia (ERR108427) which bore a unique SNP signature: G3836739A (lineage 4.8), and G1759252T (lineage 4.9).No other SNPs for lineage 4 or any other lineage were identified.A literature search turned up Congo type MTB that can present lineage 4.8 and 4.9 SNPs, but only in combination with 4.7 SNPs that were absent in this sample [31].Except this atypical MTB specimen, another split by this mce4f SNP separated MTB/MOR and MBO/MCP.
Catabolized cholesterol products fuel core acyl-CoA metabolism pathway, as well as polyketide synthesis, a pathway already known to differ between MBO and MTB [32].GWAS identified two separate missense mutations in ppsD, and synonymous changes in ppsB and pks15, all polyketide synthase genes.Genes annotated in roles of cholesterol and fatty acid metabolism or in pathways downstream of these processes are among all loci identified by the MBO subsequent tests are shown in Table 6.
These associated SNPs are scattered across lipid metabolic pathways and include members whose exact function is unclear.Ten out of fourteen of these SNPs are non-synonymous.
Other identified loci with functions separate from cholesterol and lipid metabolism include AccD1, involved in leucine degradation [33], which bore a fixed SNP of Phe343Leu.SNP 1739294 in the essential isoleucyl-tRNA synthetase IleS causes a Pro926Ala substitution, SNP 3152421 in the essential prolyl-tRNA synthetase ProS yields His177Arg, SNP 3371365 in the essential glutamyl-tRNA amidotransferase subunit GatA causes Ala-24Thr, and a synonymous change is seen at 1,260,537 for methionine synthesis gene MetE.Related to translational machinery, SNP 3198332 causes Thr259Met in essential elongation factor Tsf. SNP 1129160 impacts both RpfB (Resuscitation promoting factor B) and KsgA (a dimethyladenosine transferase) genes, resulting in RpfB Fig. 3 Manhattan plots of bGWAS results for phenotype "Bovidae" host.A Simultaneous test of association, showing no significantly ranked loci.B Subsequent test of association, showing 120 loci are ranked to be significant.The 120 loci identified here are identical to those seen in Table 2 and Fig. 2 Ala357Val and a synonymous mutation in KsgA.RpfB is thought to be involved in the transition from dormancy to active replication, and is co-transcribed with ksgA and ispE, genes involved in ribosome maturation and cell wall synthesis, respectively [34].After accounting for mislabeling cases in deposited isolates, these SNPs are all fixed and exclusive in this dataset either in MBO and MCP, or MBO alone.
MTBC physiology and function remain uncertain, and 48/120 loci identified are in genes annotated only by locus identifier and generically, like "conserved protein" or "possible oxidoreductase" even after PE/PPE gene filtering, removing a largely uncharacterized family comprising ~ 10% of MTBC genes.Even among genes with fuller annotations, nearly all include "probable" in their descriptions.The genes presented in Table 6 are not comprehensive and given the uncertainty in function across many loci, other important genes both inside and outside lipid metabolism almost certainly exist in the bGWAS output of Table 2. Loci associated with adaptation towards new hosts and lifestyles are useful then to highlight for characterization, as it narrows the still vast pool of MTBC genes with uncertain functions towards a subset of genes with fixed changes in some variants.
Any associated loci may signify adaptive roles in differentiation from a specialist infection by MTB and a pathogen with a much broader host range, like MBO.It is well-reported that members of the MTBC are clonal, and not only is horizontal gene transfer vanishingly rare, mutation rates in members of this complex are low (~ 2 × 10 -10 mutations per cell division) [35].Given 99.95% genetic identity between MTB and MBO, < 2000 polymorphisms differentiate divergent variants (disregarding RDs/large sequence polymorphisms), and fixed changes are an even smaller subset.While this research cannot draw conclusions about        1 and 2 how specific changes might alter metabolism or virulence to better reflect new host environs, it does highlight multiple SNPs across multiple genes in MTBC metabolic pathways.Metabolic differences are known to exist between variants and even between lineages of MTB, including in lipid profile [36,37].While these data are only associations, they may support findings by Griffin et al. reporting cholesterol utilization in MTB is key to host adaptation [29].MTB drives macrophages to import lipids for utilization as an energy and carbon source [38][39][40][41].The human cholesterol profile is LDL-dominant [42], as is the guinea pig [42], a model of tuberculosis that better recapitulates human disease [43] vs. the mouse model [5,43,44], an animal model with an HDL-dominant cholesterol profile and a lower overall cholesterol load [45,46].The MBO bovine host is HDL-dominant, for comparison [47].Others have reported MTB infections are influenced differently by HDL vs. LDL cholesterol [48].MTB is known to exploit lipid-rich "foamy macrophages, " and research has shown MTB trehalose dimycolate and other factors are associated with lipid droplet and foamy macrophage formation [38][39][40].Foamy macrophage formation is associated with higher levels and intake of circulating LDL cholesterol, but recent research found MTBinfected macrophages have different lipid profiles from foamy macrophages characterized in atherosclerosis and other diseases [45], indicating disease-specific responses lead to buildup of certain lipids [41].Finally, higher HDL levels are known to counteract foamy macrophage formation through classical LDL intake, HDL suppresses TNFα production in MTB-infected macrophages, and mice are more resistant to foamy macrophage formation, compared to humans, guinea pigs, rabbit, or primate models [42,45,48].Variation in use of cholesterol and fatty acids is known to exist between MTBC variants and lineages.Finally, though research is more limited in this area, studies have demonstrated mice are more susceptible to disease and death by MBO infection than by MTB [49,50].Biological reality is undoubtedly far more complex, but from existing literature, host lipid profiles differ, lipid availability and sequestration are key to MTB virulence, animal models with a lipid profile closer to humans better reproduce "classic" granulomatous tuberculosis by MTB as seen in humans, and MTB lineages and MTBC variants utilize lipids differentially.Cholesterol/fatty acid metabolic pathways associated by bGWAS showing variant-specific between MTB and MBO are suggestive of a potential contributor towards host adaptation.Adaptation to specific hosts was not detectable with this approach, which could be improved through routine sequencing of isolates from non-standard host types, which are currently rare and geographically biased.While efforts were taken to select from countries worldwide, the dataset itself is necessarily a biased sampling as well -very few isolates of the vast number of infections worldwide are ever sequenced, and of those that are, even fewer meet the inclusion criteria for metadata and sequencing coverage.Furthermore, in many clinical MTB genomes utilized in this work, evolutionary pressures imparted by antibiotic use are a powerful influence on SNP distribution, and drug-resistance associated SNPs were not filtered from the input dataset.This adds an additional layer of bias in that antibiotic selection

Table 4 SNP typing improperly labeled MTBC isolates by MBOlineage markers
Nine pathogen isolates from humans (underlined) were classified when deposited into NCBI as MTB, but their genotypes by GWAS did not align with other MTB isolates.These isolates were checked for three MBO lineagedetermining SNPs as reported by Coll et al. [24], and all 9 were found to possess these SNPs, indicating a misclassification in the database.This was confirmed by SNP-IT [25], which also reported one isolate was a BCG strain (starred).Conversely, one isolate (SRR16278270) was classified as MBO, but was shown not to possess any MBO lineage-determining SNPs, supporting a misclassification of an MTB isolate as MBO * BCG strain

Lineage Marker Bovis
Accession ID C1427476T A2831482G C3624593T  pressures vary wildly across MTBC members.For future studies and adaptations of these findings, it is critical to note that our use of MTB H37Rv as a reference to study the entire MTBC and its evolutionary history is necessarily missing data.While the use of this reference as a standard is common practice and the H37Rv coordinates are typically how MTB SNPs are defined, the H37Rv lineage is a modern one, and any regions of difference absent in H37Rv but present in other MTBC members will not be SNP-called by our method, excluding potentially tens of thousands of base pairs.Future studies should consider using more "ancestral" variants like M. canettii, performing an MBO-specific bGWAS to try to identify host adaptation within the variant, or even by constructing an MTBC pangenome to call SNPs against for the maximum possible number of informative protein-coding changes.
We hope our research serves as a launching point for these studies, for more routine sequencing and better metadata classification of isolates across the MTBC, and for better understanding of MTBC divergence, including through generation of time-measured phylogenies to estimate SNP occurrences through evolutionary history.

Conclusions
Analysis of the genetic differences between MTBC isolates and their association with classification as certain variants or isolation from certain hosts by GWAS was performed.The 120 SNPs identified through this analysis provide a trove of genes and pathways implicated in adaptation towards a generalist lifecycle, including loci across cholesterol and fatty acid uptake, catabolism, and downstream processing pathways, important for central metabolism in MTBC organisms and critical for pathogenesis [28,36,38,39,51].The understanding of host adaptation in Mycobacterium is a major outstanding knowledge gap.While tremendous work has been performed by groups worldwide, the mechanisms of how exactly MTBC members present different and infect different hosts is unresolved.Detail transcriptomic profiling has been performed in cattle infections by MTB and MBO, which demonstrate that macrophages have very distinct response profiles to each variant, as well as what appears to be a fine-tuned engagement with the bovine innate immune system for MBO [8].It is known that MTB is attenuated in cattle [8], and that cellular-level differences manifest in overall disease course differences in other hosts like mice [49].As discussed above, one rare aspect of TB pathogenesis is its critical utilization of host cholesterol for survival and success [30], and prior research within MTB hypothesizes human populationspecific cholesterol utilization adaptation has occurred across MTB lineages [4].Multiple studies have found that cholesterol utilization is in fact fundamental to pathogenesis, as covered in the review by Moopanar and Mvubu [40].While the findings in this work are only suggestive, they again highlight cholesterol and fatty acid-associated genetic pathways, and future research in the lab should assess how loci identified herein may functionally differ in the context of different host pathways for cholesterol metabolism.Available lipid and cholesterol pools in bovine, murine, and other non-human hosts, differences between host cholesterol and lipid profiles, and the possible lineage-specific adaptation identified in the loci in this analysis could contribute to the elusive mechanisms of pathogen host preferences.In summary, these SNPs reliably differentiate MTBC variants, and importantly, they may inform research into genes that differ between variants, narrowing the pool of uncharacterized proteins to study in the MTBC.

Methods
Existing datasets were collected, including those where bGWAS was performed to answer other questions.Dong et al. (2022) genome-sequenced 74 Chinese cattle MBO isolates, and performed bGWAS analysis on 3,227 total MBO isolates from around the world [52].Additionally, sequences used by Coll et al. in designing the MTBC SNP barcode are a validated set of primarily human MTB isolates [24].Both datasets were included in this analysis, along with many smaller sets.FASTQ download URLs were acquired through SRA-Explorer [53], formatted for Globus-CLI [54,55] and downloaded to MSU's High Performance Computing Center (HPCC) for processing.After transfer, single-end read data were excluded, and Snippy [56] run with default parameters (BWA base quality = 13, minimum SNP coverage = 10 reads, minimum VCF SNP quality = 100), paired-end read input, and MTB H37Rv as the reference genome (AL123456.3).A complete list of accession IDs for all sequences used are provided in Additional file 1. Rare cases of genomes with unusually high numbers of SNPs (> 3,000) were excluded, as were genomes with alignment coverage < 90% of the reference length (cutoff < 3.96mbp).Additionally, metadata were compiled for all isolates (Additional file 1), and only datasets with host, MTBC variant, and country of isolation were included.Taxonomic classification by Kraken 2 [57] revealed several isolates primarily (> > 50%) contained plant or insect genomes, and were excluded, at which point remaining samples contained classified reads mapping primarily to Mycobacterium.Remaining paired-end read sets were selected (n = 6,360) to build the final "snippy-core" core SNP alignment, along with the H37Rv reference and M. canettii (GCF_000253375.1)as an outgroup, while masking PE/PPE genes using the H37Rv-specific.bedfile provided by default in Snippy.Core SNPs were used for phylogenetic tree generation in RAxML-NG [20] (substitution model GTR + G selected by ModelTest-NG [58]; bootstrap analysis: seed = 774,900,118, bootstrap trees = 300; for tree search analysis: seed = 4,949,250,770, 50 parsimony-based, 50 random-based starting trees for tree search; applying bootstrap support to best ML tree: -consense MRE).On a Windows 10 desktop PC, RStudio (2022.07.2 + 576) [59], R (v4.0.5) [60], and the R package vcfR (v1.12.0) [61] were used to generate a vcfR object for import with prewas [21].In prewas (v1.1.1),the VCF object containing variant calls was processed with an input tree generated from RAxML-NG, the H37Rv GFF3 file, and with ancestral reconstruction flag set to TRUE, for maximum likelihood-based reconstruction through the ape R package it uses.On an HPCC cluster, a Conda [62] environment was created containing GCC (v11.2.0) [63], OpenMPI (v4.1.1)[64], and R (v4.1.2).The R package devtools (v2.4.5) [65] was installed, and used to install prewas (v1.1.1)[66] and treeWAS (v1.1) [67] from Github.The RData object containing prewas output from the desktop PC was uploaded to HPCC and used for ancestral reconstruction state, binary variant matrix, and phylogenetic tree inputs, along with binary metadata phenotype matrices.All other parameters were left at their defaults.For MTBC lineage determination, the Coll et al.SNP barcode and SNP-IT tool were used [24,25].SnpEff (v4.2) was used to annotate variants separately [68].Tree-WAS generated default Manhattan plots and distribution graphics, and text output was collected in.csv files.

Fig. 2
Fig. 2 Manhattan plots of bGWAS results for phenotype "M.tuberculosis variant bovis."A Simultaneous test of association, showing 32 loci ranked to be significant, of which 10 are of dubious quality.B Subsequent test of association, showing 120 loci are ranked to be significant.Details for each locus are available in Tables 1 (simultaneous) and 2 (subsequent)

Table 1
SNPs significantly associated with classification as MBO by Simultaneous statistical test

Rv1559 IlvA Probable threonine dehydratase IlvA In vitro essential (DeJesus 2017; Griffin 2011), non- essential in rich media (Minato 2019)
GWAS results by treeWASshowing single nucleotide polymorphisms (coordinate relative to MTB H37Rv in SNP column) associated with classification of MTBC isolates as M. tuberculosis variant bovis (MBO).For SNPs within genes or ORFs, the classification and putative function is listed, as well as select information about essentiality by transposon mutagenesis studies from Mycobrowser [https:// mycob rowser.epfl.ch/].Likely false positive calls of association for SNPs with only 1-2 occurrences are highlighted by bolded rows Table 1 (continued)

Table 2
SNPs significantly associated with classification as MBO by Subsequent statistical test

Table 2 (continued) SNP p-value score G1P1 G0P0 G1P0 G0P1 Locus Protein Description Essentiality Notes from Mycobrowser
GWAS results by treeWAS showing single nucleotide polymorphisms (coordinate relative to MTB H37Rv in SNP column) associated with classification of MTBC isolates as M. tuberculosis variant bovis (MBO).For SNPs within genes or ORFs, the classification and putative function is listed, as well as select information about essentiality by transposon mutagenesis studies from Mycobrowser [https:// mycob rowser.epfl.ch/]

Table 3
SNPs concordantly significantly associated with classification as MBO by Subsequent and Simultaneous statistical tests GWAS results by treeWAS showing single nucleotide polymorphisms (coordinate relative to MTB H37Rv in SNP column) associated with classification of MTBC isolates as M. tuberculosis variant bovis (MBO).For SNPs within genes or ORFs, the classification and putative function is listed, as well as select information about essentiality by transposon mutagenesis studies from Mycobrowser [https:// mycob rowser.epfl.ch/].This list is a subset of only variants called in both Tables

Table 5
[25]typing improperly labeled MBO isolate by MTB-lineage markersIsolate SRR16278270 was deposited in NCBI as an MBO isolate from cattle, but its genotype by GWAS did not align with other MBO isolates.This isolate was checked for MTB lineage-determining SNPs as reported by Coll et al.[24], and was found to contain all SNPs for MTB lineage 2.2.1 and none for MBO (Table4), indicating a misclassification in the database.This was confirmed by SNP-IT[25]

Table 6
Genes identified by GWAS associated with fatty acid and cholesterol metabolismA subset of genes in the pathways of lipid and cholesterol intake, metabolism, and utilization were identified with SNPs by GWAS, with a split roughly between one genotype in MTB ± MOR and a divergent genotype in MBO ± MCP.Columns 2-5 indicate presence of genotype G (SNP