- Open Access
Rules of co-occurring mutations characterize the antigenic evolution of human influenza A/H3N2, A/H1N1 and B viruses
© The Author(s). 2016
- Published: 5 December 2016
The human influenza viruses undergo rapid evolution (especially in hemagglutinin (HA), a glycoprotein on the surface of the virus), which enables the virus population to constantly evade the human immune system. Therefore, the vaccine has to be updated every year to stay effective. There is a need to characterize the evolution of influenza viruses for better selection of vaccine candidates and the prediction of pandemic strains. Studies have shown that the influenza hemagglutinin evolution is driven by the simultaneous mutations at antigenic sites. Here, we analyze simultaneous or co-occurring mutations in the HA protein of human influenza A/H3N2, A/H1N1 and B viruses to predict potential mutations, characterizing the antigenic evolution.
We obtain the rules of mutation co-occurrence using association rule mining after extracting HA1 sequences and detect co-mutation sites under strong selective pressure. Then we predict the potential drifts with specific mutations of the viruses based on the rules and compare the results with the “observed” mutations in different years.
The sites under frequent mutations are in antigenic regions (epitopes) or receptor binding sites.
Our study demonstrates the co-occurring site mutations obtained by rule mining can capture the evolution of influenza viruses, and confirms that cooperative interactions among sites of HA1 protein drive the influenza antigenic evolution.
- Influenza virus
- A/H3N2/H1N1 and B
- Antigenic evolution
- Co-occurring mutation
- Influenza vaccine
Influenza has been a major and persistent threat to public health for centuries, causing millions of deaths and huge economic loss worldwide every year. Among the three types of human influenza viruses, denoted as A, B and C, influenza A viruses are the most virulent due to their high mutation rate, frequent genetic reassortment and short generation time, which have caused several pandemics in recent history [1, 2]. The pandemics include 1918 Spanish flu (A/H1N1) , 1957 (A/H2N2) Asia flu , 1968 Hongkong flu (A/H3N2)  and 2009 swine flu (A/H1N1) . Influenza B viruses, evolving into B/Yamagata and B/Victoria lineages and frequently exchanging their segments, have been co-circulating since 2001 and cause an observably part of infections [7, 8]. Under the surveillance and monitoring by WHO (World Health Organization), influenza activity was detected to be associated with the co-circulation of influenza A/H1N1 pdm09, A/H3N2 and B viruses . Therefore, to predict and prevent potential pandemics in the future, it is important to analyze and compare the evolutionary patterns of the three types of viruses.
Haemagglutinin (HA) is a surface glycoprotein of influenza virus responsible for binding specificity and initiating the viral entry. It can be cleaved into two polypeptides: HA1 and HA2 subunits, which are covalently linked by a disulfide bond . HA1 contains the sialic acid receptor binding sites, and is considered as one of the main targets of immune system to detect influenza virus, as well as the primary protein component of vaccine [10–12]. Under rapid mutations (substitution rate estimated to be 5.7 × 10− 3per site per year ), the HA1 domain accumulates mutations causing viral antigenic drift and thus preclude effective vaccination with existing vaccines . Identifying the evolutionary trajectories and predicting future mutations would be very helpful for recommending efficient influenza vaccines before a potential variant causes an influenza outbreak. Therefore, many studies have attempted to track and predict the antigenic evolutionary dynamics of the HA protein. Phylogenetic tree analysis is a traditional technique in this field. Studies based on phylogenetic tree analysis revealed that a single predominant trunk lineage persists through time while side branches persist for 1 ∼ 5 years before going extinct [15–17], indicating a strong selection preference in the evolutionary path. Many methods have been proposed to identify single mutation sites under positive selection and thereby understand the antigenic evolution of HA [18–20]. Statistical analysis and machine learning approaches have also been applied to reveal more information about the mutational dynamics in the viral sequences. The pioneering work by Smith et al.  characterized the antigenic evolution of HA1 (A/H3N2) based on the Hemagglutination-inhibition (HI) assays, and mapped the antigenic evolution (phenotype) to the phylogenetic tree based on HA1 sequences (genotype) using a maximum-likelihood (ML) approach. Smith’s method was enhanced by Bedford et al. in  by simultaneously characterizing antigenic and genetic evolution using a diffusion model over a shared virus phylogeny. Plotkin et al.  adopted a clustering technique to investigate the spatio-temporal evolution of antigenic clusters. A Bayesian approach was applied in  to predict the antigenic relationships of H3N2 viruses, which were used to identify the antigenic clusters and infer the dynamics of antigenic evolution. The relationship between the antigenic distances based on sequences and those calculated from HI titer data was further discussed in , where an online tool named “nextflu” was provided for real-time tracking. Although those studies have obtained insightful results, most of them focus on the clusters of antigenic mutations. Currently very few studies work on the interactions among site mutations in the HA proteins and their impact on the direction of antigenic evolution.
It has been observed that simultaneous multi-site mutations (or co-occurring mutations) at antigenic sites could accumulatively enhance the antigenic drift [26, 27]. Co-occurring mutations can be categorized into stochastic co-evolution, functional co-evolution and interaction evolution . One site on a protein may compensate for another during evolution; thus mutations on these sites are under positive selection pressure and occur simultaneously (i.e. co-occurring mutations). The identification of co-occurring mutations can help uncover possible interactions among them and thereby improve our understanding of the mutational dynamics of proteins. Mutual information has been used to estimate the correlations between two site mutations [29, 30]. The correlation network (named site transition network or STN in ) based on mutual information can be used to predict the future mutations of sites in HA protein. Results in  showed that the STN can predict site mutations with 70% accuracy. However, mutual information is limited to pairwise relationships. How multiple sites interact with each other is yet to be discovered.
Here, we propose a method based on association rule mining  to identify co-occurring patterns of multiple-site mutations. Association rule mining has been shown as a promising technique in bioinformatic analysis [32, 33]. Our approach offers a flexible way to discover the interactions of multiple site mutations, not limited to pairwise interactions as in . Besides, the rules of co-occurring mutations provide interpretability, making it easy for human to understand the underlying process of antigenic evolution. Furthermore, our rules can also be used to predict potential mutations in the sites of HA1.
Rules of co-occurring mutations
Overview of extracted rules
Number of Extracted Rules
Then we classified the flu-B HA1 sequences into two lineages based on their distances from two standard HA1 sequences of Victoria and Yamagata lineages. After that, we applied our approach to the sequences of the two lineages respectively. Results of rules are visualized in Additional file 4: Figure S4 and Additional file 5: Figure S5, for Yamagata and Victoria lineages respectively. We can see that, after discriminating flu-B sequences into two lineages, the numbers of mutations (within each lineage) decrease significantly. Here we set the threshold for support of rules to 1500 (versus 5000 before classifying the lineages) and obtained 1110 and 69 rules for Yamagata and Victoria lineages respectively. The results also suggest that the Yamagata lineage may mutate more quickly than the Victoria lineage.
The analysis of co-occurring mutation patterns in A/H3N2 is given in the section “Predictions of influenza evolution” below.
Co-Mutation sites under strong selection pressure
Sites detected co-mutated frequently with other sites
Number of residues at antigenic sites/Total number of sites
50, 53, 62, 137, 144, 145, 155, 156, 158, 189, 244, 260, 275
128, 183, 186, 205, 216, 249, 272
69, 97, 143, 163, 197, 256, 260, 283
48, 56, 75, 116, 182, 183, 266
75, 88, 175, 199, 330, 235
Distribution of detected sites on epitope regions
Predictions of influenza evolution
The prediction results (for H3N2 in different years)
We also compare our results with those in , using the same dataset as  (i.e. the sequences of H3N2 from 1968 to 2002). The rules of co-mutation sites are plot as a network shown in Additional file 6: Figure S6. The following site mutations are predicted both by our method and Xia’s method (i.e. ): 50, 155 and 156. The mutation in site 144 is only predicted by our method. Since the “benchmark” for site mutations may not be unique, we introduce another set of “observed” mutations generated by BII-FluSurver . We submitted all HA1 sequences of H3N2 in 2003 to BII-FluSurver (using default parameters) and counted the frequencies of all site mutations returned by BII-FluSurver. Totally 4543 site mutations (with duplicates) were obtained. The comparison of occurrence of predicted mutations in the BII-FluSurver results is shown in Additional file 7: Table S1, which shows that the overlap between our prediction and the BII-FluSurver results is similar to that of Xia’s prediction.
In this paper, we propose a method based on association rule mining to identify the co-occurring site mutations for human influenza A(H3N2), A(H1N1) and B Viruses. The rules of co-mutation sites characterize the antigenic evolution of influenza viruses. We show that the co-mutation sites in HA1 are all in the epitope regions, indicting strong selection pressure by human immune system in those sites. Furthermore, the rules obtained by our method can be used to predict potential mutations of influenza viruses in the future.
There are several directions to improve our study in this paper. First, we could increase the number of sampling process (i.e. N in Methods section) to increase the statistic power of association rule mining. Second, instead of randomly sampling two sequences from two adjacent years, we can select two sequences with closer phylogenetic distance to calculate the mutations. Of course, experimental data where phylogenetic relationships of sequences are known would bring better results. In addition, different weights could be assigned to different years during sampling, e.g. to do more samplings on the years with more sequences. Finally, aside from analyzing the co-occurring mutations in HA protein, we can also explore the co-evolving mutations patterns in other proteins, where mutations may compensate for each other. For example, HA and NA proteins of influenza viruses are responsible for the viral’s binding and cleavage from host cells. It would be very interesting to detect the co-occurring mutations in these two proteins, which are under selection pressure, to study their cooperative manner at the genetic level.
All HA protein sequences of human Influenza A/H3N2, A/H1N1 and B Viruses were retrieved from the Influenza Virus Resource at NCBI  up to October 8, 2015. The sequences were searched from the year 1918 to the year 2015. We excluded the records without the information of year and the sequences which are shorter than the full length of HA1 (327, 312 and 345 residues for H1N1, H3N2 and B viruses, respectively).
Because there is no record in some years for a particular type of virus, we used the sequences from 1976 to 2015 for H1N1, from 1968 to 2015 for H3N2, and from 1975 to 2015 for flu-B virus, to ensure the continuity. Totally 18,450 sequences were obtained after cleaning for H1N1, 18,019 sequences for H3N2, and 6538 sequences for flu-B virus. Then we aligned these sequences using MEGA6  and extracted the HA1 sequences for the three types of viruses respectively.
After obtaining the HA1 sequences, we divided the sequences into different bins according to the year information. Then a technique of sampling with replacement was applied to randomly select one sequence from each bin (i.e. year). We repeated the sampling process for N times to obtain enough statistics. After that, the sequences from every two adjacent years were aligned to obtain mutations between the two sequences. The records of site mutations were treated as “transactions” in association rule mining , which was applied to find the rules of co-occurring site mutations. Here LCM  was used to carried out association rule mining.
From the rules obtained from association rule mining, we infer which sites tend to be co-mutated during the evolution of the influenza virus. To predict potential mutations in the future, following , we first find out the sites under positive selection and obtain the sites co-evolving with the positive-selection sites, which would be predicted as the sites to be mutated. In , the positive selection site is defined as “a site that has been mutated between successive years and then remains fixed in the population for at least 1 year”. To obtain the sites under positive selection, we need to determine which sites are mutated in a particular year. Unfortunately, currently there is no standard way to obtain the yearly site mutations (i.e. the benchmark of our prediction), which make it difficult to determine the positive selection sites. Therefore, here we treat the sites occurring frequently in our rules (i.e. the sites co-mutated frequently with other sites, or with large in-degree) as the sites to be mutated.
Phylogenetic trees were constructed with 1000 randomly selected sequences from corresponding dataset mentioned above, using NCBI tools in “Influenza Virus Sequence Tree”  based on the neighbor-joining method and mPAM distance. Co-evolved sites output by our method were mapped to the virus HA protein structure retrieved from the Protein Data Bank [39–42] using Chimera (v1.11) . The epitope regions of H1N1 and H3N2 are marked according to [44–47]. The epitope information of flu-B are from [48, 49].
This article has been published as part of BMC Medical Genomics Volume 9 Supplement 3, 2016. 15th International Conference On Bioinformatics (INCOB 2016): medical genomics. The full contents of the supplement are available online https://bmcmedgenomics.biomedcentral.com/articles/supplements/volume-9-supplement-3.
Publication of this article was funded by AcRF Tier 2 grant MOE2014-T2-2-023, Ministry of Education, Singapore.
Availability of data and materials
Data, code, and Additional files are available at: https://github.com/Xinrui0523/comutation.
HC conceived and directed the project. HC and XZ performed experiments, interpreted results, and wrote the manuscript. JZ and CK revised the paper, provided overall supervision, direction and leadership to the research. All authors have read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Kilbourne ED. Influenza pandemics of the 20th century. Emerg Infect Dis. 2006;12(1):9.View ArticlePubMedPubMed CentralGoogle Scholar
- Tscherne DM, García-Sastre A. Virulence determinants of pandemic influenza viruses. J Clin Invest. 2011;121(1):6–13.View ArticlePubMedPubMed CentralGoogle Scholar
- Taubenberger JK, Morens DM. 1918 influenza: the mother of all pandemics. Rev Biomed. 2006;17:69–79.Google Scholar
- Henderson DA, Courtney B, Inglesby TV, Toner E, Nuzzo JB. Public health and medical responses to the 1957–58 influenza pandemic. Biosecur Bioterror. 2009;7(3):265–73.View ArticlePubMedGoogle Scholar
- Viboud C, Grais RF, Lafont BA, Miller MA, Simonsen L. Multinational impact of the 1968 hong kong influenza pandemic: evidence for a smoldering pandemic. J Infect Dis. 2005;192(2):233–48.View ArticlePubMedGoogle Scholar
- Viboud C, Simonsen L. Global mortality of 2009 pandemic influenza a h1n1. Lancet Infect Dis. 2012;12(9):651–3.View ArticlePubMedGoogle Scholar
- Ambrose CS, Levin MJ. The rationale for quadrivalent influenza vaccines. Hum Vaccin Immunother. 2012;8(1):81–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Dudas G, Bedford T, Lycett S, Rambaut A. Reassortment between influenza b lineages and the emergence of a coadapted pb1–pb2–ha gene complex. Mol Biol Evol. 2015;32(1):162–72.View ArticlePubMedGoogle Scholar
- (WHO), W.H.O, et al. Recommended composition of influenza virus vaccines for use in the 2016–2017 northern hemisphere influenza season. Geneva: WHO; 2016.Google Scholar
- Imai M, Kawaoka Y. The role of receptor binding specificity in interspecies transmission of influenza viruses. Curr Opin Virol. 2012;2(2):160–7.View ArticlePubMedGoogle Scholar
- Suzuki Y. Predictability of antigenic evolution for h3n2 human influenza a virus. Genes Genet Syst. 2013;88(4):225–32.View ArticlePubMedGoogle Scholar
- Wilks S, de Graaf M, Smith DJ, Burke DF. A review of influenza haemagglutinin receptor binding as it relates to pandemic properties. Vaccine. 2012;30(29):4369–76.View ArticlePubMedPubMed CentralGoogle Scholar
- Chen R, Holmes EC. Avian influenza virus exhibits rapid evolutionary dynamics. Mol Biol Evol. 2006;23(12):2336–41.View ArticlePubMedGoogle Scholar
- Hensley SE, Das SR, Bailey AL, Schmidt LM, Hickman HD, Jayaraman A, Viswanathan K, Raman R, Sasisekharan R, Bennink JR, et al. Hemagglutinin receptor binding avidity drives influenza a virus antigenic drift. Science. 2009;326(5953):734–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Bush RM, Bender CA, Subbarao K, Cox NJ, Fitch WM. Predicting the evolution of human influenza a. Science. 1999;286(5446):1921–5.View ArticlePubMedGoogle Scholar
- Fitch WM, Bush RM, Bender CA, Cox NJ. Long term trends in the evolution of h (3) ha1 human influenza type a. Proc Natl Acad Sci. 1997;94(15):7712–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Volz EM, Koelle K, Bedford T. Viral phylodynamics. PLoS Computational Biololy. 2013;9(3):1002947.View ArticleGoogle Scholar
- Yang Z, Swanson WJ. Codon-substitution models to detect adaptive evolution that account for heterogeneous selective pressures among site classes. Mol Biol Evol. 2002;19(1):49–57.View ArticlePubMedGoogle Scholar
- Suzuki Y. New methods for detecting positive selection at single amino acid sites. J Mol Evol. 2004;59(1):11–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Zhou R, Das P, Royyuru AK. Single mutation induced h3n2 hemagglutinin antibody neutralization: a free energy perturbation study. J Phys Chem B. 2008;112(49):15813–20.View ArticlePubMedGoogle Scholar
- Smith DJ, Lapedes AS, de Jong JC, Bestebroer TM, Rimmelzwaan GF, Osterhaus AD, Fouchier RA. Mapping the antigenic and genetic evolution of influenza virus. Science. 2004.Google Scholar
- Bedford T, Suchard MA, Lemey P, Dudas G, Gregory V, Hay AJ, McCauley JW, Russell CA, Smith DJ, Rambaut A. Integrating influenza antigenic dynamics with molecular evolution. Elife. 2014;3:01914.View ArticleGoogle Scholar
- Plotkin JB, Dushoff J, Levin SA. Hemagglutinin sequence clusters and the antigenic evolution of influenza a virus. Proc Natl Acad Sci. 2002;99(9):6263–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Du X, Dong L, Lan Y, Peng Y, Wu A, Zhang Y, Huang W, Wang D, Wang M, Guo Y, et al. Mapping of h3n2 influenza antigenic evolution in china reveals a strategy for vaccine strain recommendation. Nat Commun. 2012;3:709.View ArticlePubMedGoogle Scholar
- Neher RA, Bedford T, Daniels RS, Russell CA, Shraiman BI. Prediction, dynamics, and visualization of antigenic phenotypes of seasonal influenza viruses. Proc Natl Acad Sci. 2016;1701–9.Google Scholar
- Shih AC-C, Hsiao T-C, Ho M-S, Li W-H. Simultaneous amino acid substitutions at antigenic sites drive influenza a hemagglutinin evolution. Proc Natl Acad Sci. 2007;104(15):6283–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Du X, Wang Z, Wu A, Song L, Cao Y, Hang H, Jiang T. Networks of genomic co-occurrence capture characteristics of human influenza a (h3n2) evolution. Genome Res. 2008;18(1):178–87.View ArticlePubMedPubMed CentralGoogle Scholar
- Codoñer FM, Fares MA. Why should we care about molecular coevolution? Evol Bioinformatics Online. 2008;4:29.Google Scholar
- Xia Z, Jin G, Zhu J, Zhou R. Using a mutual information-based site transition network to map the genetic evolution of influenza a/h3n2 virus. Bioinformatics. 2009;25(18):2309–17.View ArticlePubMedGoogle Scholar
- Gong Y-N, Chen G-W, Suchard MA. A novel empirical mutual information approach to identify co-evolving amino acid positions of influenza a viruses. Comput Biol Chem. 2012;39:20–8.View ArticlePubMedGoogle Scholar
- Agrawal R, Imieliński T, Swami A. Mining association rules between sets of items in large databases. ACM SIGMOD Record. 1993;22(2):207–16.View ArticleGoogle Scholar
- Chen Q, Chen Y-PP. Mining frequent patterns for AMP-activated protein kinase regulation on skeletal muscle. BMC Bioinformatics. 2006;7:394.View ArticlePubMedPubMed CentralGoogle Scholar
- Chen H, Lonardi S, Zheng J. Deciphering histone code of transcriptional regulation in malaria parasites by large-scale data mining. Comput Biol Chem. 2014;50:3–10.View ArticlePubMedGoogle Scholar
- Bao Y, Bolotov P, Dernovoy D, Kiryutin B, Zaslavsky L, Tatusova T, Lipman D. The influenza virus resource at the National Center for Biotechnology Information. J Virol. 2008;82(2):596–601.View ArticlePubMedGoogle Scholar
- Neher RA, Bedford T. nextflu: real-time tracking of seasonal influenza virus evolution in humans. Bioinformatics. 2015;381.Google Scholar
- BII Flusurver – Prepared for the next wave. http://flusurver.bii.a-star.edu.sg/ Accessed 29 May 2016.
- Tamura, K., Stecher, G., Peterson, D., Filipski, A., Kumar, S.: MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol. 2013;197.Google Scholar
- Uno T, Asai T, Uchida Y, Arimura H. LCM: An efficient algorithm for enumerating frequent closed item sets. In: FIMI, vol. 90. Citeseer; 2003Google Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Bourne PE. The protein data bank. Nucleic Acids Res. 2000;28(1):235–42.View ArticlePubMedPubMed CentralGoogle Scholar
- Cho KJ, Lee JH, Hong KW, Kim SH, Park Y, Lee JY, Seok JH. Insight into structural diversity of influenza virus haemagglutinin. J Gen Virol. 2013;94(8):1712–22.View ArticlePubMedGoogle Scholar
- Lin YP, Xiong X, Wharton SA, Martin SR, Coombs PJ, Vachieri SG, Gamblin SJ. Evolution of the receptor binding properties of the influenza A (H3N2) hemagglutinin. Proc Natl Acad Sci. 2012;109(52):21474–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Ni F, Mbawuike IN, Kondrashkina E, Wang Q. The roles of hemagglutinin Phe-95 in receptor binding and pathogenicity of influenza B virus. Virology. 2014;450:71–83.View ArticlePubMedGoogle Scholar
- Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE. Ucsf chimera – a visualization system for exploratory research and analysis. J Comput Chem. 2004;25(13):1605–12.View ArticlePubMedGoogle Scholar
- Deem MW, Pan K. The epitope regions of h1-subtype influenza a, with application to vaccine efficacy. Protein Eng Des Sel. 2009;027.Google Scholar
- Lee M-S, Chen JS-E. Predicting antigenic variants of influenza a/h3n2 viruses. Emerg Infect Dis. 2004;10(8):1385.View ArticlePubMedPubMed CentralGoogle Scholar
- Huang J-W, Lin W-F, Yang J-M. Antigenic sites of h1n1 influenza virus hemagglutinin revealed by natural isolates and inhibition assays. Vaccine. 2012;30(44):6327–37.View ArticlePubMedGoogle Scholar
- Xu R, Ekiert DC, Krause JC, Hai R, Crowe JE, Wilson IA. Structural basis of preexisting immunity to the 2009 h1n1 pandemic influenza virus. Science. 2010;328(5976):357–60.View ArticlePubMedPubMed CentralGoogle Scholar
- Wang Q. Influenza type b virus haemagglutinin: antigenicity, receptor binding and membrane fusion. Influenza: Molecular Virology. 2010:29–52Google Scholar
- Wang Q, Cheng F, Lu M, Tian X, Ma J. Crystal structure of unliganded influenza b virus hemagglutinin. J Virol. 2008;82(6):3011–20.View ArticlePubMedPubMed CentralGoogle Scholar