A systematic analysis of a broadly neutralizing antibody AR3C epitopes on Hepatitis C virus E2 envelope glycoprotein and their cross-reactivity

Background Hepatitis C virus (HCV) belongs to Flaviviridae family of viruses. HCV represents a major challenge to public health since its estimated global prevalence is 2.8% of the world's human population. The design and development of HCV vaccine has been hampered by rapid evolution of viral quasispecies resulting in antibody escape variants. HCV envelope glycoprotein E1 and E2 that mediate fusion and entry of the virus into host cells are primary targets of the host immune responses. Results Structural characterization of E2 core protein and a broadly neutralizing antibody AR3C together with E1E2 sequence information enabled the analysis of B-cell epitope variability. The E2 binding site by AR3C and its surrounding area were identified from the crystal structure of E2c-AR3C complex. We clustered HCV strains using the concept of "discontinuous motif/peptide" and classified B-cell epitopes based on their similarity. Conclusions The assessment of antibody neutralizing coverage provides insights into potential cross-reactivity of the AR3C neutralizing antibody across a large number of HCV variants.


Background
Hepatitis C virus (HCV) is a major cause of viral hepatitis, liver cirrhosis, and liver cancer. It was discovered in 1989 as a novel causative agent of hepatitis [1]. HCV is a growing health concern since it affects about 2.8% of the world population and its prevalence is rising [2,3]. Each year, there are more than 500,000 new HCV infections in Egypt, the country with the highest HCV prevalence [4]. In the United States, more people die from HCV than from human immunodeficiency virus 1 (HIV-1) related disease [5]. Six genotypes and multiple subtypes of HCV have been identified to date. Approximately 75% of Americans with HCV have genotype 1 of the virus (subtypes 1a or 1b), and 20-25% have genotypes 2 or 3, with small numbers of patients being infected with genotypes 4, 5, or 6 [6]. Effective vaccination would provide protection against this global disease. However, the development of HCV vaccine and identification of broadly neutralizing antibodies has been hampered because HCV sequences mutate rapidly generating escape variants [7], the non-neutralizing antibodies to HCV envelope proteins interfere with neutralizing antibodies [8], and there is lack of 3D structural information needed for vaccine development [9]. The first crystal structure of broadly neutralizing antibody against HCV has been published in 2013 [10].
The HCV envelope glycoproteins E1 and E2 form a heterodimer E1E2 that facilitates virus attachment and entry into host cells and are targets for neutralizing antibodies [11]. Recent progress in isolating and characterizing HCV-neutralizing antibodies are instrumental for vaccine discovery and design [12]. These HCV-neutralizing antibodies were isolated from immunized mice [13][14][15], or from patients chronically infected with HCV [16][17][18][19][20]. Giang et al. [4], using an exhaustive panning strategy, identified five distinct antigenic regions on the HCV E1E2, that were recognized by 73 human monoclonal antibodies (mAbs) from an HCV immune phage-display antibody library. Many of these antibodies showed broadly neutralizing ability.
Structural characterization of HCV envelope glycoproteins is challenging because of the difficulty in obtaining homogenous protein preparations [10,[21][22][23]. Recently, the crystal structure of E2 core bound to neutralizing antibody AR3C has been crystalized [10], The antibody AR3C belongs to a group of broadly neutralizing antibodies that recognize antigenic region 3 (AR3) of E2 protein and cross-neutralizes HCV genotypes by blocking CD81 receptor binding site [14].
In this study, we characterized the B-cell epitope from the E2c-AR3C structure. By mapping this B-cell epitope to HCV E2 protein sequences, all strains available in the HCV database have been catalogued and compared with the known neutralized HCV strains. We examined the B-cell epitope diversity among the HCV variants, assessed potential cross-neutralization of the broadly neutralizing antibody across all sequences, and provided suggestions for selection of representative strains for future analysis of diversity and cross-recognition of HCV neutralizing B-cell epitopes.

Materials and methods
Structures of neutralizing antibody-E2 core protein complex HCV envelope glycoproteins E1 and E2 mediate fusion and entry into host cells and are the primary targets of the humoral immune responses. The structure of the E2 core bound to a broadly neutralizing antibody was first crystalized at 2.65 angstroms [10], and deposited in PDB [19] database (PDB ID: 4MWF).

Sequences of E2 protein from Hepatitis C virus
All E2 envelope protein sequences of HCV strains were retrieved from HCV database [24] (http://hcv.lanl.gov/content/index), a database that provides annotated data about HCV sequences. We retrieved 5589 E2 sequences from the HCV database. Of these, 5340 sequences with translated protein sequences were retained in E2 protein dataset, with 3723, 275, 995, 70, 22 and 87 sequences labeled as genotype 1-6, respectively. Among these, 168 sequences were genotype-unclassified isolates or representatives of recombinant strains. Five of the seven neutralizing motifs studied in [18] were represented in this E2 data set.

Consistency of strain sequence numbering
All sequences in E2 protein dataset were aligned using MAFFT multiple alignment server [26]. The multiple sequence alignment (MSA) results provided a consistent sequence numbering scheme for further analysis of all sequences.
For each validated strain (Table 1), sequence similarity to all sequences in E2 protein dataset was assessed using BLAST [27] search. The sequence from E2 protein dataset with the highest identity score was used as the reference sequence. This step also provided a consistent sequence numbering scheme of positions within the MSA results for validated strains.

Identification of B-cell epitope and surrounding area
Usintg crystal structures of the antigen-antibody complex, we defined antigen-binding sites (B-cell epitopes) as described previously [28,29]. This was done using both the measurements of residue Accessible Surface Area (ASA) and the minimum atom distance to the antibody. a) For each residue on antigen protein, the ASA value was calculated using Naccess [28] software for free antigen and for antigen coupled with the corresponding antibody. Residues r i with ASA loss more than 20% were selected as designated epitope residues, The majority of contacts between two interacting atoms occur at <5Å separation. Euclidean distance was calculated between atom a i and a j with their coordinates a i (x i , y i , z i ) and a j (x j , y j , z j ) in the PDB structure data, Antigen residues r i whose minimum atom distance to the binding antibody is less than 4Å were also incorporated in epitope. The least atom distance was defined as d min = min d ij , a i ∈ antigen residue r i , a j ∈ antibody residue r j The residues that satisfy either of these two conditions (ASA loss or the minimum distance thresholds) were considered to constitute a B-cell epitope.
For the definition of surrounding area, we continued to use distance-based method: antigen residues with minimum atom distance to binding antibody less than 6Å, that are not B-cell epitope residues, were incorporated as components of the surrounding area.

Extraction of discontinuous motifs (functional motifs)
Based on the BLAST result, residue positions of a B-cell epitope and its surrounding area identified on the crystal structure were mapped onto its reference sequence, and further transferred to map onto all validated strain sequences ( Figure 1). For structure sequence or each of the validated strain sequences, a residue string from these epitope positions was recognized as a discontinuous motif. Since we do not have negative data (escape variants), all discontinuous motifs extracted from these strains were classified as neutralized motifs, which were recognized as functional in neutralizing assays.

Extraction of discontinuous peptides
The concept of discontinuous peptide [31] describes a virtual linear residue string generated from sequences that combines residues that form B-cell epitope that are not continuous in the protein sequence. Discontinuous peptides were extracted from the E2 protein dataset. Based on the BLAST and MSA results, the residue positions of B-cell epitope and its surrounding area were mapped onto its reference strain sequence, and then mapped onto all sequences in E2 protein dataset ( Figure 1). Patterns of discontinuous peptides were used to catalog all strains in the dataset, and they were compared to the functional neutralized motifs. Each discontinuous peptide that has unique sequence was termed a discontinuous motif.

Neutralizing antibody against HCV E2c protein
The mAb AR3C was known to neutralize HCV genotype 1, 2, 4 and 5. We performed the analysis of the structure of mAb AR3C complexed with HCV E2c. The B-cell epitope and its surrounding area in structure  [18]).
Comparing the discontinuous motifs on B-cell epitope, residues differing from the consensus are underlined, and the highly variable position lacking a consensus residue is underlined and in italics.

Figure 1
The workflow used in this study. The steps included: identification of a B-cell epitope and its surrounding area(key residues) from crystal structure, extraction of discontinuous motifs and peptides by key positions, cataloging and neutralization assessment of strains in E2 protein dataset by discontinuous peptides.

Functional motifs on B-cell epitopes and its surrounding area
The positions of B-cell epitope residues were extracted and mapped to all validated strain sequences. Functional motifs were retrieved with corresponding neutralizing information. Seven distinct discontinuous motifs (identical motifs were present across different strains) were extracted from the sequences of E2 protein structure and 10 validated strains.

Discontinuous peptides derived from B-cell epitopes
The positions of epitope residues were mapped onto all sequences in the E2 protein dataset. Amino acid string representing discontinuous peptide was extracted from each E2 protein sequence. Among all 5340 sequences in E2 protein dataset, there were 402 different combinations of discontinuous peptides (patterns), which reflect the high sequential variability of HCV virus. Five discontinuous peptides identical to discontinuous motifs from neutralized strains covered 14.06% strains population ( Figure 3A). The discontinuous peptides were further sorted according to their frequencies in the E2 protein dataset. Viewed by ranked frequencies, the top 10 most frequent discontinuous peptides covered more than 50% strains in the dataset, and top 25 discontinuous peptides covered nearly 80% of the total strain population ( Figures 3B and 3C). Top ranked discontinuous peptides and those identical to the discontinuous motifs extracted from the E2 protein dataset are listed in Table 2 along with their frequencies. The most frequent discontinuous peptide has coverage of 754 strains, while the second most frequent peptide covers 320 strains. There is no validation data for the 3 most frequent discontinuous peptides, while discontinuous motifs ranked 4 th , 6 th , 11 th , 12 th , and 26 th in the list were shown to be neutralizing. The neutralization potential of these un-tested discontinuous motifs could be estimated by comparing the composing amino acids to the validated motifs. The 1 st ranked discontinuous peptide (ILNCNDSLGIALFYKCW) is different from the 4 th ranked discontinuous peptide (ILNCNDSL-GLALFYRCW, which is a neutralized motifs) in two positions: 10 th residue L->I, and 15 th residue R->K. Since both residues share similar chemical features, it is possible that the HCV strains with 1 st discontinuous peptides could be neutralized by the mAb AR3C. Also, the two different residues have been shown in other validated neutralized motifs: the 26 th ranked (ILNCNDSLGIAL-FYSCW) and 6 th ranked (ILNCNESLGLALFYKCW) discontinuous peptides. From the reported neutralizing data, we derived the consensus sequence for B-cell epitope ILNCNDSLGIALFYKCW and experimentally verified E2 neutralizing motif I-L-N-C
WebLogo [29] and BlockLogo [30] were generated for all the discontinuous peptides extracted from E2 protein dataset. Among the 17-residue B-cell epitope, most of the positions are quite conserved, as shown in WebLogo figure ( Figure 4A). However, the BlockLogo figure shows a large number of different combinations and the high diversity of this binding site ( Figure 4B). The neutralized motifs cover 14.06% of strain sequences in the E2 protein dataset, while the other discontinuous peptides that cover 85.94% of the strains lack validated data ( Figure 5). Viewed by the genotype, the neutralizing coverage of genotypes 1, 2 and 4 are approximately 20% (18.48%, 22.18% and 17.14% respectively), higher than those of genotype 3, 5 and 6. The overall known neutralized coverage on the dataset is low. Of 402 discontinuous peptides, 379 had a complete B-cell epitope and 15 had ambiguities in sequence (residue X). Eight sequences had disrupted B-cell epitope (patterns 38, 65, 93, 180, 214, 285, 385, and 387, Additional file 2) most likely representing non-viable viruses.

Discontinuous peptides on B-cell epitope surrounding area
The antibody binding and neutralization ability can possibly be affected by the B-cell epitope surrounding area. Identical discontinuous peptides on B-cell epitope alone cannot fully guarantee the same neutralization result. The analysis of surrounding area aims to provide a more detailed assessment about the potential neutralizing properties of the AR3C. The frequency distribution of different discontinuous peptides on surrounding area showed similarity to the results of B-cell epitope comparisons ( Figure 6). For the strains share identical discontinuous peptides on B-cell epitope, the discontinuous peptides on surrounding area have dominant patterns: the top 5 patterns cover as much as 50% of the strains. The result indicates that the residues that define AR3C epitope surrounding area do not affect B-cell epitope/antibody interaction independently of the actual B-cell epitope. Figure 3 An overview of discontinuous peptides in the E2 protein dataset. (A) The number of discontinuous peptides and the number of discontinuous motifs generated from E2 protein dataset; (B) The distribution of all discontinuous peptide patterns frequencies. The yellow and grey bars represent discontinuous peptides identical to the neutralized motifs and the ones without validation data yet, while the red line is their accumulative frequency; (C) The zoom-up view of top ranked discontinuous peptides frequencies, from (B).

Conclusions and discussion
Hepatitis C virus, with its extreme variability of sequence repertoire, is a difficult target for vaccine design. Compared to envelope glycoproteins in other virus, such as hemagglutinin protein from influenza virus and E protein from dengue (DENV) virus, the The top 26 most frequent (including discontinuous peptides identical to discontinuous neutralized motifs) among 402 different patterns of discontinuous peptides are listed. This table lists discontinuous peptide, their frequencies, accumulative frequency and neutralization validation status to mAb AR3C. The neutralized discontinuous motifs are underlined. The residues in italics and underlined, indicate the amino acids that are not presented in the specific position from these known neutralized discontinuous motifs (see Figure 1(c)).   [33]. In the current study, 5340 E protein sequences from HCV, which is a similar sequence set size as in dengue viruses, generated almost an order of magnitude larger diversity: 402 different discontinuous peptides at the mAb AR3C binding site have been identified. We assembled a HCV strains cataloguing method in this study. Strains with identical discontinuous peptides on B-cell epitope site were grouped and estimated to Figure 5 MAb AR3C neutralization assessment sorted by HCV genotype. The virus population coverage for each genotype and unclassified isolates, the number in bracket after each genotype indicate the number of strain sequences for specific genotype (based on data from E2 protein dataset). The proportion of discontinuous peptides identical to neutralized motifs is colored in pink, while non-validated in grey. For each genotype from left to right, the numbers of different motif patterns among these sequences are 280, 51, 70, 25, 12, 45 and 30. have similar neutralizing activity. For mAb AR3C, the discontinuous peptides on B-cell epitope from validated strains ranked 4 th , 6 th , 11 th , 12 th and 26 th , covered 14.06% of all 5,340 strains in the E2 protein dataset. The discontinuous peptide and frequency list could be used as guidance for the selection of representative strains for future systematic neutralizing antibody tests. For example, the most dominant discontinuous peptides among population should be tested for neutralization assay in priority. For mAbs generated in the future, the neutralization coverage among the strains with top dominant discontinuous peptide could be used as a guidance of how broadly neutralized the mAb could reach.
The neutralizing motif indicates that conservative replacements at positions 430 (N Q), 431 (D E) and 438 (L I) will likely not affect binding affinities sufficiently to abolish neutralization. In addition, position 446 has multiple residues observed in neutralized variants (K, N,S,R) and it appears not to affect antibody binding. By observation of common discontinuous peptides we argue This results presented here are based on the existing data. More comprehensive conclusions will be generated as additional neutralizing antibody structures are crystallized and more neutralization assays are performed in the future. Advances in computation and biotechnology enable more comprehensive analysis where all combinations of antibodies and antigens can be assessed in silico. The new methodology of Big Data analysis [35] enables the analysis of diverse data types where protein, nucleotide, structure, and functional data can be analyzed in combination. The wellannotated data are combined with specialized analytical tools, including statistical analyses, sequence analysis, and mathematical models to gain insights into biological processes, generate knowledge, and inform decisions about validation experiments. This study has shown that the majority of common HCV variants have not been studied in antibody neutralization studies. The knowledge of crossneutralization is, therefore, incomplete and there is an urgent need for designing libraries of viruses that will be representative of the majority of HCV strains. These libraries will enable systematic testing of strains against the panels of antibodies and enable the design of universal broadly protective HCV vaccines.