Analysis of genetic variation in human papillomavirus type 16 E1 and E2 in women with cervical infection in Xinjiang, China

Background Xinjiang is one of the regions with a high incidence of cervical cancer, and the genetic variation of human papillomavirus may increase its ability to infect the human body and enhance virus-mediated immune escape ability. Methods Sanger sequencing of the HPV16 genome from 165 samples positive for HPV16 infection and phylogenetic analysis of the E1 and E2 genes revealed the gene polymorphism of HPV16 in Xinjiang. Results The results showed that there were 109 samples with variations in HPV16 E1, 48 sites with nucleotide variations (19 missense variations and 29 synonymous variations), and 91 samples with variations in HPV16 E2, 25 sites with nucleotide variations (20 missense variations and five synonymous variations). Conclusions From the phylogenetic tree results, 149 samples were of the European variant and 16 samples were of the Asian variant. No African or North American/Asian variant types were found.

of the virus is generally divided into four major variation lineages, the A lineage: European prototype EUR(A1-A3), Asian As(A4); B lineage: African type I (AF-1); C lineage: African type II (AF-2); D lineage: Asian-American type (AA) and North American type (NA) [6]. The epidemiology of infection with HPV strains in different countries and regions varies; the rate of infection with HPV16 in women in Xinjiang is high, and most of them are European strains [7,8]. The HPV16 genome consists of six early genes (E1, E2, E4, E5, E6, E7), two late genes (L1, L2), and an upstream regulatory region (URR). E1 and E2 are highly conserved gene sequences that regulate viral genomic replication [9]. HPV E1, one of the most conservative proteins, plays a central role in initiating HPV DNA replication. The E2 protein is a key protein in the viral life cycle, and plays an important role in transcriptional regulation, initiation of DNA replication, and viral genome segmentation [10,11]. The structure of the HPV16 E2 protein is similar to that of a typical transcription factor, with a trans-activation domain and a carboxyterminal DNA-binding domain separated by a variable hinge region. At the origin of DNA replication, E2 interacts with the HPV E1 replication helicase to promote cellular DNA replication [12]. HPV E2, through interaction with chromatin adapter proteins, binds the viral free genome to the host mitotic staining system during the division of infected cells, playing an important role in the segmentation of the viral genome [13]. The deletion of the E2 structure of the HPV16 virus gives rise to worse clinical consequences in patients with HPV16-positive tumors [14]. The integration of viral DNA, the destruction of E1 or E2 gene, and the loss of the inhibitory activity of the E2 protein on the early promoter are important steps in the process of malignant transformation [15][16][17]. The E1 and E2 proteins of HPV16 play an important role in the process of HPV infection in host cells. Therefore, studying the impact of amino acids on HPV infection is of great significance for the occurrence and development of cervical cancer.
Variations at the nucleotide sites of HPV16 E1 and E2 are associated with the development of cervical cancer [18,19]. Therefore, analyzing the gene polymorphism of female cervical cancer in Xinjiang helps us to better understand the relationship between HPV gene variation and cervical cancer (Table 1).

Variation analysis of HPV16 E1 and E2 genes
HPV16 E1 and E2 gene variation analysis: A total of 165 DNA samples positive for HPV16 infection were sequenced, and finally 115 samples mutated in E1 and E2 were obtained. The HPV16 prototype (European prototype, GenBank accession number: NC_001526.2) was used as the standard strain for comparison. The polymorphic sites are shown in Tables 2 and 3 (End of article).

Phylogenetic tree analysis of nucleotide sequence of HPV16 E1 and E2
The phylogenetic tree was constructed using the N-J (neighbor-joining) method, the bootstrap method (1000 replications), and the Kimura two-parameter model. A bootstrap value > 50% indicates credibility, and a bootstrap value > 70% indicates high credibility. The nodes with bootstrap value < 50% were hidden in the evolutionary tree diagram in Fig. 1 50 were all associated with the T2410G variation and all were Asian strains.

Genetic variation of genomic HPV16 E1 in case and control groups
Samples with pathological information were counted, including 66 cases in normal group and 50 cases in cervical cancer group. All listed in the Table 4 are misspelled variations. There were 9 variations in the normal group and 11 variations in the cancer group. The variation of two sites showed that there was significant difference between the case group and the control group (P < 0.05). They are A978G(P = 0.047) and G1473A(P < 0.001), The amino acid changes were I326M and M491I, respectively. The E1 protein is a viral replication protein conserved in papillomaviruses, and plays roles in inducing the DNA damage response and disturbing the normal cell cycle, depending on its DNA binding and ATPase/helicase domains [20]. The difference between the case group and the control group can further explain that the variations at these two sites may promote the expression of replication protein E1, thus reducing the host cell defense capacity, and finally leading to the occurrence and development of cervical cancer. Table 5 shows the HPV16 E2 gene variation in normal group and cervical cancer group. All listed in the Table 5 are missense variations. There were 13 variations in the normal group and 16 variations in the cancer group. The variation of some sites showed that there was significant difference between the case group and the control group (P < 0.05). HPV16 E2 variations located at site G1964A (P = 0.017), C2295A (P = 0.009) and G2385A (P = 0.004) in the transactivation domain, The amino acid changes were D25N, T135K and R165Q, respectively. The sites T2520C (P = 0.002), C2546T (P < 0.001) and G2585A (P = 0.009) in hinge region, with corresponding amino acid changes of I210T, P219S and E232K, respectively. The sites C2820A (P = 0.002), C2873T (P = 0.044) and C2923A (P = 0.001) in DNA binding domain, the amino acid changes were T310K, H328Y and D344E. Since the E1 protein binds to the trans-activation domain of the E2 protein and co-localizes at the start site of transcription, it co-regulates viral transcription [21,22], The variation of hinge region and DNA binding domain will affect the binding of E2 protein to LCR and inhibit the expression of E6 and E7 protein [23].

Discussion
Some nucleotide variations in HPV16 affect amino acid changes and may also affect protein expression, thus, affecting the development of cervical cancer. Most studies on HPV nucleotide polymorphism have focused on E6 and E7, and variation of E6 and E7 can affect the occurrence and development of cervical cancer [7]. Variations that destroy the ORF(open reading frame) of HPV16 E1 or E2 lead to a further increase in the immortalization potential of human keratinocytes [16]. There is a 63-base insertion sequence between nt510 and 511 of HPV16 E1, and this mutant is associated with viral oncogenic activity and viral integration [24]. Variation of T310K in the HPV16 E2 DNA binding domain may alter cellular transcription factors, where the E2 protein interacts with LCR, resulting in enhanced expression of E6 and E7 proteins [25]. E232K is a linked variation in the E2 hinge region of HPV 16 AS that enhances dose-dependent inhibition of LCR (Long control region) activity and may affect the potential of the virus to cause cancer [26].
Of the 165 samples with HPV16 infection analyzed in Xinjiang, 149 samples were European strains and 16 samples were Asian strains; no African or North American/ Asian strains were found. Nucleotide variation of HPV16 E1 occurred in 109 samples with 48 variation sites, and the amino acids of 19 samples were changed. The most common nucleotide variation sites were A189C (18/109, 16 The variation of some sites showed that there was significant difference between the case group and the control group (P < 0.05).In our research, the variation of two sites in E1 showed that there was significant difference between the case group and the control group (P < 0.05). They are A978G (P = 0.047) and G1473A (P < 0.001), The amino acid changes were I326M and M491I, respectively. HPV16 E2 variations located at site G1964A (P = 0.017), C2295A (P = 0.009) and G2385A (P = 0.004) in the transactivation domain, The amino acid changes were D25N, T135K and R165Q, respectively. The sites T2520C (P = 0.002), C2546T (P < 0.001) and G2585A (P = 0.009) in hinge region, with corresponding amino acid changes of I210T, P219S and E232K, respectively. The sites C2820A (P = 0.002), C2873T (P = 0.044) and C2923A (P = 0.001) in DNA binding domain, the amino acid changes were T310K, H328Y and D344E.These base variations will lead to amino acid variations, which may affect the expression of proteins. In the future, we will study the role of HPV16E1 and E2 variants in the occurrence and development of cervical cancer based on experiments studying cell function.

Conclusion
From the phylogenetic tree results, 149 samples were of the European variant and 16 samples were of Asian variant type. No African or North American/Asian variant types were found.

Phylogenetic analysis of HPV16 variants
The sequencing results were analyzed for single nucleotide polymorphisms (SNPs) using the Polyphred software, compared to the European standard prototype

Statistical analysis
The frequency of each mutation in the HPV16 E1 and E2 genes was determined by direct enumeration. A chisquare test was performed to determine the association between HPV16 E1, E2 variants and cervical cancer. Statistical analysis was performed using SPSS 17. P-values < 0.05 were considered statistically significant.