In silico design of recombinant multi-epitope vaccine against influenza A virus

Maleki, Avisa; Russo, Giulia; Parasiliti Palumbo, Giuseppe Alessandro; Pappalardo, Francesco

doi:10.1186/s12859-022-04581-6

Volume 22 Supplement 14

Selected papers from the 4th International Workshop on Computational Methods for the Immune System Function (CMISF 2020)

Research
Open access
Published: 02 February 2022

In silico design of recombinant multi-epitope vaccine against influenza A virus

Avisa Maleki¹^na1,
Giulia Russo²^na1,
Giuseppe Alessandro Parasiliti Palumbo¹ &
…
Francesco Pappalardo²

BMC Bioinformatics volume 22, Article number: 617 (2021) Cite this article

4357 Accesses
17 Citations
2 Altmetric
Metrics details

Abstract

Background

Influenza A virus is one of the leading causes of annual mortality. The emerging of novel escape variants of the influenza A virus is still a considerable challenge in the annual process of vaccine production. The evolution of vaccines ranks among the most critical successes in medicine and has eradicated numerous infectious diseases. Recently, multi-epitope vaccines, which are based on the selection of epitopes, have been increasingly investigated.

Results

This study utilized an immunoinformatic approach to design a recombinant multi-epitope vaccine based on a highly conserved epitope of hemagglutinin, neuraminidase, and membrane matrix proteins with fewer changes or mutate over time. The potential B cells, cytotoxic T lymphocytes (CTL), and CD4 T cell epitopes were identified. The recombinant multi-epitope vaccine was designed using specific linkers and a proper adjuvant. Moreover, some bioinformatics online servers and datasets were used to evaluate the immunogenicity and chemical properties of selected epitopes. In addition, Universal Immune System Simulator (UISS) in silico trial computational framework was run after influenza exposure and recombinant multi-epitope vaccine administration, showing a good immune response in terms of immunoglobulins of class G (IgG), T Helper 1 cells (TH1), epithelial cells (EP) and interferon gamma (IFN-g) levels. Furthermore, after a reverse translation (i.e., convertion of amino acid sequence to nucleotide one) and codon optimization phase, the optimized sequence was placed between the two EcoRV/MscI restriction sites in the PET32a⁺ vector.

Conclusions

The proposed “Recombinant multi-epitope vaccine” was predicted with unique and acceptable immunological properties. This recombinant multi-epitope vaccine can be successfully expressed in the prokaryotic system and accepted for immunogenicity studies against the influenza virus at the in silico level. The multi-epitope vaccine was then tested with the Universal Immune System Simulator (UISS) in silico trial platform. It revealed slight immune protection against the influenza virus, shedding the light that a multistep bioinformatics approach including molecular and cellular level is mandatory to avoid inappropriate vaccine efficacy predictions.

Background

Influenza has been for centuries a significant contributor to mortality and continues to be a significant threat to public health worldwide [1, 2]. The influenza virus belongs to the Orthomyxoviridae family and is divided into four subtypes: A, B, C, and D [3]. The influenza virus genome consists of several cRNA-segments which facilities viral variation by the mechanism of genetic reassortment [4]. The influenza A viruses have been responsible for causing the flu pandemic [5]. Influenza A virus structural proteins include hemagglutinin (HA) and neuraminidase (NA), which appear extensively on the lipid coating and serve the classify the virus. Currently, 18 HA and 11 NA subtypes are known, and 131 subtypes have been identified in nature [6]. HA protein can be divided into two functional domains, head and stem, encompassing highly conserved regions too; receptor-binding site (RBS) and the fusion peptide, respectively [7]. There are also two internal proteins: matrix protein (M1) and membrane matrix protein (M2). The M2 protein from the influenza A virus is crucial for infection. While the influenza A virus evolves rapidly with frequent mutation, the M2 protein, compared with other proteins encoded by the genome, comprises highly conserved residues [8]. These variations originate from two mutations: antigenic shift and antigenic drift, which allows the influenza virus to evade the human immune system [9]. Antigenic shift is caused by the substitution of hemagglutinin and sometimes neuraminidase through gene reassortment. New subtypes have not appeared in human viruses for a long time. Antigenic drift is caused by frequent point mutations during virus replication, affecting the antibody-binding sites in the HA protein, NA protein, or both.

Several vaccines have been developed for prophylaxis against human influenza viruses with the main target of HA. However, the function of these vaccines is limited due to the high mutation rate in the antigenicity of HA, short time for production, and the host's immune system. Consequently, vaccines are required to be frequently reformulated [10, 11]. Moreover, it is possible that sometimes the antigenicity of the vaccine does not match the epidemic viruses. One approach for improving the efficacy of vaccines is the approach of predicting the specific influenza A subtype that will be prevalent in a particular year. Prediction accuracy has decreased because of random genetic drift, incomplete samples of viruses that cause epidemics, and lack of knowledge regarding the evolution mechanism of sequences [12].

During the last decade, complex calculation techniques have been developed for predicting virus lineages, detecting genetic variations, and their functional impact. These techniques, such as in silico trials or thermostatted kinetic theory methods [13], ought also to be instrumental for vaccine design [14]. In silico trials use individual computer simulations to generate or evaluate a pharmaceutical product, medicinal equipment, or medical intervention. In the medical context they play a significant role in all aspects of diseases: prevention by designing and developing vaccines, diagnosis, prognostic appraisal, and prediction of the efficacy of specific treatment strategies [15]. In particular, considering the high mutation rate and evolutionary procedure in HA and NA, it is assumed that the conserved parts play a remarkable role in vaccine design [16]. In addition, the highly conserved M2 protein is valuable in the stability and improvement of vaccine function as it has 23 residues located outside the virus and assists M2 protein for the virion function [17, 18]. In this work, we evaluated the conserved parts of HA, NA, and M2, among the seven pathogenic strains, especially in Asia: H1N1, H1N2, H3N2, H5N1, H7N3, H7N9, and H9N2 by in silico method and combination as a single protein that can activate human humoral and cellular immunity [19,20,21].

The combination of epitope prediction tools and vaccine design methodologies alone do not frequently produce sufficient piece of evidence to evaluate the global immune response elicited by the vaccine under investigation. Agent based modeling can provide additional information useful to assess immune system elicited response at a cellular and organ level, closing the circle. For example, immune entities dynamics is revealed also in antigenic competition environment: this is not clearly predictable using only epitope prediction tools.

Results

After applying this immunoinformatic procedure, related results of each step are reported below.

Retrieving influenza protein sequences and multiple alignments

Amino acid sequences with FastA format for HA, NA, and M2 proteins strains were extracted from the NCBI database (Additional file 1). After multiple alignments by Jalview, consensus sequences for HA, NA, and M2 consist of 582, 257, and 487 amino acids, respectively.

B-cell epitopes prediction

Epitopes with a length 10 to 20 were extracted from IEDB, and from SVMTriP only epitopes with a score above 0.5 were collected. Finally, 15 epitopes for HA, 11 epitopes for M2, and 12 epitopes for NA were chosen from these B-cell prediction tools.

CTL epitopes prediction

15 supertype A2 ligand, 18 supertype A3 ligands, and 11 supertype B7 ligands were predicted for HA, M2, and NA proteins (Consensus peptide sequences) using NetCTL 1.2 server. Epitope identification threshold was set to 1; weight on C terminal cleavage, and TAP transport efficiency were set at to default.

CD4 T cell epitopes prediction

A total of 40 strong bound epitopes without repetition were predicted using NetMHCIIpan–4.0 for human alleles HLA-DR, HLA-DQA1, and HLA-DQB1 (DRB1_1303, DRB1_1302, DRB1_1401, DRB1_0701, HLA-DQA10103-DQB10603, HLA-DQA10102-DQB10604, HLA-DQA10104-DQB10503, HLA-DQA10201-DQB10202, and HLA-DQA10201-DQB10303). NetMHCIIpan–4.0 web server was used based on their IC50 scores, and all parameters were set to default.

Antigenicity and allergenicity prediction of CTL, CD4 T cell, and B cell epitopes

To select epitopes for the final recombinant vaccine, we evaluate the antigenicity, allergenicity, and toxicity of all 122 peptides (Additional file 2); then, we opted for non-allergenic and non-toxicity epitopes, which are antigens for the recombinant vaccine. Vaxigen provided antigenicity score for virus model is equal to 0.73 while AllerTOP 2.0 server predicted that the final recombinant vaccine is non-allergenic.

Human population coverage analysis

Worldwide human population coverage analysis predicted that T-cell epitope based on the combination of HLA-I and HLA-II can cover 90.78% of the human population.

Recombinant multi-epitope vaccine

The final vaccine, after considering some parameters for three adjuvants (PI, weight, half-life, etc.) has 813 amino acids and consists of a total of 40 epitopes including 11 CTL, 16 CD4 T cell, and 13 B cell peptides sequences (Table 1) (Additional file 3). The Adjuvant (A 50 S ribosomal protein L7/L12) was linked to N-terminal by EAAAK linker, and CTL, CD4 T cell, and B cell epitopes were merged using AYY, GPGPG, and KK linkers. AAY linkers significantly affect the expression of the target proteins and improve the immunogenicity of the multi-epitope vaccine. The significant feature of the GPGPG linker deals with its ability to break the junctional immunogenicity, which is caused by the amendment of the immunogenicity of each epitope, and GPGPG linkers have illustrated the ability to induce CD4 T cell responses which are essential for a multi-epitope vaccine. While the KK linker decreases the junctional immunogenicity by preventing the induction of antibodies for the peptide sequence that each epitope can form when joined linearly [22]. All linkers have pivotal roles in providing an extended conformation (flexibility), assisting folding, separating protein domains, and generally making the recombinant multi-epitope vaccine structure more stable [23]. Hence, from a general point of view, the possibility of introducing new "fake" epitopes in the linking regions would not represent a concrete issue to our best knowledge. A 6xHis tag was added to the C-terminal of the generated vaccine to increase protein purification and identification. The recombinant multi-epitope vaccine comprises several ectodomain locations, glycosylation sites, and solvent-accessible regions; while the selected B-cell epitopes shows averagely a score about 0.2 which mean the presence of suitable Relative Surface Accessibility regions (RSA).

Table 1 List of all the epitopes used in the construction of the recombinant multi-epitope vaccine

Full size table

Evaluation of physicochemical properties and solubility prediction

The molecular weight (MW) of the final vaccine is 87.3 KDa. The predicted theoretical pI is 9.35, and based on the pI of this protein is basic. The vaccine consists of 83 negatively charged residues and 108 positively charged residues. Half-life was estimated to be 30 h mammalian reticulocytes in vitro, > 20 h yeast in vivo, and > 10 h Escherichia coli in vivo. The formula is C3878H6146N1088O1171S18, and the total number of atoms is 12301. The Instability Index (II) is computed to be 27.74 and classifies the protein as stable. A protein with an instability index greater than 40 is unstable. The Aliphatic index was estimated to be 70.69, indicating thermostability. Furthermore, the last property is GRAVY which was predicted to be − 0.547. A negative GRAVY value indicates that the protein is non-polar and hydrophilic. The recombinant vaccine was evaluated as a soluble protein with a solubility score of 0.49.

Secondary structure prediction of the recombinant vaccine

According to the data obtained from PSIPRED, the final vaccine consists of 16% alpha-helix, 21% beta-sheet, and 61% coil, and 137 (16%) positions predicted as disordered. Predicting disordered regions is based on the cut-off value at 0.25 (Fig. 1). Another property is solvent accessibility, divided into three states by two cut-off values: 10% and 40%. This means that the three states have equal distribution: buried for less than 10%, exposed for larger than 40%, and medium for between 10 and 40%. Solvent accessibility was predicted to be 53% exposed, 24% medium exposed, and 22% buried.

Codon adaption and in silico cloning of recombinant vaccine

JAVA Codon Adaptation tool was performed to optimize codon usage of the vaccine in E. coli (strain K12) for high protein expression. The optimized codon sequence length for a multi-epitope recombinant vaccine with 813aa was 2439 nucleotides. CAI value for optimized nucleotide sequence was 0.97, and CG-content of sequence was 50.88%, representing the excellent possibility expression of the recombinant vaccine in the E. coli host. SnapGene software was used to insert adapted codon sequences into pET32a⁺ vector by assisting EcoRV and MscI restriction enzymes. The final product (vector and optimized codon sequence) consists of 8194 bp (Fig. 2).

In silico trial immune simulation

UISS computational platform was used to predict the immune simulation of the final recombinant multi-epitope vaccine. Here, we show in silico results of two specific scenarios in an average patient: (i) immune system dynamics after influenza exposure, (2) immune system dynamics after vaccine administration, and (3) immune system response to recombinant multi-epitope vaccine administration in presence of influenza exposure. In the first scenario, the peak level of IFN-g is about 1 × 10⁶ molecules at day 50 (Fig. 3, panel A), while in the second one, its level (about 1.6 × 10⁶ molecules is considerably higher than after influenza exposure at day 25 (Fig. 3, panel B). Figure 3, panel C shows a higher second peak as to highlight the effect of the vaccination in response to influenza challenge. Furthermore, the recombinant multi-epitope vaccine response is characterized by high levels of IgG, approximately 130,000 titers (Fig. 4, panel B), while after influenza exposure, IgG level is fewer (24,000 titers) compared to the one after vaccine simulation (Fig. 4, panels A–C). The recombinant multi-epitope vaccine responses demonstrate a notable increase in the number of TH1 cells (about 16,000 at day 30 (Fig. 5, panel B)). However, after influenza exposure, this amount is approximately 1000 cells at day 50 (Fig. 5, panel A). Figure 5, panel C, shows a higher second peak as to highlight the effect of the vaccination in response to influenza challenge.

Still, after influenza exposure, the number of infected lung epithelial cells is slightly higher than in the vaccine administration scenario (Fig. 6, panels A-B). This means that the proposed multi-epitope vaccine could elicit an immune response that partially protects from the infection.

Discussion

Influenza is one of the most significant contagious respiratory infection diseases, and despite vaccination, it is still one of the leading causes of mortality and threatens worldwide public health [24]. The generation of new multi-epitope vaccines brings various advantages in comparison to other approaches. Infectious substances or perilous sequences can be extracted, thus reducing the risk of undesired host reactions. Furthermore, multi-epitope vaccines are not at risk of relapse, because are weak or live vaccines [25]. Also, from a pharmaceutical point of view, multi-epitope vaccines demonstrate some desirable properties. Because multi-epitope vaccines are based on chemically well-characterized peptides, they can be produced efficiently and cost-effectively. The multiple-epitope vaccine can cover a wide range of pathogens or strains of a particular pathogen, especially for highly variable pathogens such as influenza virus, which faces several mutations and generates novel variants [26].

Animal studies demonstrate that T lymphocytes can induce a protective immune response against the influenza virus by identifying proteins processed and delivered by MHC molecules. CTL can detect several epitopes in the HA structure. Due to this fact, the response of CTL to epitope vaccines is entirely dependent on the structure of the HLA molecule. Therefore, in designing multi-epitope vaccines, T lymphocyte epitopes should be selected according to their power to elicit a response in the most of the population [27, 28]. In addition to T lymphocytes, the importance of CD4 + cells has also been considered during the immune response to the influenza virus [29]. By identifying the peptides provided by MHCII molecules, they initiate and amplify the dependent responses of CD8 + and B lymphocytes against influenza virus infection [30]. Conserved regions in HA, NA, and M are the main target to design recombinant protein as a multi-epitope vaccine which can be presented by both MHCI and MHCII and activates cellular or humoral responses.

A trial platform such as UISS computational framework is helpful in evaluating the goodness of vaccine efficacy designed through available bioinformatics tools, enhancing their success probability when tested in pre-clinical and clinical settings. However, a multi-epitope vaccine has some limitations; for instance, one of the significant limitations of a multi-epitope vaccine that most epitope prediction tools do not suitably consider is the need to distinguish proper antigen processing sites that can lead to the prediction and presentation of predicted epitopes. Because the composition of antigen processing mechanisms varies based on proinflammatory signals and can vary among different cell classes, currently existing prediction algorithms may not be proper to evaluating the processing effectiveness of viral antigens in an infected target cell [31].

Here, we evaluated HA, NA, and M2 proteins in pathogenic strains in Asia (H1N1, H1N2, H3N2, H5N1, H7N3, H7N9, and H9N2). Consensus sequences for each protein were identified after extracting and blasting sequences of HA, NA, and M2 proteins for seven pathogenic strains. Consensus sequences comprise highly conserved residues. Then, B-cell linear, CTL, and CD4 T cell epitopes were predicted, and epitopes with high scoring and high affinity were selected for calculating antigenicity, allergenicity, and toxicity for the individual peptides, as well as for the entire vaccine. Vaxijen v2.0 default threshold for showing antigenicity is equal to 0.4; therefore, epitopes with scores above 0.4, non-toxic, and non-allergenic, were chosen for designing a recombinant vaccine. To select the suitable adjuvant, three peptides were evaluated: a 50 S ribosomal protein L7/L12, H9E, and MDA5. L7/L12 seems to be a more appropriate choice. The past study reported that AAY, GPGPG, and EAAAK linkers were used between the predicated epitopes to generate a sequence with minimized junctional immunogenicity, allowing the rational design of a potent recombinant multi-epitope vaccine. Codon optimization was carried out to achieve high-level expression of the recombinant multi-epitope vaccine in the 12 K strain of E. coli. CAI value for optimized nucleotide sequence was 0.97, and CG-content was equal to 50.88%, showing the excellent possibility of expression of the multi-epitope vaccine.

Conclusions

This study deals with the design of a recombinant vaccine against influenza A, especially against seven pandemic strains in Asia (H1N1, H1N2, H3N2, H5N1, H7N3, H7N9, and H9N2), based on conserved residues of HA, NA, and M2 proteins. B cell linear, CTL, and CD4 T cell epitopes were predicted using online servers, and after spreading high scoring and high-affinity epitopes, antigen, non-allergic and non-toxic epitopes were selected for the recombinant vaccine. Epitopes were linked together by several different linkers to reduce junctional immunogenicity. Population coverage was calculated, and this recombinant vaccine can cover 90.78% of the worldwide population. Then, codon optimization was carried out for cloning and expression of the vaccine in E. coli (strain K12). CIA and CG-content indicate a high level of expression in E. coli. Then, the recombinant vaccine was inserted into the pET32a⁺ vector by assisting EcoRV and MscI restriction enzyme for cloning. The resulting suggested vaccine formulation was found with a high immunogenicity score. However, further investigations conducted with UISS in silico platform highlighted a partial immune system protection response elicited by the designed multi-epitope vaccine formulation. A multistep bioinformatic approach would hence ameliorate the vaccine development pipeline enhancing the probability of keeping good results in pre-clinical and clinical settings. The recombinant multi-epitope vaccine is an entirely hypothetical protein construct with no experimental verified epitopes; therefore, we can claim that all positive results obtained belong to the in silico level. Further experimental studies, along with epitope confirmation, should be performed.

Methods

In this section, the specific steps involved in designing the recombinant multi-epitope vaccine against influenza are reported in detail through specific subparagraphs. In parallel, a sketch of the entire workflow of the multi-bioinformatic workflow is depicted in Fig. 7.

The online services have been all accessed on August, 10th 2021.

Retrieving influenza protein sequences and multiple alignments

The amino acid sequences of HA, NA, and M2 proteins for seven strains (H1N1, H1N2, H3N2, H5N1, H7N3, H7N9, and H9N2) have been revealed from the NCBI database [32]. These seven strains include chicken, swine, and goose sequences to cover a wide range of influenza viruses. Separately, multiple alignments were performed by Jalview software based on the Muscle algorithm for seven strains of HA, seven strains of NA, and seven strains of M2 to identify consensus sequences for each protein [33] (Additional file 1).