Bioinformatics Analysis of Spike Proteins of Porcine Enteric Coronaviruses

This article is aimed at analyzing the structure and function of the spike (S) proteins of porcine enteric coronaviruses, including transmissible gastroenteritis virus (TGEV), porcine epidemic diarrhea virus (PEDV), porcine deltacoronavirus (PDCoV), and swine acute diarrhea syndrome coronavirus (SADS-CoV) by applying bioinformatics methods. The physical and chemical properties, hydrophilicity and hydrophobicity, transmembrane region, signal peptide, phosphorylation and glycosylation sites, epitope, functional domains, and motifs of S proteins of porcine enteric coronaviruses were predicted and analyzed through online software. The results showed that S proteins of TGEV, PEDV, SADS-CoV, and PDCoV all contained transmembrane regions and signal peptide. TGEV S protein contained 139 phosphorylation sites, 24 glycosylation sites, and 53 epitopes. PEDV S protein had 143 phosphorylation sites, 22 glycosylation sites, and 51 epitopes. SADS-CoV S protein had 109 phosphorylation sites, 20 glycosylation sites, and 43 epitopes. PDCoV S protein had 124 phosphorylation sites, 18 glycosylation sites, and 52 epitopes. Moreover, TGEV, PEDV, and PDCoV S proteins all contained two functional domains and two motifs, spike_rec_binding and corona_S2. The corona_S2 consisted of S2 subunit heptad repeat 1 (HR1) and S2 subunit heptad repeat 2 (HR2) region profiles. Additionally, SADS-CoV S protein was predicted to contain only one functional domain, the corona_S2. This analysis of the biological functions of porcine enteric coronavirus spike proteins can provide a theoretical basis for the design of antiviral drugs.


Introduction
The pathogenic coronaviruses including porcine transmissible gastroenteritis virus (TGEV), porcine epidemic diarrhea virus (PEDV), swine acute diarrhea syndrome coronavirus (SADS), and porcine deltacoronavirus (PDCoV) have been found to be able to infect the intestinal tract of pigs [1]. Porcine intestinal diseases caused by these viruses are widespread in the world, causing serious losses to the pig industry. These four viruses are collectively referred to as porcine enteric coronaviruses.
Porcine enteric coronaviruses belong to the enveloped single-stranded and positive-sense RNA viruses of the order Nidovirales, Coronaviridae, and Coronavirus subfamily. The subfamily of coronaviruses is further divided into 4 genera according to the differences in genome sequence, which are called α, β, γ, and δ coronavirus. TGEV, PEDV, and SADS-CoV are members of the α genus, while PDCoV belongs to the δ genus [2,3]. The genomes of coronaviruses contain four structural proteins: nucleocapsid (N), spike (S), envelope (E), and membrane (M). The S glycoprotein forms an 18-23 nm tall spike on the surface of the coronavirus, which is a typical type I virus fusion membrane protein [4,5]. It can specifically bind to host cell receptors and mediate the invasion of viruses into susceptible cells and then determine the tissue tropism and host range of the virus [6].
The length of coronavirus S protein will change after glycosylation modification, and the molecular weight will also change accordingly. S protein is a homotrimer, and each monomer is divided into two regions, S1 and S2. S1 protein has a spherical structure and contains two independent functional subdomains, S1 subunit N-terminal (S1-NTD) and S1 subunit C-terminal (S1-CTD). S1 protein contains the corresponding receptor binding domain (RBD) that can bind to the host cell membrane [7]. The S1 carboxy-terminal domain can make the virus close to the surface of the host cell. The carboxy-terminal of S2 constitutes the stem of the spinous protein, which is mainly a highly conserved spiral structure. S2 consists of two heptapeptide repeat regions (HR1 and HR2), a transmembrane region (TM), and a fusion peptide (FP) region. S2 participates in the fusion between viral and host cellular membranes, cytopathic changes, and virus replication, as well as virus assembly and release [8]. Coronavirus receptor is an important factor in determining its host range and tissue tropism. The research about the determination of the coronavirus receptor and the binding mechanism between the virus and the receptor is beneficial for the prevention of new viruses and the development of related therapeutic drugs.
Studies have shown that TGEV enters cells by binding to porcine aminopeptidase N (pAPN) on the target cell membrane and using sialic acid as a cobinding factor [9]. Previous studies have shown that pAPN has also been identified as a functional receptor for PEDV, but the results of later studies are controversial with previous reports [10,11]. PDCoV can use APN of multiple species as functional receptors, which is the main source of its cross-species transmission [12]. However, SADS-CoV does not use APN as its invasion receptor [13].
This article analyzed and compared the biological functions of the S protein of these four porcine enteric coronaviruses using bioinformatics software. The physical and chemical properties, transmembrane region, signal peptides, functional domains, protein modifications, and antigenic epitopes of porcine enteric coronavirus S protein were analyzed through bioinformatics software. Analyzing the biological functions of the S protein is helpful to study the biological characteristics of porcine enteric coronaviruses and at the same time provides data for the modification of the spinous protein and the design of antiviral drug molecules.

Bioinformatics
Software. The physical characteristics and general biological characteristics of the porcine intestinal coronavirus S protein were calculated by the ProtParam and ProtScale tools on the ExPASy server. Through TMHMM Server v.2.0, SignalP4.0, NetPhos 3.1 Server, and NetNGlyc 4.0 Server software, we predicted the transmembrane region (transmembrane helix (TMH)), signal peptide, phosphorylation site, and glycosylation site of S protein, respectively. At the same time, the amino acid sequences of the S protein of porcine enteric coronavirus were submitted to Predicting Antigenic Peptides, SMART, and PROSITE which were used to perform the prediction of the epitopes, functional domains, and motifs of each S protein sequence. Multisequence alignment of porcine enteric coronavirus S protein was analyzed by Clustal Omega (Table 2).

Physical and Chemical Properties of Porcine Enteric
Coronavirus Spike Proteins. Upload the amino acid sequences of porcine enteric coronavirus spike proteins to the ProtParam online software, respectively. TGEV H16 S protein encoded 1448 amino acids, and its molecular weight and isoelectric point were 159888.38 and 5.26, respectively. The protein contained 126 negatively charged residues and 100 positively charged residues. The instability index of TGEV H16 S protein was 30.3, the aliphatic index was 90.97, and the grand average of hydropathicity was 0.035. PEDV CV777 S protein included 1383 amino acids, and its molecular weight and isoelectric point were 151352.74 and 5.11, respectively. The protein contained 117 negatively charged residues and 85 positively charged residues. The instability index of PEDV CV777 S protein was 32.6, the aliphatic index was 93.21, and the grand average of hydropathicity was 0.123. SADS-CoV GDS04 S protein contained 1130 amino acids, and its molecular weight and isoelectric point were 125996.51 and 6.46, respectively. The protein had 92 negatively charged residues and 87 positively charged residues. The instability index of SADS-CoV GDS04 S protein was 31.83, the aliphatic index was 84.77, and the grand average of hydropathicity was -0.029.
PDCoV HNZK-04 S protein encoded 1159 amino acids, and its molecular weight and isoelectric point were 128074.64 and 5.67, respectively. The protein contained 89 negatively charged residues and 73 positively charged residues. The instability index of PDCoV HNZK-04 S protein was 31.94, the aliphatic index was 93.96, and the grand average of hydropathicity was 0.027. Then, the system automatically generates a list of the physical and chemical properties of related proteins, and the analysis results are shown in Table 3.

Hydrophilicity and Hydrophobicity of Porcine Enteric
Coronavirus Spike Proteins. The ExPASy-ProtScale software was used to analyze the amino acid sequences of porcine enteric coronavirus spike proteins for hydrophilicity and hydrophobicity, respectively. The asparagine (Asn) at positions 953 and 954 of TGEV H16 S protein had the strongest hydrophilic value of -2.967, and leucine (Leu) at position 1398 had the strongest hydrophobic value of 3.467 (Figure 1    3 BioMed Research International possible signal peptide in the range of residues 1-17 of the Nterminal of TGEV H16 S protein. The signal peptide sequence is MRSLIYFWLLLPVLPTLSLPQ. It was observed that the raw cleavage site score (C score) and the combined cleavage site score (Y score) both reached their peaks at the 17th place, while the signal peptide score (S score) began to decline at the 17th place. The splitting site was most likely to be located at the front of the maximum value of Y score, which was between amino acid 16 and amino acid 17 (LYG-DN) (Figure 3(a)).
There was a possible signal peptide in the range of residues 1-21 of the N-terminal of PEDV CV777 S protein. The signal peptide sequence is MRSLIYFWLLLPVLPTLSLPQ. It was observed that the raw cleavage site score (C score) and the combined cleavage site score (Y score) both reached their peaks at the 21st place, while the signal peptide score (S score) began to decline at the 21st place. The splitting site was most likely to be located at the front of the maximum value of Y score, which was between amino acid 20 and amino acid 21 (SLP-QD) (Figure 3(b)).
There was a possible signal peptide in the range of residues 1-20 of the N-terminal of SADS-CoV GDS04 S protein.
The signal peptide sequence is MKLFTVFTLLASIRV-LYGCE. It was observed that the raw cleavage site score (C score) and the combined cleavage site score (Y score) both reached their peaks at the 20th place, while the signal peptide score (S score) began to decline at the 20th place. The splitting site was most likely to be located at the front of the maximum value of Y score, which was between amino acid 18 and amino acid 19 (LYG-CE) (Figure 3(c)).
There may be a signal peptide in the range of residues 1-20 of the PDCoV HNZK-04 S protein. The signal peptide sequence is MQRALLIMTLLCLVRAKFAD. It could be seen that the raw cleavage site score (C score) and the combined cleavage site score (Y score) both reached their peaks at the 20th place, while the signal peptide score (S score) began to decline at the 20th place. The splitting site was most likely to be located at the front of the maximum value of Y score, which was between amino acid 19 and amino acid 20 (KFA-DD) (Figure 3(d)).    BioMed Research International

Phosphorylation Sites of Porcine Enteric Coronavirus
Spike Proteins. Almost all proteins undergo some chemical modifications during and after synthesis, such as the splicing of the peptide chain backbone and the side chains of specific amino acids. The phosphorylation of the protein is mainly carried out on tyrosine, serine, and threonine residues in the peptide chain. NetPhos 3.1 Server online software was used to predict the phosphorylation modification sites of porcine enteric coronavirus spike protein. The position with a score higher than 0.5 was the phosphorylation modification site. TGEV H16 S protein had 60 serine (Ser), 52 threonine (Thr), and 27 tyrosine (Tyr) modification sites (Figure 4(a)). PEDV CV777 S protein contained 80 serine (Ser), 39 threonine (Thr), and 24 tyrosine (Tyr) modification sites (Figure 4(b)). SADS-CoV GDS04 S protein had 52 serine (Ser), 20 threonine (Thr), and 37 tyrosine (Tyr) modification sites (Figure 4(c)). PDCoV HNZK-04 S protein contained 64 serine (Ser), 44 threonine (Thr), and 16 tyrosine (Tyr) modification sites (Figure 4(d)). The online software prediction was the same as the online database query result.

Glycosylation Sites of Porcine Enteric Coronavirus Spike
Proteins. Glycosylation modification can regulate protein functions, including N-linked and O-linked sugar chains. We used NetNGlyc/NetOGlyc 4.0 Server online software to predict N-type and O-type glycosylation modification sites for porcine enteric coronavirus spike protein. The prediction results showed that porcine enteric coronavirus spike proteins did not have O-glycosylation modification sites. TGEV H16 S protein contained 24 N-glycosylation sites ( Figure 5(a)). PEDV CV777 S protein contained 22 Nglycosylation sites ( Figure 5(b)). SADS-CoV GDS04 S protein contained 20 N-glycosylation sites ( Figure 5(c)). PDCoV HNZK-04 S protein contained 18 N-glycosylation sites ( Figure 5(d)). The specific glycosylation positions of porcine enteric coronavirus spike protein are shown in Table 4.

Epitopes of Porcine Enteric Coronavirus Spike Proteins.
The specificity of the S proteins depends on the type, nature, number, and spatial configuration of antigenic determinants. We used Predicting Antigenic Peptides online software to perform epitope prediction for porcine enteric coronavirus spike protein. The results showed that TGEV H16 S protein had 53 epitopes (Figure 6(a)). PEDV CV777 S protein had 51 epitopes (Figure 6(b)). SADS-CoV GDS04 S protein had 43 epitopes (Figure 6(c)). PDCoV HNZK-04 S protein had 52 epitopes (Figure 6(d)).

Structure Domain of Porcine Enteric Coronavirus Spike
Proteins. Different regions of the S proteins have different evolutionary rates, and some amino acids must be sufficiently conserved during the evolution process to achieve the corresponding biological functions. The functional regional subunit structure that could exist independently was the structure domain. Porcine enteric coronavirus S proteins were analyzed with the Simple Modular Architecture Research Tool (SMART). It was found that TGEV H16 S protein contained two typical functional domains, namely,  (Figure 7(a)). PEDV CV777 S protein contained two functional domains, namely, spike_rec_binding between amino acids 330 and 583 and the highly conserved functional domain corona_ S2 between amino acids 671 and 1270 (Figure 7(b)). It was found that SADS-CoV GDS04 S protein contained one typical functional domain, namely, the highly conserved functional domain corona_S2 between amino acids 535 and 1129 (Figure 7(c)). PDCoV HNZK-04 S protein also had two functional domains, namely, spike_rec_binding between amino acids 330 and 583 and the highly conserved functional domain corona_S2 between amino acids 671 and 1270 (Figure 7(d)).

Functional Motif of Porcine Enteric Coronavirus Spike
Proteins. The motif is a subunit in the structural domain, and its function is to reflect a variety of biological functions. According to the analysis of the PROSITE database, it was found that the functional motif of TGEV H16 S protein included S2-HR1 region profile of amino acids 1036-1155 and S2-HR2 region profile from amino acids 1304 to 1401 (Figure 8(a)). The functional motif of PEDV CV777 S protein included S2-HR1 region profile of amino acids 969-1088 and S2-HR2 region profile from amino acids 1240 to 1336 (Figure 8(b)). It was found that the functional motif of SADS-CoV GDS04 S protein included the S2-HR1 region profile of amino acids 750-855 and S2-HR2 region profile from amino acids 1001 to 1082 (Figure 8(c)). The functional motif of PDCoV HNZK-04 S protein included the S2-HR1

Discussion
All coronaviruses have similarities in the genome composition and protein structure, including the structural proteins N, S, E, and M at the 3 ′ -end and nonstructural proteins 1-16 (nsp1 to nsp16). The RBD in the S protein can bind to the corresponding receptor on the host cell membrane and then undergo membrane fusion with the host cell through S2 [6]. S protein can also induce the host's immune response and is a key protein for vaccine development. The change of S protein space structure directly affects the virulence of the virus. There are also differences in the tissue tropism and host range of the same genus of coronaviruses. Even TGEV H16 and PEDV CV777 and SADS-CoV GDS04 from the same genus do not necessarily use the same receptor [9,13].