Analysis of PfEMP 1 — var Gene Sequences in Different Plasmodium falciparum Malarial Parasites

Plasmodium falciparum synthesizes P. falciparum erythrocyte membrane protein-1 (PfEMP-1), a product of the multicopy var gene family, which localizes on the surface of infected erythrocytes. This protein plays an important role in cytoadherence and immune evasion. Comparative analysis of the molecular sequences of the DBLα domain of the var gene from different isolates of the parasite reveals variations in the number of cysteines and presence of small conserved motifs like DGEA, RGD, GAG-binding motifs. Phylogenetic analysis while highlighting the extensive diversity leads to clustered them in separate clades far apart from each other. Discriminant factor analysis of physicochemical properties of amino acid sequences revealed that the aliphatic index, isoelectric point, and instability index have more effect in deciding the variance of different isolates sequences. The origin of diverse repertoire of the DBLα domain in the parasites highlights the complexity of host-parasite relationship in the context of parasite survival.


Introduction
Malaria has emerged as a major health problem especially in tropical and subtropical regions of the world [1].P. falciparum erythrocyte membrane protein-1 (PfEMP-1) is a product of the multicopy var gene family [2].The gene has 60 copies; however, only one gene is expressed at a time.This variant surface antigen has a major role in evasion of the host immune response and cytoadherence [3][4][5][6].PfEMP1 acts as a ligand for host endothelial epithelial receptor.This results in sequestration of infected erythrocytes in the microvasculature of the brain, placenta, and other organs thereby causing cerebral malaria and severe malaria.PfEMP1 is a virulence factor and plays an important role in pathophysiology of the disease (severe malaria and cerebral malaria) and enhances survival of the parasite.Thus, it has been considered as one of the vaccine targets [7].PfEMP1 is composed of four domains: an N-terminal segment (NTS), Duffy binding-like (DBL) domains, cysteine-rich interdomain region (CIDR), and C2 domains [8].DBLα domain is also becoming the target of immunoepidemiological and vaccine production studies to analyze diversity in the parasite population's worldwide [9].
We have reported distinct size polymorphism of DBLα domain of the var gene in cultured and clinical isolates of malaria parasites [10].There are several reports that characterize the extent of sequence diversity of var genes in different geographical regions [11][12][13][14][15][16][17][18][19][20][21].It seems likely that the extent of diversity reported till date indicates only the tip of the var gene diversity iceberg [22].Malaria is prevalent and reemerging in India.Till date there are scanty reports regarding var gene diversity from India [21,23].Thus in order to understand the divergence in the sequence and structural motifs of var genes in Indian and Thailand parasite lines and field isolates from western part of India, we have analyzed sequences from the DBLα domain of the var gene whose flanking regions are highly conserved and compared our sequence data with the reported other Indian sequences deposited in GenBank.

Isolation of Genomic DNA.
Parasites were collected by centrifugation from cultures with 20% parasitemia with mature trophozoites stages and washed with PBS.Parasitized RBCs (p-RBC) were lysed with saponin (0.15%), and the parasite pellet washed with PBS.DNA was isolated by treatment with proteinase K in the presence of sodium dodecyl sulphate followed by phenol chloroform extraction and subsequent ethanol precipitation [25].

var Gene
Primers and PCR Amplification.DBLα primers (synthesized by Invitrogen, Md, USA) of var gene were used for amplification of parasite genomic DNA (forward primer: CGACACCGGCGACATTATAAGAGG (primer1) and reverse primer: TCGCAGGTATTGTGGCACGTAGTC (primer2)).The primers were specific for the two highly conserved sites in DBLα domain and flank the polymorphic segment of DBLα domain.The amino acid sequences were DTGDIIRG and DYVPQYLR (see Figure 1 in Supplementary Material available online at doi:10.1155/2009/824949).PCR was carried out as described earlier [10].

Cloning and Sequencing.
The PCR products were purified using PCR purification system (Roche).The DNA was ligated into pGEM-T easy system (Promega) and used to transform E. coli TOP10 cells (Invitrogen).The recombinants were sequenced using T7 and Sp6 primers with BigDye Terminator v3.1 cycle sequencing kit (Applied Biosystems 3730 sequencing machines).

Bioinformatic Analysis.
The nucleotide sequences derived from the parasite lines and field isolates were analyzed for sequence similarities by NCBI BLAST (http:// www.ncbi.nlm.nih.gov/Blast.cgi).The nucleotide sequences were translated into amino acid sequences using Expasy Translation tool.Pepstat analysis was performed to derive information about the amino acid sequences.By the SIM Alignment Tool, local similarity program (http://ca.expasy .org/tools/sim-prot.html) analysis was carried out.The multiple sequences were aligned using MultAlin program (http://bioinfo.genotoul.fr/multalin/multalin.html).Pairwise sequence alignment was generated using ISHAN-Integrated Software for Homology Analysis (http://physics .unipune.ernet.in/∼pbv/ishan.html)[26].The percentage of amino acid identity between sequences was calculated.An identity matrix analysis was performed, in which each sequence from a parasite was compared to all other sequences from the same isolate and from different isolates.The resultant matrix contained the percent sequence identity of each sequence relative to all other sequences.The mean and range values for the sequence comparison data were calculated.The percent similarity values against the sequences was plotted in MsExcel.The sequences (38) from other Indian isolates (ICGEB-R1, R15, R35, and MRC20) and 3D7 were derived from GenBank database and were also used for comparative analysis of the data.Multiple sequence alignment of the sequences for phylogenetic analysis of the parasite isolates wascarried out using ClustalW (http://www.ebi.ac.uk/clustalw/).Phylogenetic tree was constructed by using WebPhylip.The trees were reconstructed using protpars program, that implements maximum parsimony method.The phylogenetic analysis was performed to determine the relationships between the var gene sequences.The T-and B-cell epitopes prediction was carried out, as described earlier [27].Although the prediction was carried out for few sequences by us, we have studied extensively the T-cell epitopes for all the sequences.

Statistical Analysis.
The physicochemical properties of an amino acid sequence were determined using Protparam.The data was analyzed using multivariate statistics, as they offer the advantage of taking into account all the variables in a single analysis, thus making it possible to assess variation in the molecular weight, pI, amino acid index, aliphatic index, hydropathic index, and instability index of an amino acid sequence.The sequences were grouped among four groups, namely, Indian lab adapted, Indian field, ICGEB sequences [23], and MRC-20 sequences [21].Discriminant factor analysis (DFA) was done to find out variables which were most useful for discriminating between the sequences of different groups.We performed Pillai's trace statistics to find out whether the clusters were significantly different from each other [28].
The nucleotide sequences of DBLα were translated into amino acid sequence and analyzed by PepStats analysis.It provides information about various features of the amino acid sequence.Majority of the sequences had small, polar, and charged residues (see Table 1).The amino acid sequence was analyzed using Multalin (see Figures 1 and 2).The analysis of the aligned DBLα sequences indicates presence of  universally conserved blocks in the sequences (Supplementary Figure 1).Comparison of sequences showed that there is marked conservation in the positions of certain residues in DBLα domain.The conserved blocks were interspersed with variable blocks, and these sequences varied extensively in both sequence and length.The conserved block LREDWW was conserved throughout the lab adapted parasite lines (Indian and Thai), whereas it was semiconserved in field isolates of India.The sequences showed presence of conserved large bulky aromatic amino acids such as phenylalanine (F), tryptophan (W), and tyrosine(Y) and both hydrophobic as well as hydrophilic residues.It is interesting to note that a few sequences are shared and show similarity with other parasite lines while most sequences show extensive variation.
Although the DBLα domain has an average of 4 cysteine residues, the number varied from 2 to 9 cysteines.The parasite line from Thailand (S61) showed presence of 2 cysteines and was shorter in length (120 amino acids).The cysteines at CRC position were conserved throughout the Indian parasite lines whereasin field isolates and parasite lines from Thailand, it was semiconserved.The cysteine position in the homology block VWKAiTC is conserved in all sequences whereas in homology block YFraTC , it is semiconserved in parasite lines from Thailand.In parasite line from India, FAN5 HS, a single sequence had 3 cysteines and the cysteine in YFraTC block was absent.
Two of the FAN5 HS sequences showed extra long sequences of 276 and 250 amino acids with additional 156 and 148 amino acids at 5 end, respectively.Similar case was noted in a field isolate NS6.A single sequence of the NS6 field isolate has showed presence of an extra long sequence of length of 165 amino acids with additional 35 amino acids at the 5 end of the sequence.There was a duplication of 6 amino acid residues (DTGDII) at the 5 end of the sequence.The FAN5 HS sequences showed presence of conserved blocks in DBLα domain namely GACAPYRRLH and CTLARSFADIGDI while NS6 sequence showed presence of CTLARSFADIGDI motif (Supplementary Figure 1).
The presence of small motifs (GAG-glycosaminoglycan binding, DGEA motifs, RGD motifs) was observed both in semiconserved and highly variable regions (see Table 2).The number of GAG binding motifs varied in all the sequences.The DGEA motif was observed only in the Indian parasite line, FMN-17 (F1, F2) while RGD motif was observed only in parasite line from Thailand, namely, SOHS (S64).
Pairwise sequence comparisons were carried out to determine average sequence identity.It can be seen that although most of the sequences varied in their degrees of identity between 19%-62%, a few sequences were 99% identical to each other (see Table 3).There is a low sequence identity among the var sequences of Indian lab isolate, FAN5 HS.The sequences showed variation and were separated in different peaks (see Figure 3).The sequences analyzed in this study were grouped close together than other reported Indian sequences, whereas all Indian isolates showed similarity to each other when compared to 3D7.A given parasite line appears to contain sequences which cluster in distinctly different groups showing more similarity to sequences from other isolates and differed extensively from each other.The phylogenetic unrooted tree was constructed using amino acid sequences from parasite lines of India and Thailand and field isolates (see Figure 4).These sequences were compared to 3D7 sequences and other previously reported Indian sequences available from GenBank database.The sequences of the individual isolates were clustered markedly in separate clades apart from each other.The parasite lines from India, FAN5 HS and PUNE-1, were closely related to each other whereas FMN-17 was far away from them.The parasite line from Thailand and field isolates was clustered in distinctly separate groups.A few field isolates were close together to parasite lines from India and Thailand.However when compared with 3D7 sequences, the 3D7 sequences fall in separate clusters apart from parasite lines from India and Thailand and field isolates.
Discriminant factor analysis of physicochemical properties of amino acid sequences derived from different parasite lines revealed that the four groups, that is, Indian lab adapted, Indian field, ICGEB isolates, and MRC-20, do not form significantly different clusters (see Figure 5(a)).The aliphatic index, isoelectric point, and instability index have more effect in deciding the variance of different isolates sequences, where as molecular weight of the protein, its amino acid index and hydropathicity index are not significant in deciding the variation (see Figure 5(b)).
The T-cell epitope prediction revealed that majority of the sequences showed weak binding motifs; however few sequences (11) showed strong binding motifs (see Table 4).The ability to bind T cells seems to show differences in the various genes.The sequences showed presence of B-cell epitopes in all sequences.

Discussion
The diversity in the var gene in malarial parasite has been reported.In the present study, we have critically analyzed var gene sequences in terms of motifs, amino acid, physicochemical properties, and constructed phylogenetic trees from var gene sequences derived from parasite lines from India and Thailand and also field isolates from Western part of India.In this analysis, the variation in the DBLα domain in sequences derived from Indian and Thai laboratory adapted parasite lines and field isolates was compared to sequences reported in GenBank from other Indian isolates and 3D7.Extensive polymorphism and variations were observed in DBLα domain.The multicopy DBLα domain seems to be highly diversified while conserving the salient features of the domain.Thus we have observed a highly dynamic and variable picture of var gene organization.Polymorphism within the hypervariable region leads to length variation and sequence variability.The var gene diversity at genomic level among parasite isolates both within and among endemic areas has been reported earlier [11][12][13][14][15][16][17][18][19][20][21].The parasite line from India (FAN5 HS) and field isolate (NS6) both show presence of extra long sequences where DTGDII motif is repeated.It is likely that events such as duplication and DNA recombination may be possible.Thus gene recombination and duplication have been an important mechanism for generating diversity in var genes [12,13,[29][30][31].It has been shown that children with symptomatic infections had a greater repertoire of variant-specific antibodies [16].Additionally, the var genes of P. falciparum have been reported to undergo constant changes due to frequent recombinationor rearrangements that generate a vast repertoire of var genes in nature.
Natural Plasmodium falciparum populations are genetically diverse, to an extent that within some geographic regions nearly all isolates contain unique parasite genotypes with regard to polymorphic single copy genes [11].The repertoire is so immense that it raises questions about the possible molecular genetic mechanism instrumental in creating such high variability while conserving the important functionally conserved regions.The possible implications of this in immune evasion and persistence of parasites in host are of great importance to pathogenicity and virulence.It was reported that global var gene repertoire was immense even among geographically close isolates.High degree of similarity was observed among sequences from 3D7 [14].In three different geographical parasite lines, highly divergent var gene sequences were observed [20].Vast amount of global var gene diversity was reported whereas limited amount of diversity was observed in PNG isolates [21].Limited diversity was observed in India as the isolates were collected from an area having low malaria transmission rate.However, we observed considerable diversity among clones, even within a single isolate.It was reported that var COMMON type gene family is found in few Kenyan and Indian ICGEB isolate R35 [32].However, our sequences derived from parasite lines and field isolates data analysis do not show var COMMON type gene.It was reported that DBLα domain comprises of 400 amino acids and contains 16-18 conserved cysteine residues [8,12,33].The number of cysteine residues is important for classification of DBLα domain and severity of disease [8,16].Cysteine 8 was conserved in all sequences.The position of cysteine at CRC motif was conserved in all Sudanese var genes.We also observed the similar conserved cysteines (CRCs) in the Indian isolates.Variants with unusual number of cysteines may form a subset with altered antigenic and adhesive properties.Sequences predominantly expressed in patients with severe malaria could be subgrouped on the basis of number of cysteine residues.These sequences were commonly found in the var gene repertoire of parasites from patients with mild malaria [34].This suggests that positions of cysteines are important for DBLα structure, possibly because they are involved in disulphide bonding or other aspects of conformation or folding.The cysteine disulphide bonds may be important for stabilizing the surface-located variable loops and protuberances which appear to play a critical role in folding or structural conformation of   whole PfEMP1 protein.Presence of conserved cysteines and hydrophobic amino acid residues suggests that they are structurally important.
The var genes in Plasmodium falciparum genome are classified into 17 different protein architectural types based on domain compositions [35].The genome is divided into 3 major subgroups (A, B, C) and two intermediate groups (B/A, B/C) on the basis of 5 upstream (Ups) sequence and chromosomal location [36,37].Group A and B/A genes have DBLα 1 type domain whereas groups B, C, B/C, or B/A have DBLα domain.Based on the number of conserved cysteine residues in the sequences , var genes are grouped as DBLα and DBLα1 domains.DBLα domain has 4 conserved cysteine residues whereas DBLα1 has 2 cysteine residues [38].However, we too report that majority of the sequences have DBLα domain, that is, 4 cysteine residues, and all the isolates are from severe malaria cases.
A number of charged potentially exposed amino acid residues are also present in these segments.Divergent segments contain multiple hydrophilic residues suggesting that they are likely to be exposed and may serve as epitopes for agglutinating antibodies.Presence of common segments even in divergent regions of DBLα domains and the hydrophilic nature of the sequences suggest that it is likely that they are both exposed and form epitopes for antibody recognition.The presence of conserved cysteines and tryptophan (W) residues is amongst the most highly conserved residues suggesting that DBL domains share conserved three-dimensional structures.We have shown that there is variation in T-cell and B-cell epitopes and also in their 3D structures.The difference in number and position of cysteine residues suggests a rearrangement of disulfide bonds leading to different three-dimensional structures [39].
Sequences consisting of position of limited variability (PoLVs 1-4) and the number of cysteine residues have been suggested as signature sequences of the DBLα domain [16].Each PoLV was four amino acids long and situated adjacent to conserved amino acid residues at the boundary of previously defined islands of homology.In the sequences from parasite lines from India, PoLV2 (LREDW) is conserved whereas in parasite lines from Thailand and field isolates all PoLV 1-4 are different.Thus the sequences derived from parasite lines and field isolates differed from each other, and variation was also observed within sequences of a single parasite line.
The glycosaminoglycan (GAG) binding motifs are clusters of positively charged amino acid residues.These small amino acid motifs are linked to the severe states of malaria [22].We observed them in sequences obtained from parasite lines from India and Thailand and field isolates.These motifs are responsible for involvement of the protein in rosette formation of iRBC.It was observed that similar motifs could not be linked to severity of the disease [15].It has been suggested that intravenous injection of glycosaminoglycan disrupts rosettes and releases already sequestered parasites into circulation.Thus it can act as candidate for adjunct therapy in severe malaria [40].Heparin sulfate on the surfaces of uninfected RBC may act as receptors for rosetting and that GAG-binding motifs of PfEMP1 mediate this binding [39].DGEA mediates integrin-collagen interactions in platelets.It has been suggested that it is important in cell adhesion molecules and can therefore be a target for therapeutic intervention [41].The RGD motif was observed in parasite line from Thailand SOHS and in 3D7 sequences.The LDV motif was observed in 3D7 and other Indian isolates (ICGEB-R1, R15, R35), but it was not present in any of our sequences.The RGD and LDV motifs are associated with protein-protein interaction and cell attachment [42].They are cell adhesion motifs involved in ligand-receptor interactions involving the integrin family [43].Thus it can act as a potential therapeutic agent for the treatment of disease.
The T-and B-cell epitopes in different sequences of DBLα domain of parasites show extensive variation.The epitopes varied in its location and amino acid composition.Due to such variation, it is difficult to make an effective vaccine.Thus the conserved epitopes are required for an effective vaccine design against malaria.
The evolutionary relationships of conserved cysteine rich motifs in adhesive molecules of malaria parasites have been studied [44].The report suggests that rapid divergence originated from multiple gene duplication events.There is a sequential pattern of repeated duplication and diversification.The unrooted phylograms showed that the genes show diversity even among the lab adapted and field isolates from India.The sequences in the genes of Indian and Thai isolates are quite diverse from that of 3D7.This interesting pattern demonstrates the variations in the sequence organization of the var gene domains and raises interesting possibility about role of mutations and recombination in the generation of this diversity.The role of host-induced immune pressure remains to be elucidated.
The DBLα domain of pfEMP1 displays extensive divergence in both sequence and length even among the same parasite line.Despite the extent of sequence diversity in DBLα domains, it is predicted that due to the presence of conserved cysteines and homology blocks between invariant cysteines, the DBL domain may have a common fold.As a result, the receptor-binding pockets may lie in the same region of diverse DBL domains.The evolution of variation in the var gene sequences in different geographical parasite lines and field isolates with respect to small motifs seems to play important role for parasite survival as well as to evade immunity.

Figure 1 :
Figure 1: MuItiple sequence alignment of sequences derived from Indian lab adapted isolates.Multiple alignments of sequences derived from the DBLα domain of the parasite lines from India (FAN5 HS = Fa, PUNE-1 = Pu, and FMN-17 = F).Dots indicate gaps necessary for alignment.The Mulltalin program was used to align amino acid sequences, with the program default parameters.DGEA motif was observed in FMN-17 isolate (F1 and F2) and GAG binding motifs were observed in all the sequences.Positions of limited variability 2 (PoLV2 = LREDWW) and CRC are conserved in all sequences.

Figure 2 :
Figure 2: MuItiple sequence Alignment of DBLα domain from field isolates from India.Multiple alignments of sequences derived from the DBLα domain of the Indian field isolates (C, D, NS1, NS2, and NS6).Dots indicate gaps necessary for alignment.The Mulltalin program was used to align amino acid sequences with the program default parameters.Positions of limited variability 2 (PoLV2 = LREDWW) and CRC are not conserved in all sequences.

Figure 3 :
Figure 3: Pairwise sequence alignment graph of Indian isolates and 3D7.Pairwise sequence alignment was performed using ISHAN software and the matrix was plotted using MsExcel.x-axis shows isolates sequences whereas y-axis shows percent identity.

Figure 4 :
Figure 4: Phylogenetic tree of sequences of parasite lines from India lab adapted and field isolates and ICGEB and MRC20 sequences and 3D7.Unrooted phylogram was constructed by using WebPhylip-protpars program that implements maximum parsimony method.Yellow block represents 3D7 DBLα sequences, light blue block represents Indian lab adapted parasite lines (FAN5 HS, Pune-1, FMN-17), green block represents Indian field isolates (C, D, NS1, NS2, NS6), dark blue represents ICGEB sequences, and pink block represents MRC-20 sequences.The sequences of the isolates are clustered in separate groups apart from each other.

Figure 5 : 1 =
Figure 5: Discriminant factor analysis of physicochemical properties of an amino acid sequence.(a) Clusters of different groups, (b) variables which discriminate between the clusters.Group 1 = Indian lab adapted isolates (red), 2 = Indian field isolates (green), 3 = ICGEB isolates (pink), and 4 = MRC-20 (blue).Aliphatic index, isoelectric point and instability index have more effect in deciding the variance of different isolates, whereas molecular weight, amino acid index, and hydropathicity index are not significant.

Table 1 :
Comparative summary of Pepstat analysis.

Table 2 :
Summary of motifs present.

Table 3 :
Pairwise comparison from var genes from Plasmodium falciparum Indian and Thai parasite lines and Indian field isolates.

Table 4 :
Summary of strong binding T-cell epitopes in Indian isolates.