Recognition of Errors in the Refinement and Validation of Three-Dimensional Structures of AC 1 Proteins of Begomovirus Strains by Using ProSA-Web

The structural model of begomovirus AC1 protein is useful for understanding biological function at molecular level and docking study. For this study we have used the ProSA program (Protein Structure Analysis) tool to establish the structure prediction and modeling of protein. This tool was used for refinement and validation of experimental protein structures. Potential problems of protein structures based on energy plots are easily seen by ProSA and are displayed in a three-dimensional manner. In the present study we have selected different AC1 proteins of begomovirus strains (YP 003288785, YP 002004579, and YP 003288773) for structural analysis and display of energy plots that highlight potential problems spotted in protein structures.The 3Dmodels of Rep proteins with recognized errors can be effectively used for in silico docking study for development of potential ligandmolecules against begomovirus infection.


Introduction
Geminiviruses were recognized in 1978 by the International Committee on the Taxonomy of viruses on the basis of their unique virion morphology and possession of ssDNA as their genomic material [1,2].Geminiviridae is one of the largest plant virus family; its members have a circular, single-stranded DNA (ssDNA) genome of approximately 2.7-5.2 kb encapsulated within twinned (geminate) icosahedral virions.The protein coat of geminiviridae consists of one type protein molecule of about 28 kd molecular weight.Based on their genome arrangement and biological properties, geminiviruses are classified into one of four genera: Mastrevirus, Curtovirus, Topocuvirus, and Begomovirus [3].
Computational methods can be applied for the prediction of unknown structures of experimental and theoretical models of virus proteins [6,7], but the problem in structural biology is the recognition of errors in experimental and theoretical models of protein structures.The ProSA tool (https://prosa.services.came.sbg.ac.at/) verifies the three-dimensional experimental and the theoretical models of protein structures that have prospective errors.
The application of computational methods [8,9] and server (e.g., NAR web server) for the prediction of unknown structures adds a plethora of structural models [10,11] to the study.The analysis of protein structures is generally a difficult and cumbersome exercise.The new service presented here is a straightforward and easy to use extension of the classic ProSA program, which exploits the advantages of interactive web-based applications for the display of scores and energy plots that highlight potential problems spotted in protein structures.To check 3D models of protein structures for potential errors, ProSA [12] is a widely used tool.Its range of application includes error recognition in experimentally determined structures [13][14][15], theoretical models [16][17][18][19], and protein engineering [20,21].For in silico ligand designing to be an effective inhibitor, Rep protein of selected begomovirus strains (YP 003288785, YP 002004579, and YP 003288773), which is responsible for replication, was used.This is the highlight of this study.

Materials and Methods
For the present study different bioinformatics tools and databases were used for molecular modeling of Rep protein of begomovirus strains (YP 003288785, YP 002004579, and YP 003288773), for example, GenBank-NCBI, PDB (Protein Data Bank), UCLA-DOE, RAMPAGE server, and so forth.
Rep proteins sequence of begomovirus strain (YP 003288785, YP 002004579, and YP 003288773) was retrieved in FASTA format from NCBI database for homology modeling.Homology modeling procedure was performed in four basic sequential steps: template selection, target template alignment, model construction, and model assessment, and ProSA tool was used for potential errors detection [22].ProSA-web requires the atomic coordinates of the model to be evaluated.The z-score indicates overall model quality and measures the deviation of the total energy of the structure with respect to an energy distribution derived from random conformations.Z-scores outside a range characteristic for native proteins indicate erroneous structures.In order to facilitate interpretation of the z-score of the specified protein, its particular value is displayed in a plot that contains the z-scores of all experimentally determined protein chains.[23] outcomes are displayed in the form of profile search and Ramachandran plots.The models were checked with Verified-3D server [24] and Ramachandran plot at RAMPAGE [25] server.PDB files of Rep protein were used for evaluation through ProSA-web (https://prosa.services.came.sbg.ac.at/prosa.php/)that requires the atomic coordinates' file of protein.

Results and Discussion
A particular intention of the ProSA-web application is to encourage structure depositors to validate their structures before they are submitted to PDB and to use the tool in early stages of structure determination and refinement.Rep proteins 3D models with recognized errors were used for development of potential ligand molecules against begomovirus infection through docking process.A good quality Ramachandran plot has over 90% in the most favored regions [26] but the Ramachandran plot of YP 003288785.pdbhas only 87.3% of residues in the most favoured regions.Therefore it is a near to good quality model (Table 1, Figure 1(a)).Similarly, the Ramachandran plot of YP 002004579 (Figure 1 and YP 003288773 (Figure 1(c)), respectively, has 79.7% and 85.5% residues in the most favored regions.Figure 2 shows the results for a monomer of Rep proteins of Sweet potato leaf curl Lanzarote virus, Sweet potato leaf curl Spain virus, and Sweet potato leaf curl Canary virus [27].The ProSA-web results indicate that Rep proteins have features characteristic for native structures.Figure 2(a) depicts the ProSA-web z-scores of all protein chains in PDB (Table 2) determined by X-ray crystallography (light blue) or NMR spectroscopy (dark blue) with respect to their length [22].The plot shows only chains with less than 1,000 residues and a z-score of 10.The z-scores of Rep proteins are highlighted as large dots.In order to further narrow down those regions in the model that contribute to a bad overall score, ProSA-web visualizes the 3D structure of the protein using the molecule viewer, Jmol. Figure 2(c) illustrates the Jmol Ca trace of Rep proteins.Residues are colored from blue to red in the order of increasing residue energy.

Conclusion
PDB files sometimes contain errors and generally remain unknown until the corresponding revisions are made available to the structural community.Hence, ProSA is a diagnostic tool that is based on the statistical analysis of all available protein structures.By using subsequent independent X-ray analysis, we studied Rep proteins of Sweet potato leaf curl Lanzarote virus, Sweet potato leaf curl Spain virus, and Sweet potato leaf curl Canary virus that are known to be incorrect, yielding a completely different conformation.The 3D models of Rep proteins with recognized errors can be effectively used for in silico docking study for development of potential ligand molecules against begomovirus infection.

Figure 2 (
b) shows the energy plot of Rep proteins.The energy plot shows the local model quality by plotting energies as a function of amino acid sequence position .

Figure 2 :
Figure 2: (a) ProSA-web z-scores of all protein chains: investigation of three Rep proteins structures of (i) Sweet potato leaf curl Lanzarote virus, (ii) Sweet potato leaf curl Spain virus, and (iii) Sweet potato leaf curl Canary virus using the ProSA-web service (YP 003288785, YP 002004579, and YP 003288773).(b) Energy plot of all three Rep proteins: investigation of three Rep proteins structures of (i) Sweet potato leaf curl Lanzarote virus, (ii) Sweet potato leaf curl Spain virus, and (iii) Sweet potato leaf curl Canary virus using the ProSA-web service (YP 003288785, YP 002004579, and YP 003288773).(c) Jmol Ca trace of Rep proteins: investigation of three Rep proteins structures of (i) Sweet potato leaf curl Lanzarote virus, (ii) Sweet potato leaf curl Spain virus, and (iii) Sweet potato leaf curl Canary virus using the ProSA-web service (YP 003288785, YP 002004579, and YP 003288773).

Table 2 :
Z-score of selected Rep proteins.energiesaveraged over a sliding window are plotted as a function of the central residue in the window.A window size of 80 is used due to the large size of the Rep protein chain (default: 40).In general, positive values correspond to problematic or erroneous parts of a model.A plot of single residue energies usually contains large fluctuations and is of limited value for model evaluation.Hence the plot is smoothed by calculating the average energy over each 40residue fragment ,  + 39, which is then assigned to the "central" residue of the fragment at position  + 19.