Benchmarking B-Cell Epitope Prediction for the Design of Peptide-Based Vaccines: Problems and Prospects

To better support the design of peptide-based vaccines, refinement of methods to predict B-cell epitopes necessitates meaningful benchmarking against empirical data on the cross-reactivity of polyclonal antipeptide antibodies with proteins, such that the positive data reflect functionally relevant cross-reactivity (which is consistent with antibody-mediated change in protein function) and the negative data reflect genuine absence of cross-reactivity (rather than apparent absence of cross-reactivity due to artifactual masking of B-cell epitopes in immunoassays). These data are heterogeneous in view of multiple factors that complicate B-cell epitope prediction, notably physicochemical factors that define key structural differences between immunizing peptides and their cognate proteins (e.g., unmatched electrical charges along the peptide-protein sequence alignments). If the data are partitioned with respect to these factors, iterative parallel benchmarking against the resulting subsets of data provides a basis for systematically identifying and addressing the limitations of methods for B-cell epitope prediction as applied to vaccine design.


Introduction
The timely development of new vaccines is imperative to address the complex and rapidly evolving global burden of disease [1][2][3][4][5][6][7]. Vaccines typically induce protective immunity by eliciting antibodies that neutralize the biological activity of proteins (e.g., bacterial exotoxins) [6]. These proteins comprise B-cell epitopes, that is, molecular substructures whose defining feature is their capacity for binding by antibodies. In turn, each B-cell epitope comprises spatially proximate amino acid residues or atoms thereof [8]; but its physical boundaries cannot be precisely delineated due to the limited specificity of molecular recognition by antibodies [9].
A peptide may induce antipeptide antibodies that crossreact with a cognate protein; if the antibodies neutralize the biological activity of the protein and thereby confer protective immunity, the peptide is a candidate vaccine component [6]. Such peptides are routinely designed to contain B-cell epitopes that have been predicted (i.e., presumptively identified) through computational analysis of cognate protein sequence or higher-order structure [3,10].
For this application, the refinement of methods to predict B-cell epitopes necessitates benchmarking against empirical data [8].
Empirical data for benchmarking B-cell epitope prediction are customarily organized into individual records, each of which contains three key components, namely structural data on an immunogen, structural data on an antigen, and data on the outcome of an antibody-antigen binding assay [11][12][13]; the immunogen (e.g., peptide or protein conjugate thereof) induces antibodies while the antigen (e.g., cognate protein or biological source thereof) is used in the assay to determine the binding capacity of the antibodies. In many cases, the only structural data available are the sequences of both the immunogen and antigen while the outcome of the assay is expressed as either positive or negative binding even when the original outcome variable (e.g., inhibition of biological activity) is continuous rather than dichotomous. For a single record containing these minimal data, the task actually benchmarked is the exhaustive identification of putative epitopes as sequences that are predicted to both induce antibodies as part of the immunogen and act as 2 Journal of Biomedicine and Biotechnology targets for binding by the antibodies as part of the antigen. If the immunogen is found to contain at least one such putative epitope, positive binding is the predicted outcome of the assay; otherwise, negative binding is the predicted outcome of the assay.
In the discussion of approaches to benchmark B-cell epitope prediction, a major source of confusion is the superficial parallelism between cross-reaction of antipeptide antibodies with proteins and cross-reaction of antiprotein antibodies with peptides. B-cell epitope prediction for both types of cross-reaction may be benchmarked against data in records of the same format, with the core of each record containing data on an immunogen, an antigen and the outcome of an antibody-antigen binding assay; but the roles of peptide and cognate protein are reversed for the latter type of cross-reaction, wherein cognate protein serves as immunogen while peptide serves as antigen for the binding assay. The physicochemical ramifications of this difference [14] imply that cross-reaction of antiprotein antibodies with peptides is mechanistically irrelevant to peptide vaccination and, by extension, that data on this type of cross-reaction are inappropriate for benchmarking B-cell epitope prediction where the intended application is the design of peptide-based vaccines [15].
Against an unprecedentedly large set of empirical data from high-throughput peptide-scanning experiments, benchmarking has revealed apparent underperformance of methods for B-cell epitope prediction that are based solely on sequence [12]. This outcome has long been anticipated from the gross oversimplification of modeling proteins as if they were unidimensional entities [16]. However, the data used for the analysis are irrelevant to peptide vaccination because they pertain exclusively to cross-reaction of antiprotein antibodies with peptides [15]; furthermore, the analysis itself neglects the multiplicity of factors that complicate B-cell epitope prediction, which merit closer scrutiny considering the pitfalls of reductionism in vaccine design [17][18][19][20]. In light of the fact that conclusions drawn from benchmarking are highly dataset-dependent [21], the present work explores the ensuing problems and suggests how to avoid them through judicious selection and partitioning of empirical data.

Conceptual Basis
B-cell epitope prediction can be employed to arrive at a computational result on the capacity of antipeptide antibodies to cross-react with a protein, but a definitive empirical result is established by observing for evidence of actual cross-reaction in a real system [8]. The essence of benchmarking is appraisal of the computational result against the empirical result: If these two results are in agreement, the computational result is deemed true; otherwise, it is deemed false. By convention, each result is either positive if it affirms crossreaction or negative if it negates cross-reaction. Hence, the computational result falls into one of four mutually exclusive categories, namely true-positive, true-negative, false-positive and false-negative (hereafter denoted by TP, TN, FP and FN, resp.) [22,23].
Benchmarking entails computation of sensitivity as TP/(TP + FN) and specificity as TN/(TN + FP), where each algebraic symbol represents the number of computational results falling within the denoted category [22,23]. Sensitivity and specificity both range from zero to one, and would both be equal to one for a perfect predictive method; but in practice, either may be greater than zero only if the other is less than one. The accuracy of B-cell epitope prediction is increased by simultaneously increasing both sensitivity and specificity, or by increasing either without decreasing the other.
The mathematical definitions of sensitivity and specificity clarify the inherent problems of attempting to benchmark B-cell epitope prediction against irrelevant empirical results; insofar as the design of peptide-based vaccines is concerned, the most obvious of these results are from experiments that do not even simulate vaccination with peptides (e.g., where antibodies are never elicited by peptides in the first place). The designation of such results as either positive or negative is meaningless; at worst, it leads to erroneous appraisal of computational results that translates to miscalculation of both sensitivity and specificity. Consequently, methods for B-cell epitope prediction can be either underrated or overrated, thereby compromising efforts to assess their performance and address their limitations accordingly.

Selection of Data for Benchmarking
B-cell epitope prediction has been largely benchmarked against data acquired by probing antiprotein antibodies for cross-reactivity with peptides [11,12]; yet for the design of vaccine peptides, reliable data are acquired only by probing antipeptide antibodies for cross-reactivity with proteins. Cross-reaction of antipeptide antibodies with proteins is distinct from cross-reaction of antiprotein antibodies with peptides [24], for which reason antibodies elicited by a protein do not necessarily cross-react with a peptide even if antibodies elicited by the peptide cross-react with the protein [25,26]; because of this phenomenological asymmetry, benchmarking against data on antiprotein antibodies inevitably leads to misclassification of computational results that apply to antipeptide antibodies, with truepositives misclassified as false-positives (risking calculation of erroneously low values for both sensitivity and specificity) and false-negatives misclassified as true-negatives (risking calculation of erroneously high values for both sensitivity and specificity). Methods for B-cell epitope prediction may thus be either underrated or overrated for peptide-based vaccine design if they are benchmarked against data on antiprotein antibodies. Similar problems can arise with data acquired using monoclonal antibodies: If a monoclonal antipeptide antibody fails to cross-react with a protein, this result by itself may not be representative of a polyclonal antibody response in vivo [27]. Such sampling errors are avoided by using a sufficiently large panel of different monoclonal antibodies [27], which is virtually equivalent to polyclonal antibody. Therefore, the fundamental criterion for selecting empirical data is their generation through experiments that are adequate to detect cross-reactions of polyclonal antipeptide antibodies with proteins; but other criteria must also be invoked to avoid the problems of functionally irrelevant cross-reactivity (in relation to positive data) and apparent absence of cross-reactivity (in relation to negative data).
Functionally relevant cross-reactivity is cross-reaction of antipeptide antibodies with proteins that alters protein function (e.g., inhibiting enzymes). This is implicit in cross-protective immunogenicity, that is, the capacity of vaccine peptides to induce antipeptide antibodies that confer protective immunity by cross-reacting with proteins [9,10]. In contradistinction, functionally irrelevant cross-reactivity is cross-reaction of antipeptide antibodies with proteins that does not alter protein function. This is conceptually analogous to apparent cross-reaction, which stems from the notion that cross-reaction of antipeptide antibodies with proteins is either genuine if the proteins are in native form or merely apparent if the proteins are denatured [28][29][30]; but the focus on functional correlates irrespective of underlying protein conformations obviates the classical dichotomy between native and denatured proteins, which is increasingly difficult to reconcile with the emerging paradigm of structural and functional versatility among proteins [31][32][33][34] that encompasses intrinsic protein disorder [35][36][37][38][39] as well as coupled protein folding and binding [40][41][42][43]. Positive data that reflect only functionally irrelevant cross-reactivity compromise benchmarking if they lead to misclassification of true-negatives as false-negatives and false-positives as true-positives. This problem is avoided if all the positive data used for benchmarking have been validated by assays that detect antibody-mediated change in protein function (e.g., enzyme inhibition by antibodies), as opposed to immunoassays that merely detect antibodyprotein binding without regard to protein function (e.g., binding of an enzyme by antibodies without regard to its catalytic activity) [15,44].
Whereas the interpretation of positive data is confounded by functionally irrelevant cross-reactivity, the interpretation of negative data is confounded by apparent absence of cross-reactivity, that is, failure to detect cross-reactions of antipeptide antibodies with proteins that is due to artifactual masking of B-cell epitopes rather than genuine lack of capacity for binding. Apparent absence of cross-reactivity occurs in the setting of solid-phase immunoassays for which proteins are immobilized onto solid surfaces (e.g., by passive adsorption) in a way that renders B-cell epitopes physically inaccessible to paratopes. This problem accounts for conflicting results observed among different solid-phase immunoassays [45,46] and between solid-phase and fluidphase immunoassays [46,47]; it is avoided if all the negative data used for benchmarking have been validated by fluidphase immunoassays (e.g., immunoprecipitation) for which proteins are dispersed in the fluid phase prior to their encounter with antibodies [46], provided that artifactual masking of B-cell epitopes has not occurred by alternative mechanisms (e.g., through interactions with blocking reagents that might have been used to attenuate nonspecific antibody binding).
Taken together, the preceding considerations suggest the following approach to the selection of data for benchmarking: Admit only those data that pertain to cross-reactivity of polyclonal antipeptide antibodies with proteins, choosing the positive data that reflect antibody-mediated change in protein function and the negative data that are from fluidphase immunoassays.
Data thus selected are suitable for benchmarking Bcell epitope prediction in support of applications for which peptide-based immunogens are designed to induce antipeptide antibodies that cross-react with native proteins; these applications include active and passive immunization for prophylaxis against and treatment of disease, and also the production of antipeptide antibodies as immunoaffinity reagents for protein purification (e.g., in the preparation of functional recombinant gene products). The same data must be used with caution for benchmarking B-cell epitope prediction where antipeptide antibodies are produced as diagnostic probes to detect proteins (e.g., of pathogens in clinical specimens). For this purpose, use of the data is appropriate only if the proteins are in native form when encountered by the antibodies; otherwise, both sensitivity and specificity are likely to be inaccurately estimated due to protein denaturation, as may occur during the collection, storage and processing of biological samples. This problem arises in relation to negative data on antipeptide antibodies that fail to bind native proteins yet bind denatured forms of the same proteins [28]; if the envisioned diagnostic procedure relies even partly on the detection of denatured protein, benchmarking against these negative data risks misclassification of positive predictions as false rather than true and of negative predictions as true rather than false.
As for data that pertain to cross-reactivity of antiprotein antibodies with peptides, these data are suitable for benchmarking where the intended application is the design of peptide-based diagnostic probes to detect antiprotein antibodies (e.g., for serodiagnosis of infectious diseases). For this application, published B-cell epitope prediction algorithms are found wanting as they perform only marginally better than random [12,48,49], although those based on threedimensional structure outperform those based on sequence alone [48,49]. As a B-cell epitope contains residues that must simultaneously interact with a paratope for binding to occur, the superior performance of the structure-based algorithms is plausibly realized through bypassing inaccurate sequencebased prediction of both spatial proximity among protein residues and their accesibility to antibodies [50,51]; yet the hitherto unrealized success of these algorithms is at least partly due to misplaced emphasis on structure. Application of the structure-based algorithms is focused on an assumed protein structure supposedly encountered by antiprotein antibodies in the course of immunization; if this structure (e.g., from protein crystallography) differs from the actual immunogenic structure (e.g., of denatured protein in vivo), inaccurate predictions may be inevitable [10]. Published Bcell epitope prediction algorithms are therefore incomplete in that they fail to explicitly model the immunogenic structure of proteins in vivo. Attempts to address this deficiency by structural modeling are presently impossible to validate given the lack of experimental data on protein structure in vivo; while structures have been elucidated for complexes consisting of proteins already bound by antibodies or fragments thereof, the actual structure of the proteins in vivo as they are encountered by antibodies is unknown [10]. Granting that this impasse is somehow resolved as new structural data become available, the problem of underperformance could still persist due to an exclusive focus on protein structure prior to binding by antibodies considering that their binding may bring even solvent-inaccessible antigen residues into contact with paratope residues [19,52]; if so, B-cell epitope prediction might be substantially improved only through detailed computational modeling of antibody-antigen binding itself (e.g., by means of molecular docking analyses that allow for mutually induced fit between antibody and antigen, for antiprotein antibodies both reacting with proteins and cross-reacting with peptides). By the same token, a similarly elaborate computational approach might be required for peptide-based vaccine design in order to accurately predict B-cell epitopes for antipeptide antibodies both reacting with peptides and cross-reacting with proteins.
At any rate, valid claims regarding the practical utility of a method for B-cell epitope prediction are based on benchmarking against data that correspond well to the intended application; where this application is peptide-based vaccine design, the results of benchmarking are open to question if any of the data pertain to antiprotein rather than antipeptide antibodies or can be explained by either functionally irrelevant cross-reactivity or apparent absence of cross-reactivity.

Partitioning of Selected Data
Data selected as described above are heterogeneous in view of multiple factors that complicate B-cell epitope prediction (e.g., factors that define key structural differences between immunizing peptides and their cognate proteins). If the data are partitioned with respect to these factors, parallel benchmarking against the resulting subsets of data opens the possibility of discovering context-dependent variations in predictive performance. Knowledge of such variations could aid in identifying and appropriately addressing the limitations of methods to predict B-cell epitopes, thereby improving efficiency in both utilization and refinement of these methods.
From a purely physicochemical perspective, the data can be partitioned with respect to factors that are correlated with structural similarity between immunizing peptides and their cognate proteins. Though antibodies to a peptide may cross-react with a protein of apparently unrelated sequence, the likelihood of cross-reaction diminishes with decreasing sequence similarity [53]; and even where sequences are exactly matched, cross-reaction fails to occur if conformations are sufficiently dissimilar [54]. Cross-reaction thus tends to be disfavored by structural differences between peptides and proteins at the levels of both sequence and conformation.
Peptide and protein sequences are conventionally represented as strings of symbols for the twenty canonical proteinogenic amino acids, such that a difference in sequence is inferred from mismatched symbols in a pairwise sequence alignment; but this approach overlooks more subtle differences that are nonetheless relevant to the cross-reaction of antipeptide antibodies with proteins. A case in point is the mismatching of backbone charges between peptides and their cognate proteins: even if a peptide and a protein segment appear to share exactly the same sequence, they can differ from one another in terms of electrical charge at their N-or C-terminal ends. Ordinarily, both ends of the peptide backbone are charged, as when the peptide is derived from the cognate protein by hydrolysis of peptide bonds; in contrast, the corresponding ends of the protein segment lack backbone charges if they are at internal sequence positions of the protein. If a charged end of the peptide backbone corresponds to an internal sequence position of the protein, the charge on that end is unmatched in the protein.
Unmatched charges of this nature can be eliminated by chemically blocking the ends of the peptide backbone (e.g., by acetylating the N-terminal amino group and amidating the C-terminal carboxyl group); but such blocking can itself result in an unmatched charge if a blocked end of the peptide backbone corresponds to an unblocked end of the protein backbone. An unmatched charge can also result from variability in the protonation state of the histidine sidechain at physiologic pH; antibodies elicited by peptides bearing this sidechain often bind the peptides only if it is unprotonated [55], yet it is frequently protonated as part of a folded protein [56][57][58]. As the placement of charges on epitopes is critical for binding by antibodies [55,59], the backbone and sidechain charge mismatches just described may preclude cross-reaction [60,61].
Apart from charge mismatches, conventional sequence analysis also overlooks structural differences related to the propensity of cysteine for oxidative cross-linkage via disulfide bond formation to yield cystine. As cysteine and cystine are chemically nonequivalent, a paratope optimized for binding a peptide that contains cystine in place of cysteine might fail to cross-react with a cognate protein that contains cysteine in place of cystine. Moreover, loops of residues are conformationally constrained by formation of intramolecular disulfide bonds between cysteines, with loops of identical sequence possibly adopting different conformations; although cyclization (i.e., formation of a covalently closed loop) thermodynamically favors binding by antibody through a decrease in conformational entropy [62], a paratope optimized for binding a loop in a particular conformation may fail to cross-react with a loop of identical sequence in a different conformation [9].
While conformational differences between a peptide and its cognate protein tend to disfavor cross-reaction, elimination of such differences by itself cannot ensure crossreaction; even if the peptide closely resembles a segment of the protein at the levels of both sequence and conformation, the structural placement of the segment in the whole protein poses steric barriers to binding by the antipeptide antibodies if prospective interaction surfaces on the protein Journal of Biomedicine and Biotechnology 5 are buried or located within concavities that are inaccessible to paratopes [63]. As these barriers are overcome through structural adjustment to realize induced fit between antigen and antibody, cross-reaction may actually be favored more by conformational flexibility (i.e., dynamic disorder) of both peptide and cognate protein rather than their similarity in the sense of rigid conformation (e.g., expressed as rootmean-square atomic displacements between superposed crystallographic structures) [9,52].
The variety of structural differences between immunizing peptides and their cognate proteins is increased by both chemical treatment of peptides and posttranslational modification of proteins. Chemical treatment of peptides is widely applied to elicit strong antipeptide antibody responses [64]. This commonly involves conjugation of peptides to carrier proteins using covalent linker reagents such as glutaraldehyde [65]. The aldehyde groups of glutaraldehyde react with amino groups of lysine sidechains and N-terminal residues of immunizing peptides, potentially creating charge mismatches between the peptides (whose amino groups are covalently modified to uncharged derivatives) and their cognate proteins (whose amino groups are positively charged by virtue of protonation). A greater number of charge mismatches may be produced by carbodiimide reagents, such as 1-ethyl-3-(3-dimethylaminopropyl)-carbodiimide, which covalently modify both amino and carboxyl groups to uncharged derivatives [66]. More generally, the chemical moieties invariably formed through conjugation can themselves induce antibodies that dominate the immune response but fail to cross-react with protein [65]. Posttranslational modification of proteins (e.g., glycosylation) poses essentially the same problem as chemical treatment of peptides: Both processes can give rise to structural differences between immunizing peptides and their cognate proteins, thereby disfavoring cross-reaction [67].
Beyond structural differences between peptides and their cognate proteins at the level of single molecules, cross-reaction may be disfavored by protein localization in supramolecular assemblies such as biological membranes. Antipeptide antibodies that cross-react with a protein in solution may fail to cross-react with the same protein when it is physically associated with a membrane [68]. Possible mechanisms for such context-dependent crossreactivity are steric shielding (e.g., along transmembrane protein segments) and even membrane-induced conformational transitions that themselves may be contingent upon membrane composition (e.g., lipid content) [69]. As an intact membrane is itself a mechanical barrier to the passage of macromolecules, proteins and segments thereof sequestered within a membrane-bound compartment are inaccessible to antibodies from outside the compartment unless the membrane has been adequately permeabilized (e.g., with organic solvent or surfactant) [70][71][72][73][74][75][76][77][78].
A corollary to these complexities of antibody-antigen interactions in biological systems is that various features of functionally relevant cross-reactivity are themselves factors for data partitioning. Functionally relevant cross-reactivity has already been introduced herein using the example of enzyme inhibition through binding by cross-reactive antipeptide antibodies. The enzyme could be a soluble single-domain protein for which B-cell epitope prediction would be much more straightforward than if the enzyme were membrane-bound and comprised multiple protein subunits that each contained multiple domains; in the latter case, separate polypeptide chains might even contribute residues to form a single neotope, that is, a Bcell epitope whose existence depends on the integrity of protein quaternary structure. B-cell epitope prediction is far more complicated for the neutralization of viruses and other infectious agents, in which case functionally relevant cross-reactivity may reflect emergent properties of the hostpathogen system that are left unaccounted for by analyses of the isolated pathogen or structural components thereof [79]. Synergy among pathogen virulence factors suggests the potential benefit of including more than one antigen in a vaccine; but it is difficult to predict which antigen combinations might afford protection against disease, as the interactions between immune responses to different antigens range from synergistic to antagonistic [80]. Much of this problem is due to uncertainty regarding the biological consequences of antibody-antigen binding, which tend to be highly context-dependent in ways that defy the intuitive notion of neutralization capacity as a monotonically increasing function of both antibody concentration and affinity for antigen [81]. For instance, occupancy of pathogen surfaces by antibodies often blocks host-pathogen surface interactions critical for infection while facilitating immune clearance of pathogens, yet such occupancy may itself lead to enhancement of infection (e.g., by mediating viral entry into host cells expressing receptors for the Fc portions of antibodies [82]). Divergent biological effects may also result from the binding of antigen by antibodies with similar affinities but via different molecular mechanisms (e.g., irreversibly inactivating virus only when binding induces certain conformational changes in capsid proteins [83]). The unifying theme of such phenomena is the coevolution of pathogens and the host responses against them that generates diverse antibodies to impair pathogen survival processes (e.g., disrupting various stages of viral replication cycles [84]) while it simultaneously decreases pathogen vulnerabilities to these antibodies (e.g., by transient assembly of viral surface neotopes only at the time that they are required to mediate entry into a host cell, thereby minimizing their exposure to neutralizing antibodies [85]). Taking soluble single-domain proteins from non-pathogen sources as a reference class of antigens for which B-cell epitope prediction is presumably least difficult, each additional complicating feature (e.g., quaternary structure, membrane association, origin from a pathogen) represents a factor for data partitioning.
From a broader biological perspective, numerous other factors merit investigation as well (e.g., genetic background, environmental influences, physiological status, mode of immunization). Carried to the extreme, data selected for benchmarking might be so extensively partitioned that each empirical result is placed in a class of its own; while it would not be of immediate practical value for benchmarking due to the consequent sparseness of the data, such partitioning would properly emphasize the fact that each empirical result is the product of a unique combination of circumstances.
The most compelling experimental observations in support of data partitioning as outlined above are discordant results indicating that cross-reaction of antipeptide antibodies with proteins is critically dependent on details of immunogen chemistry other than the sequences shared by immunizing peptides and their cognate proteins (insofar as these sequences are conventionally represented in terms of the twenty canonical proteinogenic amino acids). This is exemplified by discordant results obtained for the model hexapeptide of sequence IRGERA, which corresponds to the C-terminal residues 130-135 of histone H3; conjugation of this peptide to ovalbumin yields immunogen that induces rabbit polyclonal antipeptide antibodies capable of binding histone H3 if glutaraldehyde is used as a covalent linker reagent, but not if carbodiimide is used instead of glutaraldehyde [86]. Other discordant results have been documented among monoclonal antipeptide antibodies to polyhistidine, which are used to detect recombinant proteins bearing Histags (i.e., sequences of consecutive histidine residues that often facilitate protein purification by affinity chromatography on metal-chelate resin columns); using different Histagged cognate proteins (each bearing either C-terminal or N-terminal hexameric His-tag), the observed pattern of cross-reaction with the proteins varies considerably among the antibodies contrary to the expectation of uniformly positive cross-reaction [87]. Yet another notable case of discordant results concerns a peptide corresponding to a covalently closed loop (residues 24-41, with N-and Cterminal cysteines linked by a disulfide bond) in toxin alpha of Naja nigricola; antibodies capable of neutralizing the toxin are elicited by the peptide in a cyclic but not a linear form [88].
Such discordant results bring to attention the problem of discordant benchmark data, particularly where predictive methods consider only protein sequence or structure (as is the usual case); unless a predictive method is sophisticated enough to properly utilize other pertinent data (e.g., on conjugation chemistry), overtly discordant benchmark data necessarily limit the apparent performance of the method because the same prediction (i.e., either positive or negative) is rendered for any two discordant results, such that the prediction must always be deemed incorrect (i.e., either falsepositive or false-negative) for one of these two results. The problem may persist in a latent form even after excluding overtly discordant benchmark data if, for example, any of the remaining data have been derived from studies wherein conjugation chemistry produced charge mismatches between immunizing peptides and cognate proteins, in which case discordance may be manifest as negative results that could otherwise have been positive had creation of the charge mismatches been avoided [15]; if a predictive method yielded positive predictions that were benchmarked against the negative results, the predictions would be labeled as falsepositive even though they could just as well be labeled as true-positive.
The problem of discordant benchmark data is addressed by partitioning the benchmark dataset into two or more subsets, of which the primary subset is defined by excluding data whose interpretation is complicated by one or more experimental conditions known to disfavor production of antipeptide antibodies that cross-react with proteins; among these conditions are those resulting in charge mismatches between immunizing peptide and cognate protein (e.g., by virtue of peptide synthesis and postsynthetic chemical treatment [15]). Revisiting the above-mentioned example of the peptide IRGERA with cognate protein histone H3 [86], the primary subset could include a positive result for the glutaraldehyde-treated peptide (which would likely mimic the C-terminus of histone H3 due to loss of positive charge on isoleucine with maintenance of the negative charges on glutamate and alanine) but not a discordant negative result for the carbodiimide-treated peptide (which would likely differ markedly from the C-terminus of histone H3 due to loss of the negative charges). Recalling the influence of histidine protonation on the binding of antipeptide antibodies [55], the variable protonation state of histidine at physiologic pH [56][57][58] and the discordant results for cross-reactivity of anti-polyhistidine antibodies with Histagged proteins [87], data on histidine-rich and possibly all histidine-containing sequences might best be excluded from the primary subset. Likewise, data on cysteine-containing sequences might best be excluded from the primary subset given the potential difficulty of ascertaining disulfide linkage and loop conformation between cysteine residues in peptides and proteins, notwithstanding the potential for crossprotective immunogenicity of certain cysteine-containing peptides [88].
By thus partitioning the benchmark data, the problem of discordant benchmark data is mitigated within the primary subset; and by benchmarking against the primary subset, the complexity of the predictive task is restricted for assessment of performance under relatively few and simple constraints (e.g., protein structure). Compared with benchmarking against the unpartitioned benchmark data, this decreases the risk of underrating predictive methods. Actual performance is expectedly poorer in the presence of any additional constraints (e.g., charge mismatches between immunizing peptides and their cognate proteins), but these can often be avoided in practice (e.g., through appropriate synthesis and postsynthetic chemical treatment of peptides); more importantly, poorer performance in their presence suggests the possibility of improving performance through more accurate modeling of their effects.

Implications for B-Cell Epitope Prediction
The overall process just described for selecting and subsequently partitioning data drastically limits the sizes of datasets for benchmarking compared with the entire body of available B-cell epitope data. This is evident on searching the Immune Epitope Database (IEDB), a comprehensive online repository of curated empirical data on epitopes [89,90]. Over sixty-thousand individual B-cell responses are represented in IEDB, comprising positive and negative subsets of comparable size. By selecting the polyclonal antipeptide responses associated with antibody binding leading to change in biological activity (a surrogate qualifier for antibody-mediated change in protein function), about a thousand positive responses are found; by selecting the polyclonal antipeptide responses analyzed by either immunoprecipitation or radioimmunoassay (which represent fluidphase immunoassays), about a hundred negative responses are found. For both positive and negative responses, the discrepancies between numbers prior to and after selection largely reflect the exclusion of data acquired by probing antiprotein antibodies for cross-reactivity with peptides in high-throughput peptide-scanning experiments based on PEPSCAN technology [91][92][93]; the more pronounced discrepancy with negative responses probably also reflects a bias towards solid-phase immunoassays due to their greater convenience compared with fluid-phase immunoassays. These discrepancies highlight a relative scarcity of selected data and the extent to which any subsequent partitioning of these data further limits the sizes of datasets for benchmarking.
Besides facilitating the selection of data for benchmarking, IEDB also supports partitioning of the selected data with respect to factors that are represented as customizable input fields on the web-based user interface for searching among B-cell responses (http://www.immuneepitope.org/ advancedQueryBcell.php). These factors include various aspects of host biology (e.g., species, sex, age) and contextual qualifiers for assayed antibodies (e.g., source material, heavy chain type). As for factors relevant to structural differences between immunizing peptides and their cognate proteins, the partitioning of data requires supplementary information that is accessible via links within IEDB to external databases of the National Center for Biotechnology Information (NCBI) [94] for the original references in published literature as well as annotated records on cognate proteins and their biological sources. As a general rule, primary sources from literature must be reviewed to ascertain the structures of both an immunizing peptide and its cognate protein before structural differences between the two can be thoroughly evaluated; this calls for attention to details of peptide chemistry (encompassing synthesis and postsynthetic covalent modification) and protein structural biology (encompassing posttranslational modification and molecular localization).
In principle, supplementary data on peptide chemistry and protein structural biology could be curated and distributed within the framework of a revised IEDB; but given the currently limited amount of selected data, a more expedient alternative is extraction of readily available data from IEDB (e.g., raw sequences of peptides and proteins) combined with independent manual curation of the supplementary data (e.g., from literature retrieved via links within IEDB). The relative scarcity of selected data motivates their provisional partitioning with respect to a few physicochemically plausible factors. To a first approximation, each factor may be qualitatively defined as the presence or absence of a certain feature posited to complicate Bcell epitope prediction on the basis of reasoning from physicochemical principles. This is illustrated using the specific example of histidine content as a factor defined by the presence of histidine in an immunizing peptide or the corresponding segment of the cognate protein: In the presence of histidine, epitope prediction is complicated by the difficulty of ascertaining the sidechain protonation state of histidine at physiologic pH. Other factors that could likewise be considered are cysteine content (defined by analogy to histidine content); unmatched backbone charge between immunizing peptide and cognate protein (due to the manner of peptide synthesis or postsynthetic chemical treatment); structural difference between immunizing peptide and cognate protein due to posttranslational modification (other than oxidation of cysteine to cystine); and localization of cognate protein in a supramolecular assembly (e.g., biological membrane or other multimeric aggregate structure). Partitioning the data with a number n of such factors yields a maximum of 2 n non-overlapping populated datasets. Of these datasets, the most valuable is that which is free of all posited complicating features (e.g., histidine and cysteine). It is logically the main reference dataset for initial benchmarking of methods for B-cell epitope prediction to assess their performance characteristics prior to refinement; if parallel benchmarking of a predictive method against the other datasets reveals poorer performance relative to this dataset, refinement of the method can focus on those factors found to be associated with performance deficits. Cycles of refinement based on parallel benchmarking could be repeated as deemed necessary to address the performance deficits; and as the body of selected data grows with the availability of new empirical results, the entire analysis from initial data partitioning onward could itself be repeated to increase its statistical power and broaden its scope to subsume additional factors.
From a practical standpoint, factors for data partitioning appear amenable to alternative casting as exclusion criteria for data selection, thus obscuring the crucial distinction between data selection and data partitioning. For instance, the factor of unmatched backbone charge between immunizing peptides and their cognate proteins appears amenable to alternative casting as a criterion for excluding data during data selection on the grounds that the attempt at molecular mimicry is flawed [15]; yet the attempt may nevertheless succeed [47,[95][96][97][98][99][100], which is not at all surprising given the molecular mimicry of protein epitopes by peptides of apparently unrelated sequence [101,102]. Excluding data is tantamount to rejecting them as meaningless, whereas retaining them acknowledges their validity. By excluding data that arguably could be retained instead, the conceivable domain of B-cell epitope prediction is artificially restricted to problems of arbitrarily limited complexity, which creates the mistaken impression that more complex problems are either nonexistent or intractable. For example, if posttranslational modification were a criterion for excluding data on the grounds that it complicates B-cell epitope prediction [14,67], posttranslationally modified sequences might be avoided altogether in the design of peptide-based vaccines; yet the synthesis of peptides that structurally mimic these sequences is presently feasible [103], and the expanding repertoire of techniques for inducing antibodies that bind haptens [104] offers the prospect of peptide-based vaccines 8 Journal of Biomedicine and Biotechnology to induce protective antibody responses directed against a diverse array of posttranslationally modified epitopes.
In retrospect, the performance of methods to predict B-cell epitopes has yet to be rigorously evaluated for the design of peptide-based vaccines. At the same time, the long and unbroken history of failed attempts to develop clinically proven and commercially viable peptide-based vaccines [105,106] points to the inadequacy of the underlying design strategies. A common albeit unjustified working assumption of these strategies is the preponderance of protein segments whose sequences can, as isolated peptidebased immunogens, mimic neutralization epitopes to induce protective antibody-mediated immunity. Typical neutralization epitopes are discontinuous; each comprises atoms that are not all located on residues of a single contiguous sequence [106]. Such an epitope may span one or more contiguous sequences, but each of these sequences as an isolated peptide immunogen may fail to induce neutralizing antipeptide antibodies. This outcome is likely if binding of cognate protein by antipeptide antibodies is impeded by steric barriers, especially where the paratopes are structurally optimized for binding peptide sequences in conformations unlike those of the corresponding sequences in native protein [54]. To induce neutralizing antipeptide antibodies, immunization with peptides identical in sequence to cognate protein segments is probably a reasonable approach only in exceptional cases, as when the cognate protein segments are conformationally unconstrained N-or C-terminal sequences entirely accessible to antibodies in biologically relevant structural contexts (e.g., on extracellular surfaces). In other cases, neutralizing antipeptide antibodies might be elicited, if at all, only by mimotopes lacking obvious sequence similarity to neutralization epitopes recognized by antiprotein antibodies [102]. Where neutralizing antipeptide antibodies can be elicited, protective immunity might be possible only with the synergistic action of such antibodies directed against multiple distinct binding sites in vivo, in analogy to synergistic protective immunity conferred by a combination of monoclonal antibodies to different neutralization epitopes [106]. Clinically informative prediction of peptides that induce neutralizing antibodies itself requires a systems view to correctly evaluate potential for both antibody-antigen binding in vivo and the possible biological consequences thereof (e.g., neutralization of viral infectivity versus enhancement of viral infection). If ever the routine design of safe and efficacious peptide-based vaccines becomes feasible, it may necessitate combining mimotopes of various neutralization epitopes for synergistic crossprotective immunogenicity, based on thorough knowledge of pathophysiological mechanisms that vaccination seeks to suppress, disrupt or otherwise circumvent.
In any event, the development of B-cell epitope prediction methods to design peptide-based vaccines remains encumbered by the problems of benchmarking discussed herein. Presently, the most challenging of these problems is the paucity of available benchmark data; in this regard, the current situation is reminiscent of the period during which the classic sequence-based methods for B-cell epitope prediction were initially developed [107]. Since then, accumulation of data gleaned from numerous studies has been the default strategy for building benchmark datasets [11,89]. While this strategy could eventually yield ample positive data that reflect functionally relevant cross-reactivity, historical trends suggest that it would be much less effective for negative data that reflect genuine absence of cross-reactivity, owing to an overall bias towards generation of positive rather than negative data. This bias follows from the prioritization of predicted B-cell epitopes for evaluation as immunizing peptides, in keeping with the aim of peptide-based vaccine development; with the negative data generated as unintended byproducts, incentive is lacking for their confirmation as genuine using fluid-phase immunoassays.
Yet, even assuming future abundance of both positive and negative benchmark data, their use as such is subject to the criticism that they are for the most part defined by dichotomization of continuous outcome variables (e.g., the extent to which biological activity is attenuated by antibody binding, or the fraction of antigen immunoprecipitated in a fluid-phase immunoassay). This dichotomization is accomplished by applying some invariably arbitrary cut point (i.e., threshold value), which is often implicit and unknown (e.g., as determined by the limit of detection for a qualitative assay). Dichotomization of continuous data, which permits calculation of sensitivity and specificity, entails potentially significant loss of both information and statistical power [108]; this is worsened by failure to use optimal cut points [108], which are unlikely to have been consistently used among different studies from which benchmark data are pooled.
To avoid dichotomization of continuous data, benchmarking must be performed without resorting to calculation of sensitivity and specificity. Such an alternative approach could be developed by supplanting dichotomous benchmark data with continuous dose-response data on antibody-mediated modulation of biological activity, thereby redefining the predictive task as computational estimation of biological activity as a function of antibody concentration and other variables. For antibody-mediated attenuation of biological activity, the dose-response data could be expressed as the observed fraction f obs of residual activity, given by where A obs is the observed activity at some specified antibody concentration and A max is the maximal activity in the absence of antibody, holding constant all variables other than antibody concentration; the corresponding predicted fraction f pre of residual activity would be calculated for each antibody concentration used to obtain f obs , and benchmarking would be performed by evaluating the correlation between f pre and f obs . For a perfect predictive method, plotting f pre against f obs would yield points all falling on the diagonal line defined by y = x, with Pearson correlation coefficient of 1.
To demonstrate how f pre might be estimated, a simple conceptual model is that of a reversible bimolecular association reaction between a catalytically active enzyme E and an inhibitory antigen-binding antibody fragment I bearing a single paratope that binds a unique epitope on E Journal of Biomedicine and Biotechnology 9 to yield a catalytically inactive complex EI. At equilibrium, the association constant K a is given by the law of mass action as where each symbol with enclosing square brackets ([]) denotes the molar concentration of the corresponding species. If EI is initially absent, (2) may be rewritten as where the subscript of 0 denotes initial value. If catalytic activity is directly proportional to [E], f pre may be computed as by analogy to (1).
where ΔG • is the standard free energy change for the association of E with I to form EI, R is the gas constant and T is the absolute temperature; if the structure of E is known, protein structural energetics [109][110][111] provides a means to estimate ΔG • from anticipated changes in solvent-accessible surface area (ASA) for putative B-cell epitopes of E upon their binding by paratopes [14,15].
The main disadvantage of reliance on (3) is the assumption of an equilibrium state that may never be reached in practice; in particular, f pre may overestimate f obs if EI is thermodynamically stable but formed very slowly. This error might be avoided by using an alternative approach based on a kinetic description of [EI], such as where t is reaction time, with k on and k off being the respective rate constants for association and dissociation. If values of k on and k off are also available in addition to [E] 0 and [I] 0 , solving (6) for [EI] at the appropriate time t (e.g., for preincubation of E with I before assaying catalytic activity) allows calculation of f pre using (4). Estimation of k on could be attempted on the basis of transition state theory [112], as where k B is the Boltzmann constant, h is the Planck constant and ΔG ‡ is the free energy change of transition state formation; in turn, k off could be estimated using the relationship in conjunction with (5) and structural-energetic methods for calculating ΔG • . The value of ΔG ‡ can be crudely approximated as an energetic penalty for structural adjustment of E upon binding by I, as calculated for the unfolding of a putative B-cell epitope on E [15]; better approximation of ΔG ‡ might be feasible through detailed computational modeling of the antibody-antigen binding process, which may occur as a multistep series of conformational changes [113].
If benchmarking were to be redefined as evaluating the correlation between f pre and f obs , data selection would be redefined as the admission of continuous dose-response data comprising biological activity data (from which f obs could be computed) and other pertinent empirical data (e.g., reactant concentrations and protein structure, from which f pre could be computed); meanwhile, data partitioning would be applicable not only as already described for dichotomous benchmark data (e.g., with respect to charge mismatches between immunizing peptides and cognate proteins), but also with respect to the manner in which f pre is computed. More elaborate quantitative models than suggested by (6) describe real antibody-antigen interactions [81]. Typical antibody molecules each bear two or more paratopes while an antigen molecule may bear two or more similar if not identical B-cell epitopes; cooperative antibody-antigen binding phenomena do occur, and antibody-mediated modulation of biological activity may be nonlinearly related to the extent of antibody binding (e.g., where neutralization of viral infectivity is enhanced by antibody-mediated aggregation of virions, which may occur only near the equivalence point of antibody-virion interaction [84]). Under these various circumstances, uniformly accurate prediction is unlikely if based on a first-approach method for computing f pre . Data partitioning could therefore be performed according to the computational complexity demanded by modeling of the antibody-antigen binding process and its functional consequences, such that relatively simple methods for computing f pre are benchmarked against data on correspondingly simple systems (e.g., a single-domain enzyme having but a single active site); this could facilitate refinement of the methods that possibly extends their applicability to more complex systems (e.g., an enzyme having multiple active sites, or a virion having multiple binding sites for receptors on host cells).
At a deeper level, the underlying logic of B-cell epitope prediction methods may itself provide additional insights on how problems might be avoided through data selection and partitioning. As applied to the design of peptide-based vaccines, the impact of selecting data on excessively long immunizing peptides is, for example, clarified by examining the implications of a structural-energetic approach to Bcell epitope prediction [14,15]; this models immunodominance among B-cell epitopes of immunizing peptides as a thermodynamically determined hierarchical steric-exclusion phenomenon, based on the premise that antibodies are preferentially elicited by an immunodominant epitope due to its higher affinity for antibody relative to other epitopes with which it physically overlaps [15]. If an immunizing peptide contains exactly one predicted immunodominant epitope, any observed functionally relevant cross-reactivity can be readily attributed to the epitope; but if the peptide contains more than one such epitope, a problem of ambiguous attribution arises that can lead to erroneous benchmarking results (e.g., if functionally relevant crossreactivity is incorrectly attributed to a functionally irrelevant epitope for computation of f pre ). This implies the existence of an upper limit on the length of an immunizing peptide for data on the peptide to be reliably informative; if immunizing peptides are assumed to be completely unfolded such that each hexapeptide sequence thereof is regarded as a candidate epitope [14,15], ambiguous attribution seems possible for an immunizing peptide as short as 12 residues (in which case two predicted immunodominant epitopes might occur in tandem) and may be unavoidable for an immunizing peptide as short as 17 residues (in which case two predicted immunodominant epitopes may be accommodated regardless of how the first one is positioned), and even if an immunizing peptide is so short that it contains only one predicted immunodominant epitope, ambiguous attribution is still a problem if the epitope sequence occurs in nonidentical structural contexts as part of the cognate protein (e.g., as sequence repeats within a single protein domain, each of which yields a distinct value of f pre ). Data selection could avoid such problems of ambiguous attribution by admitting data on functionally relevant cross-reactivity only where the immunizing peptide contains exactly one predicted immunodominant epitope sequence that does not occur in nonidentical structural contexts as part of the cognate protein; this would also avoid the problem of having to experimentally define the structural boundaries of epitopes, which is impossible to accomplish by analyzing polyclonal antibodies through the use of immunoassays [9]. Subsequent partitioning of the data thus selected could be performed with respect to repetition of the predicted immunodominant epitope sequence in the cognate protein (e.g., in a soluble symmetric homodimer, for which the sequence occurs in two structurally identical contexts that yield the same value of f pre ); such repetition may enable the formation of large immune complexes through cross-linkage of antibodies by antigen, which could complicate the computational analysis of antibody-antigen binding (e.g., by leading to antibodymediated aggregation of virions).
Progress in B-cell epitope prediction might therefore be realized through development of quantitative methods that is guided by physicochemical principles; but in the final analysis, success is an unrealistic expectation for peptidebased vaccine design without a comprehensive contextual basis for application-specific computational modeling of immune responses and their biological consequences. Even if a computational method were developed that could accurately predict functionally relevant cross-reactivity in general (e.g., as benchmarked by evaluating f pre against f obs for a wide variety of systems), only profound contextspecific knowledge (e.g., on molecular recognition of host cell receptors by pathogen virulence factors) would ensure correct identification of peptide sequences that could induce protective immunity against a particular disease (e.g., by mimicking appropriate neutralization epitopes). This knowledge would encompass aspects of host immunobiology such as immune tolerance that result from the interplay of numerous genetic and environmental factors in both healthy and diseased states; immune tolerance deserves special attention in this regard, as problems could arise due to either immune tolerance itself (e.g., complicating the prediction of immunogenic epitopes) or vaccine-induced loss thereof (e.g., leading to autoimmunity or other forms of hypersensitivity [114,115]). Perhaps the most serious defect of attempts to design peptide-based vaccines has been the naive understanding of B-cell epitope prediction as a generic procedure that requires only physicochemical knowledge of antigens.

Conclusions
The development of peptide-based vaccines demands refinement of methods to predict B-cell epitopes that is based on benchmarking against judiciously selected and partitioned empirical data. The data are meaningful for this purpose if they pertain to cross-reactivity of polyclonal antipeptide antibodies with proteins, with the positive data reflecting functionally relevant cross-reactivity and the negative data reflecting genuine absence of cross-reactivity. These data can be partitioned with respect to multiple factors that complicate B-cell epitope prediction, of which the most important are those that define key structural differences between immunizing peptides and their cognate proteins. Parallel benchmarking against the resulting subsets of data is a foundational strategy to discover context-dependent variations in the performance of methods for B-cell epitope prediction, serving to identify and address the limitations of these methods through iterative cycles of benchmarking and refinement. Ultimately, this could lead to the establishment of an integrative computational platform for B-cell epitope prediction that reliably guides the design of peptide-based vaccines. In the meantime, B-cell epitope prediction must transcend its de facto role of a rudimentary screening tool whose utility is severely limited by a narrow reductionist view of vaccine design.