Towards Defining Molecular Determinants Recognized by Adaptive Immunity in Allergic Disease: An Inventory of the Available Data

Adaptive immune responses associated with allergic reactions recognize antigens from a broad spectrum of plants and animals. Herein a meta-analysis was performed on allergy-related data from the immune epitope database (IEDB) to provide a current inventory and highlight knowledge gaps and areas for future work. The analysis identified over 4,500 allergy-related epitopes derived from 270 different allergens. Overall, the distribution of the data followed expectations based on the nature of allergic responses. Namely, the majority of epitopes were defined for B cells/antibodies and IgE-mediated reactivity, and relatively fewer T-cell epitopes, mostly CD4+/class II. Interestingly, the majority of food allergen epitopes were B-cells epitopes whereas a fairly even number of B- and T-cell epitopes were defined for airborne allergens. In addition, epitopes from nonhumans hosts were mostly T-cell epitopes. Overall, coverage of known allergens is sparse with data available for only ~17% of all allergens listed by the IUIS database. Thus, further research would be required to provide a more balanced representation across different allergen categories. Furthermore, inclusion of nonpeptidic epitopes in the IEDB also allows for inventory and analysis of immunological data associated with drug and contact allergen epitopes. Finally, our analysis also underscores that only a handful of epitopes have thus far been investigated for their immunotherapeutic potential.


Introduction
It is estimated that 50 million people in the US are affected by airborne allergens, including approximately 35 million affected by upper respiratory allergies (allergic rhinitis, hay fever and pollinosis) [1], and 16 million affected by asthma [2,3].The cost of allergies in the US (treatment and loss of work) is estimated to be more than $18 billion per year [4].Food allergies, representing the second largest category after respiratory allergies, are thought to affect 6-8% of children and nearly 4% of adults.In the US, there are ∼30,000 episodes of food-induced anaphylaxis, associated with 100-200 deaths per year [5,6].Finally, skin contact allergies and allergies to insect venoms also occur with significant incidence and are thus important component of allergic diseases in humans.These figures underscore the growing societal impact of allergy-related disease both in terms of human suffering as well as annual cost burden.
The immunological basis of allergy-related disease is universally recognized.At the level of adaptive immunity, the recognition of specific allergens by antibodies and T cells plays major roles both as effectors and regulators of allergic diseases.Several bioinformatics resources, cataloging and describing allergen protein sequences, are available to the scientific community such as the Allergome, which provides information on allergenic molecules causing IgE-mediated disease, and Allergen.org,which is the official site for the systematic allergen nomenclature approved by the World

Data Inclusion
Criteria.This analysis includes data for antibody and T-cell epitopes associated with allergic disease in human and nonhuman (animal models) hosts.To identify within the IEDB the subset of data which is allergy related, we followed the process described in more detail by Davies et al. [10].More specifically, we define herein allergy-related data on the basis of the source from which the epitope is derived (known allergen), and also on the basis of the type of response and/or clinical presentation.Accordingly, we included IgE-mediated, type I (immediate) hypersensitivity, atopic, "allergen-sensitization," exposurebased asthma, allergic rhinitis, pollinosis, contact dermatitis, atopic dermatitis, documented anaphylaxis, and all data from allergy-related animal models.
The allergy-related epitopes represent both peptidic as well as non-peptidic structures from a wide range of sources, including pollens, dust mites, molds, dander and foods, nonprotein moieties of plants (carbohydrates), as well as drugs, haptens, metals, and chemical substances from occupational exposures.The curated data was obtained using a variety of different assays, such as ELISA, Western blot, proliferation assays, surface plasmon resonance (SPR), radio immunoassay (RIA), and X-ray crystallography, and describes epitoperelated reactivity such as histamine release, hypersensitivity (PCA), delayed-type hypersensitivity (DTH), and immunotherapy assays.
All of the data described herein were captured directly from the peer-reviewed literature (PubMed) by Ph.D. level scientists or through direct submission to the IEDB by research groups.Antibody and T-cell epitope definitions (length and mass restrictions) as well as IEDB inclusion criteria can be found at http://tools.immuneepitope.org/wiki/index.php/MainPage.For the purpose of this report, the total number of epitopes reported in each case represents the total number of unique molecular structures experimentally shown to react with a B-cell or T-cell receptor (no predictions included).The IEDB captures these structures as they are defined in the literature and thus includes data describing structures categorized as minimal/optimal epitopes (11-15 resides), larger less well-defined regions (20-50 residues), and key residues identified as being involved in binding (1-2 residues).

Analysis Approach.
The entirety of the allergy-related data identified as described above was first inventoried to identify the total number of structures (positive and negative epitopes), their chemical nature (peptidic or non-peptidic), the total number of antibody/B-cell versus T-cell epitopes, as well as the effector cell phenotype or antibody isotype, and finally the total number of peer-reviewed references from which the data were derived.The second step involved investigating the distribution of epitopes among hosts: those epitopes defined in humans versus those identified using nonhuman animal models of allergy.In each case, the inventory of epitopes per host species included a breakdown according to reactivity: B cell (linear or conformational) or T cell (CD4, CD8 or unspecified).
Following the initial inventory, the data were categorized according to the following established allergy categories: food, airborne (respiratory), contact, drug, and allergies to biting insects.This categorization was based on the allergen and genus species of the organism from which the epitope was derived.These main categories were then further parsed into subcategories on the basis of taxonomic origins (plant, animal or fungus) and included a subcategory for the most commonly encountered species in that main category.
The individual compounds representing drugs/pharmaceuticals were parsed into 21 subcategories on the basis of its chemical type (e.g., beta-lactam antibiotic) or by the way the compound is used to treat a particular condition (e.g., muscle relaxant).Contact allergen data were also further parsed into subcategories based on their species of origin (plants), chemical type (metals, model haptens), or mode of exposure (chemical agents from occupational exposure).

Computational Methods.
The allergy-related data extracted from the IEDB (http://www.immuneepitope.org/)was stored in a MySQL database.The use of MySQL allows for the tailoring of database schema to the specific analysis and to keep the data synchronized with updates of the IEDB data production database.Data were periodically checked against the IEDB webpage using simple or advanced query interfaces for consistency and accuracy.Results from each query were exported as Excel files and further analyzed in that format.Tables and figures were generated from Excel.Data exclusions included structures for which only MHC binding data were available, as well as those instances in which the epitope was simultaneously used as both immunogen and assay antigen.

Data
Overview.An overview of all allergy-related data captured by our analysis is provided in Tables 1 and 2.
Consistent with the importance of immunoglobulin-related responses as effectors of allergy responses, the majority of epitopes (both peptidic and non-peptidic) were defined for antibody responses, including both linear (∼3,000) and conformational (or discontinuous) determinants (peptidic only) (Table 1).A total of 2,205 IgE epitopes were reported for all allergens, and less numerous other reactivities related to total IgG followed distantly by IgG1 IgG4, IgM, IgA, IgG2b, IgG3, IgG2a, and IgG2c (Table 2).As can be seen, the majority of antibody determinants were defined in humans.In animal models of disease, not only relatively fewer epitopes were defined, but only about 10% of them are epitopes recognized by IgE.This highlights a crucial knowledge gap and suggests that more research could be directed at the definition of the epitopes recognized by IgE in animal models of allergy.
A relatively smaller number of T-cell epitopes have been identified (1,646 epitopes) (Table 1).Of the T-cell epitopes defined in both peptidic and non-peptidic allergens, CD4 + /Class II epitopes were most numerous, and far fewer CD8 + /Class I epitopes were reported.Given their potential role in contact dermatitis and other delayed-type hypersensitivity reactions, it is likely that more effort could be devoted to the definition of class I epitopes.Supplementary Figure 1 (see Figure s1 in supplementary material available on line at doi: 10.1155/2007/628026) provides a response summary for all epitope data.
The host distribution of epitopes can be found in Table 3.Not surprisingly, the vast majority of epitopes were defined in humans.However, epitopes were also described for monkeys, pigs, dogs, rabbits, guinea pigs, rats, and mice.Of the nonhuman species, epitopes defined in mice represented the second largest group.Within epitopes defined in mice, more than 30 different strains were represented (data not shown).BALB/c predominated, followed distantly by C57BL/6 and C3H/He.Data from human HLA transgenic strains (HLA-A, DR4, DQ6, DQ8, and DR3-DQ2) were also reported.Table 1 also describes a breakdown of epitope numbers categorized as related to food allergies, airborne or respiratory allergies, allergies to stinging insects, drug allergies, and contact allergies.Food allergens represent by far the largest group of data in the IEDB.There are currently 2,322 (53%) Band T-cell epitopes identified from this group.After food allergens, epitopes defined for aeroallergens represent the second largest group, accounting for 40% of the records.To date, the database contains 125 antibody and T-cell epitopes related to the venom of stinging insects, which make up 3% of the epitope total.Drug allergies account for ∼2% of epitopes.Contact allergies manifested through the skin account for ∼2% of allergy epitopes.The following sections describe each epitope category in more detail.

Food
Allergies.These include both peptidic and nonpeptidic determinants derived from both plants and animals.The data have been parsed into three broad categories; most common food allergen sources, other plant, and other animal species (Table 4).Peanut (Arachis hypogaea) allergens which comprise nearly 40% of the total plant allergen epitopes.Epitopes described for food allergens derived from animals fall into six taxonomic categories.These include mammals (human and cow milk, beef, beef gelatin), bony fish (cod), bird (chicken eggs), mollusks (abalone and snails), crustaceans (shrimp and prawns), and nematodes (fish meat parasite).By far, the largest number of epitopes has been identified for allergens related to cow's milk allergy, followed by epitopes defined from eggs, representing 70% and 20% of the total, respectively.Non-peptidic food epitopes reported to date are comprised of carbohydrates derived from peanuts, sugar beets, celery, and sea squirt (see also supplemental Table 6).
According to the CDC [11], allergens derived from milk, eggs, peanuts, tree nuts, fish, shellfish, soy, and wheat account for 90% of all food allergies.The epitope data in general reflect this distribution.However, fewer epitopes were identified from fish (only 10 epitopes for one species) and shellfish other than shrimp.Conversely, there was a surprising number of epitopes described from fruit allergens, namely peaches, apples, and bananas.This observation may reflect the involvement of these species in the oral allergy syndrome (OAS) or pollen-food allergy and cross-reactions between foods (fruits, nuts) and inhaled allergens [12][13][14][15].

Airborne Allergies.
Epitopes defined for aeroallergens represent the second largest group within the IEDB, accounting for 40% of the records, including peptidic and nonpeptidic determinants derived from plants, animals, fungal allergens, and some industrial chemical agents.Here, the data was parsed into the categories of most common airborne sources, other plant, fungal, and animal species (Table 5).Epitopes identified from pellitory pollen, as well as those from birch and Japanese cedar pollen, are numerous.Epitopes reported for grass pollens come primarily from Timothy grass, ryegrass species, and Kentucky blue grass.
In the taxonomic grouping representing fungi, which includes yeasts and molds, epitopes identified in antigens from Aspergillus species dominate (70%).Finally, epitopes derived from aeroallergens from animals fall into three broad taxonomic categories: insects (cockroach and midge), arachnids (house dust mite, Storage mite, and Fodder mite), and mammals (cat, dog, horse, cow, rat, and mouse).Among the insects, epitopes derived from the midge are the most numerous, and within the Arachnid class, European house dust mites are the most heavily studied.
Here again, the epitope data reflects the overall trends related to airborne allergy.Grass, tree, and weed pollen epitopes represent the majority of the data (∼60%), followed by pet dander and house dust mite allergens.These findings are consistent with the overall prevalence of hay fever and/or allergic rhinitis in the general population, affecting some 18 million people annually [16].Perhaps somewhat unexpected, was the fairly low number of epitopes defined for cat allergens.Interestingly, the majority of food allergen epitopes were B-cells epitopes (86%) whereas a fairly even number of B (43%) and T-cell (57%) epitopes were defined for airborne allergens (data not shown).

Drug Allergies.
The IEDB currently contains curated data relating to immunological reactions to more than 90 different drugs associated with allergic disease.In most cases, the authors do not identify the exact reactive moiety of these non-peptidic chemical entities because the assays are carried out using the intact drug.These drugs can be further classified into 21 categories based primarily on biological function and structure (Figure 1).These include beta-lactam antibiotics (the penicillins), barbiturate anesthetics, bactericidal/antimicrobial, muscle relaxants, antihypertensive, antiparasitic drugs, neurotransmitters, sulfabased antibiotics, local anesthetics, hormones, antifibrinolytics, antiemetics, antihistamines, antipsychotics, antitussives, muscle stimulants, opiates, radiocontrast media, spermicides, and a vasoactive agonist.Antibiotics as a whole comprise nearly half (49%) of the reported drug allergens, with the vast majority of which are beta-lactam antibiotics.
3.6.Contact Allergies.Thus far, more than 80 contact allergens have been captured by the IEDB, as summarized in Figure 2. Epitopes identified from latex-allergic individuals represent the largest number of contact allergen determinants, making up 59% of the total.A total of reported 207 latex epitopes include both linear and nonlinear antibody epitopes, as well as T-cell epitopes, primarily of the CD4 + /class II phenotype.Three additional categories of contact allergens include non-peptidic entities such as metals, industrial chemicals encountered by way of occupational exposure, and model haptens.A total of seven different metals described as associated with allergic contact dermatitis include beryllium (beryllium sulfate tetrahydrate, beryllium sulfate), chromium (chromium trichloride), cobalt (cobalt dichloride), copper (copper sulfate, copper chloride), nickel (nickel chloride, nickel sulfate) palladium (palladium chloride), and zinc chloride.Of these, no single metal entity stands predominates, and as a group metals comprise only 7% of the contact allergens.Beryllium, chromium, zinc chloride, and cobalt are most often encountered in the industrial/manufacturing setting, whereas nickel, copper, and palladium allergies are most frequently associated with jewelry.Furthermore, the IEDB contains curated data relating to more than 70 compounds utilized in the manufacture of cosmetics, dyes, and certain constituents of manufacturing.A very large number of curated assays relate to model haptens, which include skin sensitizers such as trinitrophenyl (TNP), dinitrophenyl (DNP), 1-fluoro-2,4-dinitrobenzene (DNFB), and dinitrochlorobenzene (DNCB).These compounds have been used classically to define mechanisms of type IV contact hypersensitivity.Of these, DNCB appears to have received the greatest focus.A detailed list of all contact allergens can be found in supplemental Table 1.10% 6% 6% 3% 3% 2% Barbiturate anesthetic (12) Bactericidal/antiseptic (7) Muscle relaxant (7) Anti-hypertensive (4) Anti-parasitic ( 4)

Epitope Distribution by Allergen.
As a further evaluation, we determined the relative epitope distribution by allergen for each source species (supplementary Tables 2-5).The total number of epitopes described per allergen varies greatly, and well-known allergens (e.g., Ara h 1, Bet v 1, or Phl p 1) tended to have greater numbers of defined epitopes compared to other allergens from the same organism (e.g., seed storage protein SSP2, Bet v 2, Bet v 4, Phl p 2, or Phl p 11).Similarly, the total number of T-cell versus B-cell epitopes varied greatly, with the vast majority of allergens heavily weighted toward one or the other phenotype and few having a relative balance of defined B and T epitopes (data not shown).
Next, we analyzed the extent to which the allergens comprising the epitope-related data represent all known allergens, as listed by the Allergen.orgresource, the official site for the systematic allergen nomenclature (Linnean system) that is approved by the World Health Organization and International Union of Immunological Societies (WHO/IUIS) Allergen Nomenclature Sub-committee.This site maintains a list of all currently known (described) allergens derived from plant, animal, and fungal species.We found that total number of allergens from which epitope data have been described varies from one allergen source to another.In some instances, epitope data is comprehensive, showing epitope data for all allergens identified by the IUIS list for a given species (e.g., 9/9 Phl p allergens for timothy grass).However in other cases, allergen distribution is low, showing only a few of the known allergens (e.g., 6/29 Der p and Der f allergens for house dust mite), whereas other species have intermediate distribution (e.g., 4/6 Lol p allergens from rye grass) (Table 7).Furthermore, when we compared the total number of allergens in the IUIS that match the allergyrelated species reported in the IEDB, we find that ∼40% of the IUIS-designated allergens are represented in the epitope data (115 out of 297).However, for an additional 380 known IUIS allergens, no match could be found between the species in the IUIS and the species described in the papers in the scientific literature describing specific epitopes.Many of these include organisms from known genera, but with as yet nonlisted species, as well as other nomenclature inconsistencies.These results suggest that more efforts can be devoted to reconciling the origin of allergen-derived data.

Epitopes Associated with Clinical Disease or Disease
Models.Isolated epitopes can be utilized to induce or modulate allergic reactions in animal models.The use of synthetic epitopes to modulate allergic reactions has also been proposed and tested in a limited number of clinical trials [17,18].Indeed, the epitopes defined in the course study of human allergic conditions may enable the investigation of their potential in the immunotherapeutic setting.
To inventory which epitopes had been tested in these settings, we queried for antibody and T-cell epitopes that were tested either in vivo for their ability to decrease allergic reactivity in vivo (as measure by the reduction of symptoms) and for those that were shown to decrease in vitro markers of allergic disease.This is done by selecting all B-cell or T-cell contexts designated in the IEDB as assay type equals "Reduction of Disease after Treatment" (B cell) or "Treatment" (T cell).Here, the assay type assigned by the IEDB indicates the nature of the immune response, and the details of the type of assay used (lung function, DTH, PCA, etc.) can be found within the curated data from the assay comments field.Table 8 shows the PubMed identification, epitope name, epitope sequence, the host, the type of response, and allergy model classification for peptidic epitopes identified from the data as having a positive effect on disease in vivo or on markers of disease as measured in vitro.

Discussion
The analysis presented herein identified over 4,500 allergyrelated epitopes derived from 270 different allergens.Protein allergens were categorized according to their source organism, which included plants, animals, insects, parasites, and fungi.Non-peptidic allergens were categorized into four groups including drugs and biologicals, industrial compounds, or those related to occupational exposure, metals, model haptens, and carbohydrates from plants.
Overall, the distribution of the data follows expectations based on the nature of adaptive responses involved in allergy.Namely, the vast majority of allergy epitopes were defined for B cells/antibodies (and in these records, IgE-mediated reactivity figured prominently), and relatively fewer T-cell epitopes (mostly defined as CD4 + /class II, with very few being defined for CD8 + /class I).Likewise, most of the records related to the study of allergic reactions in humans, and fewer epitopes defined for mice and occasional epitopes defined for other hosts such as monkeys, pigs, dogs, rabbits, guinea pigs, and rats.The majority of peptidic epitopes were defined for foods (cow's milk, wheat, peanuts) and plants (tree and grass pollens), while the majority of non-peptidic epitopes defined for drugs and biologicals (antibiotics).
Interestingly, the vast majority of food allergen-related epitopes were described for B-cells, whereas a fairly even number of B-and T-cell epitopes were defined for airborne allergens.It is not clear why this is the case but may have to do with historical analysis of allergies to foods such as milk, peanuts, and eggs which represent a large portion of that data.The distribution of epitopes varies greatly between allergen and species.This observation suggests that definition of T-cell epitopes involved in food allergies is lacking and could be the focus of further experimental investigations.
Another unexpected finding of our analysis was that the epitopes defined in hosts other than humans were mostly Tcell epitopes, and far fewer antibody epitopes were defined.While it is surprising that so little of the nonhuman antibody responses are allergy-specific IgE; this may point to an important area for experimental investigation, to provide investigators with animal models faithfully reproducing human allergic reactions.
The current analysis also revealed that coverage of known human allergen by epitope definition studies is very sparse.The overall completeness of the epitope-specific allergy data with respect to known allergens on a species basis is about 40%.Furthermore, epitope data is available for only ∼17% of all allergens listed by IUIS.For certain species, the majority (if not all) of the known allergens have epitope-related data (e.g., timothy grass allergens), while other species have epitope data from only a small number of known allergens (e.g., apple).
The recent completion of curation of non-peptidic allergy-related epitopes in the IEDB allows a first time inventory and assessment of important drug and contact allergens.The integration within the IEDB of representation and search capabilities based on the chemical entity of biological interest (ChEBI) (http://www.ebi.ac.uk/chebi/) database will further enable the scientific community to quickly retrieve and analyze the immunological data associated with these important classes of allergens.
Finally, our analysis also inventoried which epitopes have been used to actively induce allergic disease in animal models or to modulate disease.Only a handful of epitopes have been investigated for their immunotherapeutic potential.If the promising results from human clinical trials were to be verified in later phase trails, we anticipate that the data cataloged within the IEDB might provide a wealth of leads for therapeutic intervention regimens.

Figure 1 :
Figure1: Drug allergens by functional category.Determinants identified under this category have been broadly classified into 21 groups according to their overall biological functional.The chart presents these data as percentages with the total number of unique assays in parentheses.

Figure 2 :
Figure 2: Categories of contact allergen epitopes.The chart provides a broad overview of the contact allergen epitope distribution.

Table 1 :
Overview of allergy epitope data included in the IEDB.

Table 2 :
Antibody isotype associated with epitope reactivity.

Table 4 :
Epitope data related to food allergy.Genus species have been modified to match IUIS usage.Synonyms for querying the IEDB: tomato (Solanum lycopersicum) and brown shrimp is (Farfantepenaeus aztecus).

Table 5 :
Epitope data related to Airborne/Respiratory Allergy.Genus species have been modified to match IUIS usage.Synonyms for querying the IEDB: Arizona cypress (Hesperocyparis arizonica).

Table 6 :
Epitope data related to stinging insects.

Table 7 :
Summary of allergen coverage.This table provides a comparison of the total number of allergens designated by the IUIS and housed within database Allergen.orgthat match the allergyrelated species reported in the IEDB.