Building a Cell and Anatomy Ontology of Caenorhabditis Elegans

We are endowed with a rich knowledge about Caenorhabditis elegans. Its stereotyped anatomy and development has stimulated research and resulted in the accumulation of cell-based information concerning gene expression, and the role of specific cells in developmental signalling and behavioural circuits. To make the information more accessible to sophisticated queries and automated retrieval systems, WormBase has begun to construct a C. elegans cell and anatomy ontology. Here we present our strategies and progress.


Introduction
Ontologies allow better organization of knowledge.By explicitly specifying semantics and relations, ontologies make it possible to effectively organize complex information.A successful ontology consists of factual statements organized in a strictly logical manner.Consequently, an ontology can be expressed in a computer-understandable language, dramatically increasing its utility.
Ontologies are useful to biologists; the great success of the Gene Ontology (GO; http://www.geneontology.org) is evidence of this.By organizing accumulated knowledge about biology into three orthogonal ontologies, GO provides a summary of what we know in a logical, machine-readable structure.GO has allowed improved interpretation of large-scale genomic analysis.The usefulness of ontologies goes up dramatically as the amount and the degree of complexity of biological knowledge increases.
There is detailed information on many aspects of the cells and anatomy of Caenorhabditis elegans: we know how many cells there are in a C. elegans at all times of its life cycle [4,5,7,[10][11][12][13]; we have a near-precise knowledge of most cells' lineage and developmental fates, and in a few cases the nature of developmental indeterminacy; we know, at the electron microscopy (EM) level, how most neurons are connected to each other [1,15]; we know the function of many cells in development or in mature animals; and we have detailed functional and morphological information on anatomy (generally reviewed in [6,16]).Thus, a C .elegans cell and anatomy ontology (CECAO) would be useful to organize all this information.

C. elegans cells and anatomy
C. elegans is a free-living soil nematode that feeds on bacteria in the laboratory.It is small (about 1 mm long and 80 µm in diameter when fully grown) and semitransparent.It has two sexes, selffertilizing hermaphrodite (morphologically female) and male.C. elegans has seven major developmental stages: embryo, four larval stages (L1-L4), reproductive adult, and a dispersal dauer larval stage that is alternative to L3 and takes place only under certain harsh living conditions.
As for other nematodes, C. elegans has a general body plan that is made up of two concentric tubes separated by a pseudocoelom.The outer tube consists of a single cell layer epidermis, 122 R. Y. N. Lee and P. W. Sternberg four body-wall muscle quadrants, and neurons.The inner tube contains the alimentary system, comprising (in anterior to posterior order) the pharynx, intestine and hindgut.Adults also have gonads in the pseudocoelomic space.The most complex organ system in the worm is the nervous system, which is organized in several ganglions.Neurons make contacts en passant and the major nerve bundles form the circumpharyngeal nerve ring and the dorsal and ventral cord (reviewed by [16]).
Because of its transparency, one can observe the process of C. elegans development in vivo with the aid of differential interference contrast (Nomarski) optics.Thus, Sulston et al. [13] painstakingly traced the full embryonic cell lineage, from single-cell zygotes to either 558-cell hermaphrodite or 560-cell male hatchlings.During embryogenesis, cells divide, migrate, differentiate and sometimes die.Confirming what others had noted (reviewed by [2]), Sulston et al. [13] found that the nematode follows a largely invariant cell lineage pattern, an important feature that enables studying of the process of development via lineage analysis.
During post-embryonic development, somatic blast cells continue to divide and differentiate through larval stages, so that a mature hermaphrodite has 959 somatic nuclei, whereas an adult male has 1031 [16].Post-embryonic cell lineages have been traced by Sulston and Horvitz (focus on somatic cells [12]) and by Kimble and Hirsh (focus on gonads [4]).Although most cell fates are rigid and invariable, there are a few exceptions, e.g. in the hermaphrodite lineages that ultimately form the somatic gonad, there are two alternative lineages involving the blast cells Z1.ppa, Z1.ppp, Z4.aaa, and Z4.aap (cells are named by their lineage; a, anterior; p, posterior; such that Z1.p is the posterior daughter of Z1).The two alternative lineage patterns are related by a two-fold rotational symmetry ( [4]).
Although useful in lineage tracing experiments, Nomarski optics is insufficient in working out cell-cell contacts and some subcellular details.Using EM and reconstruction of serial thin sections, fine anatomical and cellular details have been delineated for many parts of the worm, including the anterior sensory organ [14], the pharynx [1], and the male sexual organs [11].A particularly heroic body of work is the reconstruction of the entire nervous system from electron micrographs by White et al. [15], which provides an anatomical sketch of how neurons are connected to each other, to muscle and to other postsynaptic partner cells.Based on the anatomical analyses, neurons are grouped into classes, such that members of each class share similar anatomy and thus may also perform similar functions.Using the green fluorescent protein (GFP) labelling technique, one can now also observe subcellular anatomical features in live animals by light microscopy.
C. elegans researchers have been taking advantage of this deep knowledge of cells as the basis for experimental analyses.Consequently, C. elegans research is very much rooted in the knowledge of cells and anatomy, e.g.gene expression is routinely annotated to specific cells in addition to tissues; specific effects on cell lineages and fates are analysed for genetic and physical manipulations (such as laser microsurgery); and proposed neuronal pathways are tested for their roles in mediating specific animal behaviours.

Design of CECAO
The objective of CECAO is to provide an ontology that contains all the information about C. elegans cell and anatomy so that the information can be parsed by computer programs.However, we would also like an ontology of controlled vocabularies that readily supports annotation of experimental results, such that the outcomes can be queried effectively, e.g.we would like the ontology to support complex queries such as, 'Which genes are expressed in the lineage parents of pharyngeal, but not of somatic sensory, neurons?', or 'Which cholinergic neurons are in the tail region of the male?' Our ontology will consider five major aspects: cell lineage, position, cell type, organ and function.A cell can be identified by one or more of these aspects, e.g. a cell whose formation follows the lineage AB.plpaappaa, has its nucleus in the left lateral ganglion, is a neuron, is part of the amphid sensilla, and senses touch, high levels of osmolarity and other forms of noxious stimuli, is ASHL.The complex nature of this range of information precludes a simple, hierarchical tree format; instead, a more complex data structure is needed, i.e. a directed acyclic graph.
As we began to build this CECAO, we realized that we needed to apply new strategies to achieve logical consistency and to be able to represent all knowledge of C. elegans cells and anatomy.Here we discuss a few examples.

The distinction between a cell and its nucleus
• Lineage: Although usually referred to as the 'cell lineage', a lineage determined by observation with Nomarski optics primarily concerns nuclei.
A nucleus divides to give rise to two nuclei.
Whereas the identity of a cell is often established by a set of properties, a nucleus has a defined parentage and thus a precise position in a lineage.Therefore, CECAO uses nuclei to define nodes in lineages.A child nucleus has a DESCENDENT OF relationship with the parent nucleus.• Syncytia: Like other metazoans, some cells in the worm are syncytia.A syncytium is usually either the product of cell fusions or incomplete cell divisions that result in a cell with multiple nuclei.Thus, in CECAO, for each nucleus we define a node and it has a PART OF relationship with the syncytium that contains it.

Sexual dimorphism
C. elegans has two sexes: hermaphrodite and male.

Current progress and future plans
We have been constructing CECAO using the DAG-Edit tool provided by GO (http://sourceforge.net/project/showfiles.php?group id = 36 855).We have imported sets of data from available resources (Anatace, developed by Sylvia Martinelli; and the 'parts list' provided by Leon Avery; personal communications) into DAG-Edit and manually reorganized the nodes by applying rules such as those mentioned above.We currently have an ontology with about 5000 nodes, one-third of which have definitions.We do not yet know precisely how many nodes there will be in the Figure 1.Schematic of a C. elegans lineage indeterminacy that occurs during the development of hermaphrodite somatic gonads (described by [4]), and its representation in the CECAO ontology.(a) Partial and simplified depictions of two alternative developmental lineage patterns (5R and 5L) that occur in a C. elegans hermaphrodite (described in detail by [4]).Ovals in the top and bottom parts represent two possible arrangements of nuclei that may be found in the somatic gonad primordium of a young animal.During larval development, one of the two arrangements, 5R (top) and 5L (bottom), takes place.This process is stochastic.However, once one pattern of nuclear and cell arrangement forms, the developmental process that ensues is fully determined, following either the 5R or 5L lineage pattern (represented by two trees with leaf nodes facing each other).Thus, nuclei that are found in the 5R arrangement will all develop according to the 5R lineage pattern as a group, whereas those in the 5L arrangement will all develop according to the 5L lineage pattern as a group.The 5R and 5L patterns are mutually exclusive in an animal and depend on highly reproducible cell-cell interactions.In this way, a full complement of 37 nuclei (in different cells) in the mature animal is ensured, e.g.Z1.ppp does not divide but becomes the nucleus of the anchor cell (AC nucleus) in 5R, whereas it generates 10 progenitors in the 5L lineage pattern.In contrast, Z4.aaa generates 10 nuclei in 5R, but is destined to become part of the anchor cell in 5L.Dotted arrows connect nuclei with their respective lineages.A dashed line connects the leaf nodes that lead to anchor cell in 5R and 5L lineage pattern, respectively.(b) A directed-acyclic graph view of CECAO showing parts of the ontology relevant to the Z1.ppx/Z4.aaxlineage indeterminacy depicted in (a), from the perspective of the anchor cell nucleus (AC nucleus).Following from leaf nodes up, the graph shows that the node 'AC nucleus' is part of 'anchor cell' (in the 'Cell' branch) and develops from either 'Z4.aaa(5L)' or 'Z1.ppp(5R)' (in the 'Lineage' branch).'Z4.aaa(5L)', in turn, develops from 'Z1.ppx/Z4.aax(5L)'.We use the node 'Z1.ppx/Z4.aax' to represent the indeterminate state, and 'Z1.ppx/Z4.aax(5R)' and 'Z1.ppx/Z4.aax(5L)' to represent an 'equivalence group', from which one path will be chosen for further development.'Z1.ppx/Z4.aax(5R)' and 'Z1.ppx/Z4.aax(5L)' represent states of development corresponding to the 5R and 5L nuclear arrangement shown in (a), respectively.Each of 'Z1.ppx/Z4.aax(5L)' and 'Z1.ppx/Z4.aax(5R)' develops from 'Z1.ppx/Z4.aax',which is a top-level node in the Lineage branch of the ontology.A triangle represents the relationship 'decendent of'; triangle-H, 'decendent of in hermaphrodite only'; D, 'develops from'; P, 'part of'; and I, 'is a' complete ontology.There are currently 3000 cell and 500 cell group terms in WormBase (incorporating data from Anatace).There are also 80 separate lineage trees, with a total of 6000 nodes.Thus, we estimate that CECAO needs to reconcile 15 000 relationships, assuming that each cell, on average, has five edges.
Given the scale of this project, we started by developing an ontology for groups of cells that are of particular immediate use in WormBase.In particular, we have been focusing on supporting the annotation of gene expression and other cell-based experiments, e.g.we have ontologized information about cells in the pharynx, the feeding organ of the worm.From the bottom up, pharyngeal cells are grouped by anatomical location and by cell type, each having multiple layers of complexity (Figure 2).By using the pharynx ontology to annotate gene expression patterns, we will easily be able to support queries such as, 'Which genes are expressed in pharyngeal neurons whose nuclei are in the corpus, but not in the terminal bulb region?'.WormBase already has 2000 gene expression analyses annotated to 1530 cell and cell group terms.These are sufficient samples with which to test our prototype ontologies.
One important function of model organism ontologies should be to allow comparisons with ontologies of other organisms; CECAO is currently lacking comparative anatomy.We are collaborating with Worm Atlas (http://wormatlas.org), a project headed by David Hall, which will provide an online encyclopedia of C. elegans cells and anatomy to construct CECAO with a top-down approach.In addition, we are also joining forces with other model organism databases (MODs) to come up with a set of shared controlled vocabularies.
Cell lineage and other aspects of development implicitly contain temporal information, e.g. because we know when each and every cell divides, we can know how many cells are there at a given time in development.Wen Chen has constructed a C. elegans life stage ontology that relates the total number of cells to defined life stages (personal communication).In the future, we will merge the life stage ontology with CECAO.
Lastly, CECAO will support extension to other nematodes.Many nematodes are known to also have mostly invariant anatomy and cell lineage; however, the lineages differ from those of C. elegans (e.g.[3,8,9,13]).CECAO can be extended to include comparative developmental information, and thus support queries across nematode species.

Figure 2 .
Figure 2. A directed-acyclic graph view showing the multiple relations involving the pharyngeal nucleus MS.paapaaa in the CECAO ontology.MS.paapaaa is represented in three major threads: Lineage, Organ and Cell Type.A triangle represents the relationship 'decendent of'; P, 'part of'; and I, 'is a'