Yeast Two-Hybrid Systems and Protein Interaction Mapping Projects for Yeast and Worm

The availability of complete genome sequences necessitates the development of standardized functional assays to analyse the tens of thousands of predicted gene products in high-throughput experimental settings. Such approaches are collectively referred to as ‘functional genomics’. One approach to investigate the properties of a proteome of interest is by systematic analysis of protein–protein interactions. So far, the yeast two-hybrid system is the most commonly used method for large-scale, high-throughput identification of potential protein–protein interactions. Here, we discuss several technical features of variants of the two-hybrid systems in light of data recently obtained from different protein interaction mapping projects for the budding yeast Saccharomyces cerevisiae and the nematode Caenorhabditis elegans.


Introduction
Several (near) complete genome sequences of model organisms have recently been released. Annotation of these genomes has led to the prediction of y4000 protein-encoding open reading frames (ORFs) for Escherichia coli, y6000 ORFs for the yeast S. cerevisiae, y13 000 ORFs for Drosophila melanogaster and y19 000 for C. elegans [1,3,7,13]. The human genome sequence is anticipated this year and is expected to lead to the prediction of more than 100 000 ORFs [6,26]. By themselves, such predicted ORF sequences offer little information about the function of their predicted protein products.
Comparative genomics can be used to annotate the function of large numbers of predicted and previously uncharacterized gene products. For example, a comparison between the complete set of predicted ORFs in S. cerevisiae and C. elegans resulted in the de®nition of orthologues for 2497 yeast ORFs and 3653 worm ORFs, respectively (E<10 x10 ) [4]. In addition, half of the predicted¯y proteins show signi®cant homology to mammalian proteins (E<10 x10 ) [22]. However, comparative genomics is not applicable to the many predicted gene products for which no recognizable conserved domains can be detected using current BLAST methods.
In order to accelerate functional annotations for such predicted gene products, several functional genomics projects need to be initiated. Such projects utilize standardized functional assays to analyse large sets of genes/proteins simultaneously. One of the most established functional genomics approaches that has been used so far is expression-pro®ling using DNA chips or microarrays [19,23]. For example, this technique has been used to annotate genes in yeast whose transcription changes during sporulation [5]. Even though expression-pro®ling experiments yield a wealth of information on gene expression, they offer little annotation for the protein complement, or proteome, of an organism. Hence, large-scale analyses of gene products (protein) are required to add a level of complexity to data obtained from gene (DNA) analyses. Ultimately, data obtained from different functional genomics strategies should be integrated to allow the formulation of meaningful hypotheses [28; Walhout and Vidal, Protein interaction maps for model organisms. Trends Biochem Sci, in preparation].

Interaction-detection techniques
Most proteins require physical interactions with other proteins to ful®l their biological role. Therefore, it has been proposed that functional annotations for proteomes can be obtained by systematically identifying potential protein±protein interactions [18]. Several commonly used protein± protein detection techniques have been described, including co-immunoprecipitation, gluthatione-stransferase (GST) pull-down experiments and yeast two-hybrid analyses ( Figure 1). By co-immunoprecipitation ( Figure 1A), endogenously interacting proteins can be puri®ed using speci®c antibodies [14]. Protein±protein interactions observed using this method do not necessarily have to be direct but are deemed relevant interactions in vivo. For example, protein X may indirectly associate with Z through a bridging protein Y ( Figure 1A). Although interaction data obtained from coimmunoprecipitation experiments are likely to be biologically relevant, it is not yet feasible to perform such experiments on a proteome-wide scale, since it remains technically challenging to generate antibodies for each predicted protein. For GST pull-down experiments, proteins are exogenously expressed as GST-fusion proteins (GST-X) and puri®ed on glutathione-agarose (GA) beads ( Figure 1B) [15]. In general, puri®ed GST-X is subsequently incubated with cellular extracts and  Figure 1C). Protein X is fused to the DNA binding domain (DB) of a transcription factor and the potential interaction partners are fused to a transcription activation domain (AD-Y). Upon an interaction between X and Y, a functional transcription factor is reconstituted that can activate the expression of speci®c reporter genes. In both co-immunoprecipitations and GST pull-down assays, multiple proteins can be precipitated that do not necessarily have to interact directly with the bait protein X. Here, an example is shown in which protein Y binds directly to X. Protein Z is precipitated with anti-X antibodies or GST-X via its interaction with protein Y. Theoretically, bridging proteins could also facilitate interactions in the context of the yeast two-hybrid system. However, in this case, the identity of such proteins remains elusive complexes are puri®ed using GA beads. Both biochemical approaches make use of protein separation by gel electrophoresis and determination of associated protein identity by mass spectrometry [20,24]. Recently, the (near) complete yeast proteome was fused to GST and used in a biochemical approach, in which proteins were annotated by their associated enzymatic activities [21]. However, such a set of GST-X fusion protein has not yet been used for large-scale protein interaction mapping.
In the yeast two-hybrid system [8] ( Figure 1C), a protein of interest, X, is fused to the DNA binding domain (DB) of a transcription factor, such as Gal4p. The second hybrid protein, Y, is fused to a transcriptional activation domain (AD). A physical interaction between X and Y results in the reconstitution of a functional transcription factor that can activate expression of reporter genes. Usually, reporter genes that allow growth selection on speci®c media are used. The two-hybrid system is carried out in vivo and only requires the manipulation of DNA. As a consequence, the two-hybrid system is more amenable to automation and can be used to analyse large sets of (predicted) proteins simultaneously. Recently, the yeast two-hybrid system has been used to initiate the generation of protein interaction maps for S. cerevisiae and C. elegans [16,25,29].

False positives and false negatives
One intrinsic caveat of the yeast two-hybrid system is the potential detection of spurious interactions that bear no biological signi®cance. The occurrence of such false positives can be reduced using low expression levels of the two hybrid proteins and the use of multiple reporter genes utilizing different promoters [27]. As a consequence of the arti®cial nature of the two-hybrid system, interactions should be viewed as hypotheses until they are validated in the appropriate biological system. A number of proteins are frequently detected using multiple baits and might behave notoriously as false positives. Although these proteins are likely to interact with other proteins to ful®l their biological role themselves, they should be treated with caution when found in any two-hybrid experiment.
In contrast to the detection of false positives, a number of reported interactions can not be readily detected in the two-hybrid system and are therefore deemed false negatives. False negatives can be caused by different characteristics of the twohybrid system. First, DB-X and/or AD-Y may fail to localize to the yeast nucleus. Second, X and/or Y may be unable to function within the context of a DB or AD fusion. Third, the interaction between X and Y may depend on post-translational modi®cations that are absent in yeast cells. Finally, it has been reported that a number of protein±protein interactions can only be detected in the two-hybrid system when either X or Y is truncated. Recently, we have estimated the percentage of false negatives in our two-hybrid system to be approximately 45% [29]. This suggests that large-scale two-hybrid analyses are useful to obtain partial coverage of protein±protein interactions within a proteome of interest. However, in order to attain (near) complete protein interaction contiguity, alternative large-scale approaches will have to be developed.

Two-hybrid variants
The two-hybrid system was initially developed to test known interactions between two proteins [8] ( Figure 1C). Subsequently, it was applied as a method for the identi®cation of novel potential protein±protein interactions, using cDNA libraries fused to AD (AD±cDNA) (Figure 2A). Often a single full-length DB-X bait protein of interest is used. Potential interaction partners obtained are frequently retrieved as fragments, since AD±cDNA fusion libraries are generated by Reverse Transcription and therefore do not exclusively contain fulllength ORFs. When working with model organisms for which a complete genome sequence is available, a single sequence tagging reaction is suf®cient to identify the potential interactor. Thus we refer to potential interactions as`interaction sequence tags' or`ISTs'. The AD±cDNA approach has the advantage of partially de®ning the region of protein AD-Y required for the interaction. Indeed, in many cases independently derived clones of the same ORF differ in length but share a common region required for the interaction. The detection of multiple overlapping fragments has been proposed as a criterion for the classi®cation of interaction data [12]. This approach has been utilized on a larger scale for 27 C. elegans proteins involved in vulval development and resulted in the identi®cation of 148 worm ISTs (Table 1) [29]. In addition, it has been used for yeast proteins involved in splicing, which resulted in the identi®cation of 170 ISTs and the related Lsm proteins that resulted in 263 ISTs (Table 1) [12].
An alternative two-hybrid`matrix' approach has recently been employed [29]. A matrix experiment utilizes large sets of full-length ORFs, often functionally related, that are each fused to DB and AD. Usually, DB-X and AD-Y are transformed into yeast of opposite mating types (MATa and MATa respectively). Each pair-wise combination of DB-X and AD-Y is generated by mating and examined for two-hybrid interaction phenotypes ( Figure 2B). This approach was initially used to identify protein±protein interactions between Drosophila proteins involved in cell cycle regulation [9]. In`classical' two-hybrid screens, a single protein of interest is fused to DB and used to screen for potential interaction partners in an AD-Y cDNA library. (B) In a two-hybrid matrix experiment, proteins of interest are fused both to DB and to AD and expressed in yeast of opposite mating types. Subsequently, each pair-wise DB-X/AD-Y combination is generated by mating and tested for two-hybrid interaction phenotypes. Usually, the sequence of both X and Y has been determined prior to the two-hybrid experiment, circumventing the need for sequencing analysis. (C) The two-hybrid array approach resembles the matrix approach. However, large or complete sets of AD-Y expressing yeast strains are arrayed onto 96-or 384-well plates and mated with a yeast strain of the opposite mating type that expresses a single DB-X protein. Black squares indicate potential positives in which an interaction results in a selectable phenotype. (D) In the`mass mating' approach, sets of DB-X proteins are pooled and mated with pools of AD-Y expressing yeast cells. ISTs are identi®ed by plating diploids onto selective media, followed by sequencing of both the DB and AD inserts.
Matrix experiments provide the advantage of knowing the identity of each DB-X and AD-Y pair, thus circumventing the need for sequencing. However, this strategy provides no information about the domain of the protein that confers interaction.
In order to perform matrix experiments on a proteome-wide scale, arrays have been generated containing each predicted protein fused to AD (AD-Y) in a 384-well format ( Figure 2C) [25]. ISTs are identi®ed by mating a single DB-X bait with the complete AD-Y array. This approach has been used for 192 DB-X yeast proteins and resulted in 281 interacting pairs (Table 1) [25].
Another large-scale two-hybrid strategy that has been utilized recently is the mating of pools of DB-X and AD-Y fusion proteins and selection of potential interacting pairs on appropriate media ( Figure 2D). In this approach, the identity of both X and Y has to be determined by sequencing. Two independent large-scale experiments using predicted yeast proteins resulted in 183 and 692 ISTs, respectively (Table 1) [16,25]. Data obtained from large-scale two-hybrid experiments are usually disseminated to the community via the Internet (http:// www.pnas.org, http://portal.curagen.com and http:// www.vidal.dfci.harvard.edu).

Making sense of interaction data
In order to`make sense' of the wealth of protein interaction data that has been released, several interaction classi®cation strategies have been proposed. First, protein±protein interactions can be systematically compared between different model organisms. It is believed that conserved interactions, or`interologs' (Figure 3A), have a higher likelihood of being biologically relevant. Several interologs have been described for interactions involving vulval proteins in C. elegans [29]. For example, a number of interactions in the Ras/Mapkinase pathway, previously reported between the mammalian proteins, were also found with the orthologous C. elegans proteins [29].
Second, analysis of protein interaction maps has led to the identi®cation of interaction clusters. An interaction cluster can be viewed as a circular contig of protein interactions ( Figure 3B). For example, X binds to Y, Y binds to Z, Z binds to W and W itself binds to X. Proteins found together within an interaction cluster may be part of a protein complex and therefore are more likely to function in a common process. For example, proteins comprising molecular machines, such as components of the RNA polymerase II holoenzyme [17], are likely to be found in a yeast two-hybrid interaction cluster.
Where are we now?
The different protein interaction mapping projects published previously have generated relatively high numbers of ISTs. However, it is important to estimate the extent of coverage obtained from these different projects. It is widely accepted that most, if not all, proteins require interactions with other polypeptides to mediate their function. But, it has so far remained dif®cult to estimate the average number of interactors per protein in any proteome. The average of the number of ISTs found per bait for the different projects described here is approxi-  (Table 1). Together, these observations indicate that the different two-hybrid strategies are complementary, but do not yield identical results. In addition, it shows that the screening has not been performed to saturation, emphasizing the need for more of these experiments.

Conclusion
The data summarized above demonstrate that independent two-hybrid screens (Figure 2A) yield the largest numbers of ISTs. However, such screens are very time-consuming and elaborate. In contrast, the array ( Figure 2C) and mass mating ( Figure 2D) yeast two-hybrid strategies can be carried out at a higher throughput, but the rate of false negatives seems relatively high. Furthermore, it should be noted that even if all different two-hybrid ISTs merely represent hypotheses that should be validated in the appropriate biological system. Therefore, strategies for classi®cation of ISTs have been developed to prioritize which ISTs are most likely to be biologically relevant.
(A) When protein±protein interactions are conserved throughout evolution, i.e. they are detected in different model organisms, they are most likely to be biologically relevant. We refer to such interactions as`interologs'. Using databases such as WormPD and YPD (http://www.proteome.com), ISTs can be compared between yeast and worms. (B) Interaction clustering provides a classi®cation of ISTs that are more likely to be relevant in vivo. An IST cluster is identi®ed as a contig of interactions. For example, when X binds to Y, which binds to Z, which binds to Y and if W binds to X, the four proteins form an IST cluster approaches were performed until saturation was achieved, not every existing protein±protein interaction would be detected. Therefore, future protein interaction maps should comprise a compilation of data obtained from different large-scale projects, including two-hybrid screens.