Creating Hybrid Genes by Homologous Recombination

Recombination of homologous genes is a powerful mechanism for generating sequence diversity, and can be applied to protein analysis and directed evolution. In vitro recombination methods such as DNA shuffling are very flexible and can give hybrid genes with multiple crossovers; they have been used extensively to evolve proteins with improved and novel properties. In vivo recombination in both E. coli and yeast is greatly enhanced by double-strand breaks; for E. coli, mutant strains are often necessary to obtain high efficiency. Intra- and inter-molecular recombination In vivo have distinct features; both give hybrids with one or two crossovers, and have been used to study structure-function relationships of many proteins. Recently in vivo recombination has been used to generate diversity for directed evolution, creating a large phage display antibody library. Recombination methods will become increasingly useful in light of the explosion in genomic sequence data and potential for engineered proteins.


Introduction
Hybrids created by recombination of homologous genes benefit from the accumulated experience of evolution. Although homologs have diverged in sequence and often in function, their differences must be compatible with the same protein fold. Thus hybrid genes allow a large jump in "sequence space" that is much more likely to give a correctly folded protein than a similar jump due to random mutations. On a less radical scale, once individual random mutations have been vetted by selection, recombination of these is more likely to give functional proteins than new random mutations. Evolution can proceed faster with than without recombina-tion (here meaning homologous recombination, unless qualified).
These principles apply equally to genetic change in nature or directed by man. Recombination is pervasive throughout nature and for sexual organisms is obligatory at each generation. However, creation of hybrid genes is less common than recombination between alleles; in the vast majority of cases this is likely be deleterious [1], even though functional proteins may result. Examples from human genetics include recombination between δ and β globin genes to give hemoglobin Lepore [2], and recombination between red and green visual pigment genes resulting in color-blindness [3].
When rapid change is at a premium, as in the competition between host and parasite, recombination can be adapted to produce hybrid genes at very high rates. Chickens (and also rabbits, pigs, and cows) generate antibody diversity by gene conversion, a recombination process that copies bits of sequence from multiple pseudogenes into the single active antibody locus to create many different antibody genes [4]. A strikingly similar process is used by the bacterium Borrelia burgdorferi during Lyme disease infection: mosaic versions of the antigenic surface lipoprotein vlsE are generated by gene conversion of the active vlsE locus with an array of silent vls cassettes [5,6]; switching between antigenic variants allows evasion of the host immune response. Recombination is also used for antigenic variation by other pathogens including Borrelia helmsii, Neisseria, and trypanosomes [7].
Just as nature finds recombination a useful tool, so does the molecular biologist. This review is about creating hybrid genes using in vitro recombination and in vivo recombination in Escherichia coli and the yeast Saccharomyces cerevisiae, those organisms in which recombination is best understood. There are two main reasons for creating hybrid genes. The first is analysis: hybrids are useful for studying protein function because they access genetic differences yet are likely to maintain three-dimensional structure. The second is directed Disease Markers 16 (2000) 3-13 ISSN 0278-0240 / $8.00  2000, IOS Press. All rights reserved evolution: recombination is an efficient way to create genetic variants that can be selected for improved or novel functions. Different recombination methods have advantages and disadvantages for these two types of applications.

In vitro recombination
Recombination in vitro was noted early in the history of PCR and was attributed to incomplete extension and re-annealing [8]. The percentage of PCR recombinants could be increased by shortening the extension time per cycle, and it was suggested that this might be useful for producing populations of hybrid DNAs [9]. DNA with breaks and other damage is particularly prone to PCR-induced recombination [10]. Clearly, the ability to undergo recombination is a direct consequence of the complementary structure of DNA.
Stemmer demonstrated the usefulness of in vitro recombination for directed evolution with his technique of "DNA shuffling" [11,12], also dubbed "sexual PCR" [13]. A gene is cut into small fragments with DNase I, and then re-assembled in a PCR-like process that also tends to introduce point mutations ( Fig. 1(a)). The resulting products are subjected to selection or screening, and the survivors undergo another round of DNA shuffling. The first round is no different from error-prone PCR, but in subsequent rounds the evolutionary speed-up expected from recombination comes into play. A further important advance was to begin the shuffling process with a set of homologous genes ("family shuffling") rather than a single gene, tapping into sequence diversity derived from natural evolution [14].
Related methods not involving DNA degradation have been described. Random-priming in vitro recombination (RPR) creates the short fragments needed for re-assembly by primer extension of random oligonucleotides ( Fig. 1(b)) [15]. Staggered extension process (StEP) comprises cycles of denaturation and polymerase extension where the amount of extension per cycle is deliberately kept short by short times and lower temperature [16,17]. The gradually extending chains have an opportunity to switch templates with each cycle (Fig. 1(c)). Certain features are common to all these in vitro methods. Mutations are likely due to the extensive use of in vitro polymerization. The usual protocols are intended to produce many crossovers; single crossovers can be favored by altering parameters (such as fragment size, cycling), but this will also increase the percentage of non-recombinants. Similarly, as one recombines increasingly divergent genes, re-assembly of identical sequences will become strongly favored and the percentage of non-recombinants will increase.
In vitro recombination has been applied to the evolution of a wide range of proteins [18,19]. The majority of studies have shuffled only point mutations; the potential of family shuffling is just beginning to be explored. Shuffling of four cephalosporinases gave greatly increased activity against moxalactam [14]; shuffling of two biphenyl dioxygenases gave novel substrate ranges [20]; shuffling of two herpes thymidine kinases gave increased activity on zidovudine (AZT) [21]; shuffling of >20 human interferons gave better activity towards mouse cells [22]; and shuffling of 26 different subtilisin proteases gave combinations of improved properties: thermostability, solvent tolerance, and activity at acid or basic pH [23].
In vitro recombination has clearly been extremely useful for directed evolution. It can generate hybrids with many crossovers and it is very flexible. It should be very compatible with completely in vitro selection technologies such as ribosome display [24], RNA display [25], and man-made compartments [26]. On the other hand, for analysis of protein function the high mosaicism and mutations generated by in vitro methods complicate interpretation of results and increase the number of clones that need to be tested. They will probably most useful when a selection or rapid screen exists for the function of interest.

Mismatch repair
An interesting partially in vivo approach relies not on recombination machinery but mismatch repair. DNAs are constructed that are heteroduplex for the genes of interest; after transformation, mismatches are corrected in vivo (Fig. 2). If either there is no repair or if repair is concerted, only parental genes will be recovered. If only some mismatches are repaired or if not all repairs use the same strand as template ("independent repair"), then hybrid genes will result. The predominant mismatch repair pathway in E. coli does work by a concerted mechanism: one strand is targeted for replacement synthesis, and these replacement tracts can be several kilobases long [27]. Despite this, hybrid genes have been observed after transformation of multiply mismatched heteroduplexes, usually at low frequencies [28][29][30][31]. Heteroduplexes have typically been made by denaturing and re-annealing two plasmids cut For all methods, a smear is usually generated; a discrete-sized product is obtained by further cycling under conventional PCR conditions with specific primers. a) DNA shuffling: fragmentation is usually done with DNase I, and the desired size range obtained by preparative gel electrophoresis. Thermal cycling is similar to standard PCR. b) Random-primed recombination (RPR): random hexamer primers are extended with Klenow polymerase; the size of the extension products is inversely related to the hexamer concentration used. Template and primers are removed by ultrafiltration. c) StEP: Specific primers are used, and polymerization is kept short by the cycle conditions: for example, 30 seconds at 94 • C then 5 seconds at 55 • C, repeated 80 times. at different sites; heteroduplexes re-form circles having nicks or single-strand gaps. Volkov et al. have tested several heteroduplex methods for producing hybrid genes [32]. Their most effective method was to anneal a gene heteroduplex from two asymmetric PCRs and ligate this into a cut vector; 30% of transformants had the desired crossover between two mutations 463 bp apart.

In vivo recombination: Background
Some background about recombination in E. coli and yeast is helpful in understanding its application to making hybrid genes. The primary reason recombination evolved was probably not to generate sequence diversity, but to repair DNA damage. Recombination is usually greatly enhanced by the presence of DNA damage, Fig. 2. Mismatch repair. Different outcomes of a heteroduplex. A molecule is generated in vitro that has mismatches (pictured as spikes) only in the genes of interest (thick lines), while the plasmid backbone (thin lines) is perfectly matched. Depending on how the molecule is assembled, there may also be nicks or single-strand gaps, which are not shown. The pictured products are those occurring after any mismatch repair and one round of replication. especially double-strand breaks and crosslinks. The initiation and outcome of recombination depend greatly on the precise structure of the DNA substrates -the presence of gaps or breaks, the lengths (chromosome vs plasmid), positioning of the homologous segments; different substrates may have different genetic requirements for effective recombination. Recent models have emphasized the inter-relationships between DNA recombination and replication ( [33] and rest of that issue.)

Recombination in E. coli
Genetic analysis has defined three pathways of recombination in E. coli: the RecBCD, RecE, and RecF pathways. The pathway distinctions are useful but simplistic; in reality recombination is accomplished by a network of genes, many shared between the "pathways". The multifunctional RecA protein is critical for most recombination reactions; it coats single-stranded DNA and promotes its invasion into homologous duplex DNA and subsequent strand exchange. For reviews on recombination in E. coli see [34,35].
The RecBCD pathway is the major pathway for chromosomal recombination in wildtype cells. However, RecBCD is not normally involved in recombination of small plasmids. RecBCD is a potent helicase and nuclease that facilitates RecA loading onto the single strands it creates. Interestingly, when only its nuclease activity is inactivated (as in recD strains), intra-plasmid recombination is increased [36,37]. The other two recombination pathways were revealed by suppressor mutations that restored chromosomal recombination to recB recC strains. Both alternate pathways are much more proficient at intra-molecular plasmid recombination than wildtype, especially if a double-strand break is present [38].
One suppressor genotype (recBC sbcB) activates the RecF pathway for chromosomal recombination. Even in wildtype cells, the RecF pathway mediates plasmid recombination as this is inhibited by recF, recO, and recR mutations. These genes also are important in wildtype cells for recombinational repair of DNA lesions encountered during replication [35]. The other suppressor genotype discovered was recBC sbcA, activating the RecE pathway. The sbcA mutation turns on expression of recE and recT, a 5 →3 exonuclease and homologous pairing protein, respectively [39]. These genes are functionally similar to the phage lambda recombination genes Redα and Redβ (also known as exo and bet), while the Redγ (or gam) gene inactivates RecBCD -thus recombination in a recBC sbcA cell is completely analogous to that in a cell infected by wildtype phage lambda; in fact, recE and recT are remnants of a lambda-like phage named Rac that integrated into the E. coli genome long ago.

Recombination in yeast
Unlike in E. coli where mutant strains have advantages for recombination, in yeast there is little need for special genotypes. Wildtype yeast are highly proficient at recombination, particularly at double-strand breaks. Double-strand breaks are important for initiating recombination during meiosis, and for switching of mating-type which occurs by gene conversion. Over 10 genes have been implicated in a recombination epistasis group that has RAD52 as its central member: all types of recombination appear to involve RAD52, while mutants in other members show only partial recombination defects. The group also includes RAD51, a homolog of bacterial recA that also catalyzes homologous strand invasion and exchange. These and other RAD52 group genes have counterparts in mammalian cells, so basic mechanisms are likely to be conserved from yeast to man. For a review on yeast recombination see [40].

In vivo recombination: Applications
An important distinction for recombination methods is whether the substrates lie on the same or different DNA molecules (intra-vs inter-molecular recombination). Intra-molecular recombination is more efficient since homology search is aided by the substrates being physically linked; however it usually involves a specially constructed DNA. Inter-molecular recombination is much more flexible but generally less efficient.

Intra-molecular plasmid recombination in E. coli
Weber and Weissman demonstrated the importance of double-strand breaks for efficient recombination [41]. They first created two plasmids with human inteferon-α1 and α2 genes (∼80% nucleotide identity) having different resistance markers. Fragments from each, sharing homology only within the interferon genes, were ligated into linear concatamers and transformed ( Fig. 3(a)). Circularization occurred via intra-molecular recombination in vivo, giving rise to doubly-resistant colonies. DNA sequencing identified 11 different single-crossover products.
Subsequent workers have chosen to first construct a plasmid containing both homologs in head-to-tail orientation, sometimes with intervening markers. Note Fig. 3. In vivo recombination: Intra-molecular. a) The original scheme of Weber and Weissman. Vectors with different antibiotic resistance markers are digested. The fragments are purified and ligated (the free ends have incompatible overhangs so a circle cannot be generated); this is transformed and re-circularization occurs by recombination. Inter-molecular recombination is also possible but less likely. In most cases an intermediate plasmid is constructed containing both genes of interest: b) recombination of circles is infrequent, so a counter-selectable marker is required ("X"). c) If a double-strand break is made between the genes, recombination can re-create a circle; most surviving plasmids will be hybrids. that one needs to construct two plasmids (with the order of the genes reversed) to get hybrids of both polarities. This plasmid is then subjected to in vivo recombination, either as a circle or as linear DNA. If circular DNAs are used ( Fig. 3(b)), it is important to have an intervening counter-selectable marker such as galK [42] or a unique restriction site [43] since the recombination frequency is usually low. More commonly, the plasmid is cut between the two homologs and transformed as a linear fragment (Fig. 3(c)) [44,45]; this provides doublestrand ends that stimulate recombination and also selects for re-circularization since linear fragments cannot replicate. Abastado et al. studied various factors in this reaction [46]. Products were the result of single crossovers, which were much more frequent near the homology ends but distributed throughout the homolo-gous region including more than 2.5 kb away from the break. Heterologous sequence at one end had no deleterious effect, but heterology on both ends completely blocked recombination. However, others observed recombination even with heterology on both ends [44,45]. Surprisingly, recA is not required or is less important when a double-strand break is present [46][47][48][49]. One group found it advantageous to use a recBC sbcA strain for creating hybrid genes by this approach [50].

Intra-and inter-molecular plasmid recombination in yeast
Pompon and Nicolas reported using both intra-and inter-molecular recombination in yeast to create hybrid cytochrome P450 genes [58]. Intra-molecular recombination ( Fig. 3(c)) was more efficient than intermolecular recombination (discussed below) [59]. As in E. coli, the majority of transformants were the result of single crossovers, which occurred most frequently near a double-strand end. Whereas rad52 mutation completely eliminates inter-molecular recombination, it has much less effect on intra-molecular recombination [59]. However, intra-molecular recombination in yeast has been little used for engineering purposes, presumably since inter-molecular recombination is reasonably efficient and is much more flexible.
Linear DNA fragments will readily undergo intermolecular recombination with another homologous DNA in yeast [60]. Recombination of a fragment with homologous ends and a chromosomal locus (Fig. 4(a)) makes it simple to knock-out genes in yeast. Recombination between a fragment and a cut plasmid (Fig. 4(b)) is useful for plasmid construction [61,62]; recently it was used to clone the entire complement of yeast coding sequences for two-hybrid and biochemical analyses [63,64]. A similar reaction can be used to map mutations in cloned genes [65,66]. . In vivo recombination: Inter-molecular. With two or more recombining DNAs, there are many possibilities. Genes are pictured as rectangles of various shades; identical vector sequences are thick lines. a) Recombination of a linear fragment into an intact DNA is an infrequent event, but can be selected for using a marker and can be designed to inactivate the target locus. The target is shown as a circular plasmid, but can also be a linear yeast chromosome. b) Plasmid construction by recombination (in vivo cloning). A gene flanked by sequences homologous to the vector (which may be appended by PCR) is co-transformed with linearized vector; recombination results in a circle. c) Three-part hybrids: a gene fragment recombines with a plasmid-borne homolog that is cut at an internal site. d) Two-part hybrids: a gene fragment together with vector sequence recombines with a homologous gapped plasmid. e) Multi-parent hybrids: multiple co-transformed mutant genes or homologs may first recombine with each other and then with the linearized vector. las used it to make hybrid genes [58]. The gene on one plasmid was cut at an internal restriction site and co-transformed with a homologous gene on another fragment, resulting in three-part hybrids (Fig. 4(c)). Two-part hybrids can be made by using a fragment with vector sequence at one end ( Fig. 4(d)) [67]. Recombination between different cytochrome P450 genes (71% nucleotide identity) was 13-18-fold lower than between identical genes. Recombination between the yeast ARG4 gene and its human homolog (52% nucleotide identity) produced very few and mostly aberrant plasmids, although two in-frame, functional recombinants were identified [68]. Crossovers are most frequent near the double-strand breaks; treatment of ends with BAL-31 exonuclease results in a more even distribution of crossovers. More complex hybrids (fivepart rather than three-part) can also occur [69]. Yeast recombination was recently used to shuffle peroxidase mutants for improved stability: when multiple donor fragments were co-transformed, these could recombine with each other in addition to the cut plasmid (Fig. 4(e)) [70]. There have been fewer examples of hybrid recombination in yeast than in E. coli; it has mainly been used when gene expression will also be in yeast.

Inter-molecular recombination in E. coli
Applications of inter-molecular recombination in E. coli have so far mainly been for plasmid construction. Recombination between two circular plasmids occurs at low frequencies (∼10 4 in wildtype bacteria [71,72]), but hybrid genes can be identified if one has a suitable selection [73]. Analogous to the yeast reaction above (Fig. 4(b)), insert and vector fragments whose ends shared 30 bp identity could recombine in vivo to create a circular plasmid [74]; even three overlapping fragments can be recombined [75]. Several strains have been used, including recA strains, but the reaction was more efficient in recBC sbcA [76] or recBC sbcB strains [77,78]. A linear fragment with short end identities can also recombine into an intact DNA (Fig. 4(a)), but only in recBC sbcA strains and at a low frequency; the linear fragment must confer an antibiotic resistance so that recombinants can be selected [79]. This allows targeted mutagenesis of large DNAs such as bacterial artificial chromosomes and the E. coli genome. Other strain backgrounds can be used if the recombination genes are provided on a plasmid [79,80]. Lambda Red genes and recD strains have also been used for recombination of linear DNAs with the genome [81,82].
Inter-molecular recombination in E. coli can be used for creating hybrid genes as in Fig. 4(c) and (d) [83]. We observed recombination only in recBC sbcA strains, and not in wildtype, recD, or recBC sbcB strains. The efficiency of recombination was high between identical sequences or ones with two small regions of mismatch, but dropped significantly for sequences sharing 82% nucleotide identity (human and mouse p53 DNA binding domains). On the other hand, correct hybrids were still obtained between sequences sharing only 56% identity (human p53 and p73). We generated a series of hybrids between mouse and chicken p53 that allowed us to map the epitopes of the two antibodies that specifically recognize the wildtype conformation of p53 and not cancer-derived p53 mutants. Because the epitopes require a properly folded p53 domain, they cannot be analyzed using deletion mutants or peptidescanning. No special plasmid constructs are needed, and both two-part and three-part hybrids can be directly created. In addition, recombination can easily be iterated to make hybrids of hybrids for more precise analysis.

Phage delivery of genes for in vivo recombination in E. coli
Though number of recombinants is not usually a limiting factor when using recombination for analysis, the need for DNA transformation imposes a practical limit that is significant for directed evolution experiments -the more variants sampled, the better the chance of finding a variant with the desired characteristics. Numerical limitations can be partly overcome by using a more efficient way of getting DNA into cells, namely by phage infection. Lambda phage have long been used to study recombination, and can be used for creating hybrid genes [84,85]. Filamentous phage (Ff) infection is particularly suitable because any plasmid with an Ff origin (a "phagemid") can be packaged as infectious particles and because Ff phage display is a powerful selection method. Griffiths et al. created a very large antibody library by combining heavy-and light-chain sub-libraries using Ff infection and sitespecific recombination [86]; recently a simpler format has been developed [87]. Site-specific recombination allows crossover at a fixed point, whereas homologous recombination allows crossover at many points and so can generate a wider range of diversity.
We have created a library by homologous recombination in vivo (PLW and G. Winter; manuscript in preparation). In order to provide the conditions needed for efficient inter-molecular recombination, we have made a recBC sbcA strain that expresses the EcoRI restriction system, allowing the creation of double-strand breaks in vivo. We have also designed two phagemid vectors for cloning antibody single-chain Fv (scFv) genes (Fig. 5). These are packaged as viral particles and co-infected into the recombination strain. Upon entry they are cut by EcoRI, releasing fragments that can recombine to generate a phage display vector. One crossover occurs within the antibody genes, the other within ampicillin resistance (Amp R ) gene fragments, resulting in a hybrid antibody gene and a functional Amp R gene. Sequences of progeny from an in vivo recombination between two clones included a variety of hybrids, but DNAs are packaged as filamentous phage particles and efficiently delivered into bacteria by infection. Double-strand breaks are generated in vivo by action of a restriction endonuclease, EcoRI, expressed in the bacteria (irrelevant fragments that are also generated are not shown.) Recombination of the homologous ends gives a hybrid antibody (Ab-3) and a complete resistance gene (Amp), and re-creates a plasmid circle. crossovers were most common near the double-strand ends.
Since chicken antibodies are natural products of homologous recombination, we made two small sublibraries of chicken scFv (<10 7 clones) and then recombined them in vivo by co-infection into 1 liter of bacteria. This resulted in 8×10 10 ampicillin-resistant recombinants, which were directly used to produce antibody-expressing phage. This library was selected for antibodies to several antigens including groEL, p53, Werner's syndrome protein, fluorescein, and Cy5. The antibodies work well in ELISAs and in western blots. Nucleotide polymorphisms allowed us to definitively identify several of the antibody genes as hybrid products of recombination. This completely in vivo recombination system should be generally useful for directed evolution of antibodies and other genes for which selection methods are available.

Conclusion
There are several ways to make hybrid genes, and the best one will vary depending on the specific application. All methods will perform less well as the sequences to recombine become more divergent. Natural gene families often have alternating conserved and variable regions, so despite low overall similarity they may recombine readily in the conserved regions. Intra-molecular recombination may be more effective than inter-molecular recombination for generating hybrids between highly diverged homologs. But certainly the lower reaches of similarity identifiable by sequence comparison (20-25% amino acid identity) or three-dimensional structure comparison will not be amenable to homologous recombination methods; perhaps nonhomologous methods may be useful for these cases [88].
In vivo recombination can be done in either yeast or E. coli. Recombination in yeast is highly efficient, but is most practical when the hybrid genes will also be expressed in yeast. Recombination in mutant strains of E. coli seems to be as powerful as recombination in yeast and the hybrid genes created can either be analyzed directly in E. coli for which many techniques are available, or can be transfected into the species of interest. Both intra-and inter-molecular recombination can be used, with the latter being more flexible.
For analysis of gene function, both in vitro and in vivo recombination methods may be useful. In vivo methods give simple hybrids and do not introduce mutations; this means that an ordered strategy can be used, and fewer clones need to be tested and sequenced. If a functional selection or rapid screen exists, then in vitro recombination becomes more practical. And often analysis of function may shade into creating new function, that is, directed evolution.
For directed evolution, in vitro recombination methods like DNA shuffling have the major advantage that they can create hybrids with many crossovers. It is important to be able to bring together several beneficial mutations that may be anywhere in the sequence; if one is limited to one or two crossovers at a time this requires more than one round of recombination. In vitro recombination should be the method of choice for totally in vitro selection methods. On the other hand, for selections or screens requiring an in vivo step such as phage display, two-hybrid, or gene complementation, there may be practical advantages for in vivo recombination methods; for example, we have used in vivo recombination to create a large antibody phage display library.
Recombinational cloning and mutagenesis are proving to be important tools for genome-scale studies of function ("functional genomics"). The explosion of new homologs uncovered by genome sequencing projects should also greatly benefit recombinational approaches to gene analysis and directed evolution. They may even tell us something about natural evolution by re-creating possible ancestors of current-day molecules [89].