REGIA, An EU Project on Functional Genomics of Transcription Factors From Arabidopsis Thaliana

Transcription factors (TFs) are regulatory proteins that have played a pivotal role in the evolution of eukaryotes and that also have great biotechnological potential. REGIA (REgulatory Gene Initiative in Arabidopsis) is an EU-funded project involving 29 European laboratories with the objective of determining the function of virtually all transcription factors from the model plant, Arabidopsis thaliana. REGIA involves: 1. the definition of TF gene expression patterns in Arabidopsis; 2. the identification of mutations at TF loci; 3. the ectopic expression of TFs (or derivatives) in Arabidopsis and in crop plants; 4. phenotypic analysis of the mutants and mis-expression lines, including both RNA and metabolic profiling; 5. the systematic analysis of interactions between TFs; and 6. the generation of a bioinformatics infrastructure to access and integrate all this information. We expect that this programme will establish the full biotechnological potential of plant TFs, and provide insights into hierarchies, redundancies, and interdependencies, and their evolution. The project involves the preparation of both a TF gene array for expression analysis and a normalised full length open reading frame (ORF) library of TFs in a yeast two hybrid vector; the applications of these resources should extend beyond the scope of this programme.


Introduction
The completion of the sequence of the genome of Arabidopsis [28] represents one of the most significant landmarks in the history of plant biology. The next step, the interpretation of this information in functional terms is a demanding task in terms of resources. One obvious class of proteins to be of top priority are transcription factors.
These regulatory proteins provide the most common mechanism of regulation of co-ordinated gene activity, transcriptional control. Because of their power to control gene expression and consequently complex traits, TFs are believed to have played an important role in the evolution of plants and have been the targets for breeding and domestication [reviewed in 3]. Thus, many of the best characterised QTLs (quantitative trait loci) and agronomically important genes correspond to TFs (for example, [2,15,16,20,21,29,31]). Two examples of the significance of TFs in domestication and breeding are TEOSINTE BRANCHED and the GRAS genes. TEOSINTE BRANCHED1 is a bHLH type TF, responsible, through its regulatory activity, for most of the morphological difference between maize and its wild ancestor, teosinte [2]. The GRAS genes are responsible for the reduced size and increased grain yield of cereals bred for the 'green revolution' and encode mutant giberellin response modulators which are thought to act as TFs [20,21]. Mutations in some genes encoding TFs can phenocopy inter-or intra-specific natural variants [9,1] emphasising their importance in evolution as well.
As a corollary of their seminal role in the determination of plant traits, transcription factors are considered to have enormous biotechnological potential for the manipulation of agronomic traits. In support of this are the established examples in which transcription factors have been used to manipulate plant metabolism (for example, the anthocyanin, phlobaphene and lignin pathways; [14,5,27]), development (for example, flowering time and cell shape; [4,17,19,30]) and responses to stress (for example, cold, salinity, and drought stresses; [6,13]).
TFs are endowed with characteristics that make them particularly suitable for driving evolution and for biotechnological exploitation. One of these characteristics is modularity, whereby domains within the proteins function independently, thus facilitating module exchange between TFs [10] and allowing for the engineering of new TF activities. Another characteristic of TFs is the large degree of functional redundancy in higher eukaryotes, including plants. Transcription factors are, in general, members of large families that often include closely related genes that are also functionally related. Within subfamilies different extents of partial redundancy are to be expected, in which redundant genes may diverge in their expression pattern, generally due to mutations in their cis-regulatory regions [10]. Extreme examples of redundancy and of divergence are the cases of SEP1, SEP2 and SEP3, and of GL1 and WER, respectively [12,18]. In the first case, only the triple mutant sep1, sep2, sep3 showed a phenotypic difference with the wild type (sepaloid flowers). In the second case, mutations at GL1 affect trichome formation, whereas wer mutations affect root hair formation, but their proteins are functionally equivalent. Finally, these two characteristics, modularity and redundancy, together with the fact that transcription factors tend to act downstream in signal transduction pathways limits pleiotropy of mutations in transcription factors, a necessary condition for evolutionary and biotechnological potential [3].

Structure of the REGIA project
Current functional studies are generally based on the identification of mutations at the loci of interest and the evaluation of the phenotypic effects of these mutations. This primary strategy is often complemented by the generation of transgenic plants ectopically expressing the corresponding gene (or a derivative) and their phenotypic characterisation. Our approach rests on similar principles, but has been adapted and extended to cope with and to exploit the characteristics common to many transcription factors, such as low abundance, (partial) redundancy, functional interdependency and modularity. The fact that TFs regulate gene expression, and that expression is particularly amenable to molecular analysis, means that we have also been able to include aspects of target gene identification in our functional analyses. Briefly, the activities (or workpackages) in the REGIA project are the following: $ The analysis of the Arabidopsis genome sequence, the analysis of all genes encoding recognisable TFs and phylogenetic analysis, and the isolation of unique identifier probes to prepare a TF gene array and the analysis of the expression patterns of TF genes. (WP1) $ The identification of mutants of a large number of strategically identified TF genes through reverse genetic screens (WP2) $ The ectopic expression of selected TF genes (or derivatives, including inducible versions) in Arabidopsis and key crop species (WP3) $ Phenotypic analysis, including RNA and metabolic profiling, of plants mutated at TF loci or ectopically expressing TF genes (or their derivatives) to define their biological functions (WP4) $ The systematic analysis of physical (proteinprotein) interactions between TFs (WP5) $ The bioinformatic analysis and management of data produced in the programme (WP6) $ The management and co-ordination of the scientific activities on the programme, their communication to other scientific groups and industry and the protection of IP generated during the programme. (WP7) The relationships between the objectives of the project and the workpackages, as diagrammatically shown in Figure 1, are: $ Information on the function of TFs will come primarily from the phenotypic analysis (WP4) of Arabidopsis plants with mutations at TF loci (WP2) or lines ectopically expressing TFs (or their derivatives; WP3). Expression data on TFs (WP1) will assist phenotypic analysis. The bioinformatic exercises of genome mining and phylogenetic analyses have helped to simplify TF gene families into functionally related subfamilies. Functional characterisation of any one member of such subfamilies, coupled with expression analysis of all subfamily members should permit preliminary functional assignment to the entire subfamily membership. $ Information on regulatory hierarchies will come primarily from the analysis of the effects of mutants/overexpressors of a given TF on the expression of other TFs (WP4). $ Insights into redundancies among TFs will be obtained from the analysis of RNA profiles of TF mutants (WP4). If mutations in any of two structurally related TF genes, or their hyperexpression, influence the expression of a given (target) gene, these TFs are potentially redundant. Hints on redundancy will also come from detection of overlapping expression patterns of closely related TFs (WP1). Confirmation of redundancy will be obtained by preparing and analysing the corresponding double mutants. $ Insights into functional conservation and potential agronomic uses will be obtained from comparison of the effects of ectopic expression Figure 1. Relationships between the activities in the REGIA project. The white boxes and dashed arrows reflect informational outputs of the programme on expression patterns, functions and biotechnological applications. The dark grey boxes reflect the activities towards obtaining this information and the light grey outlined boxes reflect the two resources that will be produced by the programme 104 J. Paz-Ares and The REGIA Consortium of a given TF (or a derivative) in Arabidopsis and in other species (tomato, rapeseed, maize and soybean; WP3&4). $ Information on functional interdependencies will be obtained primarily from the studies of the interactions between TFs, using the yeast twohybrid system. Hints on interdependences (i.e, on indirect as well as direct interactions) will also come from the studies on functional links between TFs based on molecular phenotypes of the respective TF mutants (WP4). Thus, if mutations at two or more TF loci, (or their hyperexpression), influence the expression of the same target gene, these TFs are possibly functionally interdependent. The exhaustive definition of interdependencies following this second criterion is beyond the scope of this proposal, as it would require the detailed functional characterisation of each of the more than 1500 TF genes present in Arabidopsis. ( [22], REGIA, unpublished). Confirmation of interdependencies between two TFs identified by the yeast two-hybrid screen will be obtained through the analysis of transgenic plants hyperexpressing the two TFs.
Obviously, the biotechnological potential of each TF will depend very much on the trait it controls (WP4), but clearly the determination of functional interdependencies forms a foundation for the exploitation of the biotechnological potential of transcription factors. In fact, it is the functional and biotechnological relevance of TF interdependencies that, in our opinion, justifies the huge amount of work involved in their study, and consequently the large size and the European dimension to the Consortium.

Organisation
The Consortium includes one Project Manager and 29 research groups, 27 from academia and 2 from industry. One of the group leaders is the scientific coordinator and another seven group leaders are workpackage coordinators, forming, together with the Project Manager, the Coordination Committee. The activities are organised around several core centres which provide support to all groups on techniques which could not be efficiently implemented at the level of individual laboratories (for example, the preparation of TF cDNA and EST arrays, in situ RNA hybridisation analysis of TF gene expression, high throughput AFLP-based transcript profiling, metabolic profiling of mutant and/or transgenic plants, generation of transgenic crop plants, high throughput two-hybrid based study of TF interactions and bioinformatic analysis (see Alonso-Allende et al., this issue). Some of the activities being undertaken in individual laboratories transcend the group's specific interests in particular types of transcription factor (for instance, they provide probes and full size ORFs cloned in two-hybrid vectors for a subset of TF genes). The group benefits from this additional investment because the tools they are developing provide a better understanding of the function of specific TFs in the context of transcriptional control in plants as a whole. In this way, there will be an efficient use of resources, which will also benefit the individual research interests of the different laboratories, reinforcing their intellectual freedom and creativity.

Progress
The program is halfway through at present and much of the data that has been obtained is still fragmentary, but it already provides indications on its potential to uncover TF gene function and application.
A significant result of our studies has been the thorough bioinformatic definition and phylogenetic analysis of some transcription factor families, which provide a basis for identifying redundancy and for functional assignment when information on function from highly related genes from other species or from the same species is available (see for instance, [7,26]).
One notable aspect of our approach is the exhaustive phenotypic analysis of TF mutants which, in addition to standard phenotypic screenings, includes RNA profiling, using DNA arrays complemented by AFLP-based techniques, and metabolic profiling. These techniques are currently operative in the Consortium. This strategy should allow functional links between TFs to be established, as well as links between 'genes and metabolites', using bioinformatics tools. In addition, such powerful phenotypic analysis should help to solve analytical problems with gene redundancy since we will be able to detect even the minor phenotypic effects which can arise when there is a (partially) redundant counterpart of the mutant TF under study. An example of this is AtMYB4, for which no phenotype was obvious in mutant plants grown under standard conditions. Expression profiling showed it to change most in expression in plants exposed to UV-B light. When mutants were grown under UV-B light they were more tolerant than wild type plants to this stress, establishing a role for this TF in the negative regulation of UV-B protection [11,8]. We have also prepared a TF gene array that will allow the determination of TF expression patterns. This will provide clues to functions (for example, TFs controlling the cell cycle) and assist phenotypic analysis of mutants. In addition, it will allow the detection of overlapping expression patterns among related and potentially redundant TFs, thereby providing a rational basis for the selection of the double mutants to be prepared and analysed.
The fact that TFs are usually functionally interdependent and act in combinations rather than alone [25], has been given special consideration in the context of this programme. The importance of characterising TF functional interdependency is twofold: first, it will help to define regulatory networks, and second, it is a necessary step to permit the full manifestation of TF regulatory (and biotechnological) potential. For instance, the maize C1 and R anthocyanin regulatory genes are known to interact and it has been shown that their coexpression in transgenic Arabidopsis is necessary for anthocyanin production in all tissues [14]. Functional interdependencies involve both direct and indirect interactions. Our approach to study direct interactions depends on a novel iterative screening of interactions among TFs based on the use of the yeast two-hybrid system, and we are generating a normalised full size TF library (800 full length ORFs cloned at present) which will be made available to the scientific community for screening for additional interactions, particularly with non-TF proteins. Additional clues on direct and, especially, on indirect interactions will come from the studies on functional links based on the molecular phenotypes of TF mutants.
Also important in the context of this proposal is the modular organisation of TFs, whereby the selector, DNA-binding domain, and the effector (activation or repression) domain are to a great extent functionally independent, allowing module exchange and/or the addition of other modules (for instance, conferring chemical control, as demonstrated in several instances including the Arabidopsis AP3 gene, controlling petal and stamen formation, CONSTANS gene, controlling flowering time, and the Arabidopsis STM gene, controlling meristem identity; [24,23], Sablowski, personal communication). We have taken advantage of TF modularity to prepare TFs whose activity can be posttranslationally controlled with glucocorticoids, or whose effector domain is replaced by a strong constitutive activation or repression domain. In this way, we expect to circumvent possible problems of redundancy, or those derived from expression of constitutively active transcription factors (such as lethality), as well as of problems associated with factors for which activity depends on an unknown post-translational modification/interaction. Additionally, inducible constructs such as those activated post-translationally by supply of steroids will allow for the identification of direct target genes of particular TFs. Identification of target genes of particular TFs will be possible using the mutant and inducible lines and microarray analysis or cDNA-AFLP. These activities will help characterise function further. Once regulatory frameworks have been defined for Arabidopsis, it is anticipated that we will be able to modify the activity of relevant TFs in crop plants to engineer desirable traits.
In summary, we have already defined many transcription factor families through bioinformatic analysis, the TF gene array has already been prepared and is being used by the different groups for expression analysis under 72 defined conditions/ treatments. More than 150 transcription factor mutants have been isolated, and more than 250 TF-derived constructs have been prepared and introduced into transgenic plants which are currently being analysed. Metabolic profiling techniques have been set up. In addition, more than 800 full length TF ORFs have been cloned in Gateway entry vectors and transferred to two-hybrid delivery vectors. It is expected that by the end of the program these activities will have crystallised to provide many new ideas on plant transcription factor function. It is also important that a very significant degree of integration of the activities of 30 European laboratories has been achieved in a relatively short time. These integrated activities have been undertaken by groups separated by large distances and by many who have not worked in such co-ordinated programmes previously. We expect the full manifestation of the potential of this Consortium to be evident within the next three years.