Generation of RNAi Libraries for High-Throughput Screens

The completion of the genome sequencing for several organisms has created a great demand for genomic tools that can systematically analyze the growing wealth of data. In contrast to the classical reverse genetics approach of creating specific knockout cell lines or animals that is time-consuming and expensive, RNA-mediated interference (RNAi) has emerged as a fast, simple, and cost-effective technique for gene knockdown in large scale. Since its discovery as a gene silencing response to double-stranded RNA (dsRNA) with homology to endogenous genes in Caenorhabditis elegans (C elegans), RNAi technology has been adapted to various high-throughput screens (HTS) for genome-wide loss-of-function (LOF) analysis. Biochemical insights into the endogenous mechanism of RNAi have led to advances in RNAi methodology including RNAi molecule synthesis, delivery, and sequence design. In this article, we will briefly review these various RNAi library designs and discuss the benefits and drawbacks of each library strategy.


INTRODUCTION
RNA-mediated interference (RNAi) provides direct causal links between specific genes and observed loss-of-function (LOF) phenotypes. RNAi is an evolutionarily conserved phenomenon in which gene expression is suppressed by the introduction of homologous double-stranded RNAs (dsR-NAs). After dsRNA molecules are delivered to the cytoplasm of a cell, they are cleaved by the RNase III-like enzyme, Dicer, to 21-to 23-nt small interfering RNAs (siRNAs) [1]. These siRNA duplexes are loaded into Argonaute2 (Ago2), the catalytic component of the RNA-induced silence complex (RISC) [2]. Ago2 cleaves the passenger strand of the siRNA duplex and the antisense strand remains bound to Ago2. The antisense strand in the now mature RISC serves as a guide for sequence directed destruction of homologous mRNA, resulting in silencing of the target gene [3]. In lower organisms such as C elegans and Drosophila, RNAi is typically induced by the introduction of a long dsRNA (up to 1-2 kb) produced by in vitro transcription. Although the core RNAi mechanism appears to be conserved among diverse organisms, this simple approach cannot be used in mammalian cells, where introduction of long dsRNA (> 30 nt) elicits a strong antiviral response that obscures any genespecific silencing effect [4,5]. Much of this response is caused by activation of the dsRNA-dependent protein kinase PKR, which phosphorylates and inactivates the trans-lation initiation factor eIF2a [6,7]. It was not until the discovery that 21-nt siRNAs could effectively trigger the RNAi silencing response without activating the antiviral response that RNAi technology could be developed for mammalian systems [8].
Originally limited to lower organisms, RNAi technology has advanced to accommodate a variety of organisms to include mammals with methodologies that are readily adapted to high-throughput screens (HTS) [9,10]. The present availability of commercial RNAi libraries in addition to advancements in RNAi delivery methods has provided the opportunity for genome-wide screens evaluating any biological pathway. It is crucial that when deciding on the use of RNAi technology for the purpose of a genome-wide screen that one carefully evaluates the characteristics of the selected RNAi library so that screens can be efficiently performed with excellent gene coverage and highly reproducible data. One must ensure that the RNAi library selected has been designed to maximize the efficiency of gene silencing and that the method of RNAi molecule delivery is well suited for both the type of RNAi molecule as well as the system of interest. The choice of screening an arrayed library or as pools is also another option that should be carefully considered [11]. In this article, we will review the current RNAi methodologies based on the present understanding of the RNAi biochemical process and briefly discuss developing features in library design.

Chemical synthesis
Initial RNAi libraries were directed solely to invertebrate organism genomes and comprised of long dsRNA fragments up to 1-2 kb in size, which were generated through in vitro transcription. These long dsRNAs were found to be both highly specific and potent inducers of gene silencing in lower organisms, but the antiviral response in higher mammalian systems requires a different approach [8]. Since the realization that siRNAs could avoid the antiviral response while still effectively triggering a LOF phenotype, many groups began chemically synthesizing siRNAs. Chemically synthesized RNAi molecules take the form of small duplex RNA molecules. The sense and antisense strands are synthesized separately, annealed, and then delivered to cells by such means as transfection reagents, electroporation, or microinjection. Improved understanding of the RNAi mechanism has resulted in different RNAi molecule designs that enter the RNAi silencing pathway at different enzymatic points. Synthetic siRNA molecules can be designed to interact either with Dicer or RISC upon cellular entry (Figure 1(a)). Initial siRNAs were designed to resemble Dicer products 21-23 nt in size. Dicer product mimics, once transferred into the system of interest, load to RISC directly and guide the degradation of homologous mRNA immediately. Kim et al recently demonstrated that 25-30 nt in length RNA duplexes can more effectively induce gene silencing with up to 100fold greater potency than the analogous 21-mer siRNA by first undergoing Dicer cleavage [12]. Kim et al also noted that some 27-mer duplexes were shown to effectively silence target regions refractory to the conventional 21-mer siRNA. Chemically synthesized siRNAs are more widely used in HTS for the reason of well-characterized reagents, immediate knockdown of the target mRNA, and high transfection efficiencies compared to that of plasmid-based ones.

Algorithm-based design
Initial success in knockdown with small siRNAs has since improved due to greater understanding of the silencing mechanism of Dicer and RISC. The most crucial aspect of an RNAi library directed at mammalian systems is the choice of the sequences used to target each gene due to the base pairing specificity required for precise siRNA targeting and the differential silencing potencies of individual siRNAs corresponding to distinct regions of the same mRNA [13]. Ideally, the RNAi molecule must effectively knock down gene expression while avoiding off target effects which can be either sequence-independent or sequence-specific [14]. As mentioned above, siRNAs can trigger the mammalian antiviral response inducing translation inhibition or cell death in a sequence-independent manner [6,7,15]. Additionally, sequence similarity to an off-target transcript can result in inadvertent degradation [14] or translation inhibition [16]. Often concentration-dependent, off-target effects can be minimized or avoided with minimal siRNA treatment and the use of unique siRNA sequences, illustrating the need for effective siRNA sequence design.
Many commercially available RNAi libraries are designed with siRNA algorithms. The design algorithms for determining siRNA sequences for mammalian genes are comprised of a number of parameters based on RNAi biochemical knowledge and empirical data for maximal silencing efficacy [17]. Some of the most common specifications [8,[18][19][20] include specific base compositions along the core siRNA duplex, differential base-pairing thermodynamics between the 5' sense and 5' antisense strands [20] ensuring appropriate loading of the antisense strand into RISC, A-form helix formation between siRNA and target mRNA, no internal repeats or palindromes, 30-50% GC content, and an absence of close homology to off-target gene sequences. Sequences designed by algorithms based on available genome sequence data potentially target all predicted genes and therefore would have in theory the greatest genome coverage.
The primary deficiency of the algorithm-based siRNA design is our limited understanding of the RNAi mechanism. Ideally but not practically so far, the efficacy in silencing endogenously expressed genes by algorithm-designed siR-NAs in a library would be validated experimentally in cultured cells under strictly standardized conditions. In addition, the genome-wide RNAi analysis is further restricted to gene mining technology. Although gene prediction has advanced greatly and provides a good representation of the majority of genes in the genome, not all gene coding sequences [21] are identified nor are all possible splice variants predicted. Reboul et al showed that nine percent of genes identified from isolated cDNAs were not predicted by computational analysis of genome sequences [22]. The mature mRNA is the target molecule in RNAi and misprediction of gene boundaries will reduce the knockdown potential of rationally designed siRNA molecules.
To deflect such problems as misprediction and variable silencing capabilities, many libraries incorporate a degree of redundancy, using multiple designed siRNA sequences directed at a gene, to increase the likelihood of silencing the target gene. Although redundancy would have major implications in terms of various costs (eg, siRNA synthesis, screening more samples), having multiple siRNA molecules for a particular gene can be advantageous in the screen and validation phase, confirming the observed phenotype is the result of silencing of the target gene and not due to an off-target effect [23]. The availability of multiple siRNA oligos for each gene also provides the opportunity to screen as pools of oligos targeting the same gene. In certain scenarios, screening with such pools may increase the chance of knocking down the target gene expression effectively and decrease the likelihood of off-target effects (due to using lower concentration of each individual siRNA). Other concerns for synthetic siRNA libraries are its cost, stability, and nonamplifiable nature, which make the generation of siRNA libraries via chemical synthesis not financially practical in individual laboratories.

siRNAs from mRNA source
An alternative to algorithm designed synthetic oligos is the use of pools of siRNAs randomly generated with enzyme-mediated cleavage of mRNA [13,24]. The generation of siRNA cocktails from dsRNA can be accomplished with recombinant Dicer [25] or Escherichia coli (E coli) RNase III [26] (Figure 1(b)). Dicer is the enzyme involved in cleaving long dsRNAs into 21-23 bp siRNAs in the endogenous RNAi pathway [10]. E coli RNase III can also be used to cleave dsRNA into effective siRNAs that are able to directly engage RISC. The use of E coli RNase III to generate siRNAs may be preferred due to inefficient in vitro cleavage by Dicer [27]. Either enzyme will process dsRNAs into a pool of siRNAs targeting multiple sites on the mRNA of interest. Calegari et al were able to knock down galactosidase expression in the developing CNS system of day 10 mouse embryos with a complex pool of siRNAs prepared from endoribonuclease digestion (esiRNA) with RNase III [24]. Yang et al were able to knock down endogenous c-myc protein levels in 293 cells by 70% with esiRNA, as well as Cdk2 expression in a dosagedependent manner [27]. The gene silencing effect elicited by Dicer and RNase III generated pools of siRNA are comparable to well-designed individual siRNAs, but sequence rational design is not required.

RNAi expression systems
Model systems such as the C elegans and Drosophila are well adapted to chemically synthesized or mRNA-cleavagederived siRNAs due to the presence of an endogenous amplification of the RNAi signal [1]. In lower organisms, siR-NAs prime dsRNA synthesis via RNA-dependent RNA polymerase (RdRP) where the target mRNA functions as a template [28] allowing the generation of new dsRNAs. The C elegans model system is especially well suited for siRNA silencing not only due to endogenous amplification mechanism but also because of the phenomenon of systemic RNAi, where gene silencing can be observed in areas of the body distant from the site of the initial dsRNA delivery [29]. Systemic RNAi is due to a multispan transmembrane protein known as SID-1, which enables intercellular transport of dsRNA. This feature is not available in all lower invertebrate organisms and does not exist in Drosophila which has only cellautonomous RNAi.
Mammalian systems possess neither endogenous amplification nor the phenomenon of systemic RNAi, therefore the effects of chemically synthesized RNAi molecules are limited to transient knockdown of the target gene as a consequence of cell division and/or degradation of the siRNA molecule. Most HTS experiments require only transient knockdown to sufficiently produce an observable phenotype. Transient knockdown is insufficient for groups concerned with biological processes requiring long-term gene silencing or for protocols that require some sort of selection. To address the issue of transient knockdown, many groups have elected to use intracellular expression of siRNA or short hairpin RNA (shRNA) molecules from plasmid DNA driven by either small nuclear RNA (snRNA) U6 or the human RNase P RNA H1 promoters [30] (Figure 2). U6 and H1 are RNA polymerase III (Pol III) promoters ideally suited for si/shRNA generation. Since almost all their regulation  elements are located upstream of the transcribed region, most insert sequences shorter than 400 nucleotides can be transcribed. The U6 promoter and the H1 promoter have the same conserved protein-binding sites and transcription termination sequence, but are different in size and identity of the +1 nucleotide, guanosine for U6, and adenosine for H1 [31].
Lee et al created an siRNA expression vector that transcribes for the sense and antisense strands (Figure 2(a)). The sense and antisense sequences were located in tandem and driven by separate U6 promoters. This tandem vector design was able to induce 90% knockdown of EGFP in 293 cells [32]. They further demonstrated their siRNA expression strategy to be capable of inhibiting HIV-1 in 293 cells showing up to 4 logs of inhibition determined via HIV-1 p24 viral antigen levels. To simplify vector construction and expression, Paul et al created a single promoter system that transcribes for the sense strand followed by a UUCG tetraloop sequence followed by the antisense strand creating an shRNA structure [33] (Figure 2(c)). The transcribed shRNA would be cleaved by endogenous Dicer and generate siRNA molecules capable of loading to RISC and guide destruction of the homologous mRNA. Verification of their vector-based shRNA expression was established with the knockdown of the human lamin A/C in HeLa cells. To further simplify library construction, a dual promoter siRNA expression vector (pDual) was developed by Zheng et al that allows the facile construction of siRNA expression library [34] (Figure 2(b)). The siRNA sequence is inserted between opposing U6 and H1 promoters and serves as the template for both the sense and antisense strand upon transfection. Zheng's construct results in an siRNA duplex with a uridine overhang on each 3' terminus, similar to the siRNA generated by Dicer which can be incorporated into the RISC without any further modification. Furthermore, a simple PCR protocol has been developed that allows an efficient and cost-effective production of siRNA expression cassettes on a genome scale in a high-throughput manner.
The vector-based shRNA design strategy was expanded by groups interested in genome-wide shRNA vector libraries. The shRNA expression construct pools can be generated from cDNA with restriction enzymes, such as DNase I [35]. Several groups have developed methods to cleave cDNA into fragments of the appropriate size and quickly clone these fragments into DNA vectors that generate shRNA structures in cells [36][37][38] (Figure 3). Several techniques have been reported (REGS [37], EPRIL [38], SPEED [36]) but the underlying principles guiding each are (1) restriction enzyme (RE) cleavage of cDNA into multiple fragments with nucleotide over hangs, (2) ligation of a 3' loop with MmeI RE recognition sequence, (3) further cleavage by MmeI to create fragments of the requisite size (20-21 bp), (4) conversion of dsDNA fragments into palindromic structures with PCR amplification, and (5) insertion of the randomly generated sense-loop-antisense sequences into the desired vector backbone. Shirane   known and unknown genes across the transcriptome, can be generated from RE random digests of cDNA libraries [38]. More recently, advanced understanding of microRNA (miRNA) biogenesis in plants and animals has led to the construction of a second generation of shRNA expression libraries, shRNA-mir (Figure 2(d)). These shRNA-mir constructs transcribe silencing trigger molecules that mimic the natural miRNA primary transcripts. Originally believed to be transcribed from the genome as shRNAs and directly processed by Dicer [39], we now believe that miRNAs are actually transcribed into long primary polyadenylated RNAs (pri-miRNAs) [40,41] which are first cleaved by Drosha, an enzyme in the RNase III family, to create pre-miRNAs. The pre-miRNA is then transported to the cytoplasm, mediated by Exportin-5 [42,43], and only then recognized and cleaved by Dicer to produce a mature miRNA. Silva et al designed an shRNA-mir library, based on miR-30 primary transcript [44], which was shown to be twelve times more efficient than first-generation shRNA expression systems [45].
One added feature of using vector-based si/shRNA expression system is the facilitation of hit deconvolution by PCR amplification or barcoding when performing selective screens. Selective screens with vector libraries can be employed to fish out the target-specific and effective sequences from pools. A pooled shRNA expression library can be introduced into cells while a selective pressure is applied causing negative control cells to be eliminated from the culture [38]. The selected RNAi sequence in the resistant cells can then be determined by PCR amplification using invariant vector backbone-based primers. Alternatively, the incorporation of a gene-specific sequence into each distinct shRNA vector in the library is another means of quick identification of the selected gene target. Termed "barcode" screening [46][47][48], this identification sequence can be located within the vector backbone [48] or function as the short hairpin sequence of the shRNA molecule [46]. After the selection event, fluorescent dyes are attached to the barcodes which are then hybridized to microarrays, allowing for the quick identification of positive siRNA sequences within the surviving cell population.
In contrast to synthetic siRNAs, the vector-based siRNA expression systems are amplifiable and more cost-effective. However, their efficiency may be compromised in certain HTS assays. Synthetic siRNAs can directly enter the RNAi mechanism at the point of Dicer or RISC whereas vectorbased RNAi molecules must first be transcribed. In addition, the transfection efficiency of plasmids may be lower relative to synthetic siRNA oligos, but for cell lines resistant to classic transfection reagents transduction with viral vectors should be considered. Furthermore, vector-based stable gene silencing may be affected by its integration position and result in a poor knockdown or off-target effects.

CONCLUSION AND FUTURE DIRECTIONS
Since the discovery of RNAi, groups have adapted this technology to suit their model system and assays of interest. A few new RNAi methodologies recently developed are advances in viral delivery systems, incorporation of features such as inducibility, and fluorescence/selection markers. Several groups have developed adenoviral RNAi vector strategies [49,50] in order to achieve higher levels of transduction and intracellular expression of the shRNA molecules. Lentiviral vector approaches have also been reported enabling transduction of the RNAi containing plasmids in nonproliferating cells as well as in vivo systems [51][52][53]. Inducible RNAi vectors have also been developed by several labs as both plasmid [54][55][56] and retro-/lentiviral vectors [57,58]. RNAi libraries that incorporate fluorescent markers have the benefit of facilitating accurate evaluation of transfection efficiency. These library design features illustrate the adaptability of RNAi technology.
RNAi has proven to be a powerful tool in functional genomics. Its ability to induce the degradation of sequencespecific target mRNAs provides a direct relationship between a gene's expression level and its functional role [59]. RNAi-based methodologies are sufficiently robust for HTS adaptation allowing for genome-scale applications. Advancements aimed at resolving limitations as mentioned above will no doubt lead to accessibility of cost-effective, validated genome-wide siRNA collections further advancing our ability to annotate gene functions and investigate complex biological processes.