Essential Notes Regarding the Design of Functional siRNAs for Efficient Mammalian RNAi

Short interfering RNAs (siRNAs) are widely used to bring about RNA interference (RNAi) in mammalian cells. Numerous siRNAs may be designed for any target gene though most of which would be incapable of efficiently inducing mammalian RNAi. Certain highly functional siRNAs designed for knockout of a particular gene may render unrelated endogenous genes nonfunctional. These major bottlenecks should be properly eliminated when RNAi technologies are employed for any experiment in mammalian functional genomics. This paper thus presents essential notes and findings regarding the proper choice of siRNA-sequence selection algorithms and web-based online software systems.

Theoretically, (n-20) siRNAs targeting for a gene n bp in length can be designed. In Drosophila, more than 90% of these siRNAs are capable of reducing target gene activity by more than 80% [29]. The design of siRNAs in the case of Drosophila as well as other lower eukaryotes would thus not involve any real difficulty. But about 80% of theoretically designable siRNAs would not be highly functional in the case of mammalian RNAi [29,30]. With certain target genes rich in GC, nonfunctional siRNAs may increase by 95% or more of the total designable siRNAs [Y N et al, unpublished].
Mismatched siRNA may occasionally inactivate genes other than the target, an undesired side effect designated as the "off-target effect" [31,32]. The molecular basis for this remains to be clarified [33] though mRNA cleavage, the climax of the RNAi reaction [34][35][36][37][38], requires a nearly strict nucleotide sequence identity between the mRNA target portion and sense strand (SS) of siRNA [33,39]. Thus, at least some fraction of undesirable siRNAs, giving rise to the off-target effect through destabilization of mRNAs other than the target, may be eliminated by computer-based homology search [40][41][42][43][44][45].
In the design of highly functional siRNAs for mammalian RNAi, suitable sequence conditions or good algorithms for selection of highly functional siRNAs and good computer software suitable for genome-wide short-sequence homology search to minimize the off-target effect are indispensable.
Too many websites are available for functional siRNA search for mammalian RNAi as partly listed in Table 1. These websites may incorporate one or a few algorithms for functional siRNA selection previously determined based on biological validation data. Considerable mammalian RNAi data are presently available so that, in some websites, original algorithms may have been replaced with those modified to be more effective yet do not appear in scientific journals, thus making difficult the evaluation of individual website  [54] reliability. Consequently, the present study directs attention to basic frameworks and some related application problems of algorithms for the selection of highly functional siR-NAs.

RNAi-INDUCING ACTIVITY AS AN INTRINSIC PROPERTY OF THE siRNA SEQUENCE
RNAi activity induced in mammalian cells is highly dependent on the particular sequence of siRNA used [29,30] and may vary depending on transfected cell types or transfection efficiency. To examine these factors, various siRNAs targeting for the firefly luciferase gene (luc) were synthesized and transfected with luc encoding plasmid DNA into a variety of mammalian cell lines, which include human HeLa, HEK293, and colo205, Chinese hamster CHO-K1, and mouse E14TG2A ES cells [55]. The concentration of siRNA used in these experiments was 5-50 nM. siRNA-dependent RNAi activity was also examined in chicken embryos [29]. The transfection efficiency of colo205 is quite low and about 1/100 times as high as that of HeLa [55]. Neither difference in animal species from which cell lines or embryos were derived nor that in transfection efficiency had any significant effect on induced RNAi activity [29,55]. RNAi activity induced in mammalian and chicken cells upon siRNA transfection may thus be determined primarily by the transfected siRNA sequences themselves as far as RNAi due to 10-50 nM siRNA is concerned.

THREE BASIC ALGORITHMS FOR SELECTING FUNCTIONAL siRNAs BASED ON BIOLOGICAL VALIDATION
Many experiments have been conducted to clarify possible sequence requirements of functional siRNAs for mammalian RNAi [29,[56][57][58][59][60][61]. Only three representative algorithms, which may be widely used for functional siRNA search for mammalian RNAi, are presented and discussed in the following.
Algorithm 1. This algorithm was developed by Ui-Tei et al [29]. As shown in Figure 1(a1), all siRNAs satisfying the following four sequence conditions are defined as class I siRNAs in Algorithm 1: (1) the 5 antisense-strand (AS) end, A or U, (2) the 5 SS end, G or C, (3) the 5 -terminal one-third of AS, A/U-rich, and (4) a long G/C stretch, absent from the 5terminal two thirds of SS. Validation data obtained using luc as a target indicated all of 40 class I siRNAs arbitrarily chosen to be capable of reducing target gene activity by more than 70% [29]. All RNAi experiments were conducted at 50 nM siRNA. Algorithm 1 siRNAs with features completely the opposite to those of class I siRNAs except for condition (4) are defined as class III siRNAs (Figure 1(a2)). Validation indicated that all of 15 class III siRNAs arbitrarily chosen are incapable of inducing efficient mammalian RNAi [29]. Thus, most, if not all, class I siRNAs may possibly serve as siR-NAs highly functional in mammalian cells. Class III siRNAs Kumiko Ui-Tei et al 3 No long GC stretch  are nearly incapable of inducing effective mammalian RNAi. With the luc, the total number of theoretically designable siR-NAs is 1631 and class I siRNAs represent about 17%, which is roughly identical to the percentage (25%) of highly functional siRNAs estimated from validation data [29], class I siR-NAs may thus constitute most, if not all, of siRNAs highly functional in mammalian RNAi.
Algorithm 2. This algorithm was proposed by Reynolds et al [59, Figure 1B] who carried out analysis of 180 siRNAs targeting mRNA of two genes and found the following characteristics associated with siRNA functionality: low G/C content, preference for low internal stability at the 3 -terminus of SS, and absence of inverted repeats. Furthermore, SS is presumed to preferably use A, U, and A at SS positions 3, 10, and 19, respectively. The 5 AS terminal should not be G/C. G may not be present at position 13 (Figure 1(b)). In more than half of class I siRNAs, there are no base preferences at position 3 and 10 [29,55], so that Algorithms 1 and 2, respectively, may predict considerably different siRNA sets to be functional.
Algorithm 3. This algorithm was proposed by Amarzguioui and Prydz [60] who carried out statistical analysis on 46 siR-NAs and found Algorithm 3 to require the following features for functional siRNAs. The 5 AS terminus and its SS partner are A/U and the 5 SS terminus and its AS partner, G/C. An opposite combination of terminal bases may give rise to inadequate functionality. These authors also found that there is asymmetry in siRNA duplex end stability; that is, the A/U content differential for the three terminal nucleotides at both ends of the duplex may be considered essential to siRNA functionality. Furthermore, they noted A to prefer position 6 of functional siRNAs (Figure 1(c)), although only a small fraction of class I siRNAs is associated with A at SS position 6 [29].
To examine in greater detail, relationships among the three algorithms, that the percentage of siRNAs considered functional by Algorithm 1 (class I) can be repredicted as functional by Algorithms 2 or 3 or vice versa, was determined (see [55, Figure 1D]. Based on the three algorithms, total possible siRNA sequences (4.4 × 10 7 ) designed using RefSeq human sequences (version 11) were found to be nonfunctional by as much as 73%. Class I siRNAs constituted 14% of the total theoretically predictable siRNAs, whereas Algorithms 2 and 3, respectively, predict 10 and 20% as functional siRNAs. Nearly 90% of class I siRNAs could be repredicted as functional by Algorithm 2 or 3 or both. Eighty four percent of siR-NAs simultaneously predicted as functional by Algorithms 2 and 3 could be repredicted as functional or class I siR-NAs by Algorithm 1. More than 50% of siRNAs predicted as functional by Algorithm 2 could not be predicted to be functional by Algorithm 3. Seventy seven percent of Algorithm 3 functional siRNAs could not be repredicted as functional by Algorithm 2. These findings may indicate that Algorithm 1 is capable of predicting the functionality of siRNAs more reliably than Algorithms 2 or 3.

ALIGNMENT ALGORITHM FOR SHORT NUCLEOTIDE SEQUENCES
Rapid homology comparison of the entire mRNA sequences with siRNA AS/SS sequences is indispensable for identifying off-target genes. BLAST [62] may not be a good software for making such comparison, since a number of off-target candidates are overlooked and too, considerable time is required for BLAST-based calculation. The Smith-Waterman local alignment algorithm [63] is accurate but time consuming to execute. Recently, Yamada and Morishita have developed a very rapid and accurate alignment algorithm for short nucleotide sequences [41] and this software can process 60 million siRNA sequences of 21 nucleotides in length in 10 hours when executed in parallel on ten inexpensive PCs. The hardware of Snøve Jr and Holen [64] provides similar performance although the number of processing units is not clearly specified. Websites using the Yamada-Morishita software or hardware of Snøve Jr and Holen should thus prove much more rapid and reliable compared to BLAST.
The base mismatch introduction studies indicate that transfected siRNAs occasionally cause phosphodiester-bond cleavage not only of the authentic mRNA target but also mutated targets with 1-2 base mismatches [33,39]. But mutated targets with three or more mismatches may not undergo cleavage by transfection of the same siRNA [Y N et al, unpublished]. siRNAs less than 84 (16/19 × 100)% homology in sequence to any part of total mRNAs other than the target should thus be used for RNAi, which would reduce the number of available functional siRNAs to 1/10 of the input. That is, only 10% of class I siRNAs or less than 2% of total siRNAs theoretically designable using human Ref-Seq sequences becomes available in mammalian RNAi when off-target effects due to mRNA destability are considered. Computational analysis indicated that, even so few available siRNAs, at least one functional class I siRNA can be assigned to more than 99% of human mRNA sequences (RefSeq sequences) [Y N et al, unpublished]. miRNAs involved in posttranscriptional gene silencing through translational regulation [65][66][67][68][69][70][71][72][73] possess less homology with the target, indicating siRNAs with lesser homology in some cases to possibly be involved in some off-target reactions [74]. The elimination of a large number of siRNA with low homology to mRNAs other than the target may render genome-wide gene silencing in mammalian cells quite difficult. The simultaneous use of a few to several siRNAs targeting for an identical gene (target gene) may possibly solve this problem since, in most cases, off-target targets would not be identical to each other [31,32].

EXPERIMENTAL PARAMETERS POSSIBLY AFFECTING FUNCTIONALITY OF siRNAs
siRNA-mediated RNAi activity may vary significantly depending on not only the particular siRNA sequence but also parameters such as siRNA concentration, duration of siRNA exposure, and possibly target mRNA concentration and secondary structure within cells [29,75]. Functional siRNAs in some cases have actually been found to induce maximum RNAi activity 1 day after transfection, whereas other siR-NAs to express maximum activity on 2 or 3 days following transfection. Usually, functional-siRNA-dependent RNAi persists 1-2 weeks, whereas virtually no RNAi is induced within cells even after a long incubation with nonfunctional siRNAs. Class I siRNAs, capable of inducing highly functional RNAi when transfected at 50 nM, were considerably heterogeneous in capability of bringing about RNAi when used for 1-day transfection at the concentration of 50 pM (see [29] by Ui-Tei et al). Reduction in target gene activity varied from 20 to 60% depending on the sequences of class I siRNAs used. Thus, additional sequence conditions may possibly be found so as to define a subclass of class I with more functionality but in such a case, nearly complete genome-wide gene silencing might no longer be possible.
Recently, Kim et al [76] showed that a 27 bp long dsRNA with blunt ends is much greater in functionality than 21 bp long siRNA and suggested that short Dicer substrate dsRNA may be generally much more functional compared to authentic siRNAs 21 bp long. However, it was subsequently found that this is not a general feature of 27 bp long bluntended dsRNA [77]. In the absence of 3 overhang, Dicer digests dsRNA uncontrollably, generating many products varying in length, most of which may not be as functional as 21 bp long highly functional siRNAs [77]. RNAi-inducing activity would thus appear to depend primarily on the presence of considerable highly functional siRNAs in the digestion products and so, consequently, 27 bp long blunt-end dsRNA would not be necessarily a good choice for highly efficient RNAi.

siRNA-OLIGOMER-DEPENDENT RNAi IN MAMMALIAN CELLS
Long dsRNA possessing 2-nucleotide 3 overhangs at both ends is cleaved by Dicer from these ends to generate siR-NAs having definite nucleotide sequences [28,[77][78][79][80]. Thus, should nearly all siRNAs produced by Dicer digestion belong to class I and the interferon response due to dsRNAs equivalent in length to siRNA oligomers not being significant, the induction of effective multiple-gene knockout in mammalian cells may occur with transfection of siRNA oligomers and this was recently found to be the case [28]. Through use of class-I-siRNA oligomers multiple-target gene knockout was clearly shown to take place.

DNA/shRNA-MEDIATED RNAi
RNAi can be induced by introducing DNA encoding both SS and AS of siRNA into mammalian cells. Both RNA polymerase III and II promoters, respectively, are used to express short hairpin RNA (shRNA) and longer RNA including shRNA sequence in the middle [81][82][83][84][85][86][87][88][89][90]. The primary transcript of RNA polymerase III is a mixture of shRNAs with two to several consecutive U's at its 3 overhang [81][82][83][84][85][86][87][88]. Dicer cleavage sites of shRNAs vary depending on the length of 3 overhangs [89] and accordingly, several different species of siRNAs are expected to be generated from shRNAs transcribed by polymerase III [88]. Thus, the presence of highly functional siRNAs in these Dicer digestion products is required for successful RNAi due to a polymerase-III-based system. In addition, four consecutive U's or A's should not be included in the nonoverhang sequences of AS and SS, respectively, since these sequences stimulate premature termination of polymerase-III-dependent transcription [88].
In polymerase II-driven expression systems, the primary transcript is long polyadenylated RNA (pri-miRNAlike RNA), which is recognized and cleaved by the nuclear microprocessor complex [91,92]. This complex contains Drosha, an RNase III-type RNase that cleaves the pri-miRNA-like RNA to generate shRNA with a 2-nucleotide 3 overhang [93]. The shRNA thus produced is converted mainly to two overlapping siRNAs through Dicer digestion (see [28]), indicating that successful RNAi requires the involvement of highly functional siRNAs in these siRNA products.

POSSIBLE MOLECULAR BASES OF ASYMMETRIC SEQUENCE REQUIREMENTS IN FUNCTIONAL siRNAs
Each mammalian Argonaute proteins (eIF2Cs) is comprised of a PRP motif and two domains: PAZ and PIWI [94]. Structural analysis of the Argonaute protein crystals from Pyrococcus farious indicated that the PIWI domain has essentially the same three-dimensional structure as ribonuclease H and that Argonaute may function as a slicer of mRNA [95]. PAZ and PIWI domains may recognize separately two ends of siRNA. The crystal structure of the PAZ domain from human Argonaute 1 suggested that the PAZ domain is anchored to the 2-nucleotide 3 overhang of the siRNA duplex [96]. The PIWI domain from Archaeoglobus fulgidus contains a highly conserved metal-binding site that may recognize the 5 nucleotide of AS of siRNA in a manner not dependent on sequence [97]. Algorithms 1 and 3 predict functional siRNAs to possess A/U and G/C at the 5 AS and SS ends, respectively [29,55,60]. The GC pair is thermodynamically much more stable than the AU pair and thus, differences in stability in terminal base pair of the siRNA duplex may determine terminal sequence preference in highly functional and nonfunctional siRNAs, most probably by stimulating asymmetric binding of PIWI and PAZ domains to siRNA ends.
The 5 -terminal one-third of AS of functional class I siR-NAs is A/U-rich, possibly due to preferable siRNA unwinding from its AS end [29,56]. A one-step motor function of the putative siRNA helicase may unwind several base pairs from the A/U-rich siRNA end to stimulate formation of active RISC lacking SS of siRNA. Should this be the case, the introduction of base mismatches into the 3 -terminal third SS of siRNA may significantly increase the induced RNAi activity. Studies with Drosophila extracts showed a significant base-mismatch-dependent increase in RISC formation [56]. But, to date there are no data clearly confirm this in mammalian cultured cell experiments. Recently a part of RISC has been shown to be activated through cleavage of SS of siRNA at its center [98]. The presence of base mismatches in SS might be unfavorable to SS cleavage and this negative effect might partially prevent siRNA from being unwound.