Recently, the clustered regularly interspaced short palindromic repeats (CRISPR) system has emerged as a powerful customizable artificial nuclease to facilitate precise genetic correction for tissue regeneration and isogenic disease modeling. However, previous studies reported substantial off-target activities of CRISPR system in human cells, and the enormous putative off-target sites are labor-intensive to be validated experimentally, thus motivating bioinformatics methods for rational design of CRISPR system and prediction of its potential off-target effects. Here, we describe an integrative analytical process to identify specific CRISPR target sites in the human
Technologies to achieve precise gene correction in patient-specific induced pluripotent stem cells (iPSCs) are essential for stem cell-based tissue regeneration [
The DNA binding specificity of the CRISPR complex is dependent on the base-pair complementarity between the 20 nt sgRNA and the target genomic DNA sequence of interest that lies next to the 5′ end of a protospacer adjacent motif (PAM) matching the sequence NGG. The 1st nucleotide (numbered 1st to 20th in the 5′ to 3′ direction) of the sgRNA must remain G to avoid affecting expression driven by the U6 promoter. And the Cas9 nuclease will cleave at the 17th nucleotide of the recognition site [
The enormous putative off-target sites are labor-intensive to be validated experimentally, thus motivating bioinformatics methods for rational design of sgRNA and prediction of its potential off-target effects. However, previous CRISPR off-target prediction tools focus mainly on protein-coding regions while neglecting the noncoding regions [
For phylogenetic analysis,
Firstly, to systemically analyze the CRISPR target sites in the region of interest, all candidate sites complying with the GN19NGG sequence pattern are searched using the web tool Cas9 Design (
Secondly, putative off-target sites of each candidate site with up to 3 target mismatches are searched against the human genome assembly hg19 using the web tool Optimized CRISPR Design (
Thirdly, as DNA variants in the target sequence might affect the CRISPR/Cas9 cleavage efficiency, the numbers of single nucleotide polymorphisms (SNPs) which overlapped with the candidate sites were searched as reported in dbSNP135 (
The pCas9/I2-1 CRISPR plasmid was generated by kinasing and annealing oligonucleotides containing the I2-1 guide strand plus sticky ends, ligating into the pX330 plasmid that contains a CHB promoter-driven Cas9 and a U6 promoter-driven chimeric single-guide RNA expression cassette (Addgene: 42230). The cleavage efficiency was measured using the T7 endonuclease I (T7E1) mutation detection assay. In brief, 106 293T cells/dish were plated onto 60 mm dishes and cultured in fibroblast medium 24 h prior to transfection. Cells were transfected with 2, 5, or 10
Ahead of the CRISPR/Cas9 target site analysis, we investigate the evolution of
Phylogenetic analysis of
The alignment result and phylogenetic analysis of the protein sequences responding to the
Phylogenetic analysis of
Collectively, these results are consistent with the
An integrative analytical process combining computational analyses of target uniqueness, off-target distribution, and DNA variants information was developed to identify specific CRISPR target sites for
Uniqueness analysis of CRISPR target sites in
Secondly, based on previous finding that the CRISPR system potentially tolerated 1–3 target mismatches [
Furthermore, to evaluate the potential off-target effects, the numbers of off-target sites located in other exons and transcription factor binding sites (TFBS) validated by ChIP-seq data from the ENCODE Project were investigated. Averagely, both the exon and intron sites had approximately 8% off-target sites located in TFBS; however, the exon sites had significantly more off-target sites located in other exons, which is around 2.5 times of those of the intron sites (Figure
Off-target distribution analysis of CRISPR target sites in
Thirdly, the numbers of known single nucleotide polymorphisms (SNPs) in these candidate sites were also investigated by searching all overlapped SNPs as reported in dbSNP135. SNPs were not favored in this situation as they would result in variation of CRISPR cleavage activities between iPSC lines derived from different patients. The statistics showed that all exon sites contained more SNPs than intron sites, while there were 4 intron sites without any known SNP (Figure
Single nucleotide polymorphism (SNP) analysis of CRISPR target sites in
Finally, we found that target sites in introns had 2 advantages over those in exons: (i) less highly similar off-target sites at homologous genes; (ii) less off-target sites located in other exons. Moreover, to minimize possible off-target effects, the selection of target sites should also avoid (i) too high or too low AT content; (ii) containing common regulatory elements. Based on these criteria, we recommended I2-1 (GACGAATGATTGCATCAGTGTGG), an intron candidate site without any known SNP and with the fewest putative off-target sites across the genome, as the CRISPR target for
We then examined the cleavage activity of CRISPR system targeting the I2-1 site in 293T cells. Briefly, the I2-1 guide strands were annealed into the pX330 plasmid containing a Cas9/sgRNA dual expression cassette (pCas9/I2-1). 106 cells in a 60 mm dish were transfected with 2, 5, and 10
T7E1 mutation detection assays for the cleavage activity of the Cas9/I2-1 system in 293T cells. (a) On-target cleavage activities of the Cas9/I2-1 system by different concentrations. The mutated rate is quantified by ImageJ. (b) Off-target cleavage activities of the Cas9/I2-1 system (
In this study, we describe a rational design process combining computational analyses of target uniqueness, off-target distribution, and DNA variants information to identify specific CRISPR target sites for
To facilitate precise genetic correction, the targeting specificity of the programmable artificial nuclease is crucial. Our phylogenetic analysis has shown that the human
A few bioinformatics web tools have been developed to search for specific CRISPR target sites in a given gene sequence [
In summary, our studies provide a standard analytical procedure to design specific CRISPR target sites between homologous genes. Here, we have showed an example how to apply this design method to identify an optimal CRISPR target site in the
All authors have no conflict of interests.
Yumei Luo and Detu Zhu are equal contributors.
This work was supported by grants from the National Natural Science Foundation of China (U1132005, 31171229, and 81401206), Guangdong Province International Cooperation Program (2013B051000087), Guangdong Province Public Research and Development Program, Guangzhou City Science and Technology Key Project (2011Y1-00038 and 20140000000-4), Guangzhou City Medical Science and Technology Program (20141A011091), and Guangzhou Medical University (2012A09 and 2013Y08).