Codon Preference Optimization Increases Prokaryotic Cystatin C Expression

Gene expression is closely related to optimal vector-host system pairing in many prokaryotes. Redesign of the human cystatin C (cysC) gene using the preferred codons of the prokaryotic system may significantly increase cysC expression in Escherichia coli (E. coli). Specifically, cysC expression may be increased by removing unstable sequences and optimizing GC content. According to E. coli expression system codon preferences, the gene sequence was optimized while the amino acid sequence was maintained. The codon-optimized cysC (co-cysC) and wild-type cysC (wt-cysC) were expressed by cloning the genes into a pET-30a plasmid, thus transforming the recombinant plasmid into E. coli BL21. Before and after the optimization process, the prokaryotic expression vector and host bacteria were examined for protein expression and biological activation of CysC. The recombinant proteins in the lysate of the transformed bacteria were purified using Ni2+-NTA resin. Recombinant protein expression increased from 10% to 46% based on total protein expression after codon optimization. Recombinant CysC purity was above 95%. The significant increase in cysC expression in E. coli expression produced by codon optimization techniques may be applicable to commercial production systems.


Introduction
The use of E. coli expression systems for foreign protein production by recombination has been well documented. Such expression systems possess superior characteristics, including fast growth rates, inexpensive fermentation media, and documented genetic code. The efficiency, and thus the cost of production of recombinant proteins in this microorganism, depends on highly variant protein expression levels [1,2]. The protein expression level of foreign genes is impacted by the expression system, the specific nature of the foreign genes, and the regulation of protein expression. Each of these factors is also highly variable based on both the protein and host. Species-specific variations in codon usage are often cited as one of the major factors affecting protein expression, suggesting the effectiveness of codon optimization to suit a particular expression system, such as E. coli [3,4]. Additionally, some rare codon varieties may produce only low levels of their cognate tRNAs, thus reducing the translational rate while simultaneously increasing the risk of translational errors. Because this may significantly reduce functional protein production, protein expression will be significantly influenced [5,6]. Over the past decades, researchers have achieved success in redesigning many gene codons to improve their expression in certain systems, generating numerous commercially credible techniques [1][2][3][7][8][9][10].
The protein Cystatin C (CysC) is an alkaline and nonglycosylated small protein, also known as the Y-trance protein or post-Y-globulin. CysC is composed of 122 amino acids, its isoelectric point is 9.3, and its molecular weight is 13.3 × 10 3 Daltons [11]. Although the CysC fusion protein can be expressed in E. coli [12], expression efficiency in eukaryotic genes using prokaryotic expression vectors is low. This suggests a preference for prokaryotic codons in the E. coli system. In many cases, specific bacterial and mammalian systems have shown a preference for the use of different codons with specific and unique characteristics [6]. Codon optimization is a genetic technique that has been previously used to achieve optimum expression of 2 Journal of Biomedicine and Biotechnology a foreign gene based on the specific nature of the host system. These techniques are often based on previously identified preferred codons, relative to the system of interest. During the process of optimization, existent codons are replaced by a set of more suitable host codons [6,[13][14][15]. Because the gene of interest, cysC, is a mammalian gene that is poorly expressed in most bacterial systems, several alternatives exist to improve gene expression. Codon optimization or, alternatively, heterologous expression can improve expression by supplying the host with extra copies of tRNA molecules [16].
Typically, two strategies have been used for codon optimization. In the first strategy, commonly referred to as the "one amino acid-one codon" method, the most abundant host codon or a set of selected genes is assigned to all instances of a given amino acid in the target sequence [4,8,14,17]. A variation of the first strategy is employed by the current study, termed the "codon randomization" method. This method applies translation tables based on the frequency distribution of the codons across an entire genome or a subset of highly expressed genes. These constructs are then used to assign a weighted value to each codon. Notably, random assignment of codons based on previously determined probability weights has proven successful in codon optimization for E. coli in previous studies [2,7,8,18,19].
According to the codon preferences previously observed in the E. coli expression system [6,[13][14][15][20][21][22], the current study examines a method for codon optimization of the gene sequence for production of CysC in E. coli. This technique seeks not only to achieve optimization, but also to maintain the integrity of the amino acid sequence. Expression of the codon optimized (co-cysC) and wild type (wt-cysC) may be induced by cloning genes into a pET-30a plasmid, thus transforming E. coli into BL21 competent cells. Before codon optimization and after the procedure, the expression and biological activation of CysC in the prokaryotic expression vector and host bacteria were compared by purification of the total protein produced by the cell. The current study utilizes codon optimization strategies based on those developed for use in other protein systems in order to improve the expression of cysC in E. coli.

Construction of Recombinant wt-cysC Expression Vectors.
The total cellular RNA was extracted using RNAiso reagent (Takara, Japan) according to the instructions provided by the manufacturer. The total RNA of human promyelocytic leukemia (HL-60) cells (Key laboratory molecular virology of Shandong Province, China) was analyzed using 1% agarose gel electrophoresis. Resultant bands and band intensities were observed and recorded. The A 260 value was detected using an ultraviolet spectrophotometer (Lambda 45, USA). The CysC cDNA was obtained by reverse transcription polymerase chain reaction (RT-PCR) using the primers: 5 -GAATTCATGGCCGGGCCCCTGCGC-3 (sense) (underlined portion represents the EcoRI enzyme cut site) and 5 -GCGGCCGCCTAGGCGTCCTGACAGGTGGA-3 (antisense) (underlined portion represents the NotI enzyme cut site). The PCR products were purified, double-digested, extracted, and inserted into EcoRI/NotI sites of the pET-30a vector. The recombinant pET-30a-C plasmid was constructed and identified using double-digestion electrophoresis and DNA sequencing (Invitrogen Ltd., Shanghai, China).

Construction of Recombinant co-cysC Expression Vectors.
The mRNA sequence for CysC provided by GenBank (Gene ID;1471) was analyzed, and the gene sequence was optimized in accordance with the codon preference characteristics previously determined for the E. coli expression system [2,10,23,24]. The "codon randomization" method developed in this study employs translation tables. In this method, weighted values were assigned based on frequency distributions of each of the genomic codons. The strategy used for codon optimization was based on random assignment of a triplet for each amino acid according to an established preference table (http://www.kazusa.or.jp/codon/cgibin/showcodon.cgi?species=316407). Probability was based on the weight of each codon within the set encoding a given amino acid. Using this algorithm, one sequence was designed using the GeMS software package (KOSAN Biosciences Inc.) [18]. A complete comparison of the resultant gene sequence designs for co-cysC and wt-cysC are shown in Table 1.
According to analyses of these gene sequences, a single strand oligo was successfully designed and synthesized. EcoRI and NotI were added to the 5 and 3 ends of each sequence. The synthesized oligo was spliced to form an integrated gene using polymerase chain reaction (PCR). The synthesized sequence was inserted into a pMD-18T vector, and this vector was further transformed into E. coli BL21 competent cells. The validity of the gene sequence of the recombinant plasmid was confirmed by sequencing. A fragment of approximately 400 bp was obtained from the pMD-18T-cysc plasmid using EcoRI and NotI. The recombinant pET-30a-co plasmid was obtained by ligating this fragment with the pET-30a vector. The correct open reading frame (ORF) was verified, and the recombinant plasmid was transformed into the E. coli BL21 competent cells. The recombinant plasmids were extracted using alkaline lysis, and the gene sequence was confirmed using double digestion of EcoRI and NotI combined with gene sequencing.

Inducible Expression and Purification of Fusion Protein.
The pET-30a-co and pET-30a-wt with correct sequences were transformed into the E. coli BL21 strain, and three Table 1: Sequence comparison between wt-and co-cysC genes. colonies were grown on ALB medium. Three single bacterial colonies were independently selected from separate colonies. A single bacterial colony of each clone was inoculated into 5 mL LB medium and cultured at 37 • C for 10 h. The resultant bacteria were subsequently inoculated into 500 mL LB medium with 100 μg/mL ampicillin in the ratio of 1:100 and cultured at 37 • C until the value of A 600 reached 0.5. Protein expression in the bacteria was induced by 0.25 mmol/L isopropyl-β-D-thiogalactoside (IPTG), and the bacteria were harvested after 1, 2, 3, 4, and 5 h.

Detection of Recombinant Proteins.
The purified recombinant proteins were diluted using 0.04 M phosphate buffer (PBS) and detected by sol particle immunoassays (SPIAs) (Leadmanbio, China) using a 7600 Chemistry Analyzer (Hitachi, Japan). The recombinant Brucella protein in the same vector and host cells was set as a negative control. This additionally served as a high-value and low-value quality control material.
2.6. Western Blot Analysis. The proteins produced by transformed bacteria were analyzed by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE). The results were transferred onto a nitrocellulose film. The film was blocked using 5% skim milk in Tris-HCl buffer solution-Tween (TBST) (25 mmol/L Tris-HCI, 125 mmol/L NaCl, 0.1% Tween 20, pH 8.0) at 37 • C for 2 h. The film was washed three times for 3 min and then incubated with mouse antihuman CysC monoclonal antibodies at 37 • C for 1.5 h. The film was washed with TBST three times and incubated with horseradish-peroxidase-(HRP-)tagged goat anti-mouse IgG at 37 • C for 1.5 h. Then, the film was washed and developed in tetramethylbenzidine (TMB) for 10 min.

Optimization of the cysC Gene.
The mRNA sequence of cysC was obtained from GenBank (Gene ID: 1471), and 63 of the 120 codons presented were optimized. Furthermore, the complex secondary structure and repeated sequences of the gene were screened. The rare codons in cysC were replaced by the preferred codons for E. coli, and the GC content was decreased from 60.88 % to 52.89%. Additionally, the AT-rich fragment was removed to avoid premature termination. Optimization was conducted to carefully avoid the inclusion of regions containing more than 5 A/T or G/C repeated sequences, regions that could potentially affect mRNA stabilization. The synthesized cysC gene was modified by synonymous codons while the integrity of the amino acid structure was maintained.

Identification of Prokaryotic Recombinant Plasmids.
Recombinant pET-30a-co and pET-30a-wt plasmids were extracted by alkaline lysis and analyzed by double-digestion of EcoRI and NotI. Identification and sequencing were conducted for both plasmids that met the design requirements. Viable recombinant clones were successfully constructed.

Expression and Purification of CysC.
The bacterial proteins of pET30a-co and pET30a-wt were analyzed, and the recombinant proteins were presented as an inclusion body. SDS-PAGE indicated that a 19-kilo Dalton (kDa) protein band was uniquely present in the transformed bacteria. Electrophoresis analysis indicated results in accordance with the theoretical value of 19.0 kDa.
At a time 5 h later, 0.25 mg/L IPTG induced the highest expression of fusion protein. The expression of pET30a-wt recombinant proteins accounted for 10% of the total protein present in the transformed bacteria, while the expression pET30a-co recombinant proteins in three colonies accounted for 47%, 49%, and 42% (average of 46%) of the total protein present in the transformed bacteria. Recombinant proteins found in the lysate of the inclusion body were purified using Ni2 + -NTA resin. The purified fusion proteins were presented at about 19000-Mr with a purity in excess of 90% (Figure 1).

SPIA Detection of the Diluted Recombinant Protein.
Five hours after the bacteria was induced, the bacteria with PET30a-wt presented 36 mg/L CysC. The three colonies of pET30a-co generated yields up to 779 mg/L, 827 mg/L, and 770 mg/L (average of 792 mg/L). The standard errors of protein expression in the three colonies of pET30a-co were 30.64 mg/L, and the coefficient of variance (CV) was 3.97%. However, 0 mg/L CysC was present in the negative control. The co-cysC sequence produced significantly more protein than the wt-cysC sequence.

Identification of Recombinant Protein Antigenicity by
Western Blot. The anti-human CysC monoclonal antibodies were shown to react with single recombinant proteins with an approximate molecular mass of about 19.0 kDa (Figure 2), suggesting that the recombinant proteins were human CysC.

Discussion
CysC, a member of cysteine proteinase inhibitor family, has an unclear biological function that has made it the target of numerous research studies. In order to improve and promote the study of this compound, more efficient and affordable synthetic production techniques for this protein Journal of Biomedicine and Biotechnology in bacterial agents are required. Previous studies pertaining to the molecular structure and metabolism of CysC have demonstrated that it is present in virtually all nucleated cells. In addition, it demonstrates no tissue specificity, and stable production and cyclical levels are observed in many cell types. Most notably, the expression of CysC has not been shown to be related to pathology, age, gender, or metabolism in humans. As one of the few proteins freely filtered by the glomeruli without significant reabsorption or secretion, CysC concentrations in human blood serum are an ideal index for glomerular filtration, a primary potential application for the protein [11,23]. A study of 135 patients conducted by Grubb et al. found that the reciprocal of serum CysC was significantly related to the glomerular filtration rate, with a correlation coefficient reaching the notable level of 0.77 [24]. It has been reported that the accuracy and sensitivity of CysC for diagnosis of glomerular filtration rates were higher than those reported for either serum urea nitrogen or serum creatinine [11,[25][26][27]. Because extensive further research in development and testing of novel techniques for CysC utilization require large amounts of the protein, efficient bacterial synthesis is rapidly becoming a critical process in both research and clinical studies involving CysC.
The development of high-quality antibodies, such as egg yolk antibody (IgY), for the detection of CysC is a prominent goal in contemporary research. Preparation and purification of the protein antigen of CysC necessary for this process also increase the demand for bacterially produced CysC. Although CysC fusion protein was successfully expressed in E. coli [12], the low expression efficiency of the eukaryotic  gene using prokaryotic expression vectors has remained problematic. Thus, codon optimization has recently become an increasingly important tool for the commercialization of bacterial gene expression. When researchers neglect to select appropriate expression vectors and host systems, ignoring optimal matching of the vector and host, production of certain proteins can be much less efficient. In many cases, this may result in resource utilization and costs that prove to be prohibitive for many researchers [28].
Each amino acid corresponds to a minimum of one codon and a maximum of six codons based on the known degeneracy of codons. This trend must be considered when matching eukaryotic proteins to bacterial hosts in order to achieve optimal expression. Notably, significant differences in codons also appear between different organism types and individual species. In protozoa, high expression of certain genes may be attributed to a preference for certain codons [29][30][31]. These preferences are generally related to the variations between prokaryotic and eukaryotic systems, which can cause some codons (often codons similar to the stop codon) to prematurely initiate breaks in the translational sequence. Additionally, certain codons may result in limited tRNA supplies, while abundant tRNA supplies are required to ensure translation efficiency [31]. The preferential selection of codons also correlates with the structure and function of the coded proteins.
Rare codons in mRNA are often associated with linkage areas, wherein translational rates exhibit notable decreases. These rare codons also result in unique protein domains and regular secondary unit structures. Similarly, the translational rate varies at different mRNA fragments, largely based on variations in secondary protein structures. Many organisms possess significantly different codon preferences, as indicated by the vast differences observed in real exons. In genetic engineering, the target gene is generally a triplet sequence rather than the natural exon. Hence, the selected target gene must be analyzed in terms of its species-specific preference in order to obtain the highest expression of recombinant proteins. In bacterial cells, this process includes the removal of rare codons, utilization of preferential codons, minimization of secondary structure variation, and regulation of GC content. Cumulatively, these processes are generally referred to as "codon optimization." Previous studies [12] have shown that eukaryotic wt-cysC was able to be successfully expressed in E. coli; however, common eukaryotic codons produced only low expression levels due to the absence of prokaryotic preferential codons. Through codon optimization, optimal expression levels can be achieved in these systems, resulting in much higher protein yields. In addition to the use of optimized codons, several studies [32][33][34] have also reported that the expression and ELISA titer of synthesized genes markedly increased upon the removal of rare codons and unstable sequences. Bagherpour et al. synthesized fimH of Uropathogenic Escherichia coli (UPEC) using mammalian codons [35]. Compared with the wild-type gene, the fimH gene of the mammalian codon has been shown to be compatible with eukaryotic expression systems. Therefore, the mammalian codon may be appropriated in a fimH construct as a DNA vaccine in COS7 cells. Anzor et al. reported that the expression of synthesized genes in E. coli was 3.4-fold greater than that of wt-PEDF in the native host when codon optimization was applied. Similarly, Menzella reported that the codon randomization method was a superior strategy for codon optimization [36]. Furthermore, Menzella demonstrated significant increases in chymosin protein expression, demonstrating the effectiveness of this strategy for reducing production costs in industrial enzyme processes that use microbial hosts.
The target gene, cysC, was redesigned over the course of the current study using the optimal techniques described by previous researchers [2,10,23,24], including the removal of rare codons and unstable sequences. In addition, previously successful techniques in other similar expression systems were also applied, including utilization of optimized codons as well as reduction in AT-and GC-repeated sequences. The codons were modified, and the GC content of the target gene was decreased. In addition, the AT rich fragment was removed to avoid premature translational termination. This data suggested that the expression and titer of synthesized CysC were remarkably increased after optimization of induction concentration and time with IPTG.

Conclusions
The results of the current study confirm that production of CysC in prokaryotic systems, such as E. coli, may be improved through the application of codon optimization techniques. These findings demonstrate that expression of the recombinant protein (46%) was significantly higher in optimized systems than in the wild type (10%). The concentration of CysC dramatically increased from 36 mg/L to 792 mg/L. As the demand for CysC is expected to rise as an increasing number of researchers and clinicians begin to utilize the protein, increased expression may potentially represent significant gains for commercial enzyme producers. Though the original prokaryotic expression of cysC in E. coli was relatively low, selection of preferential codons for the prokaryotic systems enabled vast improvements in the design of the expression system. The expression of the fusion protein may also be increased by selecting an appropriate vector and host system, with special consideration applied to preferred codons. Researchers must consider that optimal gene and vector matching and host system matching are closely related to resultant protein expression levels. These considerations are important elements for optimal genetic engineering of these systems.