Serum-Free Medium Optimization Based on Trial Design and Support Vector Regression

The Plackett-Burman design and support vector machine (SVM) were reported to be used on many fields such as some feature selections, protein structure prediction, or forecasting of other situations. Here, with suspension adapted Chinese hamster ovary (CHO) cells as the object of study, a serum-free medium for the culture of CHO cells in suspension was optimized by this method. Support vector machine based on genetic algorithm was used to predict the growth rate of CHO and prove the results from the trial designs. Experimental results indicated that ZnSO4, transferrin, and bovine serum albumin (BSA) were important ones. The same conclusion was arrived at when the support vector regression model analyzed the experimental results. With the methods mentioned, the influence of 7 medium supplements on the growth of CHO cells in suspension was evaluated efficiently.


Introduction
Chinese hamster ovary (CHO) cells are the most widely used mammalian cells in the world for expression and large-scale recombinant protein production. CHO cells can provide stable and accurate glycosylation; they offer a posttranslationally modified product and thus a more accurate in vitro rendition of the natural protein. CHO cells are ideal hosts of different complex biological macromolecules. In order to improve the productivity, optimization of cell culture medium to meet the nutrient demands and to minimize toxic production in cell culture has played a critical role [1][2][3].
Serum, used to be the most important supplement in the traditional medium for the culture of mammalian cells, provides essential nutrients, hormones, peptides, growth factors, adhesion factors, and so on. However, serum has many shortages such as high cost, batch variability, interference with the downstream processing of product purification, and the desirable elimination from the culture process, for example, viruses, mycoplasma, or prions [4]. Therefore, several commercial serum-free media for CHO cells have been available. However, the compositions of them are proprietary, and the prices of them are expensive. Since, different cell lines of CHO have different nutrient requirements, and no universal serum-free media applicable to all cell lines are available. Thus, a specific serum-free medium for each individual cell line needs to be developed to maximize cell growth. It is necessary for us to develop new serum-free medium used to express a kind of anti-VEGF monoclonal antibody by CHO [5].
However, the nutrients and their interactions which account for the proliferation effects on mammalian cells are complex. It is a time-consuming work and has the risk of neglecting interactions among supplements [6]. More and more statistical designs have been introduced to reduce the time and labors. Chun et al. [7] optimized the serum-free medium by the application of factorial design. Petiot et al. [8], by using fractional factorial design to screen serumfree medium rapidly for the growth of adherent Vero cells, enhanced the concentration of Vero cells to 160%. Deshpande et al. [9] used central composite design to optimize the concentrations of glucose, glutamine, and inorganic salts for cultivation of a CHO cell line.
In this study, a Plackett-Burman statistical design, which provides an efficient way of screening a large number of medium supplements combination, was used to develop serum-free medium. For the sake of recognizing the major supplements in the medium as well as predicting the growth rate, support vector machine (SVM) was also employed. In this paper, we designed the experiments with Plackett-Burman design which can decrease the cost of time and expenses and response surface design which gives a clear response. SVM was introduced to predict the growth rate and to find out the major elements which have a significant influence on the CHO growth rate. These will be described in Section 2. In Section 3, results will be presented to prove our method worked well. In the end, conclusions are drawn in Section 4.

Cell Line and Cell
Culture. CHO-K1 cells, obtained from ATCC, were initially maintained in DMEM/F12 (a mixture of DMEM and F12) supplemented with 10% newborn bovine serum at 37 ∘ C in the presence of 5% CO 2 . Then, the cells were adapted to serum-free medium by stepwise decreasing the serum concentration by 50% every three passages until the serum concentration was reduced below 0.1%. These cells adapted to the serum-free medium were cultured for further use.

Experimental Design.
We applied the Plackett-Burman design accounting for its advantages to find out the main factors having influence on the experiment. It takes two levels of each factor for analysis and a significant factor to be determined by comparison between the two levels of the various factors and the overall difference. Screening tests cannot distinguish between the interaction and the main effects, but a significant effect on the factor can be determined, so as to achieve the purpose of screening. It can be avoided that too many factors or fractional factorial experiment was not significant, wasting resources in the optimization test afterward.
Based on the results of previous experiments and literature review, 7 medium supplements including ZnSO 4 , transferrin, putrescine, bovine serum albumin (BSA), ferric citrate, sodium pyruvate, and ethanol amine were selected as factors to evaluate their stimulatory effects on CHO cells under serum-free conditions [10][11][12][13]. Every factor has two levels of concentration: high concentration (+1) and low concentration (−1), given in Table 1. The data were presented from triplicate experiments. Results are considered significant at a value <0.05.
Experiments, based on a Plackett-Burman design, were performed in triplicate in 12-well tissue culture plates containing 1 mL of serum-free media and medium supplements. CHO cells were inoculated at a concentration of 10 5 cells/mL. The cell concentration and viability were determined by the Trypan blue exclusion method after 72 h of culture.

Support Vector Regression
Model. Support vector classification (SVC) and support vector regression (SVR) are the two main categories for SVM. We established the SVR model for prediction, since we tend to get the growth rate of CHO with 7 medium supplements and to recognize the most important components of them. With the established model, we can not only forecast the specific growth rate given factor levels but also pick out the main factors which influence the response values significantly. The SVR builds models depending on a subset of training data, because the cost function for building the model ignores any training data that is close to the model prediction which makes the SVM for regression possible and efficient [14]. As the nutrients and the interactions of them affecting the growth of CHO cells are complex, SVR nonlinear property is much more proper than the conventional linear regression for our problem, in particular the relationship among the elements in the serumfree medium. The basic idea in SVM is mapping input into a higher dimensional feature space via a nonlinear mapping and accomplishing linear regression in this space [15]. Therefore, regression approximation addresses the problem of estimating a function based on a given dataset = {( , ) } , where ∈ is the input vector, ∈ 1 is the targeted output, is the dimension of the input vector, and is the total number of samples. The standard form of support vector regression [16,17] is where is called the tube size of SVM and is the regularization constant determining the tradeoff between BioMed Research International 3 the empirical error and the regularized term. Utilizing the Lagrange multipliers, dualization becomes where = ( , ) ≡ ( ) ( ), ( , ) is the kernel function, ( ) is the high dimensional feature space, which is nonlinearly mapped from the input space , and and * are the introduced Lagrange multipliers. In above -SV regression, the goal is to find a function ( ) which has the most deviation from the actually obtained targets for all the training data and as flat as possible. The standard SVR to solve the approximation problem is described as the following form: After the models are trained by solving the above optimization problems, we evaluate the predictions by outputs mean squared error (MSE): In this study, we considered 12 samples consisting of the different level combinations of seven supplements. Each variable stands for an element, and the response is the specific growth rate of cells. The radial basis function (RBF) kernel ( , ) = exp(− ‖ − ‖ 2 ) was chosen to build the SVR regression. As a result, the parameter must be an appropriate value which significantly influences the prediction accuracy. Based on the above foundation above, it can be implemented as follows to realize the SVR: data scaling, model training, and model prediction. When training the model, parameter choosing seems to play a momentous role, so we illustrate this in the next section in detail. The model was established through LIBSVM, Matlab version 7.10 [18].

Genetic Algorithm for Parameter Selection.
SVM is simplified greatly because of kernel function rather than calculating the complicated nonlinear transformation directly. RBF kernel function was used in this study, so kernel parameter must be chosen properly to make the SVM model a wellperformed one. The penalized parameter , balancing the empirical error and the regularized term, has an influence on the regression model. Plenty of methods have been explored in the parameter selection, such as grid search, particle swarm optimization, and genetic algorithm (GA). GA was employed in our SVM parameter selection for the sake of searching for appropriate parameters in a larger range. GA is a simulation of the biological evolution process in Darwin's natural selection and genetics. It is composed of three basic operations, namely, selection, crossover, and mutation. The parameter selection process is carried out according to the following schema. First, select two original parameters, called parents, and apply the crossover to create two new solutions which are muted (with a given probability). Second, replace the parents by their offspring, new parameters. These two actions repeat for a fixed number of times, so-called generation. Finally, the selected individuals are copied backed to the population to replace the worst ones [19]. These steps repeat until the most suitable parameters are chosen [20].

Feature Selection.
Since we focus on the main supplements in the medium, with the SVR model above, this purpose was reached with mean impact value (MIV) which was proposed by Dombi reflecting the change of weighed matrix in SVM model. We implemented the idea in our study and build MIV-SVR variable selection method. MIV is used to find out the most influential variables to the SVR model. Its absolute value shows the relative importance of corresponding variable in the model. Before searching for the importance, SVR model should be trained well. Then, plus and minus 10% to each variable in the train dataset to make it into two new datasets. New datasets are applied as simulation samples to the model and consequently simulation results are obtained. The difference between the new outcomes is MIV which demonstrates the impact to the outputs.

Plackett-Burman Design.
The nutrients and the interactions of them affecting the growth of CHO cells are complex. To find the key factors, the Plackett-Burman design was used to achieve it. It is a statistical design providing an efficient way of screening a large number of medium supplements and identifying the important factors.
The two-level Plackett-Burman design was performed using seven medium supplements. For purpose of modeling, a value of "1" was assigned to those with high concentrations, and a value of "−1" was assigned to those with low concentrations. We considered the specific growth rate as the response value obtained from the experiments. The design of Plackett-Burman and the response values are shown in Table 2. From the analysis of range, given in Table 3, the ranges of 1 , 4 , and 5 are larger. 1 and 2 are the sum of each factor's results in level 1 and level −1, while 1 and 2 represent the mean value. In addition, is the range of 1 and 2. It suggests that ZnSO 4 ( 1 ), BSA ( 4 ), and Ferric citrate ( 5 ) were important ones influencing the growth of CHO cells. The results of the analysis of variance, given in Table 4, also showed that those factors were the most important ones. Figure 1 represents the growth rate value varying from 0.124 to 0.319, is the growth rate, and the color in the contour means Specific growth rate (d −1 )   3.4. Discussion. The Plackett-Burman design and support vector machine (SVM) make the multifactor experiments consume less time, materials, and labor. Experimental designs, such as uniform design, orthogonal design, and Plackett-Burman, not only reduce the times testing but also predict the optimal case. Mathematical modeling is usually taken into account when dealing with the experimental data. However, the limit of the time and other factors, there cannot be a lot of times testing different combined factors in different levels. It makes the data analysis troublesome, needing alternative modeling methods based on the small sample. The SVM is the data modeling tool which is designed specifically for small sample analysis. As for this experiment, the main goal is to find the optimal factorial composition, so the SVM make it possible to predict, optimize, and figure out the best parameters in the case. As shown in Table 2, the Plackett-Burman design samples and corresponding response were obtained. The cost in experiments has been decreased through reasonable trial design. However, taking SVR modeling into account, the sample size maybe was a little small. Because the trial design covered level combinations needed, all the data were used to train and predict can be a practical strategy. Moreover, two levels were set based on the literature, whether more levels can give different performances to be studied.

The Response Surface Design. Contour line in
As the cell line used here produces a recombinant antibody, it would be important to see the effects that these components have on production. But, the serum in medium will interfere with the antibody measuring. Although cell growth does not always correlate with production, for this CHO variant, growth was shown to be proportional to antibody production in our previous test.
We also noted that the optimized parameters (red area) were focusing on the edge of the contour line Figure 1. That's mean maybe the level range we chose before was not wide enough to get the greatest growth rate. So, another experiment was taken to see whether the growth rate could be raised after being cultured with 1.5 times supplements. Results are shown in Figure 4, and the growth rate was raised after enlarging the level range.

Conclusions
In this work, we employed Plackett-Burman design and SVM to explore the influence of 7 medium supplements on the growth of CHO cells in suspension. Results suggested that statistical design methodology offered an efficient and feasible approach for cells growth optimization. This method showed efficient and reliable property when the SVM gave  the same result. It is a powerful technique in reaching an optimized solution and accurately analyzing the interactive effects of most influential medium supplements. Furthermore, SVR was introduced to the analysis of supplements on CHO cells growth, and it can predict the growth rate for new sample based on the training data. It is easy and appropriate using GA to choose suitable parameters which play momentous role in SVR model. With SVM, trial design, and other statistical methods, experiments for CHO cells become simplified and credible.