Parameters Optimization and Application to Glutamate Fermentation Model Using SVM

Aimed at the parameters optimization in support vector machine (SVM) for glutamate fermentation modelling, a new method is developed. It optimizes the SVM parameters via an improved particle swarm optimization (IPSO) algorithm which has better global searching ability. The algorithm includes detecting and handling the local convergence and exhibits strong ability to avoid being trapped in local minima.The material step of the method was shown. Simulation experiments demonstrate the effectiveness of the proposed algorithm.


Introduction
Glutamate fermentation is a complex microbial growth process.Glutamate bacterium draws raw material of nutrition and produces complex biochemical reactions in vivo specific enzyme [1].The reaction process is highly nonlinear, timevariant, and uncertain.It is very difficult to establish dynamic model in the application of fermentation control [2,3].
Modeling the glutamate fermentation process under normal or abnormal conditions is major challenge as an optimization strategy is applied from the initial stage to the final stage.It is difficult to measure online substrate, biomass, and product concentrations; hence pH, dissolved oxygen (DO) concentration, and CO 2 production are usually utilized in the bioprocess analysis.The process variables provide indications of the bioprocess condition.However, the information density of these data is usually low, and the multidimensional nature of the data usually makes it difficult to understand.
Support vector machines (SVMs), developed by Boster, Guyon, and Vapnik (1992), are a kind of machine learning method based on statistical learning theory [4][5][6][7].It has become a hot research in the field of machine learning.SVM overcomes commendably such defects as dimensionality curse and overfitting that are apt to appear in some other conventional algorithms, such as neural networks [8,9].But there exists a problem in the practical application of SVM.This problem is how to select some SVM parameters so that the performance of SVM can be maximized.These SVM parameters mainly include the penalty constant , the relaxation factor , and the parameters in kernel function (e.g., the width in the RBF kernel function), and they affect the SVM performance more or less.Most existing approaches use leave-one-out (LOO) [10], gradient descent (GD) [11], genetic algorithm (GA) [12], particle swarm optimization (PSO) [13], and related parameter estimators.There does not exist a single universally effective method to select these SVM parameters.Generally, cross verification trial is used, but it involves ad hoc user interference factors or requires that the kernel function should be continuously differentiable, and the resulting SVM classifiers are prone to falling into local minima.
In order to overcome the shortcomings of the existing parameter selection approaches mentioned above, an attempt is made to jointly optimize the feature selection and the SVM parameters with some evolutionary algorithms such as improved particle swarm optimization (IPSO) algorithm, in hopes of improving the SVM performance in glutamate fermentation.This paper describes an IPSO algorithm which has better global search ability and then focuses on optimizing the parameters which can be used in the prediction of SVM modeling and state of glutamate fermentation.
The paper is organized as follows.In Section 2, the SVM with mixed kernel function is briefly reviewed.Section 3 describes the standard PSO algorithm.Then in Section 4 we present the new IPSO technique.The simulations results are provided in Section 4. Finally, concluding remarks are drawn in Section 5.

Mixed Kernel Function
The aim of this section is to introduce the notations and to review the concepts that are relevant to the development of the proposed model parameter selection method.
The development of SVM starts from the simplest case of two classes that is linearly separable.Its basic mathematical model can be given using the following training sample set: where {  ()}  =1 is the data in features space and {  ()}

𝑖=1
and  are coefficients.They can be estimated by minimizing the regularized risk function where   (, (  )) is the so-called loss function measuring the approximate errors between the expected output   and the calculated output (  ) and  is a regularization constant determining the trade-off between the training error and the generalization performance.The second term, (1/2)‖‖ Using the duality principle, (3) could be changed to max Although the nonlinear function  is usually unknown, all computations related to  can be reduced to the form ()  (), which can be replaced with a so-called kernel function (  ,   ) = (  ) ⋅ (  ) that satisfies Mercer's condition.Then, (1) becomes In ( 5), Lagrange multipliers   and  *  satisfy the equalities: Those vectors with   ̸ = 0 are called support vectors, which contribute to the final solution.
The kernel functions are used to project the sample data into a high-dimensional feature space and then to find the optimal separation plane in it.Kernel functions used by SVM can be divided into global and local kernels.This paper selects two polynomial kernel functions and RBF kernel function to produce a hybrid kernel function given by where  poly = ( ⋅   + 1) 2 is the polynomial kernel function; is the RBF kernel function;  (0 ≤  ≤ 1) adjusts the sizes of two kernel functions.
The parameters in the mixed kernel function SVM such as  and the width coefficient in the kernel function (,   ) exert a considerable influence on the performance of SVM.A large or small value of  may lead to degraded generalization ability of SVM.The value of  indicates the error expectation in the classification process of the sample data, and it affects the number of support vectors generated by the classifier, thereby affecting the generalization error of the classifier.If the value of  is too big, the separating error is high, the number of support vectors is small, and vice versa [14].The parameters in the kernel function reflect the characteristics of the training data, and they also affect the generalization of SVM.Therefore, only after the choice of all these parameters is correctly made can the SVM achieve its best possible performance.

Improved Particle Swarm Optimization (IPSO)
The standard PSO algorithm has fast convergence speed, but the speed of particle gets more and more slowly later [15,16].
A particle updates its velocity and location according to the following formula: where  ()  and  (+1)  represent the present and next velocity of particle  and  ()   and  (+1)  are the present and next position of particle .
In order to reduce the possibility in the evolutionary process of particles leaving the search space, the particle velocity is usually limited to a certain range   ∈ [− max ,  max ];   is the best previous position of particle ;   is the best position among all particles in the population  = [ 1 ,  2 , . . .,   ];  1 and  2 are random real numbers in the range of [0, 1];  1 and  2 are acceleration constants;  is the internal weight coefficient (i.e., the impact of the previous velocity of particle on its current one).
Because it is easy for the standard PSO to fall into local minima, which is called the phenomenon of being premature [17][18][19], improved particle swarm optimization algorithm is proposed which is the theory of detecting premature convergence.This algorithm combines the method of analyzing and treating premature convergence to avoid the premature phenomenon throughout the whole algorithm.The whole algorithm process is shown in Figure 1.

Premature Detection.
The literature [20] pointed out that when the particle swarm has premature convergence, the particles in the swarm will appear as "aggregation" phenomenon, and the position of the particles determines the adaptability of particles, so the state of particle swarm can be tracked by the overall changes in the fitness of all particles.
The number of particles in the particle swarm is assumed to be ,   is the fitness of  particle,  avg is the mean fitness, and  2 is the particle swarm colony fitness variance defined as where  is a normalized scaling factor, and it can restrict the size of  2 .The value of the  is determined using the following formula: Equation (9) shows that the variance of the colony fitness is the reflection of all particles in the swarm "aggregation" degree.The smaller the  2 , the greater "aggregates" level particle swarm is.If the algorithm does not meet the termination condition, the "aggregation" will enable the group to lose diversity in early state.A premature detection can be made when  2 <  ( is a given constant).

Premature Treatment.
For premature state of the particle, the method combined by chaos algorithm and particle swarm algorithm can set up the position and velocity of particles and make it jump out of local minima.Use the classical logistic equation to achieve chaos sequence via where   is the random number uniformly sampled from (0, 1);  is control parameter, when it takes the value of 4, the Logistic equation is in complete chaos [13].Then, it will be introduced into the optimization space by the following formula: where [  ,   ] is the range of variables.A global optimal solution has no practical significance to the algorithm when particle swarm runs into local optimum.We can put forward the renewal equation of particle premature condition after combining (11) and (12): where [− max ,  max ] is the range of particle velocity.Thus, the particle can jump out of local optimum and back into the particle swarm optimization iteration.

Optimization Algorithm.
In order to improve the precision and ability of generalization of the mixed kernel function, the following variance function can be used for the fitness function of IPSO algorithm which can respond to SVM regression performance directly.Consider where   is the predictive value,   is the measured value, and  is the number of samples.The flow diagram of optimization algorithm is shown in Figure 2. The detailed process of modeling is as follows.Step 1. Initialize particle swarm parameters (, , ), swarm size, and the maximum number of iterations and determine the weight factor  and particle swarm flight speed range [− max ,  max ].Set a loss function parameter  of SVM and judgment criterion of global convergence and premature convergence.
Step 2. The individual value of each particle  best is set to the current position, and then fitness is calculated for each particle, the fitness value of individual extreme corresponding to the best particle as the global extreme initial  best .
Step 3. Judging the convergence criterion is satisfied.If the algorithm is satisfied, then execute Step 10; otherwise go to Step 4.
Step 4. Use (2) (the standard PSO algorithm) to execute the iterative calculation and update the position and velocity of particles.
Step 6. Compare the updated ( best ) and ( best ), and if ( best ) < ( best ) then update  best .
Step 7. Determine whether the convergence criterion is satisfied.If it is satisfied, then execute Step 10; otherwise go to Step 8.
Step 8. Evaluate (3) and (4) to calculate the variance  2 of the population's fitness and then judge whether the establishment of  2 < , and if established turn to Step 9 for premature treatment; otherwise turn to Step 4.
Step 9.According to (7) for premature treatment of falling into local optimal particle, the particle swarm escapes from local optima; then turn to Step 3.  Step 10.Output the particle swarm optimal value, and the algorithm terminates.

Modeling and Simulation
4.1.Function Fitting Simulation.In order to test the effectiveness of the algorithm, one-dimensional function is selected to simulate where  is the Gaussian noise with zero mean and variance of 0.1.In order to minimize the fitness function and optimize the mixed kernel function of SVM parameters using IPSO algorithm, we take 50 sets of data to constitute a hybrid kernel function training sample of SVM in the input variable domain.Among them, mixed kernel function SVM uses an insensitive loss function which is  = 0.1; ,  and  are initialized, respectively, in [0, 1], [0.01, 1.0], and [1,1000]; the population size is 20, and the premature judgment constant  is 1.The maximum number of iterations is 50 times, and ( best ) ≤ 10 −3 which is the fitness value as global convergence.The simulation results are given in Figure 3 and Table 1.
The simulation results show that the mixed kernel function of SVM model has a high accuracy constructed by IPSO  4 and 5.
Figure 4 shows the changing behavior in glutamate production from 6 glutamate fermentation experiments.The production of glutamate increased in a nonlinear way over the fermentation period from 2 h to 38 h; glutamate increased very slowly in the early period during 2-5 h.It increased quickly from 20 to 60 g/L in 9-20 h and then increased slowly after 20 h.After 34 h, the production of glutamate kept stable at 70-75 g/L. Figure 5 shows the changing behavior of residual sugar concentration in glutamate fermentation experiments.The early period (usually the former 5 hours) is bacteria growth stage and the sugar concentration is higher.About 9 hours later fermentation turned into acid producing period, bacteria growth slowing down and a large number of glutamic acids accumulating.Most of sugar in the culture medium has been burned off, requiring a continuous consumption of nutrients.
In order to maintain stable environment for the growth of bacteria and prevent severe changes in sugar concentration which can disturb bacteria physiological metabolism, we can take a fed-batch flow rate strategy and keep the glucose concentration on a constant level, referring to the estimate results.
The average prediction error for glutamate concentration is 2.61%.The average prediction error residual sugar concentration is 3.82%.It can then be seen that this method has good modeling effect.

Conclusion and Future Work
In this paper, a new IPSO-based optimization algorithm was used to select the optimal parameters for the mixed kernel function of SVM.The simulation results show that the mixed kernel function-based SVM with optimally chosen parameters is accurate and reliable, and the optimum mixed kernel function of SVM model based on structural parameters has better learning precision.Regularization constant  mix : Hybrid kernel function  poly : Polynomial kernel function  RBF : RBF kernel function  ()   : Present velocity of particle   (+1)  : Next velocity of particle   ()   : Present position of particle   Gaussiannoise.
Smits and Jordaan studied representative mapping properties of global kernel function (polynomial kernel function) and local kernel function (RBF kernel) and proposed a mixed kernel function.It was shown that the mixed kernel function-based SVM has strong learning and generalization ability.

Figure 5 :
Figure 5: Predictive result of residual sugar concentration.

Table 1 :
Performance comparison of parameters selection of SVM by different methods.