Optimization to the Culture Conditions for Phellinus Production with Regression Analysis and Gene-Set Based Genetic Algorithm

Phellinus is a kind of fungus and is known as one of the elemental components in drugs to avoid cancers. With the purpose of finding optimized culture conditions for Phellinus production in the laboratory, plenty of experiments focusing on single factor were operated and large scale of experimental data were generated. In this work, we use the data collected from experiments for regression analysis, and then a mathematical model of predicting Phellinus production is achieved. Subsequently, a gene-set based genetic algorithm is developed to optimize the values of parameters involved in culture conditions, including inoculum size, PH value, initial liquid volume, temperature, seed age, fermentation time, and rotation speed. These optimized values of the parameters have accordance with biological experimental results, which indicate that our method has a good predictability for culture conditions optimization.


Introduction
Phellinus is a kind of fungus having great medicinal value, since it is known as one of the elemental components in drugs with functions of avoiding cancers [1,2]. Phellinus flavonoids are one of the most popular parasitifers of Phellinus in nature [3], and the research on Phellinus focuses on polysaccharides, proteoglycans medicinal mechanism, composition, and so forth, which are mostly extracted from the fruiting bodies of Phellinus flavonoids [4]. Phellinus rarely exists in the wild environment [5], and it becomes a promising research branch to cultivate it in the laboratory. With mycelial growth by liquid fermentation, the fermentation broth flavonoids, polysaccharides, alkaloids, and other active substances can be produced, which have high level physical activity, short fermentation period, and mass productions, thus providing a possible way of producing Phellinus in the laboratory [6]. In recent years, updated machine learning approaches (see, e.g., [7,8]) have been developed and applied in biological data processing.
From the understanding of the wild conditions of Phellinus, it is believed that PH value, temperature, and fermentation time have effect on the productions. Also, in general biochemical experiments, we need to consider the inoculum size, initial liquid volume, seed age, and rotation speed. In the laboratory, plenty of experiments have been designed and operated for maximizing the Phellinus production. The methods can be separated into two major groups.
(i) With biological technologies: it used optimum media on mycelial growth of Phellinus in [9] and liquid fermentation technology to cultivate Phellinus in [10]. Active ingredients in Phellinus and polysaccharide metabolism regulation are designed in [11].
(ii) With mathematical models: some researches focus on building mathematical models for the progress of producing Phellinus by differential equations [12], metabolic path and network [13], and complex network models [14].
Artificial algorithms and models have been used in the bioprocess, particularly for the optimization of culture conditions. In [15], artificial neural network (ANN) is used to optimize the extraction process of azalea flavonoids. Neural networks combined with evolutionary algorithms have been used to optimize the experimental environment, such that neural network and particle swarm optimization method were used for finding optimized culture conditions to maximize the production of Pleuromutilin from Pleurotus mutilus in [16]. Recently, with the increment of biological data, regression analysis becomes a useful tool for the data analysis. In [17] the method of fitting models to biological data using linear and nonlinear regression is proposed, where some multivariate statistical analysis strategies from [18,19] are formulated to be helpful and useful for biologists. These results give us hints of using regression analysis and artificial algorithms to optimize the culture conditions for Phellinus production. And, to the best of our knowledge, few work focuses on the optimization of culture conditions to maximize the production of Phellinus in the laboratory.
In this work, we start from operating 45 experiments for producing Phellinus from Phellinus flavonoids with different culture conditions, involving parameters PH value, temperature and fermentation time, inoculum size, initial liquid volume, seed age, and rotation speed. With the data collected during the experiments, we use regression analysis method to create a mathematical model, which can forecast the flavonoid yield and the most important element to the production of Phellinus. After that, a gene-set based genetic algorithm (GA) is proposed to optimize the culture condition, where the obtained mathematic model is used as fitness function for the evolution of individuals. Data experimental results show that predicted optimal values of the parameters have accordance with biological experimental results, which indicate that our method has a good predictability for culture conditions optimization.

Data Collected from Experiments
In this section, biological experiments are performed for finding optimal value of certain single factor.
In Table 1, experiments are operated for collecting data. In rows 1-14, it is associated with experiments with PH values ranging from 1 to 14, where the temperature is fixed to 28 ∘ C, initial volume is set to be 100 mL, the rotation speed is 140 r/ m, and seed age is 8 days. Rows 15 to 20 are 6 experiments with initial volume ranging from 40 mL to 140 mL, where PH value is set to be 6, the best one obtained from experiments with PH values ranging from 1 to 14.
In Table 2, experiments with including inoculum ranging from 2% to 16% and temperature ranging from 25 ∘ C to 40 ∘ C are performed. In Table 3 the situations on experiments with fermentation time ranging from 1 to 12 hours are shown. From the in total 45 experiments, we collect data of culture conditions for production of Phellinus. Different culture conditions have a fundamental influence on the production of Phellinus, but the optimized culture conditions remain unknown.

Methods
We consider here using regression analysis and gene-set based genetic algorithm to find the optimized culture conditions for maximizing the production of Phellinus. In general, we convert the data collected in Section 2 to construct a mathematical model by regression analysis. And then, the obtained model can be used as fitness function for optimizing the culture condition with gene-set based genetic algorithm.

Regression Analysis.
In statistical modeling, regression analysis is a statistical process for estimating the relationships among variables [20]. Regression analysis is one of the extremely versatile data analysis methods, which is appropriated to establish dependencies between variables based on observational data and widely used to analyze the data inherent law and to predict the result. Regression analysis can be divided into linear regression and nonlinear regression analysis [21], according to the type of relationship between the independent variables and dependent variables. In general, the relationship between variables is determined by the independent variables and dependent selected variables, by which regression models can be made. After that, it is used to solve the various parameters of the model based on the measured data and then evaluate whether the regression model can fit the observed data. If the model can fit the data well, then the model can be used to further predict based arguments [22]. The regression analysis is composed of the following steps [23,24].
Regression analysis is widely used in data mining, particularly for biological data analysis in recent years, with the purpose of finding a feasible statistical law by the large amount of data of experiments. The general process is given as follows.
Step 1. Determine the variables.
Step 2. Establish the prediction model.
Step 4. Calculate the prediction error.
Step 5. Determine the predicted value.
From the data collected in Section 2, it consists of seven independent variables and one dependent variable. The seven independent variables are inoculum size, PH values, initial liquid volume, temperature, seed age, fermentation time, and rotation speed. And, the dependent variable is flavonoid yield. From the observation of the experiments, it is found that some culture conditions are not suitable for production of Phellinus. These data are taken as extreme data are removed from regression analysis. Extreme data refers to the data which were measured in extreme experimental environment. Also duplicate data were cancelled. Only the following data are selected in regression analysis.
After data filtering, a statistical model is made to represent these data. It is known that there is a correlation between these data relationships, so we applied linear regression analysis to fit them. At this stage, a lot of models were tested one by one with IBM SPSS software and response surface methodology. The statistical model is = 1 * 2 (1) + ⋅ ⋅ ⋅ + 7 * 2 (7) + 1 * (1) (2), . . . , (7) are the seven independent variables associated with inoculum size, PH value, initial liquid volume, temperature, seed age, fermentation time, and rotation speed, respectively, and , , and are real numbers.
Although the relationship between the data may not be linear, we can put squared term for a type of data into these data. If this term is useful it will be retained after linear regression analysis; otherwise, the data will be deleted.
In the regression analysis, it needs to focus on the values of -squared and the significance of correlation coefficients for regulating the model. We use the regression analysis tools in the IBM SPSS, setting regression coefficients as estimated ( ) and selecting the display model fit ( ). Set the stepping method criteria as use of probability , entry ( ) as 0.5, and removal ( ) as 0.10. After regression analysis, we can get the results as shown in Table 4.
It is obtained that significance = 0.006 < 0.05; that is, the regression results are obvious. -squared value is 0.88, which means that the model is valid for fitting the 88% data. We get the statistical model:

Gene-Set Based Genetic
Algorithm. Genetic algorithm (GA) was first proposed by J. Holland in 1975 [25,26], whose general process is shown in Figure 1. In the mutation operation, if a short segment is selected in a mutation possibility and replaced by another segment, then the geneset based GA is achieved [27].
In gene-set based GA, a chromosome is treated as a set of gene-sets, instead of a set of genes as in classical GAs. It starts with gene-sets of the largest size equal to half the chromosome length. It is most appropriate to genetics model because each gene-set represents a set of adjacent parameters of certain factor of the culture conditions.
It is noted that, in the selection, only the winning individuals from the population can be selected. Select operators are also known as reclaimed operator (reproduction operator), whose purpose is to optimize the selection of individuals (or solutions) to the next generation. Population can be updated by fitness ratio method and random sampling method to traverse, local selection. Cross operator refers to the part of the structure of the two parent individuals to generate new recombinant replacing individual operation. Variation is to make GA have local random search capability. When the GA crossover neighborhood is close to the optimal solution, the use of such a mutation operator of local random search capability can accelerate the convergence to the optimal solution.
The statistical model obtained by regression analysis is used as the fitness function here, and gene-set based GA is used to optimize the culture condition for maximizing the production of Phellinus. The data simulation is achieved by gatool in MATLAB. In the data experiments, we use a binary string composed of 7 segments to represent an individual in GA population, where each segment is associated with the value of one of the 7 parameters for the culture condition. Initial population size is 50, and cross rate is set to 0.8. Mutation rate is set to be 0.01, and selection method is roulette wheel selection. If the time is long enough then the GA process will halt by meeting the stopping conditions, such as generations limit or fitness limit.
After 156 iterations the gene-set based GA process returns the best individual and shuts down the process in Figure 2.
After the regression analysis and GA process, an optimized culture condition is obtained, shown in Table 5.
The results obtained by our method have accordance with experimental experience in literature of Phellinus growth environmental studies. Specifically, the suitable environment is neutral acidic environment, about PH value 6. The appropriate temperature range is from 22 ∘ C to 28 ∘ C [10]. Seed age and fermentation time of species vary due to the strain [3,28,29]. These optimized values of the parameters have accordance with biological experimental results, which indicate that our method has a good predictability for culture conditions optimization.

Conclusion
In this work, 45 experiments are firstly operated for collecting data related to the production of Phellinus from Phellinus flavonoids. We use regression analysis method to create a mathematical model with the collected data, and then a gene-set based GA is proposed to optimize the culture condition, where the obtained mathematic model is used as fitness function for the evolution of individuals. In the comparison results, it is believed that PH value is credible and  the temperature is also within the appropriate temperature range. Taking into account environmental factors in the laboratory, the temperature value we predicted is also reliable. The seed age and fermentation time predicted are 9, close to the original data 8. Data experimental results show that predicted optimal values of the parameters have accordance with biological experimental results, which indicate that our method has a good predictability for culture conditions optimization. Neural-like computing models, such as artificial neural networks [30], spiking neural networks [31], and spiking neural P systems [32][33][34], have been successfully used in pattern recognition and engineering practice. It is of interest to use these neural-like computing models for optimizing culture conditions for Phellinus production. Our work would also guide for the "Precision Medicine" with personal SNP data [35] and other tasks in bioinformatics [21,22].