Inference of Biochemical S-Systems via Mixed-Variable Multiobjective Evolutionary Optimization

Inference of the biochemical systems (BSs) via experimental data is important for understanding how biochemical components in vivo interact with each other. However, it is not a trivial task because BSs usually function with complex and nonlinear dynamics. As a popular ordinary equation (ODE) model, the S-System describes the dynamical properties of BSs by incorporating the power rule of biochemical reactions but behaves as a challenge because it has a lot of parameters to be confirmed. This work is dedicated to proposing a general method for inference of S-Systems by experimental data, using a biobjective optimization (BOO) model and a specially mixed-variable multiobjective evolutionary algorithm (mv-MOEA). Regarding that BSs are sparse in common sense, we introduce binary variables indicating network connections to eliminate the difficulty of threshold presetting and take data fitting error and the L0-norm as two objectives to be minimized in the BOO model. Then, a selection procedure that automatically runs tradeoff between two objectives is employed to choose final inference results from the obtained nondominated solutions of the mv-MOEA. Inference results of the investigated networks demonstrate that our method can identify their dynamical properties well, although the automatic selection procedure sometimes ignores some weak connections in BSs.


Introduction
Biochemical systems (BSs) consist of many components, which interact with each other in a complex way to act as integrated dynamic systems. Inference of biochemical systems is dedicated to identify how these components interact with each other and helpful to investigate the dynamical properties of these complex systems. In the past decades, the (probabilistic) Boolean networks [1,2], the (dynamic) Bayesian networks [3,4], and some other probabilistic or statistical methods [5,6] have been proposed to confirm connections between components in BSs. Although the probability-or statistics-based models can properly address the negative effects of data noise, they cannot precisely incorporate the dynamical properties of BSs. Because ordinary differential equation (ODE) models that produce directed signed graphs are not only suited for steady-state and time-series profiles but also able to work entirely in classical category [7], they are widely utilized to model various kinds of BSs [8,9].
In biochemical system theory (BST), the S-System model incorporating the pow-law formalism is considered as an effective and consistent mathematical model to represent and analyze the biological systems [10]. Its mathematical formalism is a nonlinear ODE system: where represents the concentration of reaction component and is the total number of components in the investigated network. In the S-System, there are totally (2 + 2) parameters, including the positive rate constants , ∈ R + and the kinetic constants , , ℎ ,j ∈ R, , = 1, . . . , . Although the number of parameters to be determined is relatively large for inference of S-System, it is also employed to reconstruct largescale GRNs [11][12][13], attributed to its powerful approximation of dynamics of biochemical reactions.
Inference of S-Systems is available when there is a time-course experimental data set { ,exp ( ), = 0, . . . , , 2 Computational and Mathematical Methods in Medicine = 1, . . . , } of all components, implemented by minimizing the differences between experimental data and numerical results. To address the ill-posedness of this reverse problem, minimization of the differences is usually normalized and penalized as [14,15] min err (Θ) where ,cal ( , Θ) is the numerical results of at time and is the penalization parameter that is problem-dependent. To make the objective function continuous, (Θ) is commonly taken as Nonlinear model (2) of S-Systems has a complicated landscape, which results in the preference to solve it by evolutionary algorithms (EAs) [16,17]. When evaluating the candidate network parameters in the population of EAs, the ODE system should be solved via some numerical method such as the Runge-Kutta method, which could lead to a computationally-heavy evaluation process. Thus, Tsai and Wang [18] used an allocation method to decouple the ODE system. However, this method introduces an allocation parameter for evaluation of candidate solutions, which is hard to debug for its dependence on the investigated problems and available data sets. Liu et al. [19] developed a separable parameter estimation method (SPEM) to decouple the S-System. But, in their method, the rate constants are numerically determined by the least square method, which could be computationally difficult because it has to compute inverse of a 2 × 2 matrix that is the product of a 2 × matrix and its transpose (the computational difficulty concerns not only the time complexity but also the stability of algorithm for computation of inverse matrices).
Since inference of BSs simultaneously addresses several issues, multiobjective optimization models could be an available alternative for this problem. Liu and Wang [20] proposed a three-objective optimization model simultaneously minimizing the concentration error, slope error, and interaction error and then transformed it to a single-objective optimization problem by converting two objectives into constraints. However, transformation of the multiobjective model to the single-objective model greatly depends on a prior information on network connections of the investigated network. Koduru et al. [21] and Cai et al. [22] simultaneously minimized data error for several different data sets, but they did not try to minimize the network connections/parameter norms to get sparse networks, which makes it more difficult to set a threshold for pruning net connections. To address the defect of model (2) that has to be regulated to ensure that its global optimal solutions do lie around the true network parameters, Spieth et al. [23] took the data error and connection number as two minimization objectives and solve it using a multiobjective evolutionary algorithm. However, they did not work on how to choose an appropriate Pareto solution as the final inference result.
This work is dedicated to address the aforementioned shortcomings in existing works. To eliminate the difficulties of debugging regularization parameters in regularized methods, we construct a biobjective optimization (BOO) model that tries to simulate the dynamical properties of S-Systems by minimizing the error between computed derivatives and estimated slopes, simultaneously driving the network topology as sparse as possible. Meanwhile, fitting of derivatives also makes it possible for decoupling S-Systems without incorporation of extra parameters. For solution of the proposed BOO model, we propose a mixvariable multiobjective evolutionary algorithm (mv-MOEA) in which a candidate network configuration is represented with combination of binary variables indicating network connections and real variables of parameter values. Then, an automatic selection procedure (ASP) is employed to take the final inference results as one from the obtained nondominated solutions of BOO. Because the ASP runs tradeoff between fitting errors and network connections by locating the knee regions on the curves of normalized objective values, it can obtain a preferred sparse network configuration with the absence of a prior information on network connections.
The rest of this paper is organized as follows. Section 2 introduces the inference method proposed in this work. Then, effectiveness of our method is validated by benchmark S-Systems in Section 3. Finally, Section 4 draws the conclusions and presents the future work.

Method
The inference method based on multiobjective evolutionary optimization (IM-MOEO) consists of three parts: the biobjective optimization (BOO) model, the mixed-variable multiobjective evolutionary algorithm (mv-MOEA), and the automatic selection procedure (ASP), which are, respectively, presented in the following.
where represents the vector of ( )/ at all time points, approximated by the five-point numerical formula.
, the slope vector corresponding to parameter vector Θ , is computed via the right part of the th equation for all time points [19].
Computational and Mathematical Methods in Medicine 3

The Mixed-Variable Multiobjective Evolutionary Algorithm.
Because there are at most connections for each component in (1), the BOO model (6) has at most + 1 Pareto solutions (for an -order S-System, values of 0 (Θ ) are restricted in {0, 1, . . . , + 1}). So, no diversity keeping strategy is necessary to obtain a set of uniformly distributed efficient vectors if the population size is set greater than + 1. Meanwhile, the BOO model includes mixed variables and mixed-objectives, which make it difficult to solve it. Thus, we propose a mixed-variable multiobjective evolutionary algorithm (mv-MOEA) for this model. The mv-MOEA employs a respective evolution strategy for discrete and binary variables, which is beneficial to both the global exploration in the mix-variable search region and the local exploitation in the real variable subregion.
When inferring an equation of an S-System of components, mv-MOEA employs a population Pop with assistance of OldPop to accelerate convergence of binary search [24]. Both Pop and OldPop are of size PopSize and separated into the discrete section and real section as Considering that a solution is evaluated by combining the binary and real variables, we also use an archive = [ , ] to save promising network topologies, as well as a of real vectors for the th solution in to promote the search in feasible regions of real variables. The framework of mv-MOEA is described as follows.
Step 1 (initialization). Randomly generate , , and of PopSize individuals and evaluate them. ∀ = 1, . . . , , initialize to be a set of PopSize randomly generated (2 + 2)-dimensional real vectors. By combining with the th binary individual in , evaluate all vectors in via (6). Randomly select an individual in to be = ( , ).
Step 3 (sorting). Sort , = 1, . . . , via their fitting errors (the fitting error refers to the first objective of model (6)) and denote the worst one as , . Compute dominance ranks of individuals in via the Pareto dominance relation. For individuals of the same rank, sort them in ascending order via their 0 -norm (the second objective of (6)).
Step 4 (updating). Set = and update by , = 1, . . . , . Replace with PopSize best individuals in and update by the rest of . is randomly selected from or .
Step 5. If the stopping criterion is not satisfied, go to Step 2; otherwise, output the nondominated solutions and the iteration process ceases.
Then, the binary and real part are, respectively, generated as follows.
(i) The bit-string is generated by the bit-strings of 1 , 2 , and 3 according to the binary recombination strategy proposed in [24].
(ii) The real vector is generated by the DE/rand/1 mutation and the binary crossover strategies [25]. With a probability 2 , three parents are the real parts of 1 , 2 , and 3 ; otherwise, is generated by the real part of 1 and two real vectors randomly selected from .
Finally, we combine and to obtain the candidate solutions = ( , ), = 1, . . . , . ; otherwise, it is selected from . Since the archive is here adopted to guide the convergence, it is updated with respect to the hamming distances. An offspring is employed to update as follows.
(i) If the hamming distance between and is greater than zero for any = ( , ) ∈ , update the archive member = ( , ) by = ( , ) if = ( , ) has a better fitting error. Here, is the archive member with the worst fitting error when the hamming distance between any two archive members is greater than zero; otherwise, it is the worst one of archive members that have repeated bit-string; (ii) otherwise, compare with archive members with repeated bit-strings and replace by it one with the worst fitting error.
In this work, 1 , 2 , and 3 are set to be 0.8, 0.7, and 0.8, respectively. Assume that nondominated solutions are obtained for the th equation. A popular aggregation method is to compute the linear aggregation sum (LAS):

Evaluation of the Obtained Nondominated
for a given weight ∈ [0,1] and select one solution with minimum sum as the inference result of equation . However, the LAS method, which is dedicated to compare the difference of sum between all configurations, focuses on the most acute decrease of sum as connection number increases. As a result, a "too sparse" network is obtained.
To address the merit of LAS method, we would like to take the aggregation product (AP) as the criterion of result confirmation. For a given weight ∈ [0, 1], we select the th solution with as the inference result of equation . Combining inference results of all equations, we then get the overall inference result of the investigated S-System and evaluate its quality by numbers of the true positives (TPs), false positives (FPs), true negatives (TNs), and false positives (FPs). With change of parameter , the Receiver Operator Characteristic (ROC) curve is taken as the illustration of quality for the nondominated solutions of (6) obtained by the mv-MOEA.

The Automatic Procedure for Selecting the Final Inference Results from the Obtained Nondominated Solutions.
Although we can get an inference result for a given , we do not know which value of corresponds to the "right" tradeoff between two objectives of (6). In this work, we would like to propose an automatic selection procedure (ASP) for confirmation of the final inference result. When an inference result of the investigated S-System is obtained for a given weight , we can simultaneously get a sum vector as where 1( , ) = Score1 , and 2( , ) = Score2 , are the respective normalized objective values of the inference result for equation . When sampling in [0, 1], we also get a collection of sum vectors constituting a normalized Pareto front in the objective space.
Since there are only two objectives to be considered, improvement of one objective will lead to deterioration of another. If small improvement of one objective results in a severe deterioration of another, the solutions constitute the so-called knee regions. According to the method proposed in [26], Li et al. suggest to locate the knee region to select one result from the Pareto front, where the angle-based method [27] is employed to seek the knee points on the combined Pareto front. In this method, two adjacent points are incorporated to compute the tradeoff angle of a point, and one with maximum tradeoff angle is selected as the preferred sum vector. Recall the value of corresponding to this sum vector; we can get the final inference results of all equations via (10) and (11) and combining them to get the overall inference result of the investigated S-System.

Results and Discussions
In this section, performances of the IM-MOEO are validated by two investigated S-Systems. To demonstrate the inference precision and the robustness of our method, we first infer an artificial network via noise-free and noisy simulated data. Then, the Ethanol Production System by Yeast is investigated to show how our method obtains a sparse network by focusing on simulation of its dynamical property. When generating simulated data for two problems, we sample 15 time points uniformly in the same time intervals as those investigated in the literatures for comparison.

Inference of an Artificial System by Noise-Free and Noisy
Data. The investigated artificial network S1 is an artificial S-System illustrated in Table 1. Some previous researches Computational and Mathematical Methods in Medicine 5 Table 1: Parameter values of the artificial network S1.  [15,20,[28][29][30][31][32] have been performed on this network to demonstrate efficiencies of S-System inference methods. Thus, we also reconstruct it to validate competitiveness of our method.

Experimental Settings.
To perform a fair comparison with the method proposed in [15], noise-free data are timecourse series generated by sampling at 15 time points for 4 diverse initial conditions. Search regions for the kinetic orders and rate constants are set as [−3, 3] and [0, 10], respectively. Due to the difficulty of multiobjective optimization, we employ a population of 100 individuals and report the obtained nondominated solutions after 2000 iterations. For each equation of this artificial system, the algorithm is independently run for 10 times to obtain a satisfactory inference result. The noise-free, 5%-noise, 15%-noise, and 25%-noise data are generated with the same method proposed in [15].

Inference
where TP, FN, TN, and FP represent the true positive (TP), false negative (FN), true negative (TN), and false positive (FP) predictions of the parameters. Then, the ROC curves are illustrated in Figure 1 for noise-free, 5%-noise, 15%-noise, and 25%-noise data. It is noted that the proposed method can generally identify the network via data of noise-free and 5% noise; however, its performance deteriorates quickly when the noise rate rises to 15%. It could be attributed to the inherent mechanism of our method, that is, incorporation of binary variables for identification of network connections. Since the 0 norm of parameter vectors to be minimized is confirmed by binary variables, the proposed MOEA is dedicated to search the sparse network topologies, which makes it less compatible with the data noise.
Meanwhile, this method is significantly insensitive to the parameter . Thanks to incorporation of the 0 norm instead of the 1 norm of the parameter vectors, the inference results are not sensitive to values of . As a result, 101 uniformly sampled values in [0, 1] of only contribute to several different results of the artificial system. This suggests that if we can appropriately dispose of the difficulty introduced by incorporation of 0 norm in the optimization model, it could  be much more likely to select a preferred nondominated result of the multiobjective optimization via the aggregation method represented by (10) and (11).
By locating the knee region of the objective curve, we can get the final reconstruction results of the artificial system S1. The results again demonstrate that our method is not sensitive to value of , because the knee points are, respectively, obtained for noise-free, 5% noise, 15% noise, and 25% noise data when varies in  Table 2 for four different data sets.
It is shown that both our method and L1-DPSO can obtain the correct network of S1 by noise-free data. Generally, our method can obtain more accurate parameter values for noisefree data, except that parameters of (3) are less precise. This is because concentration of the 3rd component quickly reaches its stationary state, and consequently, less information could be extracted by fitting the time-series data of derivatives.
When it comes to the noise data, the competitiveness of our method is highlighted by the inference results. Our method can always obtain sparse network topologies of S1, even if the noise rate (NR) varies from 5% to 25%. By contrast, it is a mission impossible in the method proposed in [15] to set a uniform threshold value for pruning of network connections, because increase of NR definitely increases 6 Computational and Mathematical Methods in Medicine Table 2: Comparison between IM-MOEO and L1-DPSO for the artificial network S1. fitting errors of the L1-DPSO, which further influences the inference result of the investigated S-System. Although the data noise also lowers the precision of our inference results, incorporation of the biobjective model and the automatic confirmation scheme ensures that a sparse network can always be obtained, and for most cases, sparseness (rate of true connections versus possible connections) of the obtained network is similar to the true network. The superiority is highlighted by the inference result obtained when NR of data is relatively low-our method can correctly identify the correct network topologies via 5%-noise data. Meanwhile, the automatic confirmation of sparse network also meets a problem-sparse network topology is sensitive to data noise. As a result, for 15%-and 25%-noise data, more network connections are wrongly identified by our method.

Identification of the Yeast Fermentation Pathway Dynamics.
As an example of real biochemical networks, the yeast fermentation pathway proposed in [33] is also investigated in this work. Its S-System model contains five dependent variables: glucose ( 1 ), glucose-6-phosphate ( 2 ), fructose-1,6-diphosphate ( 3 ), phosphoenolpyruvate ( 4 ), and ATP It is inferred via a time-course data generated by 10 random initial concentrations, with each initial condition, Although there are several weak connections in the system S2, the mv-MOEA can get the correct network topologies for all equations (it means that, for each equation, the mv-MOEA can obtain one nondominated solution indicating the correct network topology). To demonstrate a highlighted illustration of the nondominated solutions obtained by mv-MOEA, we only illustrate in Figure 2 the ROC curve obtained by sampling in [0, 1]. The ROC curve demonstrates a satisfactory result for inference of S2, which shows that the biobjective optimization model (6) and mv-MOEA can work well for this network. What should be noted is that the ROC  By employing the ASP on the obtained nondominated solutions, we get the final inference result of system S2 included in Table 3. Just for the same reason, the final result obtained by the ASP has got a false connection and missed several weak connections. To evaluate how these wrongly identified connections influence dynamical properties of the investigated S-System, we compare the dynamical curves of all components in one plot illustrated in Figure 3. It is demonstrated that the dynamical properties of obtained network are almost consistent to that of the true network, even if several connections are wrongly identified. Premised on the result of data fitting, IM-MOEO is dedicated to obtain a sparse network. Because all the wrong identifications are weak connections that do not significantly influence the dynamic properties of this network, these weak connections are not correctly confirmed when the ASP gets a tradeoff between data fitting error and network sparseness. As a result, it comes to a sparse network that has a low FPR and a high TPR.

Conclusion and Discussion
To address defects of existing inference methods, we propose a biobjective optimization model for identification of S-Systems, an efficient mix-variable multiobjective evolutionary algorithm to solve the biobjective model, and an automatic selection scheme for confirmation of the final inference result. Although introduction of binary variables and 0norm make the biobjective optimization model harder to solve, the proposed mv-MOEA can deal with it with satisfactory performances. The automatic selection scheme demonstrates to be intelligent in investigated benchmark networks; however, it sometimes misses some weak connections due to its preference to sparse network topologies. In general, the biobjective optimization method accompanied with the automatic selection scheme is a universal method for inference of BSs, because the biobjective model could be applied to BSs of any size, and no problem-dependent parameters are needed for its successful implement. By inferring two widely investigated small-scale networks, we do validate its effectiveness compared with previous researches. However, when applied to large-scale BSs, mv-MOEA could be computationally expensive and perform unsatisfactorily attributed to data noise as well as lack of sufficient samples. Thus, data mining methods should be incorporated to boost its applications on large-scale BSs. Further improvement of this method could be focused on enhancement of its performance on noise data and its applications on large-scale BSs.

Conflicts of Interest
The authors declare that they have no conflicts of interest.