Distribution-Based Identification of Yield Coefficients in a Baker ’ s Yeast Bioprocess

A distribution-based identification procedure for estimation of yield coefficients in a baker’s yeast bioprocess is proposed. This procedure transforms a system of differential equations to a system of algebraic equations with respect to unknown parameters. The relation between the state variables is represented by functionals using techniques from distribution theory. A hierarchical structure of identification is used, which allows obtaining a linear algebraic system of equations in the unknown parameters. The coefficients of this algebraic system are functionals depending on the input and state variables evaluated through some test functions from distribution theory. First, only some state equations are evaluated throughout test functions to obtain a set of linear equations in parameters. The results of this first stage of identification are used to express other parameters by linear equations. The process is repeated until all parameters are identified. The performances of the method are analyzed by numerical simulations.


Introduction
A process carried out in a bioreactor can be defined as a set of m biochemical reactions involving n components with n ≥ m . The global dynamics can be represented by following the dynamical state-space model 1 : where ξ ∈ n×1 . This model describes the behaviour of an entire class of biotechnological processes and is referred to as the general dynamical state-space model of this class of bioprocesses 1, 2 . In 1.1 , the term K · ϕ ξ, t is the rate of consumption and/or production of the components in the reactor, that is, the reaction kinetics and the term −Dξ F − Q represents the exchange with the environment, that is, the dynamics of the component transportation through the bioreactor. The strongly nonlinear character of the model 1.1 is given by the reaction kinetics. In many situations, the yield coefficients, the structure, and the parameters of the reaction rates are partially known or unknown. Many of the evolved control methods for these kind of systems-like model predictive control 3 , robust or adaptive control 4, 5are based on good initial estimates of the yield parameters. This paper deals with the yield coefficients identification of baker's yeast bioprocess.
In recent years, a progress has been made in the area of continuous-time system identification. Even if the most physical systems are naturally continuous, a much more attention has been paid to parameter estimation of discrete-time systems, mainly because they are better suited for numerical implementations. Continuous-time identification makes possible a more direct link to the physical properties and operation of the underlying systems 6, 7 and the direct estimation of physical parameters which have a clear significance 8 . One important direction in continuous-time system identification is to transform the system of differential equations to an algebraic system that reveals the unknown parameters. By using some measures, the direct computation of the input-output data derivatives can be completely avoided. For linear system identification, several methods are reported on this direction: identification based on the Laplace transformation and then use the Laguerre filter 9 or transforming the continuous-time system to the frequency domain 10 .
A novel approach to identify the continuous-time system is the distribution-based approach, using deterministic distributions 11 or random distributions 12 . Through these approaches, derivatives are described in the sense of distribution theory and one constructs the input-output algebraic relationships using differential information produced in the distribution sense.
Identification of nonlinear continuous-time systems is far away more complicated. The traditional procedures are based on the Volterra functional series, expressed in time domain or frequency domain 7 . The parameter identification of deterministic nonlinear continuous-time systems NCTSs , modelled by polynomial-type differential equations has been considered by numerous authors 13, 14 . They use the modulating functions as a sum of sinusoids, whose fundamental period has a fixed length, to convert differential equations into an algebraic form in the frequency domain. These methods can be applied only if the NCTS is exactly integrable. This means that all the terms can be written as pure derivatives of some computable function of the measured signals. The computational burden in applying the method of modulating functions is overtaken by using Fast Fourier Transform algorithms.
In this paper we propose an identification method for a class of NCTS considering that the unknown parameters can appear in rational relations with measured variables. Using techniques utilized in distribution approach the measurable functions and their derivatives are represented by functionals on a fundamental space of testing functions. Such systems are common in biotechnology 1, 15 . It is supposed that the system is described by state equations and all state variables are measurable or they can be estimated using some state estimators 16 .
The most important approaches for the yield coefficients estimation make use of the state transformations based on the general structure 17-19 . The main idea of this paper is to use a hierarchical structure of identification to estimate the yield coefficients in the baker's yeast bioprocess. First, some state equations are utilized to obtain a set of linear equations in some parameters. The results of this first stage of identification are used for expressing other parameters by linear equations. This process is repeated until all parameters are identified.

Mathematical Problems in Engineering 3
The paper is organized in the following way. The nonlinear dynamical model of a baker's yeast process is given in Section 2. Section 3 presents the problem statement of continuous-time systems identification based on distribution approach and the hierarchical structure of yield coefficients identification. Section 4 is dedicated to the analysis of properties of the identification algorithm. Some experimental results are presented in Section 5, and conclusions in Section 6.

Nonlinear Dynamical Model of Baker's Yeast Process
The yeast growth has already been the subject of numerous studies and papers. The reaction scheme of baker's yeast bioprocess is 20 In the reaction schemes 2.1 -2.3 , S is the glucose, X the biomass, C the dissolved oxygen, E the ethanol, and G the dissolved carbon dioxide. The first reaction scheme represents the respiratory growth on glucose; the second reaction scheme represents the fermentative growth on glucose, and finally the third reaction represents the respiratory growth on ethanol. μ o s , μ r s , and μ o e are three specific growth rates that reflect the capacity of the baker's yeast to exploit three catabolic pathways for energy and material sources.
The dynamical model of the fed-batch bioprocess with the reaction schemes 2.1 -2.3 can be obtained by using the mass balance of the components, considering that the bioreactor is well mixed, the yield coefficients, are constant and the dynamic of the gas phase can be neglected 20 : The dynamical model is expressed as a set of differential equations, in terms of the concentrations of components, and with the additional 2.9 , which represents the dynamics of the solution volume V in the bioreactor. In 2.4 -2.9 , k i , i 1, 9 are the positive yield coefficients, S in is the glucose concentration on the feed, OTR is the oxygen rate, defined as OTR K L a C * − C where K L a is the mass transfer coefficient and C * the equilibrium concentration of the dissolved oxygen , CTR is the carbon dioxide transfer rate defined as CTR K V K L aG, with K V a transfer coefficient , F in is the input feed rate, and D F in /V is the so-called dilution rate.
The model 2.4 -2.8 can be expressed in the matrix form:

2.10
The next notations will be used: the state vector is ξ the vector of reaction rates the reaction kinetics is Φ ξ μ o s X μ r s X μ o e X T ; K is the matrix of yield coefficients, F 0 DS in 0 OTR T is the vector of feeding rates, and Q 0 0 0 0 CTR T is the vector of rates of removal of the components in gaseous form. The most difficult task for the construction of the dynamical model is the modelling of the reaction kinetics. The form of kinetics is complex, nonlinear, and in many cases unknown. The reaction rates can be expressed as where H ξ is a state-dependent diagonal matrix, whose elements correspond to the reactions' reactants. The specific growth rates μ ξ, t are considered as a vector of time-varying parameters. For the baker's yeast bioprocess the form of matrix H ξ is very simple: Then the model 2.10 becomes For the reaction rates of the baker's yeast bioprocess there are many possible models 21, 22 , and so forth. However, the process of yeast growth on glucose with ethanol production is generally described by the following three metabolic reactions, widely analyzed by Sonnleitner and Kappeli 20 .

Mathematical Problems in Engineering 5
First, the reaction rate of the respiratory growth on glucose and the associated specific growth rate are where the kinetic terms associated with the glucose consumption μ s and with the respiratory capacity μ c max are modelled by Monod-type laws: μ s q s max S/ S K s , μ c max q c max C/ C K c , with q s max and q c max the maximal values of the specific growth rates of glucose and oxygen, and K s and K c the saturation parameters for glucose and oxygen uptake, respectively. Second, the reaction rate of the fermentative growth on glucose and the associated specific growth rate are Finally, if the oxidation capacity is sufficiently high to oxidize both ethanol and glucose, then their consumption is possible. Then, the reaction rate of the respiratory growth on ethanol and the associated specific growth rate are where μ e is the potential ethanol oxidative rate, modelled by Monod-type law: μ e q e max E/ E K e , with q e max the maximal value of the specific growth rates of ethanol, and K e the saturation parameter for growth on ethanol. The above kinetic model is based on the well-known bottleneck hypothesis developed by Sonnleitner and Kappeli 20 . This model assumes a limited capacity of yeast, leading to the production of ethanol under conditions of oxygen limitation and/or high glucose concentration. The glucose concentration that corresponds to the oxidative capacity is reached when μ s μ c max /k 5 , and it is called critical concentration. Depending on this critical concentration, two operating regimes can be observed. At glucose concentration lower than its critical value, the system is in the respirative regime. The glucose consumption rate is smaller than the maximal respiratory capacity μ s ≤ μ c max /k 5 and the rate of the oxidative glucose metabolism is given by the glucose consumption rate. Ethanol can be oxidized in parallel with glucose and the rate of oxidative ethanol metabolism is determined by the excess of respiratory capacity and the available ethanol. At glucose concentration higher than its critical value, the system is in the so-called respirofermentative regime. Then the glucose consumption rate is larger than the maximal respiratory capacity μ s > μ c max /k 5 and the respiratory capacity establishes the rate of the oxidative glucose metabolism. The excess of glucose is processed by the fermentative metabolism. Under lack of oxygen conditions, the fermentative pathway is predominating.
Since there are three unknown kinetics, the implementation of some state observers such as asymptotic observers require the online measurements of three state variables. Also, the online estimation algorithms for kinetic parameters e.g., observer-based techniques necessitate the knowledge of minimum three states. From the technological point of view, E, C, and G are the online available state variables, so it is important to design the estimators by using these measurements. Unfortunately, according to the yield coefficient values, E, C, and G are linearly dependent, with the corresponding yield matrix being ill conditioned 23 . This would result in sensitive algorithms to numerical errors, thus having performances degraded. To solve this problem, Pomerleau and Perrier 23 suggested a reformulation of model 2.10 . This reformulation is based on the division of the complete process model 2.10 into two partial models, corresponding to the above-described mechanisms: the respirofermentative partial model for the ethanol production state of the process and the respirative partial model, corresponding to the ethanol consumption state of the process.
The fermentative partial model is stated as

2.17
The respirative partial model is stated as 2.18

Problem Statement
In this approach the set of nonlinear differential equations describing the state evolution is mapped into a set of linear algebraic equations with respect to the model parameters. Using techniques utilized in distribution approach, the measurable functions and their derivatives are represented by functionals on the fundamental space of testing functions 14 . The main advantages of this method are that a set of algebraic equations with real coefficients results and the formulations are free from boundary conditions. The idea of utilizing test functions in system identification was proposed by Pearson and Lee 13 in terms of modulating functions in order to ameliorate the noise handling for deterministic least-squares identification based on time-limited data. Let us denote by Ω n the fundamental space from the distribution theory 24 of the real functions ω : → , t → ω t with compact support T , having continuous derivatives at least up to the order n. Let q : → , t → q t be a function which admits a Riemann Mathematical Problems in Engineering 7 integral on any compact interval T from . Using this function, a unique distribution or generalized function can be built by the relation: In distribution theory, the notion of k-order derivative is introduced. If F q ∈ Ω n , then its k-order derivative is a new distribution F k q ∈ Ω n uniquely defined by the relations: is the k-order time derivative of the test function.
When q ∈ C k , then that means the k-order derivative of a distribution generated by a function q ∈ C k equals the distribution generated by the k-order time derivative of the function q. So, in place of the states and their time derivatives of a system one utilize the corresponding distributions and, in some particular cases, it is possible to obtain a system of equations linear in parameters. If the system is compatible the model parameters are structurally identifiable.
In our study, it were utilized three types of test functions characterized by a bounded support T t a t b , t a < t b , all of these accomplishing the condition: The nonzero restriction is one of the following three types.
1 Exponential:  2 Sinusoidal: where p ≥ 2 is an integer. Figures 1 and 2 present the exponential type test function and its first-order derivative for T 0.1 0.9 .
Mathematical Problems in Engineering 9

The Hierarchical Structure of the Identification Algorithm
Consider all state variables accessible for measurements. This assumption can be considered unrealistic but the unknown states can be estimated using different approaches. From the multitude of proposed solutions one can mention methods based on balance equations, observers asymptotic observers, high-gain observers 5 , Kalman filters 25 , neural networks 26 , and fuzzy reasoning 27 . The dynamical systems 2.17 and 2.18 contain rational dependences between parameters and measured variables. To obtain linear equations in unknown parameters, the identification problem is split in several simpler problems 28 . Based on the specific structure of these systems, it is possible to group the state equations in such way to determine some interconnected relations. They are organized in a hierarchical structure. In the first stage some state equations are utilized to obtain a set of linear equations in some parameters. If these parameters are identifiable then they can be used as known parameters in the following stages. This process is repeated in the other stages until all the parameters are identified.
Because the dilution rate D can be externally modified, it will be considered the first component of the input vector and the other component of u is the concentration S in : u u 1 u 2 D S in . From the third and fourth equations of the fermentative partial model 2.17 one gets the following relations for the reaction rates:

3.10
These relations are replaced in the other three equations to obtain linear equations with respect to yield coefficients. So, one can split the identification problem in the following steps.
Step 1 identification of k 3 and k 5 . Replacing 3.10 in the first equation of the system 2.17 one get where θ 3 1/k 3 and θ 5 1/k 5 . One denotes v t ξ 1 t Dξ 1 t , 3.12

Mathematical Problems in Engineering
Multiplying these functions with a test function ω i t and integrating over one gets the following distributions: 3.14 As one must identify two parameters it is enough to choose a finite number N ≥ 2 of test functions ω i , i 1 : N and to build an algebraic system of equations, where, θ θ 3 θ 5 T is the vector of unknown parameters, F w is a N × 2 matrix of real numbers: and F v denotes an N-column real vector build from distributions calculated as in 3.13 :

3.17
If rank F w 2 then a unique solution is obtained:
Step 2 identification of k 1 and k 2 . Considering known k 3 k 3 and k 5 k 5 from Step 1 and substituting 3.10 in the second equation of 2.17 one gets where θ 1 −k 1 and θ 2 −k 2 .
The procedure of identification from Step 1 is repeated and one obtains k 1 − θ 1 and k 2 − θ 2 .
Step 3 identification of k 7 and k 8 . Considering known k 3 k 3 and k 5 k 5 from Step 1 and substituting 3.10 in the last equation of 2.17 one gets where θ 7 k 7 and θ 8 k 8 .
The procedure of identification from Step 1 is repeated and one obtains k 7 θ 7 and k 8 θ 8 .
Since the respirative partial model 2.18 has a similar structure with the fermentative partial model 2.17 , in a similar manner one obtains the parameters k 4 , k 6 , and k 9 .

Analysis of the Algorithm Properties
In the following one presents some consistency and numerical aspects related to the presented algorithm and a short overview of the classical identification procedure widely used for this kind of biotechnological systems.

Identifiability
Identifiability is a necessary prerequisite for model identification; it concerns uniqueness of the model parameters determined from the input-output data, under ideal conditions of noise-free observations and error-free model structure. A remarkable feature of distributionbased identification procedure is that it provides a linear reparameterization of the inputoutput relation of the nonlinear system. This reparametrization of the system involves a very simple identifiability condition to be accomplished, that is, the existence of the matrix F T w · F w −1 or, equivalently, F w is of full rank.

Consistency
Obviously, consistency of the estimates is directly influenced by the precision of numerical integration used to compute the value of the distributions. There are available numerous integration methods often called numerical quadrature with various degrees of precision.  x 1 a and x 2 The integral of p x equals the of trapezoid with base b − a times the average height To increase the accuracy one subdivides the interval a, b and assumes that f i f x i is known on a grid of equidistant points x 0 a, x i x 0 ih, x n b, where h b − a /n is the step length. The trapezoidal approximation for the ith subinterval is Summing the contributions for each subinterval The global truncation error is This shows that by choosing h small enough we can make the truncation error arbitrarily small. In other words, we have asymptotic convergence when h → 0.
In the composite Simpson's rule one divides the interval a, b into an even number n 2 m steps of length h and uses the formula where are sums over odd and even indices, respectively. The remainder is This shows that one have gained two orders of accuracy compared to the trapezoidal rule, without using more function evaluations.
Let us study properties of Newton-Cotes formulas in frequency domain or spectral properties . Newton-Cotes rules are symmetric hence linear phase digital filters with finite impulse response. One of the most important characteristic of digital filter is magnitude/frequency response-function which shows how much filter damps or amplifies magnitude of particular frequency contained in input data.
So, for trapezoidal rule and Simpson's rule one obtains These transfer functions have the amplitude-frequency Bode characteristics from Figure 3.
As one can see classical Newton-Cotes formulas suppress high frequencies noise in the input data. In that sense they can be considered low-pass filters. From Figure 3 one observes that trapezoidal rule offers a better suppression of noise than the Simpson's rule, so in the algorithm implementation one used the trapezoidal rule for numerical integration.

"Classical" Identification Procedure
The "classical" identification approach of yield coefficients is a two-step procedure under the assumption that full state measurements are available 1 . This method is based on a state transformation that allows reformulating the dynamical model into separate submodels. The first submodel depends only on the reaction structure and is independent of the kinetics. It can be linearly reparametrized and used for the identification of the yield coefficients by means of linear regressions, provided suitable identifiability conditions are satisfied. We present briefly this method for estimating the yield coefficients and we use it for comparison in the next section. The general dynamical model given in 2.1 represents a particular class of nonlinear state-space models. The nonlinearity lies in the reaction rates ϕ i ξ that are nonlinear functions of the state variables. These functions enter the model in the form Kϕ i ξ where K is a constant matrix , which is a set of linear combinations of the same nonlinear functions ϕ i ξ . This particular feature can be exploited to separate the nonlinear part from the linear part of the model by an adequate linear state transformation. More precisely, one chooses a nonsingular partition with K a ∈ R p×m of full row rank matrix i.e., p rank K , K b ∈ R n−p ×m and T a permutation matrix. The induced partitions of the vectors ξ and u are ξ a ξ b Tξ u a u b Tu.

4.13
Model 2.1 is then partitioned into two submodels: where the n − p × p matrix C is the unique solution of that is, where K * a is a generalized or pseudoinverse of K a . This subsystem 4.16 can be augmented with an equation derived from 2.7 as follows:

4.19
It can be considered as a linear time-varying if d varies in the course of time model with state z, input ξ a , u a , u b , and output ξ b . It is nonlinearly parametrized by the yield coefficients but linearly reparametrized by the nonzero entries of C. When data of the signals ξ a , u a , u b , and ξ b are available, the auxiliary model 4.19 can be used to identify the yield coefficients independently of the knowledge of the reaction rates. The model 4.19 can be used to perform the identification of the nonzero entries of C by a linear regression technique, with the yield coefficients k i recovered afterwards from 4.17 . The identification makes sense only if the yield coefficients k i are structurally uniquely identifiable with the auxiliary model 4.19 .

Simulation Results
The efficacy of our distribution-based approach is shown by simulations, comparing it with this "classical" approach reviewed in the previous section. Also, the influence of sampling period, test functions type, initial conditions, and input type is analysed. To compare statistical performances of the different approaches and simulation conditions the empirical normalized mean square error NMSE was used, which is defined as where N is the number of estimated parameters, θ j is the jth element of the estimated parameter vector, while the " * " superscript denotes the true value of the parameter. The model given by 2.17 and 2.18 was integrated using a fourth-order Runge-Kutta routine with a sampling period of 1 minute. In order to study the sensitivity of the estimation method to the sampling period T s , to the initial conditions, to the test functions type, to the input type, and to the noise, the following parameters were used:  Figure 4 presents the noise-free system time response for the fermentative partial model 2.17 . The measured states are represented in percent with respect to their maximal values. Three test function of sinusoidal type are presented in Figure 5. In Figure 6 some noisy measurements are presented.
The simulation results presented in Tables 1-5 leads to the following remarks.  1 Sensitivity to the sampling period: results given in Table 1 show that the performance of the algorithm deteriorates with the increasing sampling period.

Conclusions
The paper presents a distribution-based identification procedure for offline estimation of yield coefficients in a baker's yeast bioprocess. This procedure is a functional type method, which transforms a differential system of equations to an algebraic system in unknown parameters. The relation between the state variables of the system is represented by functionals using techniques from distribution theory based on test functions from a finitedimensional fundamental space. The identification algorithm has a hierarchical structure, which allows obtaining a linear algebraic system of equations in the unknown parameters. The coefficients of this algebraic system are functionals depending on the input and state variables and are evaluated through some testing functions from distribution theory. According to the proposed procedure, first, only some state equations are evaluated throughout testing functions to obtain a set of linear equations in some parameters. The results of this first stage of identification are utilized for expressing other parameters by linear equations. This process is repeated until all parameters are identified.
The influence of the sampling period, initial conditions, test functions type, input type, and noise on the parameters estimates was empirically analyzed.
The proposed method presents two main advantages: there is necessary no information about the parameters of the reaction rates to identify the yield coefficients of the baker's yeast bioprocess and the algorithm provides very good results even the measurements are noise-contaminated because the evaluation of states derivatives is completely avoided.

C:
Dissolved oxygen concentration g/L C * : Equilibrium concentration of the dissolved oxygen g/L CTR: Carbon dioxide transfer rate h −1 g/L D: Dilution rate h −1 E: Ethanol concentration g/L F: Vector of feeding rates F in : Input feed rate Lh −1 G: Dissolved carbon dioxide concentration g/L K: Matrix of yield coefficients k i , i 1, 9: Yield coefficients K L a: Mass transfer coefficient h −1 K e : Saturation parameter for growth on ethanol g/L K s , K c : Saturation parameters for glucose/oxygen uptake, respectively g/L K V : Transfer coefficient OTR: Oxygen transfer rate h −1 g/L Q: Vector of rates of removal of the components in gaseous form q c max : Maximal oxygen specific growth rate g O 2 / g biomass · h q e max : Maximal ethanol specific growth rate g ethanol/ g biomass · h q s max : Maximal glucose specific growth rate g glucose/ g biomass · h S: Glucose concentration g/L S in : Glucose concentration on the feed g/L V : Culture volume in the reactor L X: Biomass concentration g/L Φ: Vector of reaction rates reaction kinetics μ o s , μ r s , μ o e : Specific growth rates of baker's yeast bioprocess h −1 μ: Vector of specific growth rates μ c max : Kinetic term associated with the respiratory capacity μ e : Potential ethanol oxidative rate μ s : Kinetic term associated with the glucose consumption ξ: State vector Ω n : The fundamental space from the distribution theory : The set of real numbers ω t : Test function F q : Distribution generalized function θ: The vector of unknown parameters θ i , i 1, 9: The unknown parameters.