Iterative Selection of Unknown Weights in Direct Weight Optimization Identification

To the direct weight optimization identification of the nonlinear system, we add some linear terms about input sequences in the former linear affine function so as to approximate the nonlinear property. To choose the two classes of unknown weights in the more linear terms, this paper derives the detailed process on how to choose these unknown weights from theoretical analysis and engineering practice, respectively, and makes sure of their key roles between the unknown weights. From the theoretical analysis, the added unknown weights’ auxiliary role can be known in the whole process of approximating the nonlinear system. From the practical analysis, we learn how to transform one complex optimization problem to its corresponding common quadratic program problem. Then, the common quadratic program problem can be solved by the basic interior point method. Finally, the efficiency and possibility of the proposed strategies can be confirmed by the simulation results.


Introduction
The theory of system identification can be divided into linear system and nonlinear system identification.In the classical reference [1], the identification of linear system is discussed in the time domain.Then, the whole system identification field can be divided into four procedures and the accuracy analyses corresponding to various identification algorithms are explained in the probability framework.The time domain identification can be extended to the frequency domain in [2].Now, the research on the nonlinear system identification point out that the nonlinear system can be approximately regarded as a linear term adding a distortion term in [3].All the nonlinear characteristic factors of the nonlinear system can be contained in this distortion term.In [4], many special nonlinear systems are studied, for example, Wiener system, Hammerstein system, and so forth.So, various identification methods are proposed to solve these nonlinear system identification problems, such as minimum probability method, covariance instrumental variable method, and blind maximum likelihood method.The most practical method that is used to identify the nonlinear system is the basis function method.After prior selecting a group of basis functions, the nonlinear system is approximatively expanded under the prior basis functions.In order to attain the required accuracy, let the approximate error between the expansion and nonlinear system converge to zero by adjusting the unknown weights of each basis function.In [5], the process about how to construct the orthonormal basis functions using some prior poles of the denominator is given.
Based on the idea of adjusting the unknown weights to improve the approximate accuracy with basis function, a new nonlinear system identification method-direct weight optimization was proposed in [6].The main core is that firstly we select an estimator that is linear in the observed output data of the nonlinear system and the adjusted weights are contained in this linear affine function expression.When disturbance noise exists, we get an optimization problem under the condition of the optimum approximate error.The optimum adjusted weights are derived in theory through the classical optimality KKT condition.In [7], the basic idea of the new direct weight optimization is applied to identify each weight that exists in the piecewise affine system.In [8], the effect of the perturb from the direct weight optimization is analyzed.It points out that when one parameter's perturb range tends to infinity, the solution can be expressed as a piecewise linear solution path.
Based on the foundation idea of the references, we directly collect not only the observed output sequences but also the input sequences.Because the input sequences can be designed freely.So, the two sequences are all known as the prior information.From all above descriptions, we add the observed output and input sequences in the linear affine function simultaneously.Then, there exist two kinds of unknown weights about each observed input-output sequences.When compared with [3], many unknown weights corresponding to the all input sequences are added.These unknown weights can not only alleviate the dependence coming from the unknown weights of the only observed output sequences but also avoid negative effect from the perturbance.After adding some linear terms about the input sequences, the expected minimal mean square error is adopted as a criterion function to select those unknown weights.In the optimization problem of solving those unknown weight, the contribution of this paper is to deduce the selection strategy from the theory and engineering practice, respectively.We gain the unknown adjusted weights using optimality KKT sufficient and necessary condition and find that the second unknown weights that correspond to the observed output sequences are easy to get.Their concrete expressions of the second unknown weights do not depend on the first unknown weights corresponding to the input sequences.The whole selection process tells us that the second unknown weights undertake the key roles and the first unknown weights undertake the auxiliary roles.But this auxiliary effect coming from the first unknown weights may not be neglected.
This paper is organized as follows.In Section 2, we describe the problem discussed in this paper.In Section 3, we propose to add the input sequences to the linear affine function and derive an upper bound value of the objective function.In Section 4, we derive two kinds of unknown weights by using optimality KKT condition from [9].In Section 5, the interior point algorithm is applied to solve a quadratic programming problem to get the unknown weights.The convergences of the two methods are analyzed, respectively, in Section 6.In Section 7, the numerical simulation results are given to validate the efficiency.Finally, the conclusions are drawn in Section 8.

Problem Description
Given the observed data {(), ()}  =1 from the nonlinear system, where  0 (()) is an unknown nonlinear system which need to be identified, () is called the regression vector and () is an independent zero mean stochastic white noise with variance  2  .When the regression vector () is chosen as the following form, the nonlinear system is called an exogenous input model: ( Suppose a linear affine function is used to approximate the nonlinear system  0 (()) as follows: In (3), a linear term comprised of  terms of input sequences {()}  =1 is added.Then, we identify more  unknown weights {  }  =1 additionally.As the approximation performance depends tightly on the 2 + 1 unknown weights.The main goal of this paper is to determine a parameter vector  which is consisted of 2 + 1 unknown weights:

Direct Weight Optimization Identification
As the nonlinear system  0 (()) is approximated by the linear affine function f( * ()), we want to find a linear affine function f( * ()) at an arbitrarily given point  * ().The approximation accuracy depends on the weights {  }  =0 and {  }  =1 .A most commonly used criterion function would be the mean square error: Substituting (3) into (5), we obtain Substituting (1) into (6), the objection function is expanded to the following expression: To simplify the description, we introduce the notation φ() = () −  * ().Then, after adding and subtracting the same two terms, the equality is not changed.Consider  ( * ,  0 , ) In (8), the square term is called the square bias term and the last term is the variance error term caused by the unmodeled factor.From (8), we see that the bias term will be arbitrarily large, unless we impose two constraint conditions of the unknown weights {  }  =1 : Under ( 9), the objective function can be simplified to the following expression: Expanding the nonlinear system  0 (()) with Taylor series around  0 ( * ) gives Assume that the nonlinear function  0 satisfies the following Lipschitz condition: where  is a constant; letting us combine the above three formulas, we obtain an upper bound on the mean square error (10).Consider The minimum mean square error expectations ( * ,  0 ,   ) can be converted to the minimum upper bound value of the right side in (13).Hence, an optimization problem is getting Because an additional term ∑  =1 |  ||()| exists in ( 14), so the complexity of this paper increases.

Optimality KKT Sufficient and Necessary Condition
Notice that there exist some absolute operations in (14).Some slack variables   ,   are introduced to eliminate the absolute operations as follows: Using these slack variables   ,   in ( 14), the optimization problem can be formulated as Now, the next problem is to solve the solutions of the optimization problem ( 16) Applying the optimality KKT sufficient and necessary condition to (16), the Lagrangian function is written as where From the optimality KKT condition, we find the equality relations for the optimal solution as follows: Through analyzing many subformulas in (20), we find many implicit optimal equalities:           =   ,  = 1, 2, . . ., ,           =   ,  = 0, 1, . . ., .
From the first subformula in (20), we see that  +  =  −  .Further, if   > 0 in the ninth subformula in (20), then we see that   +   = |  | +   = 2  > 0. The ninth subformula holds even when  −  = 0, so from the first subformula we derive that In the second subformula in (20), if   < 0, it implies that If the eighth subformula in (20) holds, we make  +  = 0 and, from the first subformula, we see that  +  =  −  = 0.When all the equalities   = 0 hold, it means all unknown weights of the input sequences are equal to zeros.Synthesizing two cases   > 0 and   < 0, we obtain that Substituting (24) into the each subformula in (20), every subformula in (20) can be simplified The equality relations represented by the fourth and fifth subformula in (25) are completely implied in the constructed Lagrangian function.Substituting the third subformula into the second subformula, we get When   > 0, from the seventh subformula in (25), we get If the seventh subformula holds, let  −  = 0. Substituting  −  = 0 in the first subformula, we get Substituting the above equality into (26), the following equality holds: When considering   < 0, we get Formulating the above the equality relations, we get All the above give us how to solve the unknown weights {  }  =1 .Substituting (28) into the third subformula in (25), we see that The following three equations are established: From (32), we can see that Then, In the linear algebra from [10], the commonly used selection method is to impose a constrained condition about the unknown weights {  }  =0 in order to guarantee uniqueness To eliminate the absolute notation in (36), assume that the former  1 + 1 weights {  }  =0 are positive and the latter  −  1 weights are negative.Thus, we get In the singular degradation linear equation (38), we get a group of unknown weight sequences {  }  =0 through selecting  − 2 free variables.

Solve the Unknown Weights Iteratively
To solve the unknown weights iteratively from the practice point, suppose  0 =  0 = 0 in (16), and there exists three kinds of variables as the decision variables:  2 , {  }  =1 , {  }  =1 .
Formulating 4 inequalities constrained conditions in (16) to a matrix product form, where ⃗ 0 is an 4 × 1 zero vector, and denoting the above equality's left hand as matrix ,  is 4 × 4.The inequality constrain conditions can be simplified Similarly, the two equalities constrained conditions can be simplified to the matrix product form as follows: where 0 is a 2 × 1 zero vector, and denoting the above equality's left hand as matrix ,  is 2 × 4.The equality constraint conditions can be simplified It is obvious that the second term of the objective function can be rewritten as Furthermore, the computation in the bracket of the objective function can be rewritten as Setting the partial derivative with respect to the fact that  is zero, we get the equality Introducing a slack variable  ≥ 0 to eliminate the inequality constraint  ≥ 0, we rewrite (49) Suppose that the matrix comprised by (50) is where The constrained minimum is solved by updating unknown vector  iteratively.This minimum solution is the stationary point of the Lagrangian function.During the minimal process, a new iteration value  is updated by adding a correct term Δ to the current estimation.When applying the constrained Gauss-Newton method, the Δ must satisfy the solution of the following equality: At time  + 1, the new iterate is defined as the vector where the step length of the search direction must satisfy the following inequality: The search direction is determined by (53).We may add a Levenberg-Marquardt parameter  2 based on  1   1 +  2   2 in order to avoid the singular phenomenon.It makes the left top corner matrix (1, 1) of the left matrix in (53) change to the matrix  1   1 +  2   2 +  2 .So, it can guarantee that an inverse matrix exists and its inverse matrix is definite and bounded.

Algorithms Analysis
Now, we analyze the convergences of the two algorithms (20) and (54), respectively.From Sections 4 and 5, we see that the solution of (20) is derived from the optimality KKT sufficient and necessary condition and the solution of ( 54) is an iterative solution.
According to the optimality KKT necessary and sufficient condition which is similar to [11], the convergence of the algorithm used to identify the unknown weights is given.Theorem 1. Assume that  * is a solution of the quadratic programming problem (47) which satisfies the optimality KKT necessary and sufficient condition (20).If Matrix ( 1   1 +  2    2 ) is positive semidefinite for some Lagrangian multipliers  and , then  * is a global solution of quadratic programming problem (47).
Proof.If  is any other feasible point for (47), we have that  ≥ 0,  = 0 for all  ∈  4 .Hence, using the optimality KKT necessary and sufficient condition, we have that By elementary manipulation, we find that where the first inequality follows from (56) and the second inequality follows from positive semidefinite of ( 1   1 +  2   2 ).We have shown that () ≥ ( * ) for any feasible , so  * is a global solution.
Theorem 1 tells us that if a solution which satisfies all the equality (20) can be found, then it will be a global solution for the original quadratic problem.
When the interior point algorithm is applied to solve (47) iteratively, its convergence conclusion can be gotten.

Simulation Example
As the nonlinear system can be approximated by a linear affine function using direct weight optimization method, we apply this idea to approximate the Stribeck nonlinear friction which appears in the flight simulation turntable system.The Stribeck nonlinear friction model is described as ) sgn ( θ ()) +  θ () , where   is the maximum static friction force,   is coulomb friction force,  is a viscous friction coefficient, and θ  is the critical Stribeck speed.Let us regard θ () in (56) as () in (1) and apply the new linear affine function to approximate the Stribeck nonlinear friction model as follows: where θ () is treated as the input signal.We minimize the performance function (10) to obtain the unknown parameter vector ( 0 ,  1 , . . .,   ,  1 , . . .,   ).The interior point algorithm is applied to solve it and the number of  is selected by trying test method.When  is increased to some fixed value, we survey whether the performance index function will not change much.If not, then this fixed value is the number of .Next, we make some simulations on the Stribeck nonlinear friction.
In Figure 1, we plot the relation curve between the friction force and the speed under sine position input signal.We compare the three curves of the true nonlinear friction with the proposed method, classical method.In Figure 1, the black curve represents the true nonlinear friction force, the green represents the linear affine curve proposed by our method, and the red curve represents the curve designed by [3].From Figure 1, when the speed is low, the difference is very much obvious.But if the speed is increased, the black and green curve will coincide and the red curve starts to flutter away the black curve.It means that the relationship between the true nonlinear friction force and the linear affine friction force derived from our method will be equal.Then, if the speed is chosen sufficiently high, this paper's linear affine friction force can be used to replace the true nonlinear friction force.To the classical method, it should spend more time to approximate the true nonlinear friction force.
In Figure 2, we plot the relation curve between the friction force and the speed under slope position input signal in the flight simulation turntable.From Figure 2, we see that from the beginning, the linear affine function derived by our method can tightly approximate the nonlinear friction force and it has little swing.But to the classical method, the error is high even from the beginning and in the approximation process the curve has much more swings.
We plot the crawl phenomenon under slope position input signal in Figure 3. From Figure 3, each output corresponding to the nonlinear friction model is full of many irregular curves.And each output corresponding to the linear affine function model is full of many piecewise lines.The embodiment of the approximation is to use these piecewise lines to approximate the irregular curve at different time periods.In every time period, the approximation error is defined as the derivation between the line and the corresponding curve.At the beginning, this deviation error is bigger.As the time goes, the lines are close to the curve and the approximate error is small.

Conclusion
This paper derives how to choose the unknown weights from the theory and engineering, respectively, in the improved direct weight optimization method.Because the input sequences should be designed to sufficiently excite the nonlinear system, further research on the optimal input signal design must be dealt with in future.

Figure 1 :
Figure 1: The relations between the friction force and the speed under sine position input signal.

Figure 2 :
Figure 2: The relations between the friction force and the speed under slope position input signal.

Figure 3 :
Figure 3: The crawl phenomenon under slope position input signal.
1and  2 are the Lagrangian multipliers corresponding to the equality constraint and  ±  |