Mixed L p Estimators Variety for Model Order Reduction in Control Oriented System Identification

A new family of MLE type Lp estimators for model order reduction in dynamical systems identification is presented in this paper. A family of Lp distributions proposed in this work combines p2 (1 < p2 < 2) and p1 (0 < p1 < 1) distributions which are quantified by four parameters.Themain purpose is to show that these parameters add degrees of freedom (DOF) in the estimation criterion and reduce the estimated model complexity. Convergence consistency properties of the estimator are analysed and the model order reduction is established. Experimental results are presented and discussed on a real vibration complex dynamical system and pseudo-linear models are considered.


Introduction
The choice of the norm is a fundamental problem in system identification.For some special cases, the least squares (LS) ( = 2), the least absolute deviation (LAD) ( = 1), and the Chebyshev estimator ( = ∞) methods have been proposed by many authors [1,2].It is well known that the  1 norm estimators are efficient when the noise follows the Laplace distribution, the  2 norm estimators are efficient when the noise follows the Gaussian distribution, and the  ∞ norm estimators are efficient when the noise follows the uniform distribution.Furthermore, it has been shown that the   norm estimator is the MLE if noise is a generalized -Gaussian [3].Thus, we can always find a suitable exponent  for the   norm method.In the reference paper [4], the authors discuss smooth and sensitive   norms for prediction error system identification when disturbances are magnitude-bounded.They show that a necessary condition for   norm to be statistically robust with respect to the family P () of the -contaminated distribution model with support [−, ] for some arbitrary  > 0 is that its second derivative does not vanish on the support and must be strictly positive with respect to P ().This latter condition is fundamental in our framework by ensuring the parameter estimate variance convergence rate (see Proposition 4.2 and Corollary 4.3 [4]).However, many drawbacks come from the use of the   norm.Firstly,  is the unique DOF of the norm.Secondly, the support is a restricted condition to deal with the outliers [5].Thirdly, the choice of  may prevent robustness to the large data.Finally, most of the time,  ∞ norm is used to the model reduction for robust control [6] with the restricted case where the model parameters number is limited.The   norms are widely used in many domains [7,8].Even though the formal framework presents some difficulties, previous works show the interest to use these estimators.In system identification, Chen et al. in [9] propose an ARMA robust system identification using a generalized   norm estimation algorithm.The authors use this norm to identify a system with a non-Gaussian noise, since the classical LSE is related to the situation when the noise distribution is white-Gaussian.Likewise, these authors in [10] propose the parameter estimation of linear systems 2 Mathematical Problems in Engineering with input-output noisy data, supposed to be corrupted by measurement noise unknown distributions.
Here, we deal with the problem of the   estimation in system identification in the presence of atypical data (outliers) in the vibration dynamical system output signal, in order to reduce the model order for the controller design.In such a vibration dynamical system, natural outliers occur in the output signal, increasing considerably the model order in the estimation procedure.
To avoid this problem, a new variety of MLE type   estimators are considered.We define a mixed   distribution with multiple DOF containing 1 <  2 < 2 and 0 <  1 < 1 [11], parametrized by a threshold named scaling factor, denoted by η.To ensure both the parameter estimate variance convergence and the model order reduction and apply conditions in [4] related to the second derivative of the   norm on the support with respect to P (), we show that there exists a tradeoff between  2 ,  1 , and η.This reduction is efficient only with η < 1 and  2 < 2, involving  1 < 1.
Model reduction has been subject to considerable interest in many domains.In [12], a method of the model reduction of the nonlinear complex Ginzburg-Landau equation is presented.In [13] the authors propose dimension reduction and dynamics of a spiking neural network model for decision making under neuromodulation.The model order reduction is a fundamental step, espacially in control design.To estimate a low order model of a system, several possibilities exist.The most obvious one is to directly estimate a lower order model.As known from, for example, Ljung [14], the prediction/output error estimate automatically gives models that are  2 approximations of the true system in frequency-weight norm, determined by the input spectrum and noise model.See [14, chapter 8] for more details.A second possibility is to estimate a high order model which is then subjected to model reduction to the desired order.Indeed, some contributions that take into account that the high order model is obtained through an identification experiment when performing model reduction are Porat and Friedlander [15], Porat [16], Söderström et al. [17], Stoica and Söderström [18], Zhu and Backx [19, chapter 7] and Tjärnström and Ljung [20], and Tjärnström [21].Porat and Friedlander study an ARMA parameter estimation via covariance estimates.Recently, in time domain, [22] presents a relevant paper on a new model order reduction algorithm based on general orthogonal polynomials.The following contributors deal with models having input signals.Söderström et al. [17] focus on model structures that can be embedded in larger structures which are easy to estimate, such as ARX structures.After estimating the high order structure, they reduce the estimate to the low order structure in a weighted nonlinear least-squares sense.The method is called indirect prediction error method.In [20], the authors focus on  2 model reduction.They study FIR and OE models.The former is estimated directly from data and the latter is computed by reducing a high order model, by  2 model reduction.For OE models, they show that the reduced model has the same variance as the directly estimated one, if the reduced model class used contains the true system.Recently, in [23], the authors illustrate procedures to identify a state-space representation of a lossless or dissipative system from a given noise-free trajectory.The idea is to perform model reduction by obtaining a balanced realization directly from data and truncating it to obtain a reduced order model.However, these different model order reduction methods are performed after estimation phases with high model orders.On the other hand, this reduction is made in favorable conditions, where no outliers are present in the dataset and the prediction errors.In practice, we all know that real conditions lead to model complexity strongly increased.The lack of robustness of these methods damages the estimation and the model order.Different works have been performed, using robust estimators.A first alternative is the use of the  1 norm [24][25][26], as a possibility to reduce the model complexity to identify a complex dynamical system, for example, an acoustic duct for active noise control (ANC).The authors propose relevant OE (output error)  1 models with complexity   1 M equal to 21.The second one is to use a symmetrix convex Huber's function [27].Recently, in [28,29], the identification of the same plant, using  2 − 1 estimators with low values of the scaling factor in Huber's function, proposes OE- 2 −  1 models with complexity   M equal to 17. Here, we will tackle the   2 −   1 distribution approach and, from this, we will propose a   parameterized robust estimation criterion (PREC) using a symmetric convex function, offering to the user more flexibility in terms of an extra DOF, in order to improve the estimation balance robustness/ performances.We will show that the number of DOF in this function facilitates the model order reduction.We will see that the reduced order models are provided by the   2 −   1 estimation itself, where classical approaches separate the phases, estimation and then reduction.Indeed, we will demonstrate that the   model order thus obtained remains lower than the corresponding  2 − 1 estimate.Finally, experimental results on the acoustic duct will confirm these results.
We start in Section 2 by introducing the mathematical background related to definition and properties of the   estimator.We develop the consistency and convergence properties of this estimator.The application to the system identification is presented in Section 3. Section 4 proposes the main results concerning the model order reduction.In Section 5, the improvement of the model order reduction with respect to the number of DOF is presented.Experimental results based on a vibration dynamical system are presented and discussed in Section 6. Conclusions and perspectives are drawn in Section 7.

Definition and Properties of the 𝐿 𝑝 Estimators
The new family of   -distributions proposed in this paper combines   2 and   1 distributions, quantified by four parameters,  2 (1 <  2 < 2),  1 (0 <  1 < 1),  ( > 0), and η (0 < η < 1).Hence, it can be viewed as a generalization of the redescending mixed distributions, which corresponds to the M-estimates  2 = 2 and  1 = 1 (see [27, where  = 1/2( 1 +  2 ) with where Γ() and Γ(, ) are, respectively, the complete and incomplete Euler's gamma functions.We can easily verify that, for all  ∈ R,   () ≥ 0 and ∫ R   () = 1, which ensure that   is a probability density.The parameter η is defined as a threshold named scaling factor.Its value depends on the degree of the contamination of the random variables .Moreover, a low value will ensure a good robustness and a high value a good efficiency of the estimation.
In robust statistics, continuity conditions are necessary to ensure good estimations [27, chapter 2, page [24][25].Therefore, these conditions must be applied for both   η and its first derivative.This latter must be a bounded continuous function.From these considerations, we define the dimensional continuity conditions (DCC) as follows: where The dimensional  1 is given from DCC and, after straightforward calculations, we obtain Equation ( 6) allows to show that, for the conditions η < 1 and  2 < 2, we get  1 < 1.We now define the gross error model (GEM) in the mixed   framework as a generalization of the GEM given in [27,  η .Accordingly, the   1 treatment of the estimated residuals is focused on the outlier occurrence, so the following errors rapidly decrease and they are advantageously treated by   2 η .Now, our goal is to provide the relation between the level of contamination  and the scaling factor η. In practice, the robustness is defined by a suitable value of η that fixes the contamination model. with where Γ(, ) = ∫ ∞   −1  − ,  > 0, is incomplete Euler's gamma function.
See Appendix A for the proof.Notice that in the limit case, where ( 2 ,  1 ) = (2, 1) (M-estimates), we obtain, for  =  ( = ), A = 1 − 2Φ(−) and B = 2()/.See [27, chapter 4, page 84]. Figure 1 shows an example of the level of contamination  as a function of η,  1 , parameterized by  2 , with  =  = 1.A large value of the scaling factor means that the probability distribution of the prediction errors is weakly disturbed, corresponding to  < 0.1 (see Figure 1).Conversely, a small value corresponds to a pdf more disturbed.For  2 ≤ 1.7, the curves have a minimum, meaning that a particular value of the scaling factor renders the corresponding pdf weakly contaminated.
In order to prove the consistency and convergence of the robust estimator θ  , we need the following lemma related to the moment of order  of the random variable .Lemma 6.For the considered pdf   , the moments of order  ( ∈ N) of a random variable  are with See Appendix B for the proof.
Theorem 7. Let Θ be a compact subset of R  and let θ  be the estimator obtained as a solution of the problem Let  * be the global minimum obtained as a solution of the problem and then θ  is a consistent estimate of  * ; that is, θ  →  * , ..1.
In order to do so, we use the (MLE) consistency result by Newey and McFadden [30] and show that all the assumptions of their theorem hold.Theorem 8. Suppose that   ( = 1, 2, . ..) are iid rv's with pdf (  |  * ) and Then θ See Appendix C for the proof.
Remark 9. Before presenting the theorem related to the asymptotic distribution of the   estimator, let us notice that the standard asymptotic normality results for MLE requires that the PREC be twice continuously differentiable, which is not the case here.There exist, however, asymptotic normality results for nonsmooth functions and we will hereafter use the one proposed by Newey and McFadden [30] and Andrews [31].The basic insight of their approaches is that the smoothness condition of the PREC,    (), can be replaced by a smoothness of its limit, which in the standard maximum likelohood case corresponds to the expectation − ln   (  ()) =   (), with the requirement that certain remainder terms are small.Hence, the standard differentiability assumption is replaced by a stochastic differentiability condition, which can then be used to show that the MLE θ  is consistent and asymptotically normal.Let us consider the Ψfunction as the derivative with respect to  of   η denoted by Ψ  (; ).If this function is differentiable in , one can establish the asymptotic normality of θ  by expanding √ ( θ  −  * ) about  * using element by element mean value expansions.This is the standard way of establishing asymptotic normality of the estimator.In a variety of applications, however, Ψ  (; ) is not differentiable in , or not even continuous, due to the appearence of a sign function.In such a case, one can still establish asymptotic normality of the estimator provided Ψ  (; ) is differentiable in .Since the expectation operator is a smoothing operator, Ψ  (; ) is often differentiable in  even though Ψ  (; ) is not.
Let us consider the following theorem establishing the asymptotic distribution of the estimator.
where P( * ) is the asymptotic covariance matrix.
In order to do so, we use Theorem 7.1 given by Newey and McFadden [30] and show that all the assumptions of their theorem hold.
and then √ ( θ The proof is given in Appendix D.

Application to the System Identification
Having defined the mixed   framework, our purpose is now to apply this approach to the system identification by introducing the   -PREC and its first and second derivatives, to express the asymptotic   covariance matrix of the   estimator.Before this, let us recall the prediction error context.

Prediction Error Approach.
Throughout the paper, we denote the input signal by   , the output signal by   , and the disturbance signal by   as a random variable sequence with zero mean and variances , and  is the total number of measured data.We assume that   is generated according to where  0 () is a linear time-invariant system, usually referred to as the true system and  0 () is the noise filter.In the case of OE models, the model set is parameterized by a dimensional real-valued parameter vector ; that is, with the regressor, and   () = (/) ŷ () the gradient of ŷ () [14].

Limit Hessian and 𝑄
-Matrix.The main purpose of this section is to express the asymptotic covariance matrix of the estimator θ  for prediction error estimates in case, essentially, a true description of the system is available in the model set.The condition then becomes S ∈ M, where M is a model structure, a mapping from a parameter space to a set of models.Let us consider the convergence domain defined by We put ourselves in the case where the global minimum is equal to the true parameter vector  0 ; in this case,   = { * } = { 0 }.To establish this asymptotic covariance matrix, we need the limit Hessian and the    -matrix.Using Theorem 10, the covariance matrix is given by with the limit Hessian and the    -matrix yielded, respectively, by where See Appendix E for the proof.
Remark 12.It is easy to show that the second derivative of Since  1 < 1, the second term on the right-hand side of ( 23) is negative.Accordingly, there exists a limit value of where    () is the limit criterion.Let  0 be the argument of the minimum of this limit criterion and let us expand    () around  0 , and we obtain Similarly, since From the Hessian uniform convergence theorem and the expectations of ( 25) and ( 26), we obtain for the   -RFPE If a single run is taken into account, then we can classically replace the expectation of    ( θ  ) with only one observation available.Therefore, Let us denote where  is the identity matrix.Now, in order to completely define the expression of the   -RFPE, we need the following result due to the Schur complement.
Lemma 13.Let  be a symmetric matrix of the form  = (      ), where , , and  are real and positive definite matrices.If  is invertible, then the following property holds: In the mixed   context, the estimation procedure is mainly a   2 estimation since the impulsional effect of the outlier is rapidly treated by the ,  = 1, 2, are positive definite matrices.Using Lemma 13 with  =   ,  = (Φ 0 ]  1 , ) −1 , and  = Φ 0 ]  2 , , we have where  is the model complexity and 0 <  < 1.Therefore, with   M being the model complexity.The   -RFPE becomes

Model Order Reduction
Let two model classes M and M be given, respectively, related to Huber's and   estimators.To avoid some drawbacks with the notation of the parameter vectors, model classes M and M are parameterized by  and θ, respectively.Recall that the PREC in Huber's approach is defined as follows: See [29, chapter 4, page 50].Let us consider the following conditions C1: (1)  < 1 and  < 0.4; (2) η < 1 and  < 0.4; ( From the curves of V 0 and using the conditions C1, then The proof is given in Appendix G.This tuning offers to the user the necessary DOF to estimate and validate the reduced order models, as described in the following section.

Effect of DOF in the Symmetric Function to the Model Order Reduction
Here, we focus on a new vision on the link between the model reduction objective and its link to the estimation chosen, in particular the number of DOF of this symmetric function.
Let   be the number of DOF.We present in Table 1 different symmetric function related to the MLE context and the corresponding   with their associated tuning parameters.
First, the reason for which the LSE and LSAD estimators do not allow the model order reduction seems to be due to the number of DOF; here,   = 0. From the comparison between Akaike's FPE and the  1 -FPE [24], we obtain M is ensured only by the value of the variance λ of the  2 estimation.Sometimes, this one is obtained with difficulty, especially in the presence of some outliers in the prediction errors.Since   = 0, the model complexity reduction remains difficult.From our point of view, the introduction of DOF in the estimation symmetric function facilitates the model complexity reduction.For example, the DOF in Huber's symmetric function may ensure both the robustness and the model order reduction.Therefore, comparing (36) and the  1 -FPE, we have M .Consequently, the model order reduction may be obtained by tuning  in Huber's function to have the condition 1/2 λ() < 1.By extention, with three DOF, the condition R  < 1 in (G.2) may be obtained by tuning  (through ), η, and  2 .Thus, the flexibility of the   symmetric function ensures the model order reduction.Table 2 summarizes these main results related to the effect of the norm's DOF on the model order reduction.

Experimental Results
6.1.Presentation.This section presents different results related to the identification of a real system, namely, an acoustic duct plant, in order to estimate and validate reduced order models, using   estimators.We compare the   estimation results with those given by Huber's [28,29], LSAD, LSE [24], and  ∞ estimators ( θ∞  = min  max  |  ()|).As shown in Figure 3, we have to find discrete-time dynamical models with reduced orders of the plant including the secondary loudspeaker, its amplifier, the acoustic secondary path, the measurement microphone, and its amplifier/antialiasing filter.For more details, the reader could refer to [24].The identification has been realized from a PRBS (pseudrandom binary sequence) as excitation input signal, with a length L = 2 10 − 1 and level ±3, approximating a white noise upon the frequency range [0; 1000 Hz].The sampling period is   = 500 s and the number of data is  = 1024.The pure delay  of the duct is given as an integer of the sampling period.Here,  = 7. Figure 2 shows the output signal given by the microphone during the recording phase of the data set   of the process.An estimations campaign has been conducted to get the value of   1 ensuring Proposition 4.2 and Corollary 4.3 in [4]. 1 was varied over the range [0.5; 1[ with an incremental step of 0.01.The convergence is ensured for  1 ≥ 0.88.We deduce that   1 = 0.88.In the sequel, for the   estimation context, 1.8 ≤  2 ≤ 2, η < 1, and 0.88 ≤  1 < 1, and we will denote all the models by (, ) − (  ,   ). 3 presents the main results from  ∞ , LSE, LSAD, , and   estimators.The model order reduction is clearly shown and we can notice that the complexity reduction between LSE and the low dimension of the new mixed   estimator is 40%.In Table 4, the parameters of the (2, 1)-(9, 8) and (1.8, 0.939)-(7, 7) models are given.Figures 4 and 5 show the frequency responses of the (2, 1)-(9, 8) and (1.8, 0.939)-(7, 7) models, respectively, versus the spectral estimation of the process.Even though the   model presents a lower complexity than Huber's model, its frequency response especially in low frequencies particularly interesting in controller design is good.Remark that, for  ∞ estimator, the fit is low and the model order increases considerably with  ∞ M = 42.

Conclusion
In this paper, a new robust estimator based on a mixed   distribution has been proposed in order to reduce the estimated model complexity.The main consistency and convergence properties of the robust estimator have been established and it has been shown that the tuning of DOF in the symmetric function improves the model order reduction.In effect, the additional DOF in the symmetric function of the estimation criterion present the advantage to tune its parameters.This flexibility offers to the user the choice to estimate and reduce the estimated model complexity in the same procedure.We presented experimental results from a vibration real process and we showed the effective reductions.Many aspects of these studies are opened to further researches.First, from a formal point of view, we should study the generalization of multiple DOF in a mixed symmetric function by investigating the link between the number of DOF and the model reduction.Then, it would be relevant to apply our   framework to a process with a level of contamination of the data more significant.
Case 2 (for B).From straightforward calculations, expression B only necessitates the result of the integral  = ∫ which proves B.

C. Proof of Theorem 7
To prove Theorem 7, we need to describe system S with output   by where    () is the regressor of the model and   is the disturbance signal with zero mean and variance .Let   () be the estimation error equal to   −    ().At  * , we have We obtain (iii) The continuity condition is trivially verified since ln   (  ()) is continuous at each  ∈  M w.p.1.

D. Proof of Theorem 11
(i) From  ln   (  ()), we can deduce that which is equivalent to arg min (ii) The interior condition of Theorem 11 is equivalent to the assumption  ∈  ∘ M , where  ∘ M is the interior of  M .(iii) From ( 11), the gradient and the Hessian are, respectively, ()    ()       () (D.5) In the sequel, we adopt Ljung's notations: =1   (see [14]), where  is the expectation with respect to the considered probability distribution.Moreover, write for short in the general case ℎ  ( * ) = ℎ *  .In order to render differentiable the limit Hessian, we use the stochastic differentiable condition; therefore, at the global parameter vector  * , since  * ]   , AS   → 0 ( = 1, 2), the limit Hessian is (iv) Using the mean value theorem, we get The asymptotic distribution of √ ( θ  −  * ) only depends on the asymptotic distribution of √ ((   ()/)) θ  .Therefore, (D.10) Since (1/) ∑  =1 Ψ  ,η (; θ  ) = 0, the third term on the righthand side of (D.10) is   (1).Its first term is   (1) provided {  (•),  ≥ 1} is stochastically equicontinuous and θ  prob   →  * (see Theorem 7).This follows because, given any  > 0 and  > 0, there exists a  > 0 such that lim where the second inequality uses θ  prob   →  * and the third uses the stochastic equicontinuity.Accordingly, this shows that, for  tends to infinity, we have in law with The main purpose is to prove that   ( * ) in law is a normal asymptotic distribution.For this, we must show that the terms of   ( * ) are independent.However, if we show that the dependence between distant terms decreases, that is, if we show that they are -dependent (i.e., independent beyond lags of length ), then we show the asymptotic normal behavior of   ( * ).For our proof, we will use the following technical assumptions.
(2) {  } is a sequence of independent rv's with zero mean values and bounded moments of order 4 + , for  > 0.

Figure 2 :
Figure 2: Output signal given by the microphone during the recording phase of the data set   .It clearly shows some outliers with distinguished levels.
{} is the unit function with the condition .Let us introduce two index subsets ]  2 and ]  1 , respectively, related to the   2 and   1 contributions, defined by ]  2 () = { : |  ()| ≤ η} and ]  1 () = { : |  ()| > η}.From this, the   -PREC to be minimized is given by Robust Final Prediction Error.Similar to the RFPE in Huber's framework [29], we now establish a quality measure for a given model.Let m  = M( θ  ) be the robust model to be evaluated.It is estimated within the model structure M. Let us denote J θ ( M; ), this quality measure that depends on the actual data record   .We classically have

Table 1 :
DOF in the symmetric function.

Table 2 :
Effect of the DOF in the model order reduction.

Table 3 :
Main results from  ∞ , LSE, LSAD, , and   estimators for model order reduction.