An Efficient Constrained Learning Algorithm for Stable 2D IIR Filter Factorization

A constrained neural network optimization algorithm is presented for factorizing simultaneously the numerator and denominator polynomials of the transfer functions of 2-D IIR filters. The method minimizes a cost function based on the frequency response of the filters, along with simultaneous satisfaction of appropriate constraints, so that factorization is facilitated and the stability of the resulting filter is respected.


Introduction
Factorization of 2D polynomials is an important problem in the design of IIR filters in two-dimensional signal processing because a factorized filter transfer function can be efficiently implemented in cascade form. However, the fundamental theorem of Algebra, concerning polynomial factorization, cannot be extended to the two-dimensional case; that is, a bivariate polynomial ( 1 , 2 ) cannot, in general, be factored as a product of lower-order polynomials. The 2D IIR cascade structure may be considered as an attractive circuit design alternative, because of its relative insensitivity to coefficient quantization, the requirement of smaller number of arithmetic operations for a given filter size, and finally, because issues related to the stability of the filters are easier to deal with for filters with a smaller number of denominator coefficients [1]. Hence, an efficient method for the factorization of 2D IIR filters would be most beneficial.
Many methods have been proposed for dealing with the two-dimensional polynomial factorization problem [2][3][4][5][6]. Most of those methods, however, adopt a conventional numerical method of roots finding (e.g., Laguerre's, Newton-Raphson's, Jenkins-Traub's methods, etc.), in which successive approximations to the roots are obtained. In addition, almost all numerical methods can only find the roots one by one, that is, by deflation method, where the next root is obtained by the deflated polynomial after the former root is found. This means that numerical root-finding methods are inherently sequential.
In an earlier work we have proposed a neural networkbased Constrained Learning Algorithm (CLA) for factorizing 2D polynomials using constrained optimization techniques [7]. Using that approach we were able to obtain exact solutions for factorable polynomials and excellent approximate solutions for nonfactorable polynomials. The technique offers the advantage of incorporating a priori information about the relations between the coefficients of the original polynomial and the coefficients of the desired factor polynomials. By incorporating additional stability constraints into the formalism, the stability (in the sense of Huang [8]) of a transfer function with the resulting factorized polynomial in the denominator (of up to second order) is also made possible. Our CLA method has been explored further in [9] by utilizing root moments, in [10] with an adaptive neural network architecture, and in [11] for handling complex root cases.
Despite its success in 2D polynomial factorization, CLA suffers from disadvantages which can limit its effectiveness for its application to IIR filter factorization. CLA's cost function is not directly related to a proximity measure between the frequency response characteristics of the filters, but is based on the Mean Square Error (MSE) criterion between the output of the original polynomial and the output of the resulting factorable polynomial for various real values of | 1 | < 1 and | 2 | < 1. Thus, CLA requires separate factorizations for the numerator and denominator polynomials of the IIR filter, while better solutions, in terms of the proximity of filter characteristics, could be achieved by simultaneous variation of all available parameters (numerator and denominator coefficients). In this paper we apply constrained optimization techniques to the more general problem of implementing a stable cascade IIR filter, given the specifications of an original 2D filter. This is achieved by suitably modifying our previous formalism as it is explained in the following section.

Derivation of the Algorithm
Consider the polynomial with even and 00 = 1. For the above polynomial, we seek to achieve an exact or approximate factorization of the form where with = /2. To this end, we can utilize a feedforward neural network architecture in order to learn the appropriate weights ( ) as we proposed in [7]. For example, Figure 1 shows the architecture of such a network for the case where = 2. The units in the hidden layer have a logarithmic activation function ln, and the output unit is linear. This architecture has been preferred to the ∑-∏ architecture employed in [12], because it produces smoother cost function landscapes avoiding deep valleys and thus facilitates learning. As we showed in [7], this neural network can be trained using a Constrained Learning Algorithm (CLA) that achieves minimization of the MSE criterion along with simultaneous satisfaction of multiple equality and inequality constraints between the polynomial coefficients. Using that method, we are able to obtain very good approximate solutions for nonfactorable polynomials. Now consider the 2D IIR filter transfer function where , even and 00 = 00 = 1. For the above polynomials, we seek to achieve an exact or approximate factorization of the form where where with = /2, and = /2 so that the transfer function of (4) can be approximated bŷ Unfortunately, if it were to apply CLA in this problem we would have to derive two separate factorizations for the numerator and denominator polynomials. Such an optimization scheme, however, is not directly related to a proximity measure between the original and the factorized filter frequency response characteristics. In terms of the proximity of filter characteristics, obviously a better approximation can be achieved by a simultaneous variation of both the numerator and denominator coefficients.
In this case, we should ask: in what sense are ( 1 , 2 ) and̂( 1 , 2 ) similar? The answer to this question is that we would at least expect the amplitude spectra of ( 1 , 2 ) and̂( 1 , 2 ) to be roughly equal so that the specifications of the original and of the factorized filter match as close as possible. Hence, the main purpose of the factorization algorithm is to minimize an MSE cost function based on the magnitude of the frequency responses of the original and the resulting filter: The double summation is carried over a regular grid of points. Logarithmic functions are used also in this case in order to suppress deep valleys and thus facilitate convergence. However, is still a highly nonlinear function of the polynomial coefficients, and therefore gradient-based techniques can run into problems in the presence of valleys or ridges in the cost function landscape. For such cases it has proved beneficial to incorporate into the adaptation rule additional a priori problem-specific information. This often facilitates the convergence of the algorithm and leads to better solutions. For the problem at hand, additional information is available in the form of constraints between the coefficients of the given numerator and denominator polynomials and the coefficients of the corresponding desired factor polynomials. More explicitly, if we assume that ( 1 , 2 ) and ( 1 , 2 ) are factorable, then these constraints can be expressed as follows: Thus, for the general case in which ( 1 , 2 ) and ( 1 , 2 ) are nonfactorable, the objective of the adaptation process is to reach a minimum of the cost function of (11) with respect to the variables ( ) and ( ) , which satisfies as best as possible The strategy which will be followed in order to solve this problem is similar to the method we proposed in [7], but with appropriate modifications in order to take care of the above constraints, as well as the new cost function that we introduced in (11).
At each iteration of our algorithm, the vector of unknown coefficients w = (u, v) is changed by w, so that where is a small constant. Thus, at each iteration, the search for an optimum new point in the w space is restricted to a small hypersphere centered at the point defined by the current vector w. If is small enough, the changes to and to Φ induced by changes in the coefficients can be approximated by the first differentials and Φ. Also, at each iteration, we seek to achieve the maximum possible change in | |, so that (14) is respected and the change Φ in Φ is equal to a predetermined vector quantity Q, designed to bring Φ closer to its target (zero).
Maximization of can be carried out analytically by introducing suitable Lagrange multipliers. Thus, a vector of Lagrange multipliers is introduced for the constraints in (12) and a multiplier is introduced for (14).
Consider a function , whose differential is defined as follows: On evaluating the differentials involved in the right-hand side, we readily obtain where (i) J is a vector with elements = / ; (ii) F is a matrix given by = [ F 0 0 F ]; (iii) F and F are matrices with elements = Φ / and = Φ / , respectively.
To minimize (maximize its magnitude) at each iteration, we demand that From (17) it immediately follows that 4

Advances in Artificial Neural Systems
Equation (18) is the update rule for the coefficients of the factor polynomials, but and must still be evaluated in terms of known quantities.
Noting that Φ = Q = F w, we can multiply both sides of (18) by F and solve for to obtain To evaluate , we substitute (18) and (19) into (14) and obtain Note that due to the block-diagonal form of matrix F, it is possible to avoid the tedious evaluation of I −1 by rewriting the quantities , I , and I as follows: since the matrices (I ) −1 and ( I ) −1 can be computed more easily than the composite matrix I −1 . We next discuss the choice of Q. It is natural to choose the elements of Q proportional to Φ with a negative constant of proportionality − , > 0, so that the constraints move towards zero at an exponential rate. Then, the bound ≤ (Φ I −1 Φ) −1/2 set on the value of by (20) forces us to choose adaptively. The simplest choice for will be used; namely, Thus, the final update rule has only two free parameters, namely, and .

Stability Constraints
For the filter ( 1 , 2 ) defined in (4), it is reasonable to assume that for most practical purposes the numerator polynomial ( 1 , 2 ) does not affect the system stability [1].
In the important special case of a second-order denominator polynomial ( 1 , 2 )( = 2), the stability conditions according to Huang [8] for the factor polynomials can be written as < 1, = 1, . . . , 6, as shown in Table 1. These conditions can be incorporated in the formalism as follows: at a certain iteration of the algorithm, let a number of the stability constraints be violated, so that ≥ 1, = 1, . . . , . We can augment the vector Φ of the previous section by adding extra components Φ [ +( +1) ]+ = , = 1, . . . , . By following the formalism of Section 2 using the augmented vector Φ , the quantities are driven towards the stability region. Note that the dimensionality of the vector Φ is adaptive, as only the violated constraints are included in each iteration. Iterative application of this scheme can lead to a final stable solution.

Experimental Results
We seek an approximate factorization of the second-order 2D IIR filter whose numerator and denominator polynomial coefficients are shown in Tables 2 and 3, respectively. Note that both polynomials are nonfactorable. Figure 2(a) shows the amplitude spectrum of the original low-pass IIR filter, and Figure 3 shows the error curve of the unconstrained gradient descent method applied to (11) using a regular grid of 32 2 points with an adaptation gain of 10 −6 . From the latest figure we can see that after an initial error reduction, an undesirable flat minimum is reached followed by unbounded oscillations caused by a transition into a region of the parameter space in which the denominator polynomial is unstable. The amplitude spectrum of the filter corresponding to the factorization result achieved at the flat minimum is shown in Figure 2(b) and is clearly unacceptable.
Using the constraints Φ 1 , Φ 3 , Φ 4 , Φ 1 , Φ 3 , and Φ 4 and all stability constraints of Section 3, we have applied both CLA, as described in [7], and the method proposed in the present paper using a regular grid of 32 2 training points. Figure 2(c) shows the amplitude spectrum of the filter obtained from the original CLA approach (with = 0.01 and = 0.99) applied sequentially to the numerator and denominator polynomials. Obviously CLA outperforms gradient descent in all aspects. However, there are still small (but not negligible) differences between the amplitude spectra shown in Figures 2(a) and 2(c). Figure 2(d) shows the amplitude spectrum of the filter obtained by the proposed method, with = 0.07 and = 0.99. Clearly, there is very good agreement between the amplitude spectra of Figures 2(a) and 2(d). In order to visualize this better, in Figure 4, we present the differences in the amplitude spectra between the original filter and each of the filters obtained by gradient descent (Figure 4(a)), CLA (Figure 4(b)), and the proposed method ( Figure 4(c)). From this figure it is clear that, of all the methods, the one proposed in the present paper achieves the maximum proximity to the original amplitude spectrum, leading to the design of a cascade filter that meets as closely as possible the specifications of the original nonfactorable filter.
The factorization results are summarized in Tables 2 and 3. Note that the separate factorization of the CLA approach produces coefficients that are closer to those of both the numerator and denominator polynomials of the original filter, while the proposed method achieves maximum similarity between the frequency responses in expense of a smaller coefficient similarity, distributed along both the numerator and denominator coefficients.

Conclusions
In this paper, an efficient algorithm was proposed for factorizing simultaneously the numerator and denominator polynomials of the transfer functions of 2D IIR filters. The method is useful for designing IIR filters in cascade form for more efficient hardware implementation. By introducing a cost function based on the difference of frequency responses    between the original and cascade filters and using a flexible framework for incorporating additional information in the form of constraints, we obtained optimal stable solutions for nonfactorable filters.