Backward Adaptive Biorthogonalization

A backward biorthogonalization approach is proposed, which modifies biorthogonal functions so as to generate orthogonal projections onto a reduced subspace. The technique is relevant to problems amenable to be represented by a general linear model. In particular, problems of data compression, noise reduction and sparse representations may be tackled by the proposed approach.


I. INTRODUCTION
We introduce a backward biorthogonalization technique relevant to the general linear model. Any data which may be described in terms of a linear combination of waveforms satisfies this model. For instance, the response of a physical system to a particular interaction varying as a function of a parameter, say t, is often represented by a measurable quantity f (t), which is amenable to be expressed in the fashion: where the model waveforms α n ; n = 1, . . . , N are derivable by recourse to physical considerations. The determination of the coefficients c N n ; n = 1, . . . , N entails to solve the inverse problem when the function f (t) is measured. The superscript N indicates that, unless the waveforms α n are orthogonal, the appropriate coefficients c N n depend on the order of the model, i.e., the number N of waveforms α n being considered in (1). If such waveforms are linearly independent, then, there exits a set of reciprocal functionsα N n ; n = 1, . . . , N which is biorthogonal to the former, i.e. α N n |α m = δ n,m [1,2]. Here the superscript N indicates that the biorthogonal functionsα N n allow for constructing orthogonal projections onto the subspace V N spanned by N waveforms α n . Hence, the coefficients c N n of the linear expansion (1) approximating a function f at best (in a minimum distance sense) can be obtained by computing the inner products c N n = α N n |f [2,3].
Since the reciprocal setα N n ; n = 1, . . . , N (and therefore the coefficients c N n ) depend on the number N of total waveforms, they should be recalculated when this number is enlarged or reduced. This feature of non-orthogonal expansions is discussed in [2,3,4], where a recursive methodology is introduced for transforming the reciprocal setα N n ; n = 1, . . . , N into a new oneα N +1 n ; n = 1, . . . , N + 1. The latter is guaranteed to yield orthogonal projections onto the Here we wish to consider the converse situation: Let us suppose that the reciprocal waveforms α N n ; n = 1, . . . , N are known and we want to modify them so as to obtain orthogonal projections onto a subspace V N/α j arising by eliminating an element, say the j-th one, from V N .
Let us suppose that in the summation of (1) we want to retain only some terms and approximate f (t) by a linear combination of those elements. Thus, to obtain the best approximation of f (t) by a linear combination of such a nature, we need to recalculate the coefficients corresponding to the waveforms we wish to retain. If we simply disregarded the coefficients of the unwanted terms, but did not recalculate the remaining ones, the approximation would not be optimal in a minimum distance sense. The approach we propose in this Communication allows for the necessary modifications of coefficients so as to achieve the optimal approximation. The method is based on an iterative technique capable of adapting biorthogonal functions in order to generate orthogonal projections onto a reduced subspace.
The paper is organised as follow: Sections II introduces the notation, discusses the motivation to the proposed approach and summarises a previously introduced forward biorthogonalization method [2,3]. Section III discusses the proposed biorthoganization technique to transform biorthogonal functions in order to build orthogonal projections onto a reduced subspace. The conclusions are drawn in Section IV.

II. NOTATION, BACKGROUND AND MOTIVATION TO THE APPROACH
Adopting Dirac's vector notation [7] we represent an element f of a Hilbert space H as a vector |f and its dual as f |. Given a set of δ-normalized continuous orthogonal vectors

the unity operator in H is expressed
Thus, for all |f and |g ∈ H, by insertingÎ H in f |g , i.e, one is led to a representation of H in terms of the space of square integrable functions, with t|g = g(t) and g|t = t|g * = g * (t), where g * (t) indicates the complex conjugate of g(t).
Let vectors |α n ∈ H ; n = 1, . . . , ∞ be a Riesz basis for H. Hence, all |f ∈ H can be expressed as the linear span and there exists a reciprocal basis |α n ; n = 1, . . . , ∞ for H to which the former basis is biorthogonal i.e., α n |α m = δ n,m [1]. The reciprocal basis allows to compute the coefficients c n in (4) as the inner products Thus, |α n α n |f (6) so that, by denotingÎ (4) can be recast as f =Îf , which implies thatÎ is a representation of the identity operator in H and we have the following generalization of the Plancherel-Parseval identity with c n as in (5) andc * n = f |α n . If the basis |α n ; n = 1, . . . , ∞ is orthogonalized and we denote by |ψ n ; n = 1, . . . , ∞ the corresponding orthogonal vectors after normalization to unity, then the new basis is self-reciprocal, i.e., it satisfies the orthonormality condition ψ m |ψ n = δ m,n and provides a representation for the identity operator as given byÎ This representation of the identity operator can be seen as a particular case of (7), by considering the basis and its reciprocal identical to |ψ n ; n = 1, . . . , ∞. The equivalence between (7) and (9) holds only when both sums run to infinity. Because, on the one hand if the sum in (9) is truncated up to N terms we obtain an operator,P , given bŷ which is the orthogonal projector onto the subspace V N spanned by N vectors |α n ; n = 1, . . . , N. On the other hand, by truncating (7) up to N terms one obtains an operator which is idempotent, and hence a projector, but as it fails to be self-adjoint it is not an orthogonal projector operator. As a consequence, the approximation of |f that we obtain by truncating the expansion (1) up to N terms is not the best approximation of |f that can be obtained by a linear superposition of N vectors |α n . If one wishes for orthogonal projections by means of biorthogonal families, then biorthogonal vectors |α N n ; n = 1, . . . , N specially devised for such a purpose must be constructed. The superscript N indicates that if the subspace V N is enlarged (or reduced) each function should be recalculated.
Let |α n be a set of linearly independent vectors and let vectors |ψ n be obtained by orthogonalizing the formers in such a way that |ψ n = |α n −P V n−1 |α n , whereP V n−1 is the orthogonal projector operator onto the subspace V n−1 spanned by |α l ; l = 1, . . . , n − 1. Then, it is proved in [2,3] that vectors |α k+1 n arising from |ψ 1 = |α 1 through the recursive equation: are biorthogonal to vectors |α n ; n = 1, . . . , k+1 and provide a representation of the orthogonal projection operator onto V k+1 i.e., As discussed in [4], in order to reduce numerical errors the vectors |ψ k are conveniently computed by Modified Gram Schmidt procedure or Modified Gram Schmidt with pivoting [4,6].
Since the unique vector in V k+1 minimizing the distance to an arbitrary vector |f ∈ H is ob- which approximates an arbitrary |f ∈ H at best in a minimum distance sense, can be recursively obtained as: with c 1 1 = α l 1 |f |||α l 1 || 2 . This technique, yielding forward approximations, has been shown to be of assistance in sparse signal representation by waveforms selection [3] as well as data set selection [5]. Nevertheless, in those and other application areas, it is clear the need for a technique yielding approximations in the opposite direction. Hence the motivation to the approach of the next section.

III. BACKWARD ADAPTIVE BIORTHOGONALIZATION
Let V N/α j denote the subspace which is left by removing the vector |α j from V N , i.e, and let |α N/j n ; n = 1, . . . , j − 1, j + 1, . . . , N be the corresponding reciprocal family which allows to express the orthogonal projector operator onto V N/α j aŝ Assuming that the biorthogonal vectors |α N n ; n = 1, . . . , N yielding a representation ofP V N as given byP are known, our goal is to modify such vectors so as to obtain the corresponding set |α N j n ; n = 1, . . . , j − 1, j + 1, . . . , N yieldingP V N/α j as in (17).
We start by writingP whereP V ⊥ N/α j is the orthogonal projector onto V ⊥ N/α j , the orthogonal complement of V N/α j in V N . Thus, V ⊥ N/α j contains only one linear independent vector, arising by subtracting from |α j its component in V ⊥ N/α j , i.e.,P where Taking the inner product of both sides of (22) with ψ f j |, and using the fact that ψ f j |α n = 0 for n = j, we obtain: Hence, vector ψ f j | turns out to be Taking now the inner product of both sides of (22) with every α N n | ; n = 1, . . . , j−1, j+1, . . . , N we obtain the equation we wanted to find: The following theorem demonstrates that the modification of vectors |α N n as prescribed in (25) provides us with biorthogonal vectors |α N/j n ; n = 1, . . . , j −1, j +1, . . . , N rendering orthogonal projections.
Theorem 1: Given a set of vectors |α N n ; n = 1, . . . , N biorthogonal to vectors |α n ; n = 1, . . . , N and yielding a representation ofP V N as given in (18), a new set of biorthogonal vectors |α N n ; n = 1, . . . , j − 1, j + 1, . . . , N yielding a representation forP V N/α j , as given in (17), can be obtained from the following equations Proof: Let us first use (26) to writê To prove that (27) is the orthogonal projector onto V N/α j we show that a)P V N/α j |g = |g for all |g ∈ V N/α j and b)P V N/α j |g ⊥ = 0 for all |g ⊥ in the orthogonal complement of V N/α j in H.
|α n α N n | has been proved to be a projector, it is self-adjoint. Hence (17) holds ✷ Corollary 1: Let |f N be the orthogonal projection of and arbitrary |f ∈ H onto V N , i.e with c N n = α N n |f ; n = 1, . . . , N assumed to be known. Hence, the coefficients c N/j n of the orthogonal projection of |f onto V N/α j are obtained from the known coefficients c N n as follows: The proof trivially stems from (27), sinceP V N/α j |f = N n=1 n =j c N/j n |α n implies c N/j n = α N/j n |f ✷ Corollary 2: For |f ∈ H, let |f N be as above and |f N/j =P V N/α j |f . Then, the following relation between |||f N || and |||f N/j || holds: Proof: Using (29) and the fact that projectors are self-adjoint and idempotent, it follows that So far we have discussed how to modify the coefficients of a linear expansion when one of its components is removed. Nevertheless, we have given no specification on how to choose such an element. We are now in a position to address this point, since the last Corollary suggests how the selection could be made optimal. The following proposition is in order.

Proposition 1:
Let be given by the coefficients c N n ; n = 1 . . . , N, and let be obtained by eliminating the coefficient c N j from (35) and modifying the remaining coefficients as prescribed in (32). The coefficient c N j to be removed as minimizing the norm of the residual error |∆ = |f N − |f N/j is the one yielding a minimum value of Proof: Hence, making use of (33), we further have from which we gather that |||f N − |f N/j || 2 is minimum if |c N j | 2 |||α N j || 2 is minimum ✷ Proposition 1 is relevant to backward approximation of a signal, a common procedure in compression and noise reduction techniques. The goal being to shrink coefficients so as to have a more economical representation and/or reduce spurious information (noise). Successive applications of criterion (37) leads to an algorithm for recursive coarser approximations. Indeed, let us assume that at the first iteration we eliminate the jth-term yielding a minimum of (37).
We then construct the new reciprocal vectors as prescribed in (26) and the corresponding new coefficients as prescribed in (32). We are thus in a position to repeat the process and obtain a coarser approximation of the previous one. If we denote by |f (k) the approximation arising at the k-step, a common stopping criterion would recommend to cease the iteration process when the following situation is reached: where δ is estimated according to the desired precision. If the aim is to denoise a signal the value of δ may be set as the variance of the noise, when available. It is appropriate to remark, however, that in the context of some applications the selection criterion (37) may not be the adequate one. Instead, other criteria based of statistical properties may be required [8,9,10].
In any case, regardless of the criterion for selecting the coefficient c N j to be overlooked, if one wishes the remaining ones to yield the optimal approximation in a minimum distance sense, such coefficients should be modified as indicated in (32). We illustrate next, by a simple example, the gain that results in following this prescription.
Let us consider N = 13 elements |α n ; n = 1, . . . , 13 whose functional representation are given by the following shifted Mexican hat wavelet where each A n is a constant which normalizes the corresponding function |α n to unity in the interval [−4, 4]. We construct the biorthogonal functionsα 13 n (t) (assumed to be known) by applying the forward biorthogonalization technique of Section II.A, but they could be constructed by any other available method.
The signal f (t) is considered to be the pick plotted by the continuous line of Figure 1. Such signal is also expressible by a linear combination of the waveforms given in (41). A high quality fitting results by using the corresponding 13 coefficients c 13 n , each of which is calculated as c 13 n = α 13 n |f . The actual numbers being the following: We now disregard the last two coefficients, the ones of smallest magnitude, and use the remaining ones without modification. Although the neglected coefficients are quite small in comparison with some of the others, the approximation that results, represented by the dotted line in Figure 1, does not fit correctly the distribution tails. Nevertheless, if we disregard the same coefficients but modify the others by applying (32) twice, the resulting approximation happens to coincide with the continuous line of Figure 1. To magnify the effect we wish to show, let us now disregard two more coefficients, those of value 0.2957. The approximation that results from a simple truncation is shown by the darker dotted line of Figure 2. The slender dotted line plots our approximation. This very simple example clearly illustrates the significance of the proposed modification of coefficients.

IV. CONCLUSIONS
A recursive approach for adapting biorthogonal functions so as to obtain orthogonal projections onto a reduced subspace has been proposed. The required modifications are simple and easy to implement. The modified functions are used to adapt coefficients of a lower order linear model, in order to obtain an optimal approximation in a minimum distance sense.
A criterion for disregarding coefficients has being discussed. Such criterion leads to an iterative procedure for successive backward approximations which yields, at each iteration, minimal residual norm. It should be stressed that, regardless of the criterion used for neglecting coefficients, the proposed approach may be applied to guarantee optimality (in a minimum distance sense) of the remaining approximation. We believe, thereby, that this technique is potentially applicable to a broad range of problems including data compression, noise reduction and sparse representation.