The least squares problem appears, among others, in linear models,
and it refers to inconsistent system of linear equations. A crucial question is
how to reduce the least squares solution in such a system to the usual solution
in a consistent one. Traditionally, this is reached by differential calculus. We
present a purely algebraic approach to this problem based on some identities
for nonhomogeneous quadratic forms.
1. Introduction and Notation
The least squares problem appears, among others, in linear models, and it refers to inconsistent system Ax=b of linear equations. Formally, it reduces to minimizing the nonhomogeneous quadratic form f(x)=(Ax-b)T(Ax-b).
Classical approach to the problem, presented in such known books as Scheffé ([1, Chapter 1]), Rao ([2, pages 222 and 223]), Rao and Toutenburg ([3, pages 20–23]), uses differential calculus and leads to the so called normal equation ATAx=ATb, which is consistent. The aim of this note is to present some useful algebraic identities for nonhomogeneous quadratic forms leading directly to normal equation.
Traditional vector-matrix notation will be used. Among others, if M is a matrix then MT, ℛ(M), and r(M) stand for its transposition, range (column space), and rank. Moreover, by Rn will be denoted the n-dimensional euclidean space represented by column vectors.
2. Background
Any system of linear equations may be presented in the vector-matrix form asAx=b,
where A is a given n×p matrix, b is a given vector in Rn, while x∈Rp is unknown vector. It is well known that (2.1) is consistent, if and only if, b belongs to the range ℛ(A).
If (2.1) is inconsistent, one can seek for a vector x minimizing the norm ∥Ax-b∥ or, equivalently, its square (Ax-b)T(Ax-b). The Least Squares Solution (LSS) of (2.1) is defined as a vector x0∈Rp such that(Ax0-b)T(Ax0-b)≤(Ax-b)T(Ax-b)∀x∈Rp.
A crucial problem is how to reduce the LSS of the inconsistent equation (2.1) to the usual solution of a consistent one. Formally, the least squares problem deals with minimizing the nonhomogeneous quadratic form f(x)=(Ax-b)T(Ax-b). Traditionally, this problem is solved by differential calculus and leads to the normal equation ATAx=ATb.
In the next section, we will present some useful algebraic identities for nonhomogeneous quadratic forms. They yield directly the inequality (2.2).
3. Identities and Inequalities for Nonhomogeneous Quadratic Forms
The usual, that is homogeneous quadratic form is a real function fM(x)=xTMx defined on Rp. In this note, we shall consider also nonhomogeneous quadratic forms of typefM,a(x)=(x-a)TM(x-a),
where M is a symmetric p×p matrix and a is a vector in Rp.
Some inequalities for nonhomogeneous quadratic forms may be found in Stępniak [4]. Let us recall one of these results, which is very useful in the nonhomogeneous linear estimation.
Lemma 3.1.
For any symmetric nonnegative definite matrices M1 and M2 of order p, the condition
(x-x1)TM1(x-x1)+c1≥(x-x2)TM2(x-x2)+c2
for some c1,c2∈R, x1,x2∈Rp and all x∈Rp implies that M1-M2 is nonnegative definite and c1-c2≥0.
Now we will present some identity which may serve as a convenient tool in the LSS of (2.1). For convenience, we will start from the case r(A)=p, leaving the singular case r(A)<p to Section 5.
Proposition 3.2.
For arbitrary n×p matrix A of rank p and arbitrary vectorb∈Rn(Ax-b)T(Ax-b)=(ATAx-ATb)T(ATA)-1(ATAx-ATb)+bT[In-A(ATA)-1AT]b.
Proof.
Let us start from the trivial identity
In=A(ATA)-1AT+[In-A(ATA)-1A].
We only need to premultiply this identity by (Ax-b)T, postmultiply it by (Ax-b), and then collect the terms to get (3.3).
4. Least Squares and Usual Solutions: Nonsingular Case
As above, we consider an inconsistent equation Ax=b, where A is an n×p matrix of rank p. We are interested in the LSS of this equation.
Theorem 4.1.
Vector x∈Rp is a Least Squares Solution of the inconsistent equation Ax=b, if and only if, it is usual solution of the consistent equation
ATAx=ATb.
Proof.
Consistency of (4.1) follows from the fact that ℛ(ATA)=ℛ(AT).
We note that the second component in the right side of the identity (3.3) does not depend on x. Thus, we only need to minimize the first one. Since ATA, and in consequence (ATA)-1, is positive definite, this component is minimal, if and only if, ATAx-Atb=0. This completes the proof.
5. General Case
If r(A)<p then the matrix ATA is singular, and, therefore, the identity (3.3) is not applicable. However, as we will show, it remains true if we replace (ATA)-1 by arbitrary generalized inverse (ATA)-.
There are many papers on generalized inverses and several books; among others Bapat [5] Ben-Israel and Greville [6], Campbell and Meyer [7], Pringle and Rayner [8], and Rao and Mitra [9]. A recent paper by Stępniak [10] may serve as a brief and self-contained introduction to this field.
Let us recall that a given n×p matrix A, its generalized inverse A- is defined as an arbitrary p×n matrix G satisfying the condition AGA=A.
A key result in this section is stated as follows.
Proposition 5.1.
For arbitrary n×p matrix A and arbitrary vector b∈Rn(Ax-b)T(Ax-b)=(ATAx-ATb)T(ATA)-(ATAx-ATb)+bT[In-A(ATA)-AT]b,
where - means arbitrary generalized inverse.
Proof.
The idea of the proof is the same as in Proposition 3.2. Since ℛ(ATA)=ℛ(AT) one can replace the vector ATb by ATAc for some c.
Now we will apply the identity (5.1) to the least squares problem.
Theorem 5.2.
For arbitrary n×p matrix A and arbitrary vector b∈Rn,
a vector x∈Rp is a Least Squares Solution of the equation Ax=b, if and only if, it is the usual solution of the (consistent) equation ATAx=ATb;
the lower bound of (Ax-b)T(Ax-b) is equal to bT[In-A(ATA)-AT]b, and it does not depend on choice of generalized inverse (ATA)-.
Proof.
By setting ATb=ATAc, the first component in the right side of (5.1) reduces to (x-c)T(ATA)(x-c) which is nonnegative and takes zero, if and only if, (ATA)(x-c)=0 or, equivalently, if ATAx=ATb. Since the second component does not depend on x, this is just total minimum. The same setting shows that the lower bound does not depend on the choice of generalized inverse.
Acknowledgment
The author thanks a referee for his (or her) useful suggestions concerning presentation of this paper.
SchefféH.1959New York, NY, USAJohn Wiley & Sonsxvi+4770116429ZBL0086.34603RaoC. R.19732ndNew York, NY, USAJohn Wiley & Sonsxx+625Wiley Series in Probability and Mathematical Statistic0346957ZBL0386.93041RaoC. R.ToutenburgH.1995New York, NY, USASpringerxii+352Springer Series in Statistics1354840ZBL1213.76224StępniakC.Admissible linear estimators in mixed linear models198931190106102235510.1016/0047-259X(89)90052-3BapatR. B.20002ndNew York, NY, USASpringerx+138Universitext1748349Ben-IsraelA.GrevilleT. N. E.20032ndNew York, NY, USASpringerxvi+420CMS Books in Mathematics/Ouvrages de Mathématiques de la SMC, 151987382CampbellS. L.Meyer,C. D.Jr.1991New York, NY, USADoverxii+2721105324PringleR. M.RaynerA. A.1971London, UKGriffinRaoC. R.MitraS. K.1971New York, NY, USAJohn Wiley & Sonsxiv+2400338013StępniakC.Through a generalized inverse20084122912962419906