A FULLY PARALLEL METHOD FOR TRIDIAGONAL EIGENVALUE PROBLEM

. In this paper, a fully parallel method for finding all eigenvalues of a real matrix pencil (A, B) is given, where A and B are real symmetric tridiagonal and B is positive definite. The method is based on the homotopy continuation coupled with the strategy Divide-Conquer and Laguerre iterations. The numerical results obtained from implementation of this method on both single and multiprocessor computers are presented. It appears that our method is strongly competitive with other methods. The natural parallelism of our algorithm makes it an excellent candidate for a variety of advanced architectures.

If fli= 7 i= 0 for some i, then (I) can clearly be decomposed into two subproblems and we can solve them independently.Hence, we will assume all fli and Consider the homotopy H:Rx[0 1] R, defined by () H(A,t) det((1 t)(C-AD) + t(A det(A(t)-AB(t)), where A(t) (l-t)C+tA and B(t) (l-t)D+tB.The pencil (C,D) is called an initial pencil.
In section 2, we shall show that the solution set of ll(A,t)=O in (6) consists of continuously differentiable curves A(t), each joins an eigenvalue of (C,D) to one of (A,B).lie call each of these curves a h0m0t0py cure or an eigencurve, lle shall also show that each eigenvalue curve is monotonic in t.
And if m, the multiplicity of an eigencurve A(t) is greater than one in any subinterval of [0,1], then it must be a constant curve.In the consequence, it is an eigenvalue of (A,B) of multiplicity of m or m+l.
lie shall give the details of our algorithm in section 3 and some numerical results will be presented in section 4.
PROOF.First of all we show that A(t), a solution of H(A(t), t) 0 is real for any in [0,1].Since U(A(t),t) det(A(t)-A(t)B(t)), we only need to show that B(t) is positive definite for all in 0,1].Let A(t) be the smallest eigenvalue of B(t), then by Cauchy's interlace theorem [8], AI(O > Al(t) for all in (0,1].By Proposition 2.1 in [6], Al(t is strictly monotonic in t.Therefore, Al(t) is strictly monotone decreasing in t.
ilence A(t) > A(1) > 0 for all in [0,1) since B(1) B is positive definite, llence B(t) is positive definite for all in [0,3.Now we show that A(t) is continuously differentiable.Clearly, H(A,t) can be written as: H(l(t),t)=Cn(t)An(t)+Cn_(t)An-(t) + + cl(t)A(t)+Co(t), where ci(t)'s are polynomials in t.Let A(t) be a solution of H(A(t),t) 0 and o any point in (0,1).By Puiseux's theorem [10], A(to+ A(to)+bth + b2h + converges for sufficiently small .Let b m denote the first nonzero coefficient, then bin= lira (to + )-(to) is real since A(t0+e), A(t0) and are all real.On the other hand, is also real.Hence, (-1) X is a real number.Therefore, m must be a multiple of h.
We can continue the same argument to show that only integral powers of can have nonzero coefficients.
Therefore, A(t) must be strictly monotonic in t.
Since rank (A(t)-A(t)B(t))=n-m<n-1 for any in a,b] and A(t)-A(t)B(t) is symmetric tridiagonal, at least one of the off-diagonal elements of A(t) -(Z)B(t), say s-A(t)Vs, is equal to zero at two points, say and tj.
Hence, A(t) is constant for all in [0,1] and A(O) is an eigenvalue of (A,B) of multiplicity of m or m+l.

2,
Q.E.D. From Proposition 2.3, we will not have any those bifurcation curves in Figure If A(t) is not constant, then its multiplicity must be 1.
Contradiction. t=O t2 I Similarly, if Ai(t) and Ai+(t) are both strictly monotone decreasing, then the conclusion holds.
Q.E.D. From these propositions, we know that all eigenvalues of the initial pencil not only close to the eigenvalues of pencil (A,B), but also separate them.These results provide us very important information in designing our code.
For a given matrix pencils (A,B), let k=[n/2], then form an initial matrix pencil as in ( 4) and ( 5) in Section 1.If k is greater than 2, repeat above procedure on those submatrices, until the dimension of each submatrices is less than or equal to 2. Now, .(C,D) is the initial matrix pencil, where C drag(As, l, As, z, As, p) and D diag(Bs, l, Bs, Bs, p), p for some positive integer s.We have & tree (See Figure 6).
Since Asi and Bsi are at most 2x2 matrices, the eigenvalues of pencil (Asi, Bsi can be easily obtained. (iii) Conquer.
After all the eigenvalues of each submatrix pencil (Asi, Bsi) 1,2, p are available, we then compute all the eigenvalues of each submatrix pencil (Asi Bsi), i= 1,2 ,p/2 by simple step homotopy method with Laguerre iterations.It is sufficient to assume that all the eigenvalues of matrix pencils (All Bll and (AI, BI are available and we want to compute all eigenvalues of matrix pencil (A,B).
Let Al(t), A2(t An(t be n eigencurves of H(A,t), where Al(0 )_A2(0 )_ _ An(0 are the eigenvalues of matrix pencil (C,D).We conquer (C,D) to (A,B) with Laguerre iteration 12], i.e., use the cigenvales of (C,D) as starting points of Laguerre iterations to compute all eigenvalues of (A,B).
(a) Some knowledge of the Laguerre iteration.Let f(A) be a function having n real zeros A < A < < A n and z 0 lie between A and Ai+t, then Laguerre iteration gives the two sequences zm + Zm () One of which converges to Ai, another to Ai+ monotonicly.In the neighborhood of a simple zero at Ai, (7) converges cubicly to A i. On the other hand, in the neighborhood of a zero A of multiplicity r, Zm+ z m nf(Zm) I'(m) + "i-(l'(m))/(" ")((l,(m))2 "I(m)I"(m)) also converges to A cubicly.

(b) Computing /(A) if(A) and
Let .f(A)det{A-AB}, then all the zeros of /(A) are real.Since

A-AB
we have the following three term recursion.
(c) The overflow and the underflow control.Since ](A) det(A-AB), f(A) could be very small or large.Since f(A) can be written as () ni oi(-), l'(,x E j n # j(-i) and n n n ai(A-i) I"(X) Et= Ej= hi#j, t Ve can see that we can find # such that I(A) o100 I'(A) 7100 and f'(A) 6100 where , 7 and 6 have magnitude between the machine minimum and maximum numbers.Hence, when we compute pm(A), p(A) and p(A), if pm(A) is too large (or too small), then we multiply pro(A), p(A) and p(A) by 100 for some integer so that pm(A)100 is not too large (or too small).In this way, we can avoid overflow as well as underflow.Finally, we will get n, n(A) and n(A) and all of them have magnitude between machine minimum and maximum numbers and /(A) =lO*p" n ,I'(A)= 10" n and if(A) I0 s n, for some integer s.Since Zrn + Zrn n.f(zm) f'(Zm) /(n--1)S(f'(Zm)) s-n(n-l).f(Zm)f"(Zm)-.() -' P n(Zm) + 1)2( n(Zm)) -n(n-1)' (Zm)''/(::m) overflow and underflow can be avoided.Now we give details of computing eigenvalues of (A,B).
Step 1.If i 1, go to step 2. Use 1(0) as a starting point to do Lguerre iteration to locate p, the smallest eigenvalue of (A,B).By computing f(A;(O)), we know exactly if {Zrn}, the sequence generated by Laguerre iterations, will converge to Pl" If it does not, then use At(O)-a as a starting point, where a 2ik+x/Tk+x if 7k+lO and a 2iD&+I if 7k+t=O.If AI(O < /l {Zm} will converge to Step 2. If Ai(O) is not a simple eigenvalue of (C,D), then go to step 3.If   IAi(O)-pi_ >, where tolerance* max{IAn(O) l,lAl(O)l }, use Ai(O) as a starting point.Then, {Zrn} will either converge to Pi or Pi+I' since the starting point according to the propositions 2.2 and is either in the neighborhood of Pi or Pi+l 2.4 in section 2.
If it converges to Pi' then set i=i+1, go to step 5.If it converges to Pi+l' then use (i+l +i-l)/2 as a starting point to do Laguerre iteration and the new sequence {zm} will converge to i" Set i=i + 2, go to step 5.
Step 3. If the multiplicity r of i(0) is equal to 2, go to step 4. by the propositions in last section, Ai(O) is an eigenvalue of (A,B).The generalized Sturm sequences at i(O)+/are computed on (A,B) to check the multiplicity of Ai(O).t/e may have following cases according to the proposition 2.3.
Step 4. The generalized Sturm sequence at i(0) e are computed on (A,B) to check if i(0) is an eigenvalue of (A,B).If it is not a eigenvalue of (A,B), then Two sequences {:rn} and {z} will be generated by Laguerre iterations with Ai(0) as a starting point.One of them converges to #i and another to Pi+I" Then set i=i+2, go to step 5.
If i(0) is an eigenvalue of (A,B), then we amy have following cases according to the proposition 2.2.
If the eigenvectors are required, compute them by inverse iteration. 4. NUIIERICAL RESULTS In this section, we present our numerical results.Our homotopy algorithm is in its preliminary stage, and much development and testing are necessary.But the numerical results on the examples we have looked at seem remarkable.I/e implemented our algorithm on 50 pencils (A,B) on each different dimension, where A is symmetric tridiagonal random matrix with both diagonal and off-diagonal elements being uniformly distributed random numbers between 0 and 1. B is a symmetric tridiagonal matrix with off-diagonal elements, 7i, being uniformly distributed random numbers between 0 and 1, and its diagonal elements 6i 2rnaz(Ti, Ti + 1)" The computations were done on a Sun SPARC station 1.

Table
shows the results of our algorithm HILST and the algorithm ILS6 in EISPACK [4].The algorithm RSG first reduces Az=ABz to y=Ay, then solves it.
Since is a full matrix, the RSG is unattractive for this problem as we mentioned in section 1.
But the RSG is the only algorithm available for this problem in EISPACK.EXPERIMENT 2. (a) A is the Toeplitz matrix [1, 2, 1], i.e., all diagonal elements, a(i), are 2 and off-diagonal elements are 1.B I. (b) A is the Wilkinson matrix, i,e., the mtrix [1, a(i), 1], where a(i) abs((n+l)/2 -i) i=1,2 n with n odd.B=I.
Table 2 shows the computational results comparing our algorithm HRST with the bisection algorithm DSTEBZ in LAPACK [1] on these two problems.It appears that our algorithm leads in speed by a considerable margin in comparison with the DSTEBZ.EXPERIMENT 3. Matrices A and B are obtained from piece wise linear finite element [11] discretization of the Sturm-Liouvi11 problem -d (p(x))+q(x)u-- where u=u(z), 0 < x<r and u(O) u'(x)=0 and p(x) > O. Here, both A and B are symmetric tridiagonal and positive definite.We use p(z) and q(x)=6.
The computations were executed on BUTTERFLY GP 1000, a shared memory multiprocessor machine.
The speed-ap is defined as execution time using one processor Sp= execution time using p processors and the effscaencl is the ratio of the speed-up cover p. Table 3: Execution time(second), speed-up and efficiency of the algorithm HRST.

Order
Table 3 shows the execution time and the speed-up Sp as well as the efficiency Sp/p of our algorithm.
It appears that our algorithm is very efficient.
The natural parallelism of our algorithm makes it an excellent candidate for multiprocessor machines.
Figure 5 3. ALGORITHM Our algorithm is based on following steps.
From Proposition 2.4, if Ai(t) and Ai+(t) are both constants or both are not constants, then they must be disjoint.However, if one of them is constant, they

Table l :
Average execution time (second) of computed eigenvalues and eigenvectors.