FAST INTERSECTION METHODS FOR THE SOLUTION OF SOME NONLINEAR SYSTEMS OF EQUATIONS

We give a fast method to solve numerically some systems of nonlinear equations. This method applies basically to all systems which can be put in the form U ◦V(X) = Y , where U and V are two possibly nonlinear operators. It uses a modification of Newton’s algorithm, in the sense that one projects alternatively onto two subsets. But, here, these subsets are not subspaces any more, but manifolds in a Euclidean space.


Introduction
As we will see below (Section 3), a water distribution problem can be put in the form where Λ : n × N, A : N × n, X : n × 1, B : N × 1, and D : n × 1 are matrices with real entries.The unknowns X represent the pressures (to be determined) at all points of the water distribution system.The matrices Λ, A, B, D are known.
The function f is defined by the formula and this is a differentiable increasing function which is invertible.The inverse is the function In problem (1.1), the meaning of f (X), if (1.5) So, we see that problem (1.1) is well posed: there are as many unknowns as equations, but the problem is nonlinear, due to the presence of the function f .If f was a linear function, the problem would easily be solved by the usual techniques of linear algebra.
The classical approaches to problem (1.1), as it stands here, are by means of optimization techniques.Indeed if one wants to solve G = 0, one looks for the minima of F, where F = G.See [5,6,10] for precise descriptions of these algorithms.
Techniques for nonlinear optimization work in a satisfactory manner for problems of limited size, but here the unknowns, for large systems, may be of order 10 000, 100 000, or more.So fast techniques are needed if one wants the computation to be performed in a reasonable time.The technique we present here is much faster than optimization tools, for it uses only intersections of subspaces; see [8].

The intersection procedure
We introduce two subsets of (2.1) We look for the intersection of F and G. Indeed, if Y ∈ F, ΛY = D and if Y ∈ G, there exist X such that Y = f (AX + B).Therefore we get (1.1).The set F is an affine subspace of R N (translate of a linear subspace).The set G is not an affine subspace, due to the function f which is not linear, it is precisely a manifold, contained in R N .But, as we will see later, we will still work on an affine subspace.We still speak of a "subspace," though it is formally a manifold.
Because N is usually much larger than n, neither F nor G is reduced to single points.They are usually large-dimensional subspaces.Therefore, it would be unwise to try to get a complete formal description of both of them and then take the intersection.Such an approach would be very slow, costly, and quite useless, we want to find the intersection, not to describe both.
In order to find the intersection of F and G, we will use a Newton-type algorithm.We start at any given point and we project alternatively upon F and G.We now describe this procedure.

2.1.
Projecting upon the affine subspace F. The orthogonal projection onto a linear or affine subspace is quite simple, conceptually speaking.But an efficient computer implementation requires some precaution.

Bernard Beauzamy 129
Let, as before, where Λ : n × N, Y : N × 1, and D : n × 1 are matrices with real entries.We write Λ in the form where Q is a square matrix, N × N and orthogonal, and R is of the form (R 1 ,O), where R 1 is a lower-triangular, invertible, square n × n matrix and O is an n × (N − n) matrix entirely made of zeros.This decomposition follows from the classical QR decomposition, taking transpose.So we get and, left-multiplying by There are n equations of the first type and N − n of the second type.Let We construct a basis for F, set f j = Q −1 (d 1 e 1 + ••• + d n e n + e j ), j = n + 1,...,N.These vectors are clearly independent.Moreover, which shows that they are a basis for F.
The conditions Y − Y orthogonal to f j can be written as which means that The condition Y ∈ F can be written as and thus (2.12) Putting back into (2.10),we get This gives explicitly the projection.Then we find Y taking t QZ , the matrix Q is orthogonal and so its inverse is equal to its transpose.
and Y − Y is colinear with the vector (a 1 ,...,a N ).So we have

.15)
We may use this approach when F is no longer a hyperplane, but any affine subspace.Indeed, any affine subspace is an intersection of hyperplanes (if the subspace has codimension p, it is the intersection of p hyperplanes).So we can project onto the hyperplanes, one after the other.But, in this way, the point we obtain is not exactly inside the intersection; it gets closer to it when we iterate the algorithm.Remark 2.2.It will be convenient to choose a basis for which Q = I, if we want to work on F. But one cannot do this for both F and G at the same time.

Projection onto the manifold G. First, we define an affine subspace
(2.16) It will be convenient to put H in the previous form, so we can argue as we did for F: where U is a given n × N matrix and V is a given n × 1 matrix (i.e., a column vector).

Bernard Beauzamy 131
This is done in the following way: (see, e.g., Beauzamy [2]), and so this happens if and only if P o (Y − B) = 0, where P o is the orthogonal projection onto Ker t A. This gives the expression (2.17).
In order to project onto G, we set Y = f (Z), Z ∈ R N , and we look for Y in the form We take as a distance This is not the usual Euclidean distance, but the set intersections will be the same.
In order to project the point Y onto the manifold G, we project the point Z = g(Y ) onto H for the usual metrics (recall that g is the inverse function to f ).So we find the projection Z and we take Y = f (Z ).
So the algorithm is finally described as the iteration of the procedure where Π is a nonlinear operator.Quite clearly, any fixed point of Π is a solution to our problem, since it is in the intersection of F and G. Indeed, let x be a fixed point of Π.
Then clearly x ∈ F and f • P H • g(x) = x, which implies P H • g(x) = g(x), which shows that g(x) ∈ H.But then x belongs to g −1 (H) = f (H) = G, which proves our claim.We observe that the nonlinear operator Π is not a contraction in general, that is, it does not satisfy the property for all x and y.Indeed, this property would imply, for all points x and y in F, and this property is not true in general.The orthogonal projection onto a subspace is not a contraction for a distance which is not the Euclidean distance (see Beauzamy [1] and Beauzamy and Maurey [3]).Since the orthogonal projection is not a contraction, a fortiori it is not a strict contraction, that is, it does not satisfy dist P H (ξ),P H (η) ≤ c dist(ξ,η) (2.24) for some c < 1, so the usual theorems about fixed points for strict contractions do not apply here (see, e.g., Goebel and Kirk [7]).
So we leave it as an open question to prove the convergence of the algorithm as it is defined here precisely.In practice, other choices can be taken for the projection onto G (for instance, to move in some direction if necessary) and one can easily construct algorithms for which the sequence of iterates x n converges.
Indeed, let P G be the projection of closest approximation onto the manifold G, that is, We consider the iteration of the projection onto G and then onto F, that is, P F • P G .Then, starting from any point x, the distances decrease strictly: and so on.Since the intersection of F and G exists, the limit of the decreasing sequence will be zero.The problem with this algorithm is that the practical implementation of the projection P G onto a manifold is quite hard to implement and the computation will be lengthy.This is why we chose to project onto a subspace.
As it stands here, the algorithm we describe here will be fast.The matrix Λ being given, the decompositions (2.3) and (2.5) are computed only once.Then, for any point Z, the projection Z onto F, given by formulas (2.13), is obtained as a sum involving only one multiplication.Such an operation is extremely fast and problems with N ≈ 10 5 can be handled in milliseconds.Moreover, standard routines exist for such tasks.

Taking constraints into account.
In practice, constraints are added to problem (1.1) for practical reasons: the flow is limited, the water is compelled by a valve to run in one direction only, or the pressure is limited, and so on.These constraints appear as follows: (i) constraints upon the elements of X, such as x i ≥ 0 or x i ≤ C; (ii) constraints upon the elements of Q = AX + B, such as q j ≥ 0 or q j ≤ C; (iii) the elements of the matrix Λ may contain variable coefficients, in the formula ΛY = D, some of the λ i, j depend on Y (in practice, they may only take two values, depending only on one variable).The first two types of constraints are easily handled by our method.Instead of an affine subspace, one only has a "piece" of a subspace, that is, the intersection with half-spaces.We see how one handles the third type, which is more difficult.Some lines of the matrix are still constant.We treat them as before, altogether (each line gives a hyperplane).
So we have two possible hyperplanes, and the same with λ 1 .

Bernard Beauzamy 133
We project the point Y upon the first one, we get Y , and Y when we project upon the second one.
Several cases may occur: (i) y 1 > 0 and y 1 > 0: only the first one is acceptable; (ii) y 1 > 0 and y 1 < 0: both are acceptable and we choose the closest; (iii) y 1 < 0 and y 1 > 0: none is acceptable and we project upon x 1 = 0; (iv) y 1 < 0 and y 1 < 0: only the second is acceptable.
So, in fact, we are working on half-hyperplanes.The principle will be the same if two coefficients are variable.
One might make a continuous (or even differentiable) junction between both pieces and project onto this differentiable manifold, but this would be much longer to obtain.In our case, we may even predict which piece of the hyperplane will be used.

Modeling the physical problem
In this section, we explain how (1.1) is obtained from the physical problem; further reading may be found in [4,9,11].We have a water distribution problem, with sources, consumers, and pipes in between, with many interconnections.A node is a place where a consumer or a source stands.There are S sources and N consumers (nodes which are not sources).
We define the following: (i) t i, j is the physical pipe from node i to node j; (ii) q i, j the flow from node i to node j; some orientation is chosen and this flow is positive or negative; (iii) λ i, j a physical coefficient on the pipe t i, j ; (iv) c i the water consumption at node i; (v) C the total water consumption of the water distribution system C = N i=1 c i ; (vi) H i a physical coefficient associated with the node i; (vii) T is the number of pipes in the system; we count the pipes which go from a source to a node.
We have the following equations.
(i) At each node i, j q j,i = c i .(3.1)This equation says that the (algebraic) sum of all quantities of water arriving at a node is equal to the consumption at this node.(ii) For each pipe t i, j , g i, j q i, j = H i − H j (3.2) with g i, j (x) = λ i, j x|x|.
(iii) For the sources, the coefficients H s are known: We want to compute all the flows q i, j in all pipes, the flow coming out of each source, and the coefficients H i at each node.So, altogether, we have T + S + N unknowns.From (3.2), we deduce that for any closed circuit m (a closed circuit is called a mesh in the technical vocabulary), (i, j)∈m λ i, j q i, j q i, j = 0. (3.4) The sum is computed after a sense is defined on the mesh.leads to the equation λ 1,2 q 1,2 q 1,2 + λ 2,3 q 2,3 q 2,3 + λ 3,4 q 3,4 q 3,4 + λ 4,1 q 4,1 q 4,1 = 0 (3.6) or rather λ 1,2 q 1,2 q 1,2 + λ 2,3 q 2,3 q 2,3 + λ 4,3 q 4,3 q 4,3 − λ 1,4 q 1,4 q 1,4 = 0. (3.7) Let M be the number of meshes in the system.In order to write a general equation describing the net, we define a matrix W, T × M, as follows: (i) if the pipe t does not belong to the mesh m, then W t,m = 0; (ii) if the pipe t belongs to the mesh m, then W t,m = ±1, depending on whether they have the same orientation or not.
We also introduce a matrix U, T × S matrix, whose term U t,s is obtained as follows: (i) each mesh of the water system is "broken," withdrawing one pipe in each mesh; (ii) all sources are turned off, except the source s; (iii) the flows in all pipes are computed starting from the extremities of the tree, and moving upwards, one adds the consummations to be satisfied.
Then the coefficient U t,s is the flow in the pipe t, with only the source s active after normalization, that is, after division by the total consumption C.

Bernard Beauzamy 135
Let be the column vector of the contributions of the sources (which are unknown) and let (3.9) be the column vector of the additional flows inside the meshes, when one puts back the pipes which were withdrawn (see above).These flows are of course unknown.
Then the basic equation is We can rephrase our problem as follows.Find the q s and the V m (a total of S + M unknowns) such that s q s = C, path S1→Si λ i, j q i, j q i, j = H(1) − H(i), mesh m λ i, j q i, j q i, j = 0, (3.12) The previous equations can be written as for K = 1,...,n.The index t replaces the (i, j) of the pipes t i, j after sequential numbering µ k,t = 0 or ±λ t .
We set So X is an n-dimensional vector.
Using equation (3.10), we find that Q can be written as