The Ritz method with Lagrange multipliers

We develop a general form of the Ritz method for trial functions that do not satisfy the essential boundary conditions. The idea is to treat the latter as variational constraints and remove them using the Lagrange multipliers. In multidimensional problems in addition to the trial functions boundary weight functions also have to be selected to approximate the boundary conditions. We prove convergence of the method and discuss its limitations and implementation issues. In particular, we discuss the required regularity of the variational functional, the completeness of systems of the trial functions, and conditions for consistency of the equations for the trial solutions. The discussion is accompanied by a detailed examination of examples, both analytic and numerical, to illustrate the method.


Introduction
In variational problems linear boundary conditions are often divided into essential (geometric) and natural (dynamic) [1,II.12], [2, 4.4.7]. More generally, one calls the boundary conditions essential if they involve derivatives of order less than half of the order of the differential equation, and natural otherwise [3, I.1.2]. The common lore on the Ritz method is that the trial functions may violate the natural conditions, but must satisfy all the essential ones [2, 4.4.7], [4]. The reason is that the variational equations force the natural conditions on the trial solutions anyway, even if the trial functions themselves do not satisfy them.
But what if we wish to use trial functions that violate the essential conditions as well? For instance, in problems involving parametric asymptotics the trial functions are pre-imposed with no regard for boundary conditions [5,6], and in initialboundary problems with time-dependent boundary conditions the (time independent) trial functions can not satisfy them in principle. One may also wish to use such violating trial functions because they are simpler. Sure, it is easy enough to adjust them in one-dimensional examples, but it is not so easy at all in higher dimensions, especially in problems with fancy boundaries. Thus, there is abundant motivation to generalize the Ritz method to the trial functions that do not satisfy the essential conditions. However, surprisingly few authors consider such generalizations, many works on the subject are rather old, and sometimes give prescriptions that produce inconsistent systems and/or erroneous approximations [7,8].
This is not to say that nothing has been done at all. One idea is to treat the essential boundary conditions as variational constraints and to remove them as any other constraints using the Lagrange multipliers. This idea is so natural that it appears occasionally in some applied works at least since 1946, see [9], where the authors explicitly cite the simplicity of the trial functions that violate the essential conditions as a reason for using them. Violating trial functions were even used in the boundary eigenvalue problems for vibrating plates, see [10] and references therein.
However, in all of these works Lagrange multipliers are essentially applied by analogy to finite dimensional optimization, numerical success serving as the only validation.
A systematic discussion, or even a general description of the method, accounting for its requirements and limitations seems to be missing in the literature. This is all the more surprising since, as we shall see, the standard theory of the Ritz method [11,12,13] can be readily accommodated to handle such a generalization.
A very interesting paper [14] discusses one particular issue that arises when using the Lagrange multipliers in one-dimensional boundary value problems, completeness of the trial functions. We will discuss this and other issues that come up, extend the method to higher dimensions and give a general proof of its convergence. Our analysis indicates that more care is needed compared to the usual Ritz method, to wit, the variational functional has to be more regular on a larger space, the trial functions have to be complete in this larger space as well, and the multipliers can not be eliminated from the approximating systems using the usual variational formulas because of convergence issues. In the higher dimensional problems one needs to select boundary weight functions in addition to the trial functions, and balance the numbers of each to obtain well-posed approximating problems.
We try to keep our discussion more suggestive than technical, so the paper is structured somewhat unusually. We start in Section 2 by presenting some simple one-dimensional examples, and apply to them some natural looking approximating procedures recommended in the literature when the trial functions do not satisfy the essential boundary conditions. The examples are so simple that not only the exact solution, but even all trial solutions can be computed analytically. Distressingly, in many cases the approximations do not exist, do not converge, or converge to a wrong answer. The reasons turn out to be subtle, but they all can be traced to the vagueness in the concept of "approximation". In Sections 3, 4 we flush out the false assumptions behind the failed approximations and develop the Ritz-Lagrange method that avoids them, proving along the way that it works. Unfortunately, in its original form the method only applies to one-dimensional problems, so more developing and proving is done in Section 5 for higher dimensions. Numerical applications to multidimensional problems follow in Section 6. The paper ends with Conclusions, where we summarize our findings and discuss Galerkin type generalizations. Technical proofs are collected in the Appendix.

One-dimensional examples
In this section we test several reasonably looking methods on simple one-dimensional problems. Not all of them work and, as we will find out later, some of them work for the wrong reasons. These examples will highlight some subtleties and serve as a motivation for developing a general method later on. A typical procedure for numerical solution of boundary value problems is to represent the trial solution as a linear combination of finitely many trial functions and solve a finite dimensional problem that results. Most frequently the trial functions are required to satisfy the essential boundary conditions, otherwise some additional effort is needed to take care of them. Problem 1. Consider a boundary value problem for the second-order equation u xx = f on [0, L] with essential boundary conditions on both ends of the interval u(0) = u(L) = 0. We set for convenience L = π, and f = 1 to make everything explicitly computable. The exact solution is easily found to be u = 1 2 Lanczos tau method. This is perhaps the most popular method that uses trial functions not satisfying essential conditions. The reader is warned that we use the broad understanding of the method, as in [5,6] (review [7] also describes it without using the name). Many authors, including Lanczos himself, understood it much more narrowly, only allowing Chebyshev polynomials as trial functions, see [15]. The basic idea is to make the residual, the difference between the left and the right hand sides of the equation, orthogonal only to some trial functions used in the trial solution. This creates a shortfall in the number of equations compared to the number of unknown coefficients. This shortfall is used to impose the essential boundary conditions directly.
In our case the system to be solved can be written succinctly as where selected trial functions are substituted for the weight δu. We select cos nx, n ≥ 0 as our trial functions, they obviously do not satisfy the boundary conditions. Taking N of them the trial solution is of the form u (N ) = a 0 2 + N n=1 a n cos nx with unknown coefficients a i ( 1 2 in front of a 0 is for agreement with the convention for the cosine series). Since there are two boundary equations we need to omit two trial functions when choosing weights, e.g. take δu = cos mx, m = 0, . . . , N − 2 in Eq.(1).

This gives
where m = 0, . . . , N − 2. However, these equations are already inconsistent for any N since π 0 cos nx dx = 0 for n ≥ 1 and the m = 0 equation becomes π 0 −1 dx = 0. If we choose a different subset of trial functions as weights, say m = 2, . . . , N, then the equations are consistent, but we find −m 2 a m π 2 = 0, i.e. a m = 0 for m ≥ 2. The boundary conditions then imply that also a 0 = a 1 = 0 yielding u (N ) = 0 for all N.
One can see that any choice of weights here leads to inconsistency or to the trivial solution. The Lanczos tau method fails completely for our choice of trial functions.
Ritz method with boundary terms. Let us turn to the Ritz method now. The corresponding variational functional is J(u) = L 0 1 2 (u x ) 2 + f u dx and the boundary value problem is equivalent to minimizing it on functions satisfying the boundary conditions. Since our boundary conditions are essential, and our trial functions do not satisfy them, the standard approach would have to be modified in some way. It turns out that not every plausible modification works. One obvious idea is to treat the essential conditions as variational constraints and remove them using Lagrange multipliers. The Lagrange functional is L = J + λ 0 u(0) + λ π u(π), where λ 0 , λ π are the Lagrange multipliers, and the variational equation is Since the boundary variations are independent of the internal ones we see that λ 0 = u x (0) and λ π = −u x (π). This allows us to eliminate the multipliers from Eq. (3) leading to: This equation is derived as a generalization of the usual Ritz system e.g. in [8]. Let us test it on our example. As before, one substitutes the trial solution u (N ) for u and the trial functions for the weight δu. With our trial functions cos mx however δu x (0) = δu x (π) = 0 for all m. But then Eq.(4) reduces to the first equation in Eq.
(1) and produces the same system as in Eq.(2), only with m = 0, . . . , N. We already saw that the m = 0 equation can not be satisfied, making the entire system inconsistent for any N. This method does not work either.
Ritz-Lagrange method. Perhaps, instead of trying to eliminate the Lagrange multipliers from Eq.(3) we should keep them as unknowns and supplement the ob-tained equations with the boundary conditions, just as one would do for any variational constraints. One positive trait is that the number of unknowns will always match the number of equations since each multiplier corresponds to a boundary condition. Instead of relying on Eq.(3) let us find L u (N ) explicitly. Recall that Variational equations are ∂L ∂a i = 0 and adding the boundary conditions we get the following system: a n = 0 πn 2 2 a n + λ 0 + (−1) n λ π = 0 a 0 2 + N n=1 (−1) n a n = 0 .
There are N + 3 equations for N + 3 unknowns a 0 , . . . , a N , λ 0 and λ π . For even n = 0 one immediately finds that a n = 2 n 2 from the first column equations. Analogously, for odd n we have a n = − 2 πn 2 (λ 0 − λ π ). Then subtracting the second equation in the second column from the first n odd This implies that λ 0 = λ π = − π 2 and all the odd coefficients vanish. Thus, we have a n = 1+(−1) n where ⌊·⌋ is the floor function returning the largest integer not exceeding its argument. Summarizing, we conclude that the trial solutions u (N ) converge to the sum of the series Recall that by extending a square integrable function w(x) on [0, π] to an even function on [−π, π] one can expand it into a cosine series with coefficients a n = 2 π π 0 w(x) cos nx dx [16, 12.1]. Considering that we see that u (∞) is exactly the cosine series for the exact solution u = 1 2 Finally, we have a method that works for this example at least. To make sure it was not an accident we will now apply it to some other boundary value problems.
We apply the Ritz-Lagrange method again. The last two boundary conditions are natural and we need not worry about them. Thus, the variational problem is to minimize J = L 0 1 2 (u xx ) 2 − f u dx on the space of functions satisfying the essential boundary conditions only. Now let us find the trial solutions. Taking since cos nx = (−1) n . The Ritz-Lagrange system is a n = 0 πn 4 2 a n + λ 0 + (−1) n λ π = 0 a 0 2 + N n=1 (−1) n a n = 0 .
It can be solved along the same lines as system Eq.(6) and we get λ 0 = λ π = π 2 , a n = − 1+(−1) n n 4 for n ≥ 1 a 0 = a (N ) 0 , where ⌊·⌋ is the floor function returning the largest integer less than or equal to its argument. Using Eq. (7) and one can check that and therefore u − u (∞) = π 3 x 24 − π 2 x 2 24 . This time the trial solutions do not converge to the exact one, and the limit difference is a linear combination of x and x 2 .
The Lagrange functional will now be L = J + λ 0 u(0) + λ π u(π) + λ ′ 0 u x (0) + λ ′ π u(π), but if we use our trial functions cos nx, n ≥ 0 the last two terms in L will be 0 for any u (N ) . But then L is the same as in Eq. (8), and therefore the Ritz-Lagrange system is the same as in Eq. (9). Then the trial solutions are also the same and they converge to u (∞) from Eq. (11). However here, unlike in Problem 2, this is the right solution.
The reader may wish to entertain herself by thinking over the last two problems. To help we will remove one red herring, as we shall see the presence of natural conditions is not an issue here.

Continuity and convergence
In this section we will analyze what went wrong (and right) in our examples and come up with a general scheme that provably works. The Lanczos tau method is the easiest to figure out. The idea behind the method is fairly intuitive. Since the exact solution has residual 0 it is certainly orthogonal to all trial functions, and it is the only such function that also satisfies the boundary conditions. So as we construct approximations with residuals orthogonal to more and more trial functions, while also forcing the boundary conditions on them, it stands to reason that they should approach the exact solution in some sense. There are several assumptions that this argument relies on however. First, the system of trial functions should be complete, i.e. one should be able to approximate any function by their linear combinations.
Second, we have to make sure that in the limit the residuals do become orthogonal to all trial functions. This was not the case in our second application of the Lanczos tau method, when cos mx, m = 2, . . . , N were used as weights. Indeed, no residual ever had to be orthogonal to 1 or cos x. This is how the spurious solution u = 0 slipped through the cracks with the residual −1.
It is harder to explain why we got inconsistent systems when using cos mx with m = 0, . . . , N − 2. Let us use the benefit of hindsight and look at the cosine series for the exact solution u = − π 2 12 + ∞ n=1 1 + (−1) n n 2 cos nx. Cutting it off at N terms and computing the residual we get u xx − 1 = − N n=1 1 + (−1) n sin nx. Clearly, this series does not converge to 0 in any apparent sense. But our argument above relied on exactly this kind of continuity assumption: if trial solutions approach the exact solution so do the residuals, and this is not the case here. Without the exact solution to approach 'trial solutions' have nothing to approximate, hence there is no reason for the Lanczos tau system to be consistent. And indeed it was not.
Looking closely at the above reasoning the reader will notice the vagueness in the meaning of "approximation". Approximation in what sense? To talk about ap-proximating we need a way to measure distance between functions. Such measure is usually provided by Banach space norms. The norm relevant to the discussion above is the L 2 norm u L 2 := π 0 |u| 2 δu dx 1 2 with the corresponding inner product u, v := π 0 uv dx. The Banach space of square integrable functions with the L 2 norm is a Hilbert space is denoted L 2 ([0, π]), and all above references to convergence, completeness, continuity and orthogonality referred to this space. One additional issue left is that one needs J to be differentiable to minimize it Summarizing we get the theorem below. The proof is fairly standard, but we outline it in the Appendix for the convenience of the reader.
linear operator, and {φ i } is a complete system in U. If J is convex and grows at infinity onŮ then it has a minimizer u on it, as well as minimizers u (N ) on allŮ (N ) , and there exists a subsequence N k such that In both cases the values of J converge to its minimum onŮ.
One can say more for quadratic functionals of the form J(u) = 1 2 B(u, u) + l(u), where B is a symmetric bilinear form and l is a linear form. These types of functionals produce linear boundary value problems [19, 22.1]. In Problem 1 we some ε > 0. One can check that our functionals are strictly convex onŮ in both cases [17, I.8]. Simple algebra shows that for any u, v ∈Ů where J ′ (v) = B(v, ·) + l is the derivative of J at v. Hence, the first term on the right is the first variation of J and it vanishes for a minimizer v = u. Taking u = u (N ) we see that for strictly positive definite B: Thus, the Ritz-Lagrange solutions converge to the minimizer even by norm. More general conditions for convergence by norm are given in [11,6.2A].
Theorem 1 mostly justifies the Ritz-Lagrange method used to solve Problem 1.
The required properties of J(u) = It was not so innocent after all. It helps to use the hindsight again. We have λ 0 = u x (0) and λ π = −u x (π) for the exact solution, but for any finite N the trial solutions have u (N ) x (π) = 0, while the Lagrange multipliers are λ 0 = λ π = − π 2 = 0. We see that the values of derivatives of the trial solutions at the ends of the interval do not converge to the values of the Lagrange multipliers, even though the two are equal for the exact solution. Substitution of the limit values in Eq.(4) relies on exactly the same kind of hidden continuity assumption that was made about the residual in the Lanczos tau method.
How does this reconcile with convergence of the Ritz-Lagrange trial solutions to the exact solution? We do have convergence but even for strictly positive definite quadratic functionals it is not strong enough to provide such continuity. Indeed, convergence by norm in W 1 2 means that derivatives u pointwise. In fact, we will always have u (N ) x (π) = 0 regardless of the variational problem as long as we use cosines for trial functions. Unless it so happens that the first derivatives of the exact solution vanish at the endpoints there will never be convergence of the end point derivatives to the Lagrange multipliers. Lagrange multipliers have to be kept as variables and can not be assumed to satisfy relations that hold for the exact solution.

Completeness
Nothing we discussed so far explains why the Ritz-Lagrange method did not work for , the Hilbert space of functions square integrable with their first and second derivatives, with the norm u W 2 2 := u 2 The functional is J(u) = , the space of W 2 2 functions that vanish at the ends along with their derivatives. One can check that in both cases J and Γ satisfy all the conditions of Theorem 1. And yet for Problem 2 we did not obtain the exact solution in the limit.
After inspecting the theorem closely the reader will notice one more condition that we did not address yet, because (perhaps) it seemed to be obviously satisfied. But "when you have eliminated the impossible, whatever remains, however improbable, must be the truth". That would be the completeness of trial functions, and [14] deserves credit for highlighting how far from obvious it can get. Theorem 1 requires W 2 2 completeness in this case. What is obvious, or at least well-known from the standard theorems on the Fourier series, is that any function in L 2 ([0, π]) can be approximated by cosines (sines) in the L 2 norm. This is because any L 2 function on [0, π] can be extended by evenness (oddness) to [−π, π] while remaining in L 2 . But that is not even enough for Problem 1, where W 1 2 completeness was required (this is the one issue we skipped). Fortunately, W 1 2 completeness of cosines reduces to the L 2 completeness of sines.
The minimality above means that the system becomes incomplete after deleting any function, the proof is in the Appendix.
Not so for W 2 2 , cosines are incomplete in W 2 2 ([0, π]). As observed in [14], the second derivatives 0, − cos x, −4 cos 2x, . . . , −n 2 cos nx, . . . do not include a constant, and therefore can not approximate the second derivative of x 2 in L 2 . But then cosines can not approximate x 2 in W 2 2 since its norm incorporates the L 2 norm for second derivatives. In [14] the authors add x 2 to the cosine system, but... we shall see that adding x 2 is not enough. Here is a quick way to see it. In contrast to linear span is at least two, while x 2 spans only one extra dimension. It follows from the next Lemma that the codimension is exactly two and x also needs to be added.
It may seem odd that we have to add x on top of x 2 , after all there is no second derivative issue with it. However, while L 2 approximation of the second derivatives is necessary for W 2 2 approximation of functions themselves, it is not sufficient. We a n cos nx. The Lagrange functional becomes and the Ritz-Lagrange system is The equations with λ-s immediately yield λ 0 = λ π = π 2 , b 2 = − π 2 24 and a n = − matches the exact solution as expected.
This does it for Problem 2, but now we get an unexpected puzzle in Problem 3. How come we got the correct answer while using 1 + (−1) n n 4 cos nx, which converges not only in L 2 but even in W 2 2 .

The Ritz-Lagrange method in higher dimensions
The Ritz-Lagrange method described in Section 2 can not be applied to multidimensional boundary value problems. In this section we will develop a suitable generalization and prove that it works. The main distinction is that the boundary operator For higher order equations it is more natural to use several boundary operators Γ k instead of a single one, e.g. one for function values, another for normal derivatives, etc.
One can always formally wrap them into a single operator with combined codomain, so no additional theoretical discussion is necessary.
A justification of our method is based on Theorem 1 and Theorem 2 below. The reader not interested in justification may skip the rest of this section and look at applications in the next one. One piece of bad news is that we can not expect u s to converge to u in the same generality as in Theorem 1. The root cause is that the minimizer inŮ is approximated by elements outside ofŮ, which is why we need J to be well-behaved on the entire U, not justŮ , something avoidable if φ i do satisfy the boundary conditions. In particular, the values J(u s ) are potentially smaller than J(u) because they are obtained by minimizing J on larger subspaces We will need a form of uniform monotonicity, cf. [12, V.18.6], namely where c (t) is a continuous monotone increasing function with c(t) = 0. The point is that if c ( u s ) → 0 then u s → 0 and hence u s → 0 by norm. We are now ready to state the main theorem of this section. In examples it is typical that J does not satisfy Eq.(16) on the entire space U, but does satisfy on subspaces much larger thanŮ, such asŮ s . Let us discuss the case of quadratic functionals J(u) = 1 2 B(u, u) + l(u) in more detail. From the calculation in Eq. (12) we know that J ′ (v) = B(v, ·) + l. Therefore, We need B(u, u) ≥ ε u 2 with ε > 0 to satisfy Eq. (16)  where Ω and Ω are some bounded domains. Then the system {φ i φ j } is complete in If one starts from one-dimensional systems the lemma will only produce complete systems in box-like domains [a 1 , b 1 ]×· · ·×[a m , b m ]. However, any system of functions complete on a domain will be complete on any of its subdomains, so for an arbitrary domain one can always use a system complete on the smallest box that contains it. Unfortunately, such product systems can not be expected to be minimal even if the original systems were. A more targeted choice is to take eigenfunctions of an operator on the same domain that is simpler than the one involved, but is somewhat similar to it. Various spectral theorems often ensure completeness of eigenfunctions in suitable Sobolev spaces [19, 22.11a].
In this section we illustrate the multidimensional Ritz-Lagrange method developed in Section 5 by applying it to some typical problems. Since calculations by hand quickly become intractable we performed them using a computer algebra system. The profile of f was chosen so that the problem has an analytic solution which is not a polynomial. Specifically, one can represent the exact solution as a rapidly convergent series where r = x 2 + y 2 , γ := lim n→∞ n k=1 (∇u) 2 + f u dxdy, which gives the total potential energy of the membrane, subject to the boundary condition.
In the notation of Section 5 we take U = W 1 2 (Ω) with Γ being the restriction of u to the boundary ∂Ω. Moreover, Γ is continuous if we take V = L 2 (∂Ω). Our internal trial functions are the monomials, which obviously do not satisfy the boundary con-dition, and the trial solution is c ij x i−1 y j−1 . Note that N of Section 5 will be N 2 here because of double indexing. As the boundary weight functions we choose the piecewise linear ones on uniform partitions of ∂Ω. Unlike the usual choices for a circle, e.g. the trigonometric functions, these ones can be used on a wide variety of boundaries. Instead of using a single indexed system ψ k it is convenient to split it into the constant g k and the linear h k parts. If the boundary is partitioned into s segments we have Therefore, the number of boundary weight functions, denoted s in Section 5, will be 2s here. The Lagrange multiplier has the form λ (s) = where u is the displacement of the plate, υ is the Poisson ratio, and f is a distributed load.
The Euler-Lagrange equation induced by Eq. (19) is the biharmonic equation where for convenience we split the Lagrange multiplier λ (s) into its restrictions λ The exact solution to this problem can be expressed as a rapidly convergent series, we use its first ten terms to calculate the errors. We do not tabulate them here, because they are very large (up to 70%), and increasing the number of terms does not improve the approximation. At this point, the reader should not be surprised, indeed we are dealing with the same mistake as in the one-dimensional Problem 2.
As we pointed out in Section 3, the system of cosines is incomplete on the interval, so the system of their products naturally is incomplete on the product of intervals that represents the plate. A more down to Earth explanation is that products of cosines  Table 2: Central (x = 0.5, y = 0.5) and boundary (x = 0, y = 0) error relative to maximum bending deflection of an C-C-C-C square plate of unit size.
Section 3 also gives us a way to solve the original problem, we just need to complete the system of cosine products. By Lemma 4 it suffices to complete the cosines on the interval and take the products from the completed system. Namely, we take the products of x, x 2 , cos((i − 1)πx) and y, y 2 , cos((i − 1)πy) as the new trial functions, and keep the rest of the above setup intact. The relative errors for the Ritz-Lagrange solutions with the completed system against the known series solution [2, 8.2.4] are shown in Table 3 For this eigenvalue problem to be solvable one needs L to have the maximal rank 4s, which is ensured by the non-degeneracy condition N 2 ≫ 4s as in Problem 5. The eigenvalues µ i = ω 2 i approximate the squares of the natural frequencies of the plate's vibrations.
With N = 10 and s = 5 we obtain a set of approximate non-dimensional natural frequencies ω i , first nine of which are shown below. Since the eigenmodes are known to be of the form sin(πmx) sin(πny) we change the single index notation to ω mn and arrange the frequencies in a square pattern One can see that the estimated frequencies are slightly lower than the exact ones.
This is in contrast with the application of the usual Ritz method, where the estimated frequencies are always higher. From a physical viewpoint, the latter happens because replacing an infinite system with a finite one is equivalent to imposing additional constraints, which tend to raise the stiffness of the system, and hence the frequencies.
This assumes however that all the boundary constraints are enforced in both systems, i.e. that the trial functions satisfy the essential boundary conditions.
In the Ritz-Lagrange method the trial functions do not satisfy the essential conditions, and even the trial solutions are forced to satisfy them only approximately.
In other words, i.e. we are effectively relaxing the boundary constraints in addition to imposing additional ones through discretization. This relaxation lowers the frequencies (because a plate with fewer constraints is less stiff ) and counteracts the effects of discretization. If we were able to impose the boundary conditions everywhere along the boundary the estimated frequencies would have been higher than the exact ones just as in the usual Ritz method.

Conclusions
We developed a general extension of the Ritz method to systems of trial functions that do not satisfy the essential boundary conditions, and proved its convergence. The method is based on treating the essential conditions as variational constraints and removing them using the Lagrange multipliers. Here are some general observations of the workings of the method.
• The variational functional has to be well-behaved not only on the energy space of the problem, but on its extension that contains the trial functions. Sufficiently good behavior is a strong form of convexity, which in the case of quadratic functionals means that the boundary value problem is strongly elliptic.
• The systems of trial functions must be complete in the norms consistent with the functional, which usually restrict to the energy norms on the energy space of the problem. Although similar requirement applies to the usual Ritz method, here it is much easier to encounter systems that appear complete but are not due to effects at the boundary.
• The Lagrange multipliers have to be treated as additional variables in the approximating systems. They can not be eliminated by substituting the trial solutions into the variational formulas for them in terms of the exact solution.
These formulas are discontinuous in the relevant norms.
• In multidimensional problems the boundary conditions incorporate infinitely many constraints, and to obtain a finite dimensional approximating system one has to select boundary weight functions in addition to the trial functions. The number of trial functions has to be significantly larger than the number of the boundary weights, otherwise the approximating system may be inconsistent or only have the trivial solution.
• In multidimensional problems the approximating values of the functional may approach the exact value from below rather than above, as in the usual Ritz method, because the minimization takes place on a larger space of functions not satisfying the boundary conditions exactly. contrast to the Ritz method where they are always higher, due to relaxation of the boundary constraints.
As is well-known [4,2], the Ritz method leads to the same approximating systems as the Galerkin method, but the latter can also be applied to non-optimization problems. It is quite intriguing whether the Ritz-Lagrange method developed here can be extended to a 'Galerkin-Lagrange method' for non-variational problems. There is no Lagrange functional to be had in such problems, but one can formally add boundary terms multiplied by extra variables ('Lagrange multipliers') to a weighted residual of the problem. An approximation of the boundary conditions should also be added to the usual Galerkin system. However, as we saw in the Lanczos tau example such a straightforward approach is not likely to work. Indeed, in the Ritz-Lagrange method we add the Lagrange boundary terms not to the bare weighted residual, but to an integrated by parts expression with some boundary terms of its own. This suggests that in a correct generalization the problem has to be rewritten in a weak form [2, 7.5.1] before the Lagrange-like boundary terms are added.
Another complication is the role of the natural boundary conditions. In variational problems they are enforced automatically, so there is no point in adding them as constraints and introducing additional Lagrange multipliers. An example in [14] even shows that attempting to do so leads to worsening the convergence of the trial solutions. However, it is unclear how the natural conditions should be enforced in a 'Galerkin-Lagrange method'. Still, the distinction between the natural and the essential boundary conditions in [3, I.1.2] (by the order of the derivatives entering them) makes sense even for non-optimization problems, and it has been shown in [4] that in some cases the weak form of a problem provides an enforcement mechanism for the natural conditions.

Appendix: Proofs
Proof of Lemma 1 . Let {φ i } be a complete system in U. Since Γ has an s-dimensional image we can represent it as Γu = Γ 1 , u , . . . , Γ s , u T , where Γ i are bounded linear functionals. Assume without loss of generality that they are linearly independent, otherwise some of them can be dropped without changingŮ. SetŮ k := {u ∈ U Γ 1 , u , . . . , Γ k , u = 0}, we will construct a complete system in eachŮ k by induction on k. SinceŮ =Ů s the process concludes in s steps.
For k = 1 we must produce a complete system of linear combinations inŮ 1 := {u ∈ U Γ 1 , u = 0}. Without loss of generality, Γ 1 , φ 1 = 0 since {φ i } is complete and Γ 1 can not vanish on all φ i . We claim that To estimate the second term we find, ε, and since u, ε are arbitrary completeness of φ i follows.
Let { φ i } be a complete system inŮ k from the preceeding step. Linear independence of Γ j guarantees that Γ k+1 does not vanish on some φ i , which we may as well take to be φ 1 . Apply the process above with Γ 1 replaced by Γ k+1 andŮ 1 replaced bẙ it is a minimizer onŮ (N k ) , so J min ≤ J u (N k ) ≤ J(u) = J min + ε. After passing to limit we have that J u (∞) = J min , i.e. u (∞) is a minimizer of J onŮ.
If we assume additionally that J is strictly convex onŮ then u is unique and the entire sequence u (N ) (which is now also uniquely defined) converges to it at least weakly [11,6.2A].
Proof of Lemma 2. The following inner product is equivalent to the usual one on w xx · (−n 2 cos nx) dx = 0 for n ≥ 1 because all sines vanish at 0. In particular, w xx is L 2 orthogonal to cos nx for all n ≥ 1. But orthogonal complement of the latter in L 2 consists of constants, so w xx = const and w(x) = ax 2 + bx + c. Since w(0) = 0 free term is 0 and w is a linear combination of x and x 2 . Thus, orthogonal complement to cosines is spanned by x and x 2 proving completeness.
x and x 2 are orthogonal to all cosines and to each other. This means that neither one of them can be deleted without loosing completeness. It also means that if a cosine can be approximated in W 2 2 by other cosines combined with x and x 2 then it can already be approximated by other cosines alone. But the latter can not be done with arbitrary precision even in L 2 , let alone in W 2 2 .
Proof of Lemma 4. In the multiindex notation an equivalent norm on W k p (D) is given by where the L p norm is just f Lp := ( D |f | p dξ) 1 p . If f ∈ W k p (Ω) and f ∈ W k p ( Ω) then it follows from the Fubini theorem that f f Lp = f Lp f Lp since f and f depend on different variables. Let x and x denote the variables on Ω and Ω respectively, so that ξ = (x, x) is the variable on Ω × Ω. Then we estimate Since φ i are complete any monomial x β can be approximated to any precision ε > 0 in W k p by their linear combination φ = i a i φ i , and analogously x γ can be approximated by a linear combination φ = j a j φ j . But φ φ = i,j a i a j φ i φ j is a linear combination of φ i φ j , while the difference between the products can be made arbitrarily small: where the first inequality follows from the above estimate. Hence any product of monomials, and therefore any polynomial, can be approximated in W k p by linear combinations of φ i φ j . By the generalized Weierstrass theorem [20,II.4.3], polynomials are complete in W k p (Ω × Ω), and hence so is the system {φ i φ j }.