On the Application of Iterative Methods of Nondifferentiable Optimization to Some Problems of Approximation Theory

We consider the data fitting problem, that is, the problem of approximating a function of several variables, given by tabulated data, and the corresponding problem for inconsistent (overdetermined) systems of linear algebraic equations. Such problems, connected with measurement of physical quantities, arise, for example, in physics, engineering, and so forth. A traditional approach for solving these two problems is the discrete least squares data fitting method, which is based on discrete l 2 -norm. In this paper, an alternative approach is proposed: with each of these problems, we associate a nondifferentiable (nonsmooth) unconstrained minimization problem with an objective function, based on discrete l 1 and/or l ∞ -norm, respectively; that is, these two norms are used as proximity criteria. In other words, the problems under consideration are solved by minimizing the residual using these two norms. Respective subgradients are calculated, and a subgradient method is used for solving these two problems. The emphasis is on implementation of the proposed approach. Some computational results, obtained by an appropriate iterative method, are given at the end of the paper.These results are compared with the results, obtained by the iterative gradient method for the corresponding “differentiable” discrete least squares problems, that is, approximation problems based on discrete l 2 -norm.


Introduction: Statement of Problems under Consideration
1.1.Problem Number 1.Let  : R  → R be a real-valued function in  real variables and let the following tabulated data be given: Find a generalized polynomial   (x) = ∑  =0     (x), based on the system of linearly independent functions {  (x)}  =0 , that is, a polynomial of generalized degree , which approximates function (x) with respect to some distance (norm).Depending on the distance (norm) used,   (x) is an optimal solution to various problems.In this paper, we discuss the approximation with respect to weighted discrete ℓ 1 -norm         ℓ 1 = Thus, the polynomial   (x) of best approximation to function (x) with respect to ℓ 1 -norm is an optimal solution to the minimization problem The corresponding discrete least squares data fitting problem, which can be associated with (1), is Here, a = ( 0 ,  1 , . . .,   ) ∈ R +1 .When the tabulated data is given by the same confidence (reliability) for all  = 1, . . ., , then   are chosen to be equal to 1 for all  = 1, . . ., .Recall that the system of functions { 0 , . . .,   } is said to be linearly independent if whenever then  0 =  1 = ⋅ ⋅ ⋅ =   = 0. Otherwise, the set of functions is said to be linearly dependent.
For the problems under consideration, the system {  (x)}  =0 of linearly independent functions can be chosen as follows: that is, It is proved that functions {  (x)}  =0 , defined by (10), are linearly independent (Theorem 1, Section 2.1).

Problem Number 2.
Given an inconsistent (overdetermined) system of linear algebraic equations,  ∑ =1     =   ,  = 1, . . ., ;  > . (11) This system does not have a solution in the general case when  >  and all the equations are linearly independent.We can associate the following minimization problems with (11): or where x = ( 1 , . . .,   ) ∈ R  .The corresponding discrete least squares data fitting problem, which can be associated with (11), is Problem ( 12) is a special case of the problem of best Chebyshev approximation, based on ℓ ∞ -norm (3), and problem ( 13) is based on ℓ 1 -norm.
Approximations with respect to ℓ 1 -norm are known as ℓ 1 -(or absolute deviation) approximations, and approximations with respect to ℓ ∞ -norm, as Chebyshev, or minimax, or uniform, approximations.

Bibliographical Notes and Organization of the Paper.
Problems like (1) and (11), connected with measurement of physical quantities, arise, for example, in physics, engineering, and so forth.The weights   ,  = 1, . . ., , mean the reliability, with which each value (measurement, empirical datum) (x  ) at x  for Problem Number 1, or each equation for Problem Number 2, can be accepted.
Problems, discussed in this paper and related to them, are considered in  and so forth.
Books of Clarke [11] and Demyanov and Vasiliev [14] are devoted to nondifferentiable optimization and book of Mathematical Problems in Engineering 3 Korneichuk [17] is devoted to optimization problems of the approximation theory.
Numerical methods for best Chebyshev approximation are suggested, for example, in the book of Remez [23].
A subgradient algorithm for certain minimax and minisum problems is suggested in the paper of Chatelon et al. [10].
A quasi-Newton approach to nonsmooth convex optimization problems in machine learning is considered in Yu et al. [31].Nonsmooth optimization methods in the problems of constructing a linear classifier are proposed in Zhuravlev et al. [32].
Polynomial algorithms for projecting a point onto a region defined by a linear constraint and box constraints in R  are proposed in Stefanov [25], and well-posedness and primal-dual analysis of some convex separable optimization problems is considered in Stefanov [26].
Rest of the paper is organized as follows.In Section 2, some results for calculation of subgradients of particular types of functions are formulated and proved, and solvability of the problems under consideration is analyzed.In Section 3, the iterative subgradient method for solving nondifferentiable unconstrained optimization problems is formulated and its convergence is proved.In Section 4, results of computational experiments are presented.In Section 5, Conclusions, the proposed approach and the obtained computational results are discussed.In the appendix, some known propositions, used in the paper, are formulated without proofs, and only for comparison purposes, the iterative gradient method for solving differentiable unconstrained optimization problems is presented and convergence theorem is formulated.

Preliminaries
2.1.Theoretical Matters.Some known results, called propositions, which are used in subsequent sections, are recalled without proofs in Appendix A.1 at the end of the paper.
We prove below some results, which guarantee solvability of considered problems (Theorem 1, combined with Proposition A.11 of the appendix) and which are used for calculating subgradients in Section 3.2 (Theorems 2 and 3).
Since     (x) is the only term in (x), containing    , then   must be equal to zero.Therefore In this representation of (x), the only term that contains powers of . Hence, we must have  −1 = 0, and Continuing in this way, we obtain that the remaining coefficients  −2 , . . .,  1 ,  0 are also equal to zero.Therefore, the functions { 0 , . . .,   } are linearly independent by definition.
The following two theorems give the rules for calculating subgradients for some types of functions.
Proof.Since convex functions have derivatives on the right and on the left at each interior feasible point, then we can assume that  +  (  ) and  −  (  ) exist.According to Proposition A.8 of the appendix, about the vectors f + (x) and f − (x) of the right and the left derivatives of  at x, respectively, we have that is, f ± (x) ∈ (x) by the definition of subgradient.Since the subdifferential (x) of a convex function (x) is a nonempty, convex, and compact set, and since  +  (  ),  −  (  ) ∈ (  ) according to the above discussion, then Therefore that is, f(x) = ( f 1 ( 1 ), . . ., f  (  )) with f  (  ), defined above, is a subgradient of ( 1 , . . .,   ) at x by definition.
Using the same reasoning, we obtain that Φ 5 (x) is a convex function of x.

On Problems Associated with Problem Number 1 (1).
Since ( 5) is a minimization problem, Φ 1 (a) is a continuous (and, therefore, both lower and upper semicontinuous) function, bounded from below from 0 as a sum of nonnegative terms, and Φ 1 (a) → +∞ as ‖a‖ → ∞, then problem (5) has an optimal solution according to Corollary A.2 of the appendix with  = R  .
Using the same reasoning, we can conclude that problems ( 6) and ( 7) are also solvable.
Existence of solutions to these problems can also be proved by using some general results.
Furthermore, since ℓ  , 1 <  < ∞, are strictly convex spaces, then problem (7) (and problem (25)) has a unique solution (Proposition A.12 of the appendix), and since ℓ 1 and ℓ ∞ are not strictly convex spaces, in the general case we cannot conclude uniqueness of the optimal solution to problems ( 5) and (6).

On Problems Associated with Problem Number 2
In addition, using the same reasoning, the following problem min Φ 2 6 (x) (26) has also an optimal solution and it is unique (Proposition A.9 of the appendix) because Φ 2 6 (x) is a strictly convex function.Existence and uniqueness of the optimal solution to problem (2) can also be proved by using an approach, similar to the alternative approach for problem (25).
Propositions A.7 and A.10 of the appendix imply that a * is an optimal solution to problem (6) if and only if where Similarly, x * is an optimal solution to problem (12) if and only if where and "co " denotes the convex hull (convex envelope) of .
Let (x) be a convex proper function defined on R  .The subgradient method for solving problem min  (x) can be defined as where  0 ∈ R  is an arbitrary initial guess (initial approximation);   is a step size, such that The following theorem guarantees convergence of the subgradient method (32).
The subgradient method (32) can be modified for the case of nondifferentiable constrained optimization as follows: where Π  (y) denotes the projection operation of y onto the feasible region .This modification is not considered here because the optimization problems, considered in this paper, are unconstrained.

Calculation of Subgradients.
In order to apply the subgradient method for solving the problems under consideration, we have to calculate the corresponding subgradients.

Computational Experiments
In this section, we present results of some computational experiments, obtained by the subgradient method for problems (5), ( 6), (12), and (13).As it was pointed out, only for comparison, we give results obtained by the gradient method for solving the least squares problems (7) and (14).Each type of problems was run 30 times.Parameters and data were randomly generated.The computations were performed on an Intel Pentium Dual-Core CPU E5800 3.20 GHz/2.00GB using RZTools interactive system.Table 5 By method (32) By method (32) By method (A.15) for problem (12) for problem (13) for problem ( By method (32) By method (32) By method (A.15) for problem (12) for problem (13) for problem (