MPE Mathematical Problems in Engineering 1563-5147 1024-123X Hindawi Publishing Corporation 10.1155/2014/165701 165701 Research Article On the Application of Iterative Methods of Nondifferentiable Optimization to Some Problems of Approximation Theory Stefanov Stefan M. Yin Peng-Yeng Department of Informatics South-West University “Neofit Rilski” 2700 Blagoevgrad Bulgaria swu.bg 2014 27112014 2014 30 09 2014 13 11 2014 27 11 2014 2014 Copyright © 2014 Stefan M. Stefanov. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

We consider the data fitting problem, that is, the problem of approximating a function of several variables, given by tabulated data, and the corresponding problem for inconsistent (overdetermined) systems of linear algebraic equations. Such problems, connected with measurement of physical quantities, arise, for example, in physics, engineering, and so forth. A traditional approach for solving these two problems is the discrete least squares data fitting method, which is based on discrete l 2 -norm. In this paper, an alternative approach is proposed: with each of these problems, we associate a nondifferentiable (nonsmooth) unconstrained minimization problem with an objective function, based on discrete l 1 - and/or l -norm, respectively; that is, these two norms are used as proximity criteria. In other words, the problems under consideration are solved by minimizing the residual using these two norms. Respective subgradients are calculated, and a subgradient method is used for solving these two problems. The emphasis is on implementation of the proposed approach. Some computational results, obtained by an appropriate iterative method, are given at the end of the paper. These results are compared with the results, obtained by the iterative gradient method for the corresponding “differentiable” discrete least squares problems, that is, approximation problems based on discrete l 2 -norm.

1. Introduction: Statement of Problems under Consideration 1.1. Problem Number 1

Let f : R p R be a real-valued function in p real variables and let the following tabulated data be given: (1) x R p x 1 x 2 x m f ( x ) f ( x 1 ) f ( x 2 ) f ( x m ) . Find a generalized polynomial P n ( x ) = j = 0 n a j φ j ( x ) , based on the system of linearly independent functions { φ j ( x ) } j = 0 n , that is, a polynomial of generalized degree n , which approximates function f ( x ) with respect to some distance (norm). Depending on the distance (norm) used, P n ( x ) is an optimal solution to various problems. In this paper, we discuss the approximation with respect to weighted discrete l 1 -norm (2) f l 1 = i = 1 m w i f x i and weighted discrete l -norm (3) f l = sup 1 i m w i f x i , and, only for comparison, a weighted discrete l 2 -norm (“discrete least squares norm”) (4) f l 2 = i = 1 m w i f 2 ( x i ) 1 / 2 , where w i > 0 ,   i = 1 , , m , are weights.

In order to ensure uniqueness of the solution to problems under consideration, it is known that the following condition must be satisfied: m n + 1 . This requirement means that we must have at least n + 1 values of f , where n + 1 is the number of the unknown coefficients a 0 , a 1 , , a n of the generalized polynomial P n ( x ) .

Thus, the polynomial P n ( x ) of best approximation to function f ( x ) with respect to l 1 -norm is an optimal solution to the minimization problem (5) min a Φ 1 a = def P n - f l 1 = i = 1 m w i j = 0 n a j φ j x i - f x i , and the best approximation to f ( x ) with respect to l -norm is an optimal solution to the minimization problem (6) min a Φ 2 a = def P n - f l = max 1 i m w i j = 0 n a j φ j x i - f x i . The corresponding discrete least squares data fitting problem, which can be associated with (1), is (7) min a Φ 3 a = def P n - f l 2 i = 1 m w i j = 0 n a j φ j x i - f x i 2 1 / 2 hhhhh = i = 1 m w i j = 0 n a j φ j x i - f x i 2 1 / 2 . Here, a = ( a 0 , a 1 , , a n ) R n + 1 . When the tabulated data is given by the same confidence (reliability) for all i = 1 , , m , then w i are chosen to be equal to 1 for all i = 1 , , m .

Recall that the system of functions { φ 0 , , φ n } is said to be linearly independent if whenever (8) c 0 φ 0 ( x ) + c 1 φ 1 ( x ) + + c n φ n ( x ) = 0 , then c 0 = c 1 = = c n = 0 . Otherwise, the set of functions is said to be linearly dependent.

For the problems under consideration, the system { φ j ( x ) } j = 0 n of linearly independent functions can be chosen as follows: (9) φ j x = k = 1 p x k j , j = 0,1 , , n ; that is, (10) φ 0 x = 1 + 1 + + 1 = p , φ 1 x = x 1 + x 2 + + x p , φ 2 x = x 1 2 + x 2 2 + + x p 2 , φ n ( x ) = x 1 n + x 2 n + + x p n .

It is proved that functions { φ j ( x ) } j = 0 n , defined by (10), are linearly independent (Theorem 1, Section 2.1).

1.2. Problem Number 2

Given an inconsistent (overdetermined) system of linear algebraic equations, (11) j = 1 n a i j x j = b i , i = 1 , , m ; m > n . This system does not have a solution in the general case when m > n and all the equations are linearly independent.

We can associate the following minimization problems with (11): (12) min Φ 4 x = def max 1 i m w i j = 1 n a i j x j - b i or (13) min Φ 5 x = def i = 1 m w i j = 1 n a i j x j - b i , where x = ( x 1 , , x n ) R n . The corresponding discrete least squares data fitting problem, which can be associated with (11), is (14) min Φ 6 x = def i = 1 m w i j = 1 n a i j x j - b i 2 1 / 2 . Problem (12) is a special case of the problem of best Chebyshev approximation, based on l -norm (3), and problem (13) is based on l 1 -norm.

Approximations with respect to l 1 -norm are known as l 1 - (or absolute deviation) approximations, and approximations with respect to l -norm, as Chebyshev, or minimax, or uniform, approximations.

1.3. Bibliographical Notes and Organization of the Paper

Problems like (1) and (11), connected with measurement of physical quantities, arise, for example, in physics, engineering, and so forth. The weights w i , i = 1 , , m , mean the reliability, with which each value (measurement, empirical datum) f ( x i ) at x i for Problem Number 1, or each equation for Problem Number 2, can be accepted.

Problems, discussed in this paper and related to them, are considered in  and so forth.

The l 1 -approximation is considered, for example, in papers of Barrodale and Roberts [2, 3] and Coleman and Li , and the l 1 -solution to overdetermined linear systems is discussed in Bartels et al. . l p -approximations are considered, for example, in papers of Calamai and Conn , Fischer , Li , Merle and Späth , Watson , Wolfe , and so forth. A global quadratically convergent method for linear l problems is suggested in the paper of Coleman and Li .

Papers of Andersen , Calamai and Conn , Overton , and Xue and Ye  consider minimization of sum of Euclidean norms.

Books of Clarke  and Demyanov and Vasiliev  are devoted to nondifferentiable optimization and book of Korneichuk  is devoted to optimization problems of the approximation theory.

Numerical methods for best Chebyshev approximation are suggested, for example, in the book of Remez .

A subgradient algorithm for certain minimax and minisum problems is suggested in the paper of Chatelon et al. .

Least squares approach is discussed by Bertsekas , Björck , Lawson and Hanson , and so forth.

A quasi-Newton approach to nonsmooth convex optimization problems in machine learning is considered in Yu et al. . Nonsmooth optimization methods in the problems of constructing a linear classifier are proposed in Zhuravlev et al. .

Polynomial algorithms for projecting a point onto a region defined by a linear constraint and box constraints in R n are proposed in Stefanov , and well-posedness and primal-dual analysis of some convex separable optimization problems is considered in Stefanov .

Rest of the paper is organized as follows. In Section 2, some results for calculation of subgradients of particular types of functions are formulated and proved, and solvability of the problems under consideration is analyzed. In Section 3, the iterative subgradient method for solving nondifferentiable unconstrained optimization problems is formulated and its convergence is proved. In Section 4, results of computational experiments are presented. In Section 5, Conclusions, the proposed approach and the obtained computational results are discussed. In the appendix, some known propositions, used in the paper, are formulated without proofs, and only for comparison purposes, the iterative gradient method for solving differentiable unconstrained optimization problems is presented and convergence theorem is formulated.

2. Preliminaries 2.1. Theoretical Matters

Some known results, called propositions, which are used in subsequent sections, are recalled without proofs in Appendix A.1 at the end of the paper.

We prove below some results, which guarantee solvability of considered problems (Theorem 1, combined with Proposition A.11 of the appendix) and which are used for calculating subgradients in Section 3.2 (Theorems 2 and 3).

Theorem 1 (linear independence of a system of multivariate functions).

If φ j ( x ) is a polynomial of degree j for each j = 0,1 , , n , then the set of functions { φ 0 , , φ n } is linearly independent.

Proof.

Let c 0 , , c n be real numbers such that (15) P ( x ) = def j = 0 n c j φ j ( x ) = 0 .

Since the generalized polynomial P ( x ) of degree n vanishes, the coefficients of x k n , k = 1 , , p , are equal to zero. Since c n φ n ( x ) is the only term in P ( x ) , containing x k n , then c n must be equal to zero. Therefore (16) P ( x ) = j = 0 n - 1 c j φ j ( x ) . In this representation of P ( x ) , the only term that contains powers of x k n - 1 is c n - 1 φ n - 1 ( x ) . Hence, we must have c n - 1 = 0 , and (17) P ( x ) = j = 0 n - 2 c j φ j ( x ) . Continuing in this way, we obtain that the remaining coefficients c n - 2 , , c 1 , c 0 are also equal to zero. Therefore, the functions { φ 0 , , φ n } are linearly independent by definition.

The following two theorems give the rules for calculating subgradients for some types of functions.

Theorem 2 (subgradient of a sum of univariate convex functions).

Let f ( x 1 , , x n ) = j = 1 n f j ( x j ) ,    f j ( x j ) be a convex function of x j for each j = 1 , , n . Then f ^ ( x ¯ ) = ( f ^ x 1 ( x ¯ 1 ) , , f ^ x n ( x ¯ n ) ) , where (18) f ^ x j x ¯ j = λ j f j + x ¯ j + 1 - λ j f j - x ¯ j , 0 λ j 1 , j = 1 , , n , f j + ( x ¯ j ) = lim ɛ 0 f j ( x ¯ j + ɛ ) , f j - ( x ¯ j ) = lim ɛ 0 f j ( x ¯ j - ɛ ) are the derivatives of f j on the right and on the left at x ¯ j , respectively, and f ^ ( x ¯ ) denotes the subgradient of function f at point x ¯ .

Proof.

Since convex functions have derivatives on the right and on the left at each interior feasible point, then we can assume that f j + ( x ¯ j ) and f j - ( x ¯ j ) exist.

According to Proposition A.8 of the appendix, about the vectors f + ( x ) and f - ( x ) of the right and the left derivatives of f at x , respectively, we have (19) f y - f x f ± x , y - x ; that is, f ± ( x ) f ( x ) by the definition of subgradient.

Since the subdifferential f ( x ) of a convex function f ( x ) is a nonempty, convex, and compact set, and since f j + ( x ¯ j ) , f j - ( x ¯ j ) f ( x ¯ j ) according to the above discussion, then (20) λ j f j + ( x ¯ j ) + ( 1 - λ j ) f j - ( x ¯ j ) f j ( x ¯ j ) , j = 1 , , n . Therefore (21) f x - f x ¯ j = 1 n f j x j - j = 1 n f j x ¯ j j = 1 n [ f j ( x j ) - f j ( x ¯ j ) ] j = 1 n [ λ j f j + ( x ¯ j ) + ( 1 - λ j ) f j - ( x ¯ j ) ] ( x j - x ¯ j ) f ^ x ¯ , x - x ¯ ; that is, f ^ ( x ¯ ) = ( f ^ x 1 ( x ¯ 1 ) , , f ^ x n ( x ¯ n ) ) with f ^ x j ( x ¯ j ) , defined above, is a subgradient of f ( x 1 , , x n ) at x ¯ by definition.

Theorem 3 (subgradient of a function in two variables).

Let f ( x , y ) be a convex function of x for each y , let there exist a y ( x ) such that (22) f ( x ) = def max y Y f ( x , y ) = f x , y x and let the subgradient f ^ x ( x , y ) of f ( x , y ) with respect to x be known for each y . Then (23) f ^ x ( x ) = f ^ x x , y y = y ( x ) .

Proof.

Since f ( x , y ) is a convex function of x for each y and since y ( x ) is an optimal solution to problem max y Y f ( x , y ) , then (24) f z - f x f z , y z - f x , y x f ( z , y ( x ) ) - f ( x , y ( x ) ) f ^ x x , y x , z - x . Therefore f ^ x ( x ) = f ^ x ( x , y ) y = y ( x ) according to definition of subgradient.

2.2. Some Properties of Objective Functions and Solvability of the Problems under Consideration

Functions j = 0 n a j φ j ( x i ) - f ( x i ) , i = 1 , , m , are linear functions of a j , j = 0,1 , , n ; the “absolute value” function | f | = max { f , - f } is convex (Proposition A.4 of the appendix) when f is a linear function (and therefore, both f and - f are convex). Hence, Φ 1 ( a ) is a convex function of a as a linear combination with nonnegative coefficients w i , i = 1 , , m , of convex functions (Proposition A.3 of the appendix).

Using the same reasoning, we obtain that Φ 5 ( x ) is a convex function of x .

According to Proposition A.4 of the appendix, Φ 2 ( a ) is a convex function of a as maximum of the convex functions w i | j = 0 n a j φ j ( x i ) - f ( x i ) | , where w i > 0 , because functions j = 0 n a j φ j ( x i ) - f ( x i ) are both convex and concave as linear functions of a j , i = 1 , , m ; j = 0 , , n . Also, Φ 4 ( x ) is a convex function of x j , j = 1 , , n , because of similar reasons.

Function Φ 3 2 ( a ) is a strictly convex function of a as a linear combination with nonnegative coefficients w i , i = 1 , , m , of the quadratic functions j = 0 n a j φ j ( x i ) - f ( x i ) 2 , i = 1 , , m , which, as it is known, are strictly convex.

Similarly, Φ 6 2 ( x ) is a strictly convex function of x .

Functions Φ 1 ( a ) , Φ 2 ( a ) , Φ 4 ( x ) , and Φ 5 ( x ) are nondifferentiable (nonsmooth) whereas functions Φ 3 ( a ) and Φ 6 ( x ) are differentiable.

Functions Φ 1 ( a ) , Φ 2 ( a ) , Φ 3 ( a ) , Φ 4 ( x ) , Φ 5 ( x ) , and Φ 6 ( x ) are separable functions; that is, these functions can be expressed as the sums of single-variable (univariate) functions, which follows from definitions of these six functions.

2.2.1. On Problems Associated with Problem Number 1 (<xref ref-type="disp-formula" rid="EEq1">1</xref>)

Since (5) is a minimization problem, Φ 1 ( a ) is a continuous (and, therefore, both lower and upper semicontinuous) function, bounded from below from 0 as a sum of nonnegative terms, and Φ 1 ( a ) + as a , then problem (5) has an optimal solution according to Corollary A.2 of the appendix with X = R n .

Using the same reasoning, we can conclude that problems (6) and (7) are also solvable.

Since min Φ 3 ( a ) and min Φ 3 2 ( a ) are attained at the same point (vector) a = ( a 0 , a 1 , , a n ) , we can consider problem (25) min a F 3 a = def Φ 3 2 a = i = 1 m w i j = 0 n a j φ j x i - f x i 2 instead of problem (7). Since Φ 3 2 ( a ) is a strictly convex function, problem (25) has a unique solution (Proposition A.9 of the appendix).

Existence of solutions to these problems can also be proved by using some general results.

As it is known, l 1 , l 2 , and l are normed linear spaces; they are Banach spaces with the norms (2), (4), and (3), respectively; l 1 , l 2 are separable spaces and l is not a separable space (see, e.g., [16, 30], etc.).

Linear independence of { φ j ( x ) } j = 0 n , proved in Theorem 1, guarantees the existence of an element of best approximation for problems (5), (6), and (7) (Proposition A.11 of the appendix).

Furthermore, since l p , 1 < p < , are strictly convex spaces, then problem (7) (and problem (25)) has a unique solution (Proposition A.12 of the appendix), and since l 1 and l are not strictly convex spaces, in the general case we cannot conclude uniqueness of the optimal solution to problems (5) and (6).

The n + 1 -tuple a = ( a 0 , a 1 , , a n ) R n + 1 , which is obtained as an optimal solution to problem (5) (problem (6), problem (25), resp.), gives coefficients of the generalized polynomial P n ( x ) of best approximation for f ( x )    (1), x R p , with respect to l 1 -norm ( l -norm, l 2 -norm, resp.).

When p = 1 , that is, when f ( x ) is a single-variable (univariate) function, the generalized polynomial P n ( x ) = j = 0 n a j φ j ( x ) becomes an algebraic polynomial of degree n : P n ( x ) = j = 0 n a j x j and problem (7) (or equivalently (25)) with p = 1 is the well-known discrete least squares data fitting problem.

2.2.2. On Problems Associated with Problem Number 2 (<xref ref-type="disp-formula" rid="EEq9">11</xref>)

Solvability of problems (12), (13), and (14) follows from Corollary A.2 of the appendix: using that Φ 4 ( x ) 0 , Φ 5 ( x ) 0 are continuous functions, a i j , i = 1 , , m , j = 1 , , n , and b i , i = 1 , , m , are coefficients given by (11), then it follows that Φ 4 ( x ) + , Φ 5 ( x ) + when x .

In addition, using the same reasoning, the following problem (26) min Φ 6 2 ( x ) has also an optimal solution and it is unique (Proposition A.9 of the appendix) because Φ 6 2 ( x ) is a strictly convex function. Existence and uniqueness of the optimal solution to problem (2) can also be proved by using an approach, similar to the alternative approach for problem (25).

Propositions A.7 and A.10 of the appendix imply that a * is an optimal solution to problem (6) if and only if (27) 0 n + 1 Φ 2 ( a * ) = co { F i 2 ( a ) : i I ( a ) } , where (28) F i 2 ( a ) = w i j = 0 n a j φ j x i - f x i , I ( a ) = i 1 , , m : F i 2 a = Φ 2 a .

Similarly, x * is an optimal solution to problem (12) if and only if (29) 0 n Φ 4 ( x * ) = co { F i 4 ( x ) : i I ( x ) } , where (30) F i 4 ( x ) = w i j = 1 n a i j x j - b i , I x = i 1 , , m : F i 4 x = Φ 4 x , and “co X ” denotes the convex hull (convex envelope) of X .

3. Iterative Methods for Solving Problems under Consideration 3.1. The Subgradient Method

Since Φ 1 ( a ) , Φ 2 ( a ) and Φ 4 ( x ) , Φ 5 ( x ) are nondifferentiable convex functions, we use the so-called subgradient (generalized gradient) method for solving problems (5), (6), (12), and (13).

Let f ( x ) be a convex proper function defined on R n .

The subgradient method for solving problem (31) min f ( x ) can be defined as (32) x k + 1 = x k - ρ k γ k f ^ ( x k ) , k = 0,1 , , where x 0 R n is an arbitrary initial guess (initial approximation); ρ k is a step size, such that ρ k + 0 as k ,    k = 0 ρ k = + ,    k = 0 ρ k 2 < + ; γ k is a norming multiplier; usually γ k = 1 / f ^ ( x k ) or γ k = 1 ; f ^ ( x k ) is a subgradient of f at x k .

The following theorem guarantees convergence of the subgradient method (32).

Theorem 4 (convergence of the subgradient method).

If ρ k 0 when k , ρ k 0 ,    k = 0 ρ k = + , γ k = 1 for all k and f ^ ( x k ) < C = c o n s t for all x k , then there exists a subsequence { f ( x k s ) } of the sequence { f ( x k ) } such that lim s f ( x k s ) = f ( x * ) , where x * M * , M * = def { x * R n : f ( x * ) = inf x R n f ( x ) } .

Proof.

By the assumptions of Theorem 4 we have that (33) x * - x k + 1 2 = x * - x k + ρ k f ^ x k 2 = x * - x k 2 + 2 ρ k f ^ x k , x * - x k + ρ k 2 f ^ ( x k ) 2 x * - x k 2 + 2 ρ k f ^ x k , x * - x k + C 2 ρ k 2 .

Choose some δ > 0 . For every k = 0,1 , , there are two possible cases: (34) ( 1 ) 2 f ^ x k , x * - x k + C 2 ρ k - δ , (35) ( 2 ) 2 f ^ x k , x * - x k + C 2 ρ k > - δ . It turns out that there exists a positive integer N such that (35) is satisfied for k N . Assume, on the contrary, that (34) is satisfied and k N . Then from (33) it follows that (36) x * - x k + 1 2 x * - x k 2 - δ ρ k x * - x k - 1 2 - δ ρ k - 1 - δ ρ k x * - x N 2 - δ s = N k ρ s . The right-hand side of (36) tends to - when k because k = 0 ρ k = by the assumption, which contradicts x * - x k + 1 2 0 . Therefore, there exist sufficiently large numbers k s , s = 1,2 , , such that (37) 2 f ^ x k s , x * - x k s + C 2 ρ k s > - δ , which satisfy (35). Since ρ k s 0 , for any ɛ > 0 , a sequence { k s } and a number S ɛ can be found such that (38) f ^ x k s , x * - x k s > - ɛ is satisfied for s S ɛ . Moreover, using the property of convex functions (Proposition A.8 of the appendix), we have f x * - f x k s f ^ x k s , x * - x k s > - ɛ ; that is, f ( x k s ) - f ( x * ) < ɛ . However, x * = argmin f ( x ) ; therefore f ( x k s ) - f ( x * ) > 0 .

Both inequalities imply lim s f ( x k s ) = f ( x * ) .

The subgradient method (32) can be modified for the case of nondifferentiable constrained optimization as follows: (39) x k + 1 = Π X ( x k - ρ k γ k f ^ ( x k ) ) , k = 0,1 , , where Π X ( y ) denotes the projection operation of y onto the feasible region X . This modification is not considered here because the optimization problems, considered in this paper, are unconstrained.

3.2. Calculation of Subgradients

In order to apply the subgradient method for solving the problems under consideration, we have to calculate the corresponding subgradients.

Using that Φ 1 ( a ) , Φ 2 ( a ) , Φ 4 ( x ) , Φ 5 ( x ) are convex separable functions and statements of Propositions A.5, A.6, and A.7 of the appendix and statements of Theorems 2 and 3, we can calculate corresponding subdifferentials (subgradient sets) at iteration k as follows, respectively: (40) Φ 1 ( k ) ( a k ) = i = 1 m f i ( k ) ( a k ) R n + 1 , where (41) f i k a k = w i · ( p , φ 1 ( x i ) , , φ n ( x i ) ) , j = 0 n a j ( k ) φ j ( x i ) - f ( x i ) > 0 , w i · ( p , φ 1 ( x i ) , , φ n ( x i ) ) · [ - 1,1 ] , j = 0 n a j ( k ) φ j ( x i ) - f ( x i ) = 0 , - w i · ( p , φ 1 ( x i ) , , φ n ( x i ) ) , j = 0 n a j ( k ) φ j ( x i ) - f ( x i ) < 0 , i = 1 , , m .

Let max 1 i m w i | j = 0 n a j φ j ( x i ) - f ( x i ) | be attained for i = i ( x ) I ( x ) . Then (42) Φ 2 k a k = co f 0 , i x k a k , f 1 , i x k a k , , hhhh f n , i x k a k : i x I x , where (43) f j , i x k a k = w i ( x ) φ j ( x i ( x ) ) , j = 0 n a j ( k ) φ j ( x i ( x ) ) - f ( x i ( x ) ) > 0 , w i x φ j x i x · [ - 1,1 ] , j = 0 n a j ( k ) φ j ( x i ( x ) ) - f ( x i ( x ) ) = 0 , - w i ( x ) φ j ( x i ( x ) ) , j = 0 n a j ( k ) φ j ( x i ( x ) ) - f ( x i ( x ) ) < 0 , j = 0,1 , , n .

Let max 1 i m w i | j = 1 n a i j x j - b i | be attained for i = i ( x ) I ( x ) . Then (44) Φ 4 k x k = co f 1 , i x k x k , , f n , i x k x k : h h h h h h h h h h h h h f 1 , i x k x k , , f n , i x k x k i x I x , where (45) f j , i ( x ) ( k ) ( x k ) = w i ( x ) a i ( x ) j , j = 1 n a i ( x ) j x j ( k ) - b i ( x ) > 0 , w i ( x ) a i ( x ) j · [ - 1,1 ] , j = 1 n a i ( x ) j x j ( k ) - b i ( x ) = 0 , - w i ( x ) a i ( x ) j , j = 1 n a i ( x ) j x j ( k ) - b i ( x ) < 0 , j = 1 , , n , (46) Φ 5 ( k ) ( x k ) = i = 1 m f i ( k ) ( x k ) R n , where (47) f i k x k = w i · ( a i 1 , a i 2 , , a i n ) , j = 1 n a i j x j ( k ) - b i > 0 , w i · ( a i 1 , a i 2 , , a i n ) · [ - 1,1 ] , j = 1 n a i j x j ( k ) - b i = 0 , - w i · ( a i 1 , a i 2 , , a i n ) , j = 1 n a i j x j ( k ) - b i < 0 , i = 1 , , m .

Obviously, elements of Φ 1 ( k ) ( a k ) , Φ 2 ( k ) ( a k ) , Φ 4 ( k ) ( x k ) , Φ 5 ( k ) ( x k ) depend on the sign of the corresponding expression from (41), (43), (45), and (47), respectively, and therefore they depend on the current values a j ( k ) , j = 0 , , n ; a j ( k ) , j = 0 , , n ; x j ( k ) , j = 1 , , n ; x j ( k ) , j = 1 , , n , respectively.

We can choose, for example, ρ k = c / k , k = 1,2 , ; ρ 0 = 1 ; c = const > 0 . The requirements for the step size are satisfied for this choice of ρ k .

4. Computational Experiments

In this section, we present results of some computational experiments, obtained by the subgradient method for problems (5), (6), (12), and (13). As it was pointed out, only for comparison, we give results obtained by the gradient method for solving the least squares problems (7) and (14). Each type of problems was run 30 times. Parameters and data were randomly generated. The computations were performed on an Intel Pentium Dual-Core CPU E5800 3.20 GHz/2.00 GB using RZTools interactive system.

For both methods (32) and (A.15), two termination tests are used: an “accuracy” stopping criterion (48) x k + 1 - x k ρ k γ k f ^ ( k ) ( x k ) ρ k f ( x k ) < ɛ , where ɛ > 0 is some given (or chosen) tolerance value, and an upper limit criterion on the number of iterations.

Example 1 (for problem (<xref ref-type="disp-formula" rid="EEq1">1</xref>)).

Consider problem with m = 300 , n = 5 , p = 10 , and ɛ = 1 0 - 6 .

Results: see Table 1.

 By method (32) By method (32) By method (A.15) for problem (5) for problem (6) for problem (7) Iterations 101 Iterations 98 Iterations 97 Run time 0.00045 s Run time 0.00055 s Run time 0.00035 s
Example 2 (for problem (<xref ref-type="disp-formula" rid="EEq1">1</xref>)).

Consider problem with m = 300 , n = 5 , p = 12 , and ɛ = 1 0 - 6 .

Results: see Table 2.

 By method (32) By method (32) By method (A.15) for problem (5) for problem (6) for problem (7) Iterations 103 Iterations 103 Iterations 96 Run time 0.00037 s Run time 0.000375 s Run time 0.00038 s
Example 3 (for problem (<xref ref-type="disp-formula" rid="EEq1">1</xref>)).

Consider problem with m = 300 , n = 4 , p = 15 , and ɛ = 1 0 - 6 .

Results: see Table 3.

 By method (32) By method (32) By method (A.15) for problem (5) for problem (6) for problem (7) Iterations 100 Iterations 105 Iterations 82 Run time 0.00015 s Run time 0.00017 s Run time 0.00006 s
Example 4 (for problem (<xref ref-type="disp-formula" rid="EEq9">11</xref>)).

Consider problem with m = 500 , n = 20 , and ɛ = 1 0 - 6 .

Results: see Table 4.

 By method (32) By method (32) By method (A.15) for problem (12) for problem (13) for problem (14) Iterations 100 Iterations 104 Iterations 108 Run time 0.00065 s Run time 0.0018 s Run time 0.0019 s
Example 5 (for problem (<xref ref-type="disp-formula" rid="EEq9">11</xref>)).

Consider problem with m = 500 , n = 22 , and ɛ = 1 0 - 6 .

Results: see Table 5.

 By method (32) By method (32) By method (A.15) for problem (12) for problem (13) for problem (14) Iterations 108 Iterations 118 Iterations 111 Run time 0.0048 s Run time 0.0051 s Run time 0.0049 s
Example 6 (for problem (<xref ref-type="disp-formula" rid="EEq9">11</xref>)).

Consider problem with m = 500 , n = 28 , and ɛ = 1 0 - 6 .

Results: see Table 6.

 By method (32) By method (32) By method (A.15) for problem (12) for problem (13) for problem (14) Iterations 102 Iterations 119 Iterations 101 Run time 0.00375 s Run time 0.0039 s Run time 0.0037 s

Examples 7 and 8 below present results for simple particular problems of the forms (1) and (11), respectively.

Example 7 (problem (<xref ref-type="disp-formula" rid="EEq1">1</xref>)).

Consider (49) x R 1 - 2 - 1 0 1 2 3 f ( x ) - 4 15 1 10 7 6 w i = 1 , i = 1 , , 6 ; n = 2 .

Results: see Table 7.

Therefore, algebraic polynomials obtained by the two methods are (50) P 2 1 ( x ) = - 0.22 x 2 + 2.21 x + 1.27 , (51) P 2 ( 2 ) ( x ) = - 0.975 x 2 + 1.986 x + 7.904 , respectively.

 By method (32) By method (A.15) for problem (5) for problem (7) a 0 * = 1.27 a 0 * = 7.904 a 1 * = 2.21 a 1 * = 1.986 a 2 * = −0.22 a 2 * = −0.97 Φ 1 ( a * ) = 25.4635 Φ 3 ( a * ) = 12.9625 Iterations 101 Iterations 106 Run time 0.00135 s Run time 0.0019 s
Example 8 (Problem (<xref ref-type="disp-formula" rid="EEq9">11</xref>)).

Consider the system of linear equations

x 1 + x 2 = 1 ,

x 2 + x 3 = 1 ,

x 1 + x 3 = 1 ,

x 1 + x 2 + x 3 = 1 .

Results: see Table 8.

 By method (32) By method (32) By method (A.15) for problem (12) for problem (13) for problem (14) x 1 * = 0.3945 x 1 * = 0.4999 x 1 * = 0.4261 x 2 * = 0.4016 x 2 * = 0.4999 x 2 * = 0.4261 x 3 * = 0.3946 x 3 * = 0.4999 x 3 * = 0.4261 Φ 4 ( x * ) = 0.2107 Φ 5 ( x * ) = 0.500 Φ 6 ( x * ) = 0.3780 Iterations 101 Iterations 84 Iterations 18 Run time 0.0011 s Run time 0.0008 s Run time 0.00165 s
5. Conclusions

Computational experiments presented above, as well as many other experiments, allow us to conclude that the subgradient method (32), applied for minimizing the nondifferentiable functions Φ 1 ( a ) , Φ 2 ( a ) and Φ 4 ( x ) , Φ 5 ( x ) , is computationally comparable with the gradient method (A.15), applied to the corresponding “differentiable” problems (25) and (26), based on l 2 -norm, respectively. For some problems, the gradient method gives better results with respect to number of iterations and therefore with respect to run time. However, in many cases it is preferable to approximate with respect to either l 1 -norm (2) or l -norm (3) instead of using the l 2 -approximation.

Appendix A. Review of Some Results A.1. Review of Some Known Results

In this section, some known results, called propositions, used in this paper, are recalled without proofs.

The following Weierstrass theorem and the corollary turn out to be useful concerning solvability of the problems under consideration.

Proposition A.1 (Weierstrass, e.g., [<xref ref-type="bibr" rid="B20">20</xref>, Theorem C.4.1]).

A lower (upper) semicontinuous function f , defined on a compact set X in R n , is bounded from below (above) and attains in X the value (A.1) α ¯ = inf x X f ( x ) [ β ¯ = sup x X f ( x ) ] .

Corollary A.2.

Let X be a closed set in R n and let function f : X R be lower (upper) semicontinuous in X and lim k f ( x k ) = +    ( lim k f ( x k ) = - ) for each sequence { x k } k = 1 X such that lim k x k = + . Then f attains on X the value (A.2) α ¯ = inf x X f ( x ) [ β ¯ = sup x X f ( x ) ] .

Proposition A.1 and Corollary A.2 mean that, under their assumptions, problem (A.3) min x X f ( x ) [ max x X f ( x ) ] has a minimim [maximum] solution.

Since a continuous function is both lower and upper semicontinuous, then Proposition A.1 and Corollary A.2 are also valid for continuous functions.

Proposition A.3 (nonnegative linear combinations of convex and concave functions, [<xref ref-type="bibr" rid="B20">20</xref>, Theorem 4.1.6]).

Let f i , i = 1 , , m , be numerical functions defined on X R n . If f i are convex (concave), then each linear combination with nonnegative coefficients of these functions (A.4) f ( x ) = def i = 1 m α i f i ( x ) , α i 0 , i = 1 , , m , is convex (concave).

If f i are convex (concave) and at least one of them is strictly convex (strictly concave) and corresponding α i is positive, then the function f ( x ) defined above is strictly convex (strictly concave).

Proposition A.4 (convexity of the supremum of a family of convex functions, [<xref ref-type="bibr" rid="B20">20</xref>, Theorem 4.1.13]).

Let f i : X    R , i I , be convex functions which are bounded from above on the convex set X in R n . Then the function (A.5) f ( x ) = def sup i I f i ( x ) is convex on X .

f is strictly convex if each f i is strictly convex and I is finite.

Recall that a vector f ^ ( x ¯ ) is said to be a subgradient or a generalized gradient of f at x ¯ if (A.6) f x - f x ¯ f ^ x ¯ , x - x ¯ , for any x R n , where x , y denotes the inner (scalar) product of x , y R n .

The set containing all subrgadients of f ( x ) at x ¯ is said to be a subdifferential f ( x ¯ ) of f at x ¯ .

If f ( x ) is differentiable at x ¯ , then f ^ ( x ¯ ) = f ( x ¯ ) and f ( x ¯ ) = { f ( x ¯ ) } is a singleton, where f ( x ¯ ) is the gradient of f ( x ) at x ¯ .

Proposition A.5 (subdifferential of a product of convex function with a positive real number).

Let f :    R n    R be a convex function. Then ( α f ) ( x 0 ) = α f ( x 0 ) for each scalar α > 0 .

Proposition A.6 (subdifferential of a sum of convex functions, [<xref ref-type="bibr" rid="B24">24</xref>, Theorem 23.8]).

Let f ( x ) = i = 1 m f i ( x ) , where f i are proper convex functions on R n , and the convex sets r i ( dom f i ) , i = 1 , , m , have a point in common, where “ri” stands for relative interior of a set and “ dom ” stands for effective domain of a function. Then f ( x ) = i = 1 m f i ( x ) for each x R n .

Proposition A.7 (subdifferential of a maximum of convex functions, [<xref ref-type="bibr" rid="B14">14</xref>, Lemma 5.4]).

Let f i ( x ) , i = 1 , , m , be convex functions on R n and (A.7) f ( x ) = max i = 1 , , m f i ( x ) . Then (A.8) f ( x ) = co { f i ( x ) : i I ( x ) } , where (A.9) I ( x ) = { i { 1 , , m } : f i ( x ) = f ( x ) } and co X denotes the convex hull of X .

Proposition A.8 (convexity, strict convexity, concavity, and strict concavity of differentiable multivariate functions, [<xref ref-type="bibr" rid="B20">20</xref>, Theorems 6.1.2 and 6.2.2]).

Let f be a numerical differentiable function on an open convex set X in R n . f is convex on X if and only if (A.10) f x 1 - f x 2 f x 2 , x 1 - x 2 , for each x 1 , x 2 X . Similarly, f is concave on X if and only if, for each x 1 , x 2 X , (A.11) f ( x 1 ) - f ( x 2 ) f x 2 , x 1 - x 2 .

f is strictly convex (strictly concave) on X if and only if these inequalities are strict, respectively, for each x 1 , x 2 X , x 1 x 2 .

Proposition A.9 (uniqueness of the optimal solution to a strictly convex program, [<xref ref-type="bibr" rid="B20">20</xref>, Theorem 5.2.2]).

Let X be a convex set in R n , let f be a strictly convex numerical function on X , and let x * be a solution to the minimization problem min x X f ( x ) . Then x * is the unique solution to this problem.

Proposition A.10 (Fermat’s generalized rule).

Let f :    R n R be convex and let X be a nonempty convex set in R n . The point x * is an optimal solution to the minimization problem (A.12) min x X f ( x ) if and only if there exists a subgradient x ^ f ( x * ) such that for each x X the following inequality holds true: (A.13) x ^ , x - x * 0 . In particular, if X = R n , x * is an optimal solution to the minimization problem min x R n f ( x ) if and only if 0 f ( x * ) .

Proposition A.11 (existence of element of best approximation, e.g., [<xref ref-type="bibr" rid="B17">17</xref>, Propositions 1.3.1 and 1.3.2]).

Let L be a linear subspace of the normed linear space X and let L be generated by the linearly independent elements { φ j } j = 0 n of X . Then for each f X there exists an element of best approximation in L .

Proposition A.12 (uniqueness of the element of best approximation, e.g., [<xref ref-type="bibr" rid="B17">17</xref>, Proposition 1.3.3]).

If X is a strictly convex space, then the element of best approximation is unique.

A.2. The Gradient Method for Differentiable Functions

In order to compare the results, obtained by the subgradient method for nonsmooth optimization for problems (5) [(6)] and (12) [(13)], with the corresponding results, obtained by methods for “differentiable” optimization for problems (25) and (26), respectively, consider the iterative gradient method for solving the “differentiable” unconstrained minimization problem (A.14) min f ( x ) , where f :    R n R .

The gradient method for solving problem (A.14) is defined through (A.15) x k + 1 = x k - ρ k f ( x k ) , k = 0,1 , , where x 0 R n is an arbitrary initial guess (initial approximation); ρ k ( 0 ) is a step size; f ( x k ) is the unique gradient of the differentiable function f ( x ) at x k .

We use, for example, a line search method for choosing the step size ρ k . The gradient method with such a choice of step size is known as the steepest descent method. The value of ρ k is an optimal solution to the following single-variable problem of ρ : (A.16) min f x k - ρ f x k , subject to ρ 0 ; that is, (A.17) f ( x k - ρ k f ( x k ) ) = min ρ 0 f ( x k - ρ f ( x k ) ) .

An alternative way of choosing the step length ρ k is the so-called doubling method. Set, for example, ρ 0 = 1 . Choose ρ k = ρ k - 1 . If f ( x k + 1 ) < f ( x k ) , then ρ k : = 2 ρ k - 1 . If f ( x k + 1 ) < f ( x k ) again, then this doubling continues until f ( x ) stops to decrease. If f ( x k + 1 ) f ( x k ) , then ρ k : = ( 1 / 2 ) ρ k - 1 . If f ( x k - ( 1 / 2 ) ρ k - 1 f ( x k ) ) < f ( x k ) , then x k + 1 : = x k - ( 1 / 2 ) ρ k - 1 f ( x k ) ; go to iteration k + 2 . If f ( x k - ( 1 / 2 ) ρ k - 1 f ( x k ) ) f ( x k ) , then ρ k : = ( 1 / 4 ) ρ k - 1 , and so on.

The gradient method (A.15) can be considered as a special case of subgradient method (32) (with γ k = 1 ) when the function f to be minimized is differentiable.

Gradients of Φ 3 2 ( a ) and Φ 6 2 ( x ) , respectively, at iteration k are (A.18) Φ 3 2 k ( a k ) = f 0 k a k , , f n k a k , where (A.19) f l k a k = 2 i = 1 m w i j = 0 n a j k φ j x i - f x i φ l x i , hhhhhhhhhhhhhhhh l = 0,1 , , n , (A.20) Φ 6 2 k ( x k ) = ( f 1 ( k ) ( x k ) , , f n ( k ) ( x k ) ) , where (A.21) f l ( k ) ( x k ) = 2 i = 1 m w i a i l j = 1 n a i j x j ( k ) - b i , l = 1 , , n .

Theorem A.13 (rate of convergence of the steepest descent method, e.g., [<xref ref-type="bibr" rid="B5">5</xref>, Theorem 8.6.3]).

Let f : R n R , f C 2 ( R n ) , let there exist positive constants m and M such that (A.22) m y 2 f ′′ x y , y M y 2 for any x R n and y R n , and let sequence { x k } be generated by the steepest descent method (method (A.15) with ρ k determined by a line search method).

Then f has a unique minimum solution x * R n and for each x R n the following inequality holds true: (A.23) f ( x ) 2 m 1 + m M ( f ( x ) - f ( x * ) ) . Further, there exist constants q and C : 0 q 1 , C > 0 such that (A.24) f x k + 1 - f x * q k f x 0 - f x * , x k + 1 - x * C q k / 2 , k = 0,1 , .

Conflict of Interests

The author declares that there is no conflict of interests regarding the publication of this paper.

Andersen K. D. An efficient Newton barrier method for minimizing a sum of Euclidean norms SIAM Journal on Optimization 1996 6 1 74 95 MR1377726 10.1137/0806006 ZBL0842.90105 2-s2.0-0030304772 Barrodale I. Roberts F. An improved algorithm for discrete l 1 linear approximation SIAM Journal on Numerical Analysis 1973 10 5 839 848 10.1137/0710069 Barrodale I. Roberts F. An efficient algorithm for discrete l 1 linear approximation with constraints SIAM Journal on Numerical Analysis 1978 15 3 603 611 10.1137/0715040 Bartels R. H. Conn A. R. Sinclair J. W. Minimization techniques for piecewise differentiable functions: the l 1 solution to an overdetermined linear system SIAM Journal on Numerical Analysis 1978 15 2 224 241 10.1137/0715015 MR0501831 Bazaraa M. S. Sherali H. D. Shetty C. M. Nonlinear Programming. Theory and Algorithms 1993 2nd New York, NY, USA John Wiley & Sons Bertsekas D. P. A new class of incremental gradient methods for least squares problems SIAM Journal on Optimization 1997 7 4 913 926 10.1137/S1052623495287022 MR1479606 2-s2.0-0031285678 ZBL0887.49025 Björck Å. Numerical Methods for Least Squares Problems 1996 Philadelphia, Pa, USA Society for Industrial and Applied Mathematics (SIAM) MR1386889 10.1137/1.9781611971484 Calamai P. H. Conn A. R. A stable algorithm for solving the multifacility location problem involving Euclidean distances Society for Industrial and Applied Mathematics 1980 1 4 512 526 10.1137/0901037 MR610761 Calamai P. H. Conn A. R. A projected Newton method for l p norm location problems Mathematical Programming 1987 38 1 75 109 10.1007/BF02591853 MR899010 2-s2.0-0023382398 Chatelon J. A. Hearn D. W. Lowe T. J. A subgradient algorithm for certain minimax and minisum problems Mathematical Programming 1978 15 2 130 145 MR509957 10.1007/BF01609012 ZBL0392.90065 2-s2.0-34250284369 Clarke F. Optimization and Nonsmooth Analysis 1990 5 Philadelphia, Pa, USA SIAM Classics in Applied Mathematics MR1058436 10.1137/1.9781611971309 Coleman T. F. Li Y. A global and quadratically convergent method for linear l problems SIAM Journal on Numerical Analysis 1992 29 4 1166 1186 MR1173192 2-s2.0-0026909510 10.1137/0729071 Coleman T. F. Li Y. A globally and quadratically convergent affine scaling method for linear l 1 problems Mathematical Programming 1992 56 1–3 189 222 MR1183647 2-s2.0-0027114329 10.1007/BF01580899 Demyanov V. F. Vasiliev L. V. Nondifferentiable Optimization 1985 Berlin, Germany Springer Fischer J. An algorithm for discrete linear l p approximation Numerische Mathematik 1981 38 1 129 139 10.1007/BF01395812 MR634756 2-s2.0-0037505100 Kantorovich L. Akilov G. Functional Analysis 1983 Moscow, Russia Nauka (Russian) Korneichuk N. P. Extremum Problems of Approximation Theory Moscow, Russia Nauka 1976 (Russian) Lawson C. L. Hanson R. J. Solving Least Squares Problems 1995 Philadelphia, Pa, USA SIAM MR1349828 10.1137/1.9781611971217 Li Y. A globally convergent method for l p problems SIAM Journal on Optimization 1993 3 3 609 629 10.1137/0803030 MR1230159 Mangasarian O. L. Nonlinear Programming 1994 10 Philadelphia, PA, USA SIAM Classics in Applied Mathematics Merle G. Späth H. Computational experiences with discrete l p -approximation Computing 1974 12 4 315 321 10.1007/BF02253335 MR0411125 2-s2.0-0015967278 Overton M. L. A quadratically convergent method for minimizing a sum of Euclidean norms Mathematical Programming 1983 27 1 34 63 ZBL0536.65053 2-s2.0-0003150632 MR712109 10.1007/BF02591963 Remez E. Fundamentals of Numerical Methods for Chebyshev Approximation Kiev, Ukraine Naukova Dumka 1969 (Russian) Rockafellar R. T. Convex Analysis 1997 Princeton, NJ, USA Princeton University Press MR1451876 Stefanov S. M. Polynomial algorithms for projecting a point onto a region defined by a linear constraint and box constraints in R n Journal of Applied Mathematics 2004 2004 5 409 431 10.1155/S1110757X04309071 MR2108371 2-s2.0-20444402729 Stefanov S. M. Well-posedness and primal-dual analysis of some convex separable optimization problems Advances in Operations Research 2013 2013 10 279030 MR3047386 10.1155/2013/279030 Watson G. A. On two methods for discrete l p approximation Computing 1977 18 3 263 266 10.1007/BF02253212 MR0451628 2-s2.0-34250293925 Wolfe J. M. On the convergence of an algorithm for discrete l p approximation Numerische Mathematik 1979 32 4 439 459 10.1007/BF01401047 MR542206 2-s2.0-0008368994 Xue G. Ye Y. An efficient algorithm for minimizing a sum of Euclidean norms with applications SIAM Journal on Optimization 1997 7 4 1017 1036 10.1137/S1052623495288362 MR1479612 2-s2.0-0031285688 Yosida K. Functional Analysis 1995 Berlin, Germany Springer Classics in Mathematics MR1336382 Yu J. Vishwanathan S. V. N. Günter S. Schraudolph N. N. A quasi-Newton approach to nonsmooth convex optimization problems in machine learning The Journal of Machine Learning Research 2010 11 1145 1200 MR2629829 2-s2.0-77951160087 Zhuravlev Y. I. Laptin Y. Vinogradov A. Zhurbenko N. Likhovid A. Non smooth optimization methods in the problems of constructing a linear classifier International Journal Information Models and Analyses 2012 1 103 111