An Active Set Smoothing Method for Solving Unconstrained Minimax Problems

In this paper, an active set smoothing function based on the plus function is constructed for the maximum function. The active set strategy used in the smoothing function reduces the number of gradients and Hessians evaluations of the component functions in the optimization. Combing the active set smoothing function, a simple adjustment rule for the smoothing parameters, and an unconstrained minimization method, an active set smoothing method is proposed for solving unconstrained minimax problems. The active set smoothing function is continuously differentiable, and its gradient is locally Lipschitz continuous and strongly semismooth. Under the boundedness assumption on the level set of the objective function, the convergence of the proposed method is established. Numerical experiments show that the proposed method is feasible and efficient, particularly for the minimax problems with very many component functions.

In [27], the following aggregate function (also called the exponential penalty function) induced from Jaynes' maximum entropy principle was introduced: where t > 0 is the smoothing parameter. It approaches F(x) uniformly with respect to x ∈ R n as the smoothing parameter goes to 0, and has been wildly used in the smoothing methods for solving the minimax problems. Its gradient can be written as follows: with which is a convex combination of the gradients of all the component functions, and its Hessian is a complicated combination of the gradients and Hessians of all component functions. erefore, for the maximum function with very many nonlinear component functions, the evaluations for the gradient and Hessian of the aggregate function always consume a large amount of computation.
For the minimax problems with very many component functions, several active set strategies have been developed for the smoothing methods to reduce the number of gradients or Hessians evaluations of the component functions at each iteration. In [18], the following active set smoothing function for F(x, α) � α + j∈Q max(f j (x) − α, 0) was presented: where t > 0 is the smoothing parameter, e active set used in F t (x, α) at (x, α) ∈ R n × R can be written as follows: In [28], a cubic spline smoothing function for F(x) was presented. For any smoothing parameter t > 0, the active set used in the cubic spline smoothing function at x ∈ R n can be represented as follows: In [25], an active set strategy for the aggregate function was introduced. For a given ϵ > 0, let en, the active set Q k used for the aggregate function at x k ∈ R n is updated as In [26], another active set strategy for the aggregate function was presented. For any smoothing parameter t > 0, the active set used for the aggregate function at x ∈ R n is defined as follows: where ϵ(t, q) > 0 is a complicated combination of several parameters.
In this paper, based on the plus function, an active set smoothing function for the maximum function is proposed, and the smoothing function only relates to a part of component functions, whose function values are close to F(x). It is continuously differentiable, and its gradient is locally Lipschitz continuous and strongly semismooth. Combing the active set smoothing function, a geometric reduction rule for the smoothing parameters, the Armijo line search strategy, the steepest decent direction, and the Newton direction, an active set smoothing method is proposed for solving unconstrained minimax problems. Under the boundedness assumption on the level set of F(x), the convergence of the active set smoothing method is established. Numerical experiments show that the resulting method is stable and efficient, especially for the minimax problems with very many component functions. e following assumptions and results will be used in this paper: Assumption 1: the component functions f j : R n ⟶ R, j ∈ Q, are twice continuously differentiable, and ∇f j : R n ⟶ R n , j ∈ Q, is strongly semismooth Assumption 2: for any M > 0, the level set Definition 1 (see [29]). Suppose that Φ: R n ⟶ R m is locally Lipschitz continuous, if for any V ∈zΦ(x + h), h ⟶ 0, where zΦ(x) is the generalized Jacobian of Φ at x, Φ j ′ (x; h) is the directional derivative of Φ j at x in the direction h for j � 1, . . . , m, and then Φ is said to be strongly semismooth at x.
Lemma 1 (see [29]). Suppose that ϕ: R ⟶ R and ψ: R ⟶ R are strongly semismooth, then strongly semismooth (iv) ϕ°ψ: R ⟶ R is strongly semismooth Theorem 1 (see [24]). Suppose that the component functions f j : R n ⟶ R, j ∈ Q, are continuously differentiable. If x * is a local minimizer of problem (1), then where and conv A { } denotes the convex hull of A.

An Active Set Smoothing Function for the Maximum Function
In this section, based on the plus function z + : R ⟶ R + : we construct the following function F c t : R n × R ⟶ R: where t > 0 is the smoothing parameter and c > 0 is the scaling parameter. By the definition of the plus function, Assumption 1, and t > 0, we have the following result.

Lemma 4. For any
For any t > 0, c > 0, and (x, α) ∈ R n × R, let then we know that only relates to the component functions f j for j ∈ Q c t (x, α), whose function values are close to F(x). erefore, F c t (x, α) is called an active set smoothing function for the maximum function in this paper. By direct calculation, we can obtain the gradient of F c t (x, α): which can be also written as follows: Lemma 5. For any t > 0, c > 0, and (x, α) ∈ R n × R, Proof. If cα > F(x), by t > 0, we have By (23) and (24), the conclusion holds.

An Active Set Smoothing Method and Its Convergence
In this section, based on the active set smoothing function F c t (x, α) for F(x) and the smoothing methods introduced in [24], an active set smoothing method is proposed to solve problem (1). For a starting point (x 0 , α 0 ) ∈ R n × R and an initial smoothing parameter t 0 > 0, the initial scaling parameter c 0 > 0 is chosen from a bounded region in Subroutine 1, which reduces the ill-conditioning of V c 0 t 0 (x 0 , α 0 /c 0 ) caused by the scaling problem of the variable α ∈ R; then, α 0,0 is set to be α 0 /c 0 . e Armijo line search strategy, the steepest decent direction, and the Newton direction, in which the selection of the search direction depends on the condition number of V c 0 t 0 (x k,0 , α k,0 ) and two convergence conditions for k ≥ 0, are used to compute an approximate solution (x k 0 ,0 , α k 0 ,0 ) of the smoothing problem P c 0 t 0 : min en, the smoothing parameter t 0 geometrically reduces to t 1 , the scaling parameter c 1 is chosen from a bounded region in Subroutine 1, α k 0 ,0 is updated to α k 0 +1,1 in two ways to balance the efficiency and convergence of the resulting algorithm in Subroutine 1, and the smoothing problem P c 1 t 1 : min is solved with the starting point (x k 0 ,0 , α k 0 +1,1 ). By repeating this process, a sequence of smooth, unconstrained optimization problems is solved. As the smoothing parameters t i go to 0, a solution of problem (1) can be obtained by the solutions of the smoothing problems P c i t i .
Step 1: (compute the search direction) compute the condition number go to Step 2. Else, compute the steepest decent direction go to Step 2.
Step 5: (adjustment of the smoothing parameter) set � ω t t i , and c i+1 � c i , replace i by i + 1 and k by k + 1, and go to Subroutine 1. Subroutine 1: adjustment of the scaling parameter.
Compute the condition numbers Remark 1. In Subroutine 1 for adjusting the scaling pa- 6 Mathematical Problems in Engineering which keeps the monotonicity of F c i t i (x k,i , α k,i ) with respect to k in Lemma 13.

Lemma 12. Suppose that Assumption 1 holds, then for any bounded set
where λ c t (x, α) is the stepsize computed in Step 2 of Algorithm 1.
By Lemma 9, ∇F c t (x, α): R n × R ⟶ R n+1 is locally Lipschitz continuous, and then, there exists a Lipschitz constant L S > 0 such that for any ( For any (x, α) ∈ S and λ ∈ (0, 1], by the mean value theorem, there exists a ξ ∈ (0, 1) such that where the second inequality comes from (58) and the last inequality comes from (56), (57), and ξ ∈ (0, 1). Let then it follows from (59) that for any λ ∈ (0, λ * ], and hence, λ c t (x, α) ≥ βλ * . erefore, by (57), we have en, by c, κ * 2 ∈ (0, 1], the conclusion holds for □ Lemma 13. Suppose that Assumptions 1 and 2 hold and τ(t i ) ≤ ω t c i , then for any (x 0 , α 0 ) ∈ R n × R, t 0 > 0 and c 0 > 0, the sequence (x k,i , α k,i ) generated by Algorithm 1 satisfies the following: , by Lemma 12, there exists a l k,i ≥ 0 such that Mathematical Problems in Engineering 7 then we know which implies that the conclusion holds. (ii) By t i+1 � ω t t i and ω t ∈ (0, 1), for any t 0 > 0 and t ∈ (0, 1), there exists an i such that for any i ≥ i, en, by τ(t i ) ≤ ω t c i < c i and t i > 0, we have Let t be the variable of the function F c t (x, α) for given (x, α) ∈ R n × R and c > 0, which is redefined as F c x,α (t), then by the mean value theorem, there exists a t ∈ (ω t t i , t i ) such that with (67) and (68) and which implies By t > ω t t i > 0, we have en, we know that (71) also holds and hence which means By x k+1,i+1 � x k,i and c i+1 α k+1,i+1 � c i α k,i according to (53) and (66), we have en, by (76) and t i+1 � ω t t i , we know is monotone decreasing with respect to k by (65) and (78). (iii) By (i), for any 0 ≤ k ≤ k 0 , we have and for any 1 By (ii), for any i ≥ i and By the finiteness of i satisfying i < i, (79), (80), and (81), there exists a constant F 0 ∈ R such that for any (x k,i , α k,i ), Suppose that for any M > 0, there exists a point by the definition of the plus function and 0 < t i ≤ t 0 , we have If c i α k,i < M − t i , by F(x k,i ) > M and t i > 0, we have However, the arbitrariness of M, (83), and (85) contradict x k,i ∈ Ω F 0 ; then, there must exist a M * > 0 such that F(x k,i ) ≤ M * for any F(x k,i ) ≤ M * and i ≥ 0. erefore, we have that x k,i ⊆ Ω M * , and hence, the sequence x k,i is bounded by Assumption 2. (iv) For any (x k,i , α k,i ) with k ≥ 0 and i ≥ 0, if Q c i t i (x k,i , α k,i ) � ∅, by the definition of Q c t (x, α), we know (1 + ((f j (x k,i ) − c i α k,i )/t i )) + � 0 for any j ∈ Q by t i > 0. en, by (x k,i , α k,i ) ∈ Ω F 0 , we have Hence, by c i ≥ C l > 0, we know en, by 0 < t i ≤ t 0 and c i ≥ C l > 0, we know erefore, the sequence α k,i is bounded by (87) and (89). □ Lemma 14. Suppose that Assumptions 1 and 2 hold, then for any (x 0 , α 0 ) ∈ R n × R and t 0 > 0, the sequences (x k,i , α k,i ) and t i generated by Algorithm 1 satisfy the following: (i) If there exists an i ≥ 0 such that the sequence and k ⟶ ∞. By Lemma 12 and the boundedness of (x k,i , α k,i ) from Lemma 13, there exists a constant C i > 0 such that which contradicts that (x k,i , α k,i ) is bounded and , then t i is updated to t i+1 � ω t t i . erefore, by ω t ∈ (0, 1), we know that i ⟶ ∞ as k ⟶ ∞, and hence, the sequence t i is infinite and strictly monotone decreasing, t i ⟶ 0 as k ⟶ ∞. □ Theorem 3. Suppose that Assumptions 1 and 2 hold, (x k,i , α k,i ) is the sequence generated by Algorithm 1, then for any accumulation point (x * , α * ) ∈ R n × R of (x k i ,i , α k i ,i ) , 0 ∈zF(x * ), i.e., x * is a stationary point of problem (1).
Proof. By Lemmas 13 and 14, the sequence (x k i ,i , α k i ,i )} ⊆ (x k,i , α k,i ) is infinite and bounded; then, there exists at least one accumulation point of (x k i ,i , α k i ,i ) . For any accumulation point (x * , α * ) of (x k i ,i , α k i ,i ) , there exists a subsequence of (x k i ,i , α k i ,i ) (denoted also by (x k i ,i , α k i ,i ) for convenience) converging to (x * , α * ). By Lemma 14, It follows from (32) and (92) that (96) Hence, by the finiteness of indexes in Q∖I(x * ), there exists an i 0 > 0 such that for any i > i 0 , which implies , and by the finiteness of indexes in I(x * ), there exists a subsequence of ( erefore, by x k i ,i ⟶ x * , (93), (98), (99), and Assumption 1, we have which implies that x * is a stationary point of problem (1).

Numerical Experiments
In this section, we present the numerical results of Algorithm 1 and several related algorithms for solving unconstrained minimax problems. Algorithm 1 is recorded as ASSF. Fminimax is the MATLAB algorithm "fminimax". Fmincon is the MATLAB algorithm "fmincon" applied to which is equivalent to problem (1). To show the efficiency of the proposed active set smoothing function, we replace it by some other smoothing techniques in Algorithm 1 to obtain several smoothing methods. AF, SSF, and SPF are constructed by Algorithm 1 with aggregate function (2), cubic spline smoothing function introduced in [28], and exact penalty function technique introduced in [18], respectively. TAF and ASAF are constructed by Algorithm 1 and the aggregate function with the active set strategies introduced in [25,26], respectively. e parameters in Algorithm 1 are set as follows:

(102)
For the moderately sized test problems, t 0 and α 0 are set as follows: and then, we have which implies that ∇ x F c 0 t 0 (x 0 , α 0 ) is a convex combination of the gradients of all the component functions. For the test problems with very many component functions, t 0 is set as is computed by the bisection method according to For the algorithm ASAF, the parameter ϵ in (10) is set as for the moderately sized test problems, and for the test examples with very many component functions. For the algorithm TAF, the parameter ϵ(t, q) in (12) is set as with ϵ 1 � 0.1, ϵ 2 � 0.01, ϵ 3 � 0.01, and ϵ 4 � 0.1. For the algorithms AF, ASAF, TAF, SSF, and SPF, the initial smoothing parameters are set to t 0 � 1. e termination criterion for the algorithm AF is set as e termination criteria for the algorithms ASSF, SSF, SPF, ASAF, and TAF are set as or where ∇F t (x) represents the gradient of the smoothing function with respect to the variable x and x * AF is the approximation solution computed by the algorithm AF. e numerical results were obtained by running MATLAB R2014a on a laptop with Inter(R) Core(TM) i5-7300HQ CPU 2.50GHZ and 4.00 GB memory.
We carry out a comparison on three categories of test problems described in the Appendix. e first category of problems, Examples 1-10, emanates from the discretized semi-infinite minimax problems, and the number of the component functions is at least 1000. e second category of problems, Example 11, possesses many variables and many component functions. e third category of problems, Examples 12-45, is composed by various moderately sized test problems. Tables 1-3 list the CPU time; Tables 4-6 list the  number of function evaluations and iterations; Tables 7 and  8 list the average proportion of the component functions used in the active set strategy; the word fail 1 means that the stepsize cannot be computed in the region [10 − 10 , 1]; the word fail 2 means that the number of iterations in Fminimax or Fmincon reaches the upper limit; the word fail 3 means that the CPU time exceeds 3600 seconds. In order to make the advantages of Algorithm 1 clearer and more explicit, the corresponding Dolan-Morée � performance profiles proposed in [32] are shown in Figures 1-3 for three categories of examples above. For all the test problems with very many component functions, we see that Algorithm 1 is predominantly faster than other algorithms from Tables 1 and 2 and Figures 1 and  2, the proposed active set strategy results in more significant reduction of gradient evaluations than the active set strategies in [18,25,26,28] from Tables 7 and 8, and Fminimax and Fmincon have poor stability and low efficiency. For most moderately sized test problems, we see from Tables 3 and 6 and Figure 3 that Algorithm 1 requires fewer iterations and function evaluations and takes less CPU time than the other algorithms considered.

Conclusion
We have proposed an active set smoothing function for the maximum function by using the plus function and an active set smoothing method with convergence analysis for solving unconstrained minimax problems. e active smoothing function can be simply implemented in the smoothing methods. Compared with the similar smoothing algorithms based on other smoothing techniques, and the algorithms in the MATLAB environment, the proposed algorithm is competitive for wide moderately sized problems and dramatically efficient for the problems with very many component functions.