A Derivative-Free Affine Scaling LP Algorithm Based on Probabilistic Models for Linear Inequality Constrained Optimization

In this paper, a derivative-free aﬃne scaling linear programming algorithm based on probabilistic models is considered for solving linear inequality constrainted optimization problems. The proposed algorithm is designed to build probabilistic linear polynomial interpolation models using only n +1 interpolation points for the objective function and reduce the computation cost for building interpolation function. We build the aﬃne scaling linear programming methods which use probabilistic or random models and aﬃne matrix within a classical linear programming framework. The backtracking line search technique can guarantee monotone descent of the objective function, and by using this technique, the new iterative points are located within the feasible region. Under some reasonable conditions, the global and local fast convergence of the algorithm is shown, and the results of numerical experiments are reported to show the eﬀectiveness of the proposed algorithm.


Problem Description Motivation.
In this paper, we consider the following nonlinear optimization problem with linear inequality constraints: where f(x) is the general nonlinear twice continuously differentiable functions, but none of their first-order or second-order derivatives is explicitly available.
1 , a T 2 , . . . , a T l ] T ∈ R l×n . Moreover, it is assumed that the interior of the feasible region intΩ � x|Ax > b { } is not empty. e focus of this paper is the analysis of a numerical scheme that utilizes affine scaling randomized models to minimize nonlinear functions with linear inequality constraints. e motivation comes from the algorithm for minimization of so-call black-box functions where values are computed. For such problems, function evaluations are costly and derivatives are typically unavailable and cannot be approximated. Hence, the derivative-free optimization is developed. Moreover, the list of applications including molecular geometry optimization, circuit design, groundwater community problems, medical image registration, dynamic pricing, and aircraft design [1]is diverse and growing. In recent years, some people developed derivativefree algorithms for solving unconstrained and constrained minimization problems. Kolda et al. [2] presented a new generating set search approach for minimizing functions subject to linear constraints. One of their main contributions is a new condition to define the set of conforming search directions that admits several computational advantages. Zhang and Conn [3] developed a framework for a class of derivative-free trust region algorithms for the least-squares minimization.
ese algorithms are designed to take advantage of the problem structure by building polynomial interpolation models for each function in the least-squares minimization. Gratton et al. [4] considered an implementation of a recursive model-based active-set trust-region method for solving bound-constrained nonlinear nonconvex optimization problems without derivatives using the technique of self-correcting geometry proposed in [5]. Kolda et al. [2] analyzed the oracle complexity of first-order and derivative-free algorithms for smooth non-convex minimization. Billups et al. [6] proposed a derivative free algorithm for optimizing computationally expensive functions with computational error. e algorithm used the trust region regression method which used weighted regression to obtain more accurate model functions at each trust region iteration. Bueno et al. [7] introduced a new method for solving a constrained optimization problem in which the derivatives of the constraints were available but the derivatives of the objective function were not. e method is based on the inexact restoration framework, and the iteration was divided in two phases. e first phase considered improving the feasibility of the constraints, and the second phase minimized a suitable objective function subjective to a linear approximation of the constraints which must be solved using derivative-free methods. Wild and Shoemarker [8] introduced the global convergence of radial basis function trust region algorithms for derivative-free optimization. eir results extend the recent work of Conn et al. [9] to fully linear models that have a nonlinear term. Xue and Sun [10] proposed a derivative-free trust region algorithm for constrained minimization problems with separable structure, where derivatives of the objective function are not available and cannot be directly approximated. ey constructed a quadratic interpolation model of the objective function around the current iterate and used the filter technique to ensure the feasibility and optimality of the iterative sequence.
ere is a variety of evidence supporting the claim that randomized models can yield both practical and theoretical benefits for deterministic optimization. As an example, the randomized coordinate descent method for large-scale convex deterministic optimization proposed in [11] yields better complexity results than cyclical coordinate descent. Most contemporary randomized methods generate random directions along which all that may be required is some minor level of descent in the objective f. Within direct search, the use of random positive spanning sets has also been recently investigated [12,13] with gains in performance and convergence theory for nonsmooth problems. Bandeira et al. [14] considered the use of a probabilistic or random model within a classical trust region framework for the optimization of deterministic smooth general nonlinear functions.

Contribution and Structure of the Paper.
In order to solve the linear inequality constrained optimization problem, the derivative-free affine scaling LP algorithm based on probabilistic models is introduced in this paper. Compared with the traditional derivative-free algorithm, the new derivative-free algorithm proposed in this paper has the following contributions: (1) e randomized numerical algorithm is designed in this paper providing accurate enough approximations such that in each iteration, a sufficiently improving step is produced with higher probability than traditional random schemes (2) e polynomial interpolation models are built only using n + 1 interpolation points, which are far less than (n + 1) (n + 2)/2 interpolation points of building the quadratic interpolation function. e computational cost of building the interpolation function using n + 1 interpolation point is less than using (n + 1) (n + 2)/2 interpolation point.
(3) e backtracking line search technique is used such that the linear programming subproblem is solved only once for obtaining the search direction, and the total computational effort for completing one iteration is less than the traditional derivative-free algorithm. Meanwhile, this technique also ensures that each iteration point is a feasible interior point.
is paper is organized as follows. In Section 2, we introduce the first-order affine scaling linear programming method based on probabilistic models. e algorithm for solving linear inequality constrained optimization problems is given in Section 3. In Section 4, we give the global convergence of the algorithm. e fast local convergence of the algorithm is given in Section 5. Some numerical results are given in Section 6. Finally, we give some conclusions in Section 7.

First-Order Affine Scaling LP Method Based on Probabilistic Models
In this section, we first introduce the construction method of a stochastic fully linear model and then give the affine scaling linear programming method based on probabilistic models. In the classical trust-region method, there need (n + 1) (n + 2)/2 interpolation points to build a quadratic interpolation model m k of function f with in the ball B(x k , Δ k ) where x k is the center point and Δ k is the radius of the ball. It is obvious that the number of interpolation points is much larger than the dimension of problem (P) when the dimension increases. is brings great difficult to computer in storage and computation. In this section, we will introduce an affine linear programming model based on probabilistic models which need only n + 1 interpolation points. It means that we will build random fashion polynomial interpolation models. Firstly, we will give the models and some assumptions of the models.

Probabilistically Fully Linear Models.
We propose the polynomial m(x) ∈ P d n as proposed in [15], which is (κ ef , κ eg ) fully linear, and we consider the following linear models, written in the form: where g k � ∇m(x k ). e following definition shows the accuracy of the model m(x k ).
Definition 1 (see [14]). We say that a function m is a Definition 2 (see [14]). We say that a sequence of random models M k is probabilistically (κ eg , κ ef ) fully linear for a corresponding sequence B(x k , δ k ) if the events S k � {M k } is (κ eg , κ ef ) fully linear for a corresponding sequence B(x k , δ k ) }satisfying the following submartingale-like condition: where , then we say that the random models are probabilistically (κ eg , κ ef ) fully linear.
In Definition 2, X k and the interpolation radii δ k are the random variables defined over the s-algebra generated by M 0 , M 1 , . . . , M k− 1 . M k depends on X k and δ k , and hence, M k depends on M 0 , M 1 , . . . , M k− 1 .

e Affine Scaling Linear Programming Method Based on
Probabilistic Models. Using the scaling matrix D k in [15], the form is as follows: According to the same convergence theory, for any x k ∈ int(Ω), the augmented affine scaling linear programming subproblem at the k-th iteration is defined as where g k � ∇m(x k ).
Same as reference [15], we have that a "good" decrease of the quadratic objective function in (L k ) along the gradient g k can lead to dual feasibility:

The Algorithm
In Section 2, we introduced the affine scaling LP method based on probabilistic models. In this section, we define the specific steps of the derivative-free affine scaling linear programming algorithm based on probabilistic models for solving problem (P).

Initialization
Step. Fix the positive parameter

Main
Step (2) Compute the affine scaling linear programming subproblem (Lk) and obtain the step d d k .

Remark 1.
Each realization of the algorithm defines a sequence of realizations for the corresponding random variables, in particular,

Remark 2.
Similarly, the definition of α k in [15], for the search direction d k , we have A key property of the scalar α k is that an arbitrary step α k d k to the point x k + α k d k does not violate any linear inequality constraints. To see this, we first observe that if − (a T i

Mathematical Problems in Engineering
which means that the i-th linear strict inequality constraint holds.
Hence, (12) and (13) mean that no matter what cases the inequality a T i (x k + α k d k ) > b i for all any i � 1, 2, . . . , l holds.

Global Convergence
In this section, we will introduce the global convergence of our algorithm, which means that the algorithm proposed in this paper converges to the first-order critical points. For the purpose of proving the global convergence of the algorithm, we assume that the initial iterate point x 0 is given. en, all the subsequent iterates belong to the level set However, the failed iterates may lie outside this set. In the setting considered in this paper, all potential iterates are restricted to the region where Δ max is the upper bound on the size of the interpolation radius, as imposed by the algorithm, and we define conv (L enl (x 0 )) to be the convex hull of L enl (x 0 ). Next, we also give the following assumption for the global convergence of algorithm. Assumption 1. Suppose x 0 and Δ max are given. We assume that f is continuously differentiable in an open set containing the set L enl (x 0 ) and that ∇f is Lipschitz continuous on conv (L enl (x 0 )) with constant κ Lf , i.e., for ∀x, y ∈ L nel (x 0 ), we have that We also assume that f is bounded from below on L(x 0 ).
Based on solving the augmented linear programming subproblem (L k ), the following lemma shows that g T

Lemma 1.
For every realization of the algorithm, let d k be the solution of subproblem (L k ); then, Proof. Let Since According to the definition of Γ k , we have that where μ k � − (AA T + D k ) − 1 Ag k . By (21)-(23), we have that e following lemma shows that every iterative point of the algorithm is a feasible solution of problem (P), i.e., x k + α k d k ∈ Ω. □ Lemma 2. Let x k be a sequence generated by the algorithm; then, we have that there exists α k > 0 such that x k + α k d k ∈ Ω.
Proof. According to the definition of D k and By (11) and (27), we have that there exists α k > 0 such that A(x k + α k d k ) ≤ b holds in the k-th iteration, i.e., there exists α k > 0 such that x k + α k d k ∈ Ω.
e following lemma states that the interpolation radius converges to zero regardless of the realization of the model sequence M k made by the algorithm.
Proof. First, we can see that if α k < η 1 holds as k ⟶ ∞, then (28) holds according to the definition of Δ k and Step 6 of the algorithm. If α k ≥ η 1 and |g T k Γ k | (1/2) < η 2 Δ k hold as k ⟶ ∞, equation (28) holds according to the definition of Δ k and Step 5 of the algorithm. Otherwise, according to Step 3 of the algorithm, we have that By (17), we have that Assumption 2 shows that f(x) is a bounded function; hence, f(x k ) − f(x k + α k d k ) ⟶ 0 as k ⟶ ∞. By (30), we know that α k β|g T k Γ k | (1/2) Δ k ⟶ 0 as k ⟶ ∞. If α k ⟶ 0, then there exists a positive index k 0 such that α k < η 1 for all k > k 0 , we have that (28) holds. Otherwise, we have that α k > 0 for all k, if lim k⟶∞ Δ k � 0, then the conclusion holds.
Otherwise, we assume that Δ k > 0 for all k ⟶ ∞, then (30) means that lim k⟶∞ |g T k Γ k | (1/2) � 0 holds; according to the modified method of Δ k in Step 5, we have that Δ k will decrease and tend to zero, which is a contradiction. Hence, we have that (28) holds.
e following lemma shows that in the presence of sufficient model accuracy, a successful step will be achieved, provided that the interpolation radius is sufficiently small relative to the size of the model gradient.
] has full row rank in the compact set L(x 0 ), μ k is bounded, there exists a constant κ μ > 0 such that ‖μ k ‖ ≤ κ μ . By the definition of Γ k and Lemma 2, we have that Let κ Γ � κ g + κ μ κ A , we have that ‖Γ k ‖ ≤ κ Γ . By accepting the rule of the step α k d k , we have that Using the mean value theorem, we have the following equality: with ξ k ∈ (0, 1). Since ∇f(x) is Lipschitz continuously differentiable with constant κ Lf , we have that
Next, we assume that the model used in the algorithm is probabilistically fully linear, and we will state an auxiliary result from the martingale literature that will be useful in our analysis.
□ Theorem 1 (see [14]). Let G k be a submartingale, i.e., a sequence of random variables which, for every k, are inte- denotes the conditional expectation of G k given the past history of events F G k− 1 . We assume further that |G k − G k− 1 | ≤ M < ∞ for every k. We consider the random events C � { lim k⟶∞ G k exists and is finite} and D � limsup k⟶∞ G k � ∞. en, P(C ∪ D) � 1.

Proof.
e theorem is a simple extension of eorem 5.3.1 in [16]. Now, we will give the global convergence of the algorithm. □ Theorem 2. Suppose that the model sequence M k is probabilistically (κ eg , κ ef ) fully linear for some positive constants κ eg and κ ef . Let x k be a sequence of random iterates generated by the algorithm. en, almost surely, Proof. According to the definition of S k in Definition 2, we use the the following random walk in [14]: where 1 It is easily seen that W k is a submartingale. e proof can be found in eorem 2 [14]. By eorem 4.6, we can also see that the event D � limsup k⟶∞ W k � ∞ holds almost surely. Suppose there exist positive constant ε and index K 1 such that For all k ≥ K 1 . Let x k and Δ k be any realization of X k and δ k , respectively. By Lemma 3, there exists K 2 such that for all k ≥ K 2 , From the definition of fully linear models, we have that According to Assumption 2, we have that (AA T + D k ) is a reversible matrix, and by (AA T + D k )μ k � − Ag k and where κ 1 is an upper bound of ‖(AA T + D k ) − 1 A‖. From (43) and (44), we know that |μ k − μ k | ⟶ 0. Hence, we have that Since A D (1/2) k has full row rank in the compact set L(x 0 ), μ k is bounded, there exists a constant κ μ > 0 such that ‖μ k ‖ ≤ κ μ . Similarly, to prove (32), we have that there exists a positive constant κ Γ such that ‖Γ‖ ≤ κ Γ . Combining (28), (43), (44), and (45), we can obtain that as k ⟶ ∞. By (41) and (46), we obtain that Hence, we have that Using Lemma 4 and (28), we obtain that α k ≥ η 1 and By the construction of the algorithm, and the fact that Δ k ≤ (Δ max /c), we have that Δ k+1 � cΔ k . Similar to the proof of eorem 2 in [14], we have that almost surely.

Mathematical Problems in Engineering
Next, we show that almost surely. Before stating and proving the main theorem, we will give the following lemmas.
□ Lemma 5 (see [14]). Let Z k be a sequence of nonnegative uniformly bounded random variables and B k be a sequence of Bernoulli random variables such that Let P be the set of natural numbers k such that B k � 1 and N � N\P, where P and N are the random sequences. en, Proof. e proof is similar to Lemma 1 in [14].
□ Lemma 6 (see [14]). Let x k and Δ k be sequence of random iterates and random trust region radii generated by the algorithm. We fix ε > 0 and define the sequence K i consisting of the natural numbers k for which |∇f(x k ) T Γ k | > ε, where K i is a sequence of random variables. en, almost surely.
Proof. e proof is similar to Lemma 2 in [14]. Next, we will prove that lim k⟶∞ |∇f(x k ) T Γ k | � 0 holds. □ Theorem 3. Suppose Assumptions 1 and 2 hold, we suppose that the model sequence M k is probabilistically (κ eg , κ ef ) fully linear for some positive constants κ eg and κ ef . Let X k be a sequence of random iterates generated by the algorithm; then, Proof. We suppose that lim k⟶∞ |∇f(x k ) T Γ k | � 0 does not hold almost surely.
en, there exists ε > 0 such that P(|∇f(x k ) T Γ k | > 2ε) > ε holds for infinitely many k. Let K i be a subsequence of the iterations for which |∇f(x k ) T Γ k | > 2ε. According to eorem 3, we have that there exists a pair of integers where x k and Δ k are the realizations of X k and δ k , respectively. en, for any l, we have that By (65), we obtain that i.e., the algorithm is superlinear convergence.

Numerical Examples
In this section, we present the numerical experiments of our algorithm. In order to show the effectiveness of our algorithm, we choose the same test problem and compare the calculation results with algorithms COBYLA [17] and SDPEN [18]. First, the parameters used in the algorithm are as follows: Δ 0 � 5 and Δ max � 10.

(68)
We test the algorithm for "HS" problems which come from [19]; the test problems are reported in Table 1. e program code is written in MATLAB and run in MATLAB 7.0 environment. In Table 1, n denotes the dimension of the problem, and Ineq. denotes the number of inequality constraints. In this section, we choose 46 test problems and compare the number of function evaluations used in our method with COBYLA [17] and SDPEN [18]. We report our results using performance profiles. Such profiles compare the number of function evaluations needed by each solver to achieve the desired accuracy in the objective function value. We use three different levels of accuracy: 2, 3, and 5 significant figures in f(x * ).
e interpolation sample set is important for the algorithm. In this paper, we generate the sample set by randomly selecting the sample points, i.e., we suppose that x k is a current iterate point; we choose n random points in a ball of radius Δ k around x k . e derivative-free algorithm uses the trust region method which constructs quadratic interpolation function using (n + 1) (n + 2)/2 sample point. According to Section 2.1, we know that the dimension of the interpolation basis matrix M(Φ, Y) is (n + 1) (n + 2)/2. For taking the quadratic interpolation function, we need to compute the inverse matrix of M(Φ, Y); hence, the computation cost is very expensive, especially when the dimension of the problem is very large. Now, we use linear interpolation function, and the dimension of the interpolation basis matrix M(Φ, Y) is only (n + 1); the computation cost is also far less than the construction of the quadratic interpolation function. e comparison results of CPU time are recorded, as shown in Figure 1; from Figure 1, we can see that the CPU time of building quadratic interpolation function is far more than linear interpolation function.
Using the performance profile of Dolan and Moré [20], we can demonstrate the overall behavior of the present algorithm and get more insights about their performance. Let S be the set of all algorithms and P be the set of test problems, with n s � |S| and n p � |P|. For each problem p and solver s, let τ p,s denote the quantities we want to compare for problem p and solver s. e performance ratio is defined as In order to compare the performance of solver s on problem p with the best performance by any solver S on P. For any factor t ≥ 1, the overall performance of algorithm s is given by e values on the left side of the figures represent the efficiency of each solver, and the values on the right side give information about the robustness of the solvers. is means that the best solver is the highest on the figures. Figure 2 shows the comparison results of using our algorithm, COBYLA, and SDPEN to solve the test problems in Table 1 under termination accuracy level 2 (ε � 10 − 2 ), respectively. It is obvious from the figure that the speed of the red curve (representing our algorithm) tending to 1 is faster than the other two curves, and when t � 1, the value of the red curve is obviously higher than the other two curves.
is shows that when the termination level is 2, the number of iterations spent by our algorithm is less than the other two algorithms when we solve the test problems in Table 1, and our algorithm has high computational efficiency under the condition of the low termination level.   Table 1 when the termination level is 3 (ε � 10 − 3 ). It can be clearly seen from Figure 2 that when the termination level is increased to 3, the performance rate of the three algorithms is lower than that of accuracy level 2. However, when t � 1, the value of the red curve is still significantly higher than the other two curves, and the speed of red curve tending to 1 is significantly faster than the other two curves. is shows that after the accuracy level is improved, the number of iterations spent by our algorithm in solving the test problem is still less than that of the other two algorithms, and its computational efficiency is still high. Figure 4 records the calculation results of three algorithms when the termination level is 5 (ε � 10 − 5 ). It can be seen from Figure 3 that although the computational efficiency of the three algorithms has decreased significantly at a high accuracy level, the red curve still shows a high level. Whether from the value when t � 1 or the speed tending to 1, the red curve shows good characteristics, that is, under the condition of high accuracy level, the number of iterations spent by our algorithm in solving the test problem in Table 1 is still less than that of the other two algorithms. Figure 1 reports the results of the CPU time of constructing interpolation functions in 100 time iteration using (n + 1) and (n + 1) (n + 2)/2 interpolation points, respectively. From the result, we can see that the interpolation function can be established faster by using our algorithm. Lemma 1 shows that the linear interpolation model has the same drop as the two interpolation trust region model. Hence, our algorithm is more effective than the algorithm using two interpolation function.
In order to test the effectiveness of our algorithm in solving problems with large dimensions, we selected 6 problems in Table 2, calculated them with our algorithm, COBYLA, and SDPEN under different dimensions, and recorded the calculation results of level 5.
When the accuracy level is 5 (the termination accuracy is ε � 10 − 5 ), we use our algorithm to calculate the test problems in Table 2. At the same time, we use COBYLA and SDPEN to calculate the test problems in Table 2, compare the number of iterations spent by the three algorithms in calculating the test problems, and record the comparison results in Figure 5.
From Figure 5, we can see that the red curve representing our algorithm tending to 1 is the fastest, that is, the red curve tends to 1 faster than the curves representing the other two algorithms. Moreover, we can also see that when t � 1, the computational efficiency of our algorithm has reached 0.38, while COBYLA rectification has reached 0.25 and SDPEN has only reached 0.11, which shows that our algorithm not only spends less the number of iterations than COBYLA and  SDPEN in solving low-dimensional problems but also shows high computational efficiency. Moreover, the number of iterations is still lower than that of COBYLA and SDPEN. In short, our algorithm has high computational efficiency when calculating the actual test problems.

Conclusion
is paper has described the derivative-free linear programming methods based on probabilistic models for linear inequality constrained optimization. e algorithm is based on methods built by linear polynomial interpolations, which use only n + 1 interpolation point and far less than (n + 1) (n + 2)/2 interpolation point in quadratic polynomial interpolations and reduce the computation cost of building interpolation function. e affine scaling linear methods which use probabilistic or random models are built to obtain the descent direction. In this paper, we use the backtracking line search technique to ensure that the step length along the direction of descent is in the feasible region, and the objective function is reduced. Under some reasonable