Distributed Optimization Methods for Nonconvex Problems with Inequality Constraints over Time-Varying Networks

. Network-structured optimization problems are found widely in engineering applications. In this paper, we investigate a nonconvex distributed optimization problem with inequality constraints associated with a time-varying multiagent network, in which each agent is allowed to locally access its own cost function and collaboratively minimize a sum of nonconvex cost functions for all the agents in the network. Based on successive convex approximation techniques, we first approximate locally the nonconvex problem by a sequence of strongly convex constrained subproblems. In order to realize distributed computation, we then exploit the exact penalty function method to transform the sequence of convex constrained subproblems into unconstrained ones. Finally, a fully distributed method is designed to solve the unconstrained subproblems. The convergence of the proposed algorithm is rigorously established, which shows that the algorithm can converge asymptotically to a stationary solution of the problem under consideration. Several simulation results are illustrated to show the performance of the proposed method.


Introduction
Network-structured problems have drawn recently considerable attention in various applications, such as mobile ad hoc networks, wireless sensor networks, and Internet networks [1][2][3].The absence of centralized access to information and time-varying network connectivity are common features of network-structured problems.For this reason, distributed optimization methods associated with multiagent networks should be designed on the base of local communication and computation and the change of network topology.Distributed computing allows each agent to only utilize its own cost function and communicate with its direct neighbors, which has the potential advantage on protecting agents' privacy.In recent years, there is an increasing trend to develop distributed optimization by integrating the communication and computation cooperation (see, e.g., [4,5] and references therein).
The methods designed to solve distributed convex problems have been widely studied in the literature.Based on the consensus averaging mechanism, there are several useful approaches, including primal consensus distributed methods, dual consensus distributed methods, and primal-dual consensus distributed methods.Nedic and Ozdaglar in [5] originally developed a distributed subgradient-based algorithm, in which every agent optimizes its own objective and locally exchanges information with neighboring agents in a network.The convergence rate of their proposed algorithm has been obtained.But, only unconstrained distributed problems were investigated.In [6], Nedic et al. generalized the distributed method in [5] to solve constrained convex optimization.Later, many researchers proposed various extensions based on primal (sub)gradient-based methods in, for example, [7][8][9][10][11][12].In [13], Duchi et al. extended the centralized dual averaging algorithm to the distributed setting and then proposed a distributed dual averaging algorithm.The convergence result of their algorithm shows that the number of iterations is dependent on the underlying network sizes and spectral gaps.The primal-dual consensus distributed method was designed for solving distributed convex problems 2 Complexity with global inequality constraints under the framework of Lagrangian duality.By finding the corresponding saddlepoints of the Lagrangian function, Zhu and Martinez [14] first proposed a primal-dual consensus distributed algorithm and established the convergence results.The authors in [15] obtained the explicit convergence rate of the algorithm under the strong connectivity of networks.
However, the methods mentioned above are not suitable for the general nonconvex distributed optimization problems.Until recently, there are some references designing algorithms for distributed nonconvex problems, such as [16,17].In [16], Lorenzo and Scutari developed a novel distributed algorithm for solving unconstrained nonconvex optimization problems which associated a multiagent network with timevarying connectivity.They combined a successive convex approximation technique and a dynamic consensus mechanism to realize distributed computation as well as local information exchange in the network.For nonconvex optimization problems with constraints, Scutari et al. [17] proposed a successive convex approximation method by solving a sequence of strongly convex subproblems while maintaining feasibility.They have shown that their proposed method can converge to a stationary solution of the original constrained nonconvex problem under consideration.However, the method proposed in [17] is not suitable in the distributed setting.Additionally, the Lagrange dual method is utilized to solve the sequence of strongly convex subproblems, which may largely enlarge the dimension of the problem due to introducing the dual variables.Thus, the computational difficulty and cost may be increased.
In this paper, we investigate a distributed nonconvex problem with inequality constraints.The main contributions of this paper are twofold: (i) based on the penalty function method, a distributed algorithm for solving the nonconvex problem with global inequality constraints is proposed; (ii) the convergence of the proposed algorithm is rigorously proved.More specifically, we first transform the nonconvex problem to a sequence of strongly convex subproblems based on successive convex approximation techniques.In order to realize distributed computation, we then exploit the exact penalty function method to transform the sequence of strongly convex constrained subproblems into unconstrained ones.Finally, we propose a fully distributed method to solve these unconstrained subproblems.We obtain the convergence results of the proposed algorithm and demonstrate several numerical simulations.
The work in this paper is closely related to the previous works [16][17][18][19].Our method is based on the algorithm proposed in [16], but the problem we consider in this paper is different from the one in [16], since we investigate a nonconvex problem with global inequality constraints.The proposed algorithm is different from [17], since our algorithm exploits the exact penalty function method to solve the related subproblems possibly reducing the dimensionality of the problem.However, the algorithm in [17] utilized the Lagrange dual method to solve the subproblems, in which the computation is not implemented in a distributed manner.Our algorithm is an extension of the algorithm in [18] to handle with the nonconvex case.By comparison, we solve a constrained distributed nonconvex optimization in this paper, while the authors in [19] solved the unconstrained one.
The remainder of this paper is organized as follows.Section 2 provides the problem statement and related preparations.Section 3 gives the algorithm development.Section 4 proposes the distributed algorithm and establishes the results of convergence.Numerical simulations are given in Section 5. Finally, the conclusions are obtained in Section 6.

Problem Statement and Preparations
In the section, we state the optimization problem under consideration and give the assumptions and definitions that will be used in the sequel.
where   (x) : R  → R is a nonconvex smooth cost function, only known by agent  for  = 1, . . ., ; g(x) fl ( 1 (x), . . .,   (x))  ,   (x) : R  → R is a convex smooth function for  = 1, . . ., ; and K ⊆ R  is a closed, convex set.The constraint g(x) and the set K are known by all the agents.Let be the feasible set of problem (1).We assume that Slater's condition [20] is satisfied for problem (1).Problem ( 1) is ubiquitous that arises in many applications, such as networking, wireless communications, and machine learning.Therefore, it is meaningful to solve the problem.Assumption 1 (see [5]).(A1) The graph sequence {G[]} is Bstrongly connected; that is, there is an integer  > 0 such that the graph
(3) Assumption 1 shows that, at any time , any agent  will receive the information from agent  within the next  time slots.Moreover, the weigh matrix [] is double-stochastic.
(B3) ∇ is bounded on K; that is, there is a finite number   > 0 such that ‖∇(x)‖ ≤   , for all x ∈ K.
(B4) For  ∈ {1, . . ., }, each function ∇  is bounded on K; that is, there is a finite number The above assumptions are quite general, which can be satisfied by a large class of problems in practical applications.Assumption (B5) can make sure that problem (1) has a solution.
The goal in this paper is to design a method that can find a stationary solution of problem (1).Moreover, the method is implemented in the distributed scenario satisfying Assumptions 1 and 2.
Next we introduce several definitions, which will be used in the convergence analysis of our method.Definition 3 (regularity [21]).A point x ∈  is called regular for problem (1) if the Mangasarian-Fromovitz Constraint Qualification (MFCQ) holds at x, that is, if the following implication is satisfied: where is the normal cone to K at x and  = { ∈ {1, 2, . . ., } :   (x) = 0} is the index set of convex constraints that are active at x.
Definition 4 (stationary point [17]).A point x ∈  is a stationary point of problem (1), if it satisfies the following KKT system: where   ∈  are Lagrange multipliers chosen suitably.
As pointed out by [17], a regular (local) minimum point of problem ( 1) is also a stationary point.It is well known that the traditional goal for solving nonconvex problems is actually to find stationary points.To simplify the discussion, we assume that all feasible points of problem (1) are regular throughout the rest of this paper.

Development of Algorithm
Designing an effective distributed method for problem (1), we had to face three main challenges: (i) the nonconvexity of the objective function ; (ii) the unavailability of global knowledge on ; and (iii) the presence of inequality constraints g.In order to deal with the difficulties, we utilize successive convex approximation (SCA) techniques, exact penalty function methods, and dynamic consensus mechanisms to develop our algorithm.

Local SCA Approximation.
In a distributed setting, the computational cost for directly solving problem (1) is considerably high and even infeasible.We would prefer to suitably approximate problem (1), in the sense of local convex approximation.
By copying the global variable x, each agent  maintains a local estimate x  that needs to be updated at each iteration.We rewrite (x  ) =   (x  )+∑  ̸ =   (x  ) and consider a convexification of  as follows: at each iteration, we use a strongly convex function f (⋅; x  []) to replace the nonconvex function   and linearize the ∑  ̸ =   (x  ) at x  []; that is, where f (⋅; x  []) : K → R is a strongly convex surrogate of the nonconvex   and   (x  []) is the gradient for the term At each iteration , a strongly convex problem is solved by agent Note that x (x  []) in ( 9) is well defined, since subproblem (9) has a unique solution.
We give the following assumptions on the approximation of   .Assumption 5.Each f satisfies the following: (C1) f (⋅; x) is uniformly strongly convex on K ×  with a strongly convex parameter   > 0.

Complexity
Assumption 5 is quite natural.The function f is viewed as a strongly convex, local approximation of   around x that inherits the first-order properties of   .Assumption 5(C2) is the requirement of Lipschitzianity that is surely satisfied, for example, if the set K is bounded.For a given   , several feasible choices are provided in [16,22,23].

Exact Penalty Function
Method.Subproblem ( 9) is not easy to solve due to the presence of the global constraints g and the local accessibility of   ; thus, the exact penalty function method is utilized to transform subproblem (9) into an unconstrained problem, given by where (x  ) = max{0,  1 (x  ), . . .,   (x  )},   > 0 is a penalty parameter.We can obtain that (x  ) is convex on K, and ‖  (x  )‖ ≤   for all   (x  ) ∈ (x  ), where   (x  ) stands for the subgradient of  at x  .Under suitable conditions [24], the solution set of the penalized problem (10) coincides with the solution set of the constrained problem (9).In order to explain the fact in detail, we introduce the Lagrangian function of problem ( 9): where   = ( 1 ,  2 , . . .,   )  ∈ R  + is the vector of dual variable corresponding to the constraints g(x  ) ≤ 0. The dual problem of problem ( 9) is max It can be proved that no duality gap exists between subproblem (9) and its dual problem (12) if Slater's condition is satisfied (see Proposition 5.3.1 in [20]).In addition, the set of dual optimal solutions is nonempty bounded.Thus, based on Proposition 1 in [24], there is a penalty parameter satisfied   > ∑  =1  , such that the solutions of penalized problem (10) are the same as those of subproblem (9).Thus, throughout the rest of this paper, we can always select a finite penalty parameter   such that   ≥   .

Consensus Update.
We now introduce a consensus mechanism to ensure that each local estimate x  gradually coincides among all agents.A consensus-based step is used on x (x  []) and each agent  updates its state as follows: where   [] is the weight satisfying Assumption 1.
Since the evaluation of   (x  []) in ( 8) requires the quantity of all ∇  (x  []),  ̸ = , which is not feasible at agent .In order to deal with the issue, we need a local estimation of   (x  []) in (8), eventually converging to ∑  ̸ = ∇  (x  []).We rewrite   (x  [𝑘]) in ( 8) as follows: and let ∇( where y  [] is a locally auxiliary variable updated by agent , asymptotically tracking ∇(x  []).By using the dynamic averaging consensus strategy [25], we can update y  [] in (15) via the following formula: where Note that the update of y  [] in ( 16), and thus π (x  [])) in (15), can be now performed locally with message exchanges with the agents in the neighborhood N   [𝑘].By the above description, problem (10) can be converted into the following problem: where

Algorithm and Convergence Results
Based on the previous algorithm development, we propose an exact penalty function based distributed algorithm (EPDA, for short) to solve problem (1), presented in Algorithm 1.
In order to obtain the convergence results of the proposed algorithm, we first give Lemma 6, which shows the relationship between the solutions of problem (1) and the solutions of subproblem (9).Note that the following conditions ( 18) and ( 19) can be satisfied; please see the proof of relation (A.32).Lemma 6. Suppose that Assumptions 1, 2, and 5 hold.Let {x  []} be the sequence generated by Algorithm 1; the following results hold: then at least one regular limit point of {x  []} is a stationary solution of problem (1).
then every regular limit point of {x  []} is a stationary solution of problem (1).
We are now in the position to give the convergence properties of Algorithm 1.

Proof. See Appendix (3).
Remark 8. (1) On the choice of surrogate functions, we only present several instances to show how to choose the surrogate function f ; also see [16,19,22,23].
(i) If any convex structure of   is unavailable, the linearization of   at x  [] is the most plain choice; that is, (iii) Consider the case where   can be decomposed as   (x  ) =  (1)   (x  ) +  (2)   (x  ), where  (1)    is convex and  (2)    is nonconvex.We can only linearize  (2)    and preserve the convex  (1)   as follows: (2) On the choice of step-sizes, condition (20) in Theorem 7 requires that the step-size sequence reduces to zero, but not too fast.The choice of the step-size [] meeting (20) is quite flexible [1,23].The following two choices of step-size are very effective in our simulations: (3) On the choice of weigh matrices, Assumption 1 requires that each communication weigh matrix [] is doubly stochastic.References [4,5] provided several choices of weigh matrices, such as the maximum degree weigh matrix, the Metropolis-Hastings weigh matrix, and the leastmean square consensus weight matrix.

Numerical Simulations
We consider a distributed nonconvex quadratic problem with quadratic inequality constraints (also see Example C in [22]): where where the penalty parameter  > 0 is chosen suitably.In general, there are two ways to choose  such that it satisfies the requirement.One way is by solving the unconstrained optimization problem defined in (12), and the other one is by heuristic.In order to imitate the time-varying weight matrix, a pool of 50 weight matrices from connected random graphs are generated, in which each weight matrix is satisfied (Assumption 1).The time-varying weight matrices required in Steps 6(a) and 6(b) of Algorithm 1 are randomly drawn from the above pool.To compare, we use the distributed Lagrangian primal-dual subgradient (DLPDS, for short) algorithm proposed in [14] to solve the corresponding subproblem (9).For simplicity, we assume that each A  is a diagonal matrix with elements generated randomly in (0, 1], a  is a vector with elements generated randomly in (0, 1],   is generated randomly in [1,2], B is a diagonal matrix with elements generated randomly in (0, 1],  is generated randomly in (0, 10], and the initial points x  [0] are generated randomly in K, where the box constraint is set as K = [−2, 2]  .In the numerical experiments, we heuristically select the penalty parameter  = 10.
Some experimental results are presented to illustrate the convergence behavior of the proposed Algorithm EPDA.All the curves are averaged over 20 independent realizations.Some comparisons with existing Algorithm DLPDS are also given.
Figures 1(a) and 2(a) depict the value of max error max  {‖x  [] − x []‖/‖x  []‖} versus the number of iterations with different nodes and dimensions.It can be observed that both algorithms have the potential to converge to the same stationary solution.However, our Algorithm EPDA is much faster than Algorithm DLPDS [14].As can be seen from Figure 1(a), Algorithm EPDA can reach higher precision than Algorithm DLPDS after 400 iterations.Figure 2 1) by (i) of Lemma 6.Thus, statement (i) of the theorem is proved, and statement (ii) readily follows from (A.8).

2. 1 .
Problem Statement.Consider the following distributed nonconvex optimization problem: min x∈K  (x) fl  ∑ =1   (x) Definition.We first give the description of the network topology.Time is assumed to be discrete.At each time slot , the network is modeled as a directed graph G[] = (V, E[]), where V = {1, . . ., } is the set of nodes with  agents, and E[] represents the set of timevarying directed edges.The neighborhood of agent  at time  is defined as N   [] = { | (, ) ∈ E[]} ∪ {}.The communication pattern between neighbors is set as follows: agent  ̸ =  in N   [] can communicate with node  at time .We assign time-varying weights (  []) to match the digraph G[] and define the weight matrix [] = (  [])  ,=1 .
(a) shows the similar results when  = 100 and  = 10.Figures 1(b) and 2(b) depict the value of function (x[]) versus the number of iterations with different nodes and dimensions.For two tested cases, the value of the objective function gradually decreases when the number of iterations increases, but the value of the objective function for Algorithm EPDA reduces faster than that of Algorithm DLPDS.
Function value versus number of iterations

Figure 1 :
Figure 1: Numerical results for nodes of the network  = 50 and dimensions of the problem  = 5.

Figure 2 :
Figure 2: Numerical results for nodes of the network  = 100 and dimensions of problem  = 10.