Distributed Constrained Stochastic Subgradient Algorithms Based on Random Projection and Asynchronous Broadcast over Networks

We consider a distributed constrained optimization problem over a time-varying network, where each agent only knows its own cost functions and its constraint set. However, the local constraint set may not be known in advance or consists of huge number of components in some applications. To deal with such cases, we propose a distributed stochastic subgradient algorithm over timevarying networks, where the estimate of each agent projects onto its constraint set by using random projection technique and the implement of information exchange between agents by employing asynchronous broadcast communication protocol.We show that our proposed algorithm is convergent with probability 1 by choosing suitable learning rate. For constant learning rate, we obtain an error bound, which is defined as the expected distance between the estimates of agent and the optimal solution. We also establish an asymptotic upper bound between the global objective function value at the average of the estimates and the optimal value.


Introduction
Distributed optimization problems have received considerable interest from industry and academia since it arises in many applications, including distributed parameter estimation, detection and source localization [1][2][3][4], distributed learning and regression [5][6][7], resource allocation [8,9], and distributed power control [10].The goal of such problems is to minimize a global objective function which is a sum of local cost functions over a network.To achieve this goal, we need to design a distributed optimization algorithm over time-varying networks, where each local cost function and constraint set are private information.Moreover, each agent can exchange information with its neighbors across the timevarying networks.Hence, many distributed optimization algorithms are proposed to address the distributed optimization problems [11][12][13][14][15].
However, each agent may not know the constraint set X  beforehand in some applications.Thus, the estimate of each agent  cannot be projected onto the constraint set X  by the determinate projection operation, and the determinate projection-based distributed optimization algorithm [16][17][18] cannot be directly application in such optimization problems.To deal with this case, a random projection-based distributed optimization algorithm is studied in [19,20].Moreover, the determinate projection-based distributed optimization algorithm can be regarded as a special case of the random projection-based distributed optimization algorithm.Therefore, we consider the distributed random projection algorithm in this paper.Furthermore, the local constraint set X  is assumed to have the form, X  = ⋂ ∈Θ  X   , where Θ  is the index set and X   is a simple set.In addition, each agent needs to exchange information with its neighbors over time-varying networks.Hence, the design of communication protocol is a crucial role in the design of distributed optimization algorithm.In practice, gossip communication protocol [21] and broadcast communication protocol [22] are two frequently used communication protocols.An asynchronous distributed random projection algorithm based on gossip communication protocol has been proposed in [23].However, the broadcast is a natural The remainder of this paper is organized as follows.We describe the optimization problem of our interest, present the algorithm, and give some assumptions in Section 2. We state the main results of the paper in Section 3. In Section 4, the convergence rate of the algorithm and their proofs are provided.The analysis of error bounds is presented in Section 5.The conclusion of the paper is given in Section 6.
Notation.In this paper, all vectors are column vectors.We use boldface to denote the vectors in R  and use normal font to denote scalars or vectors of different dimensions.x T and  T denote the transpose operation of a vector x and a matrix , respectively.We use ‖x‖ to denote the standard Euclidean norm of a vector x.The notations 1 and  denote a vector whose all entries are 1 and the identity matrix of size  × , respectively.E[] denotes the expectation of a random variable .

Algorithm Description and Assumptions
We consider a network which consists of  agents (or nodes), indexed by 1, . . ., .At each time , the network topology is denoted by an undirected graph G() = (V, E()), where V = {1, . . ., } denotes the set of agents and E ⊂ V × V denotes the set of edges.We assume that the undirected graph is simple.If there exists a directed edge from agent  to  at time , then (, ) ∈ E().In a connected network, two agents are said to be neighbor of if the agents may be connected directly by an edge; that is, the agents can share information with each other.At time , we denote the set of neighbors of agent  by N  ; that is, N  = { ∈ V | (, ) ∈ E()}.We also assume that communication links may be interrupted at random times.
We consider an optimization problem as follows: where   : R   → R denotes the convex objective function of agent  and X  denotes constraint set of agent .
To solve problem (1), we propose an asynchronous distributed subgradient random projection algorithm based on randomized broadcast communication protocol.In this paper, we employ the asynchronous time model as in [21] and the randomized broadcast model as in [22].

Algorithm Description.
In asynchronous model, each agent has a virtual clock, and the virtual clock ticks is a Poisson process with rate .We assume that one agent is waked up at a time.Thus, if an agent is waked up at time , we use   to denote the index of the agent.Since the link can be randomly interrupted, we also use   to denote the subset of the neighbors of the agent   .Hence, the agent  ∈   receives the broadcast information with probability   ; that is, if (, ) ∈ E(), then   > 0. Each agent  ∈   hears the broadcast information from agent   at time .Hence, if  ∉   , then where x  () denotes the estimate of agent  at time .If  ∈   , then the estimate of agent  is updated as follows: where  ∈ (0, .In order to analyze conveniently, we first define the following matrix: where  is the identity matrix of size  ×  and   ∈ R  denotes a vector with th entry being 1 and the other entries being 0. From Lemma where  {∈  } = 1 if { ∈   } occurs, or else  {∈  } = 0.

Assumptions.
In order to analyze the convergence properties of algorithm (5), we need to give some standard assumptions as follows.
Assumption 2. We assume that the constraint sets X   are nonempty closed and convex for all  ∈ V. We also assume that the cost function   : R  → R is convex.Moreover, we assume that the subgradient ∇  (x) of   (x) is uniformly bounded over X for every  ∈ V; namely, ‖∇  (x)‖ ≤  max for all x ∈ X. Assumption 3.For any random variable   () ∈ Θ  ,  ∈ V, and for all x ∈ R  , we assume that the following relation hold: where   is a positive constant and (x, X) denotes the distance between a point x and a set X.
In Assumption 3, random variable   () obeys a positive probability distribution.Moreover, if the set X has a nonempty interior, then Assumption 3 can be seen to hold [27].
Let F  denote all the information generated by the entire history of algorithm (5).Next, we give the assumption for stochastic subgradient error   () as follows.
Assumption 4. For each agent  at time , we assume that the error   () satisfies with probability 1, where ] is a positive constant.
In this section, we propose an asynchronous distributed subgradient projection algorithm based on random projection operation and randomized broadcast communication protocol.Moreover, we also provide some standard assumptions to analyze the convergence properties of the algorithm.We will present main results of this paper in Section 3.

Main Results
In this section, we provide the convergence properties of algorithm (5).The detailed proofs of main results are given in next section.
We first define the optimal value and optimal solutions set of problem (1) as follows: The first result states that our proposed algorithm is convergent with probability 1. Theorem 5.Under Assumptions 1-4, let the set of optimal point X * be nonempty.Let estimate sequence {x  ()},  = 1, . . ., , be generated by algorithm (5) with positive stepsize   () = 1/Φ  (), where Φ  () is the update number of agent  until time .Then, the estimates of all agents converge to some optimal points x * ∈ X * with probability 1.
Theorem 5 shows that the iterations of all agents asymptotic converge to some optimal points over time-varying networks; that is, for all  ∈ V, lim →∞ x  () = x * with probability 1.
We also establish asymptotic error bound between some optimal points in the optimal set and the estimates of algorithm (5).Theorem 6.Under Assumptions 1-4, let estimate sequence {x  ()},  = 1, . . ., , be generated by algorithm (5) and let   () =   for  ≥ 1.Moreover, assume that each function   (x) is   -strongly convex, where the constant   satisfies 0 < 2    < 1.Furthermore, let the set X be compact.Then, one has with probability 1 (a) (b) where Theorem 6 establishes asymptotic error bound, which is defined as the average of expected distances between some optimal points x * ∈ X * and the estimates of algorithm (5).The asymptotic error is defined as the difference between the global cost function at x () and the global cost function at optimal point x * .
In this section, we provide the main results of this paper.The detailed proofs of main results are given in Sections 4 and 5.

Analysis of Convergence Results
In this section, we provide the proof of Theorem 5.For this purpose, we first establish a basic iterate relation for the estimates of algorithm (5).
Proof.Let u ∈ X.Following from the nonexpansive property [16], we have Further, from the definition of inner product and Cauchy-Schwarz inequality, we obtain where the second inequality follows from the inequality 2 ≤  2 +  2 .Hence, following from relations ( 12) and ( 13), we obtain In addition, we also have where the last inequality follows from the inequality 2|||| ≤  2 + 2 / for some  > 0 by letting  =  max   (),  = ‖z  ()− y  ()‖, and  = 4 ( > 0).Hence, combining relations ( 14) and ( 15), we obtain Next, we consider the term 2  ()(  (y  ()) −   (u)) in ( 16), which can be rewritten as follows: Thus, according to Lemma 3 of [26], there exists a sufficient large positive constant t, for any  ≥ t, we have where we use the Cauchy-Schwarz inequality.Then, from ( 16) and ( 18), we obtain Note that ‖z  () − y  ()‖ = (z  (), X); we have Hence, taking conditional expectation in both sides of (19) and using the fact that the mean of stochastic errors   () is zero, we obtain that with probability 1 for all  ≥ t and  ∈ By Assumption 3, we have where  = max    .Let  = .For all  ≥ t, we obtain with probability 1 Note that x  () = z  () if  ∉   .Besides, if  ∈   , then the agent  updates its estimate with probability   and does not update its estimate with probability 1 −   .Hence, following from (23), the desired result is obtained completely.
Further, for all  ∈   , following from the definition of error e  (), we have where we use the nonexpansive property of the projection.Furthermore, following from the inequality ( + ) 2 ≤ 2( 2 +  2 ), we obtain where the last inequality follows from Lemma 3 in [26].
Taking the conditional expectation on both sides of (29), we have Following from the fact that < ∞ for all  = 1, . . ., .Further, we also have lim →∞ ‖e  ()‖ = 0 with probability 1.Therefore, the conclusions of this lemma are proved completely.
We also establish the following lemma to prove the main results of this paper.
According to the property of stochastic matrix (), we have By using the inequality In addition, we have ‖ ℓ ()‖ 2 ≤ ‖ ℓ ()‖ ) . (37) Furthermore, from the nonexpansive property and  2  () ≤ 4 2 / 2  min for  ≥ t with large enough t, we find Thus, from (38), we have with probability 1 Hence, from inequalities (36) and (39), we obtain Further, we obtain for  ≥ t According to Lemma 8, the first term in (41) is convergent with probability 1.Moreover, since 1 − √  > 0 and then Mathematical Problems in Engineering following from Supermartingale Convergence Theorem [29], we have From the definition of V ℓ (), we obtain for  ∈ From the convexity of norm and the definition of z(), we have Thus, following from ( 43) and ( 44), we can see that for all  ∈ V with probability 1.Therefore, the lemma is proved completely.
In this section, we have given a proof of Theorem 5. From this theorem, we can see that the proposed algorithm is almost sure convergent.In next section, we will analyze the error bounds of the proposed algorithm.

Error Bounds Analysis
In this section, we give the proof of Theorem 6, where we assume that the cost function   is strongly convex with constant   and the stepsize   () is constant for all  ∈ V; that is,   () =   .To prove Theorem 6, we first establish a basic iterate relation for a constant stepsize as follows.
Further, we also establish the following lemma.
Proof.Letting u = y  () =  X [z  ()] in Lemma 10 and then following from the fact that (x  (), X) ≤ ‖x  () − y  ()‖, for  ≥ 1 and all  ∈   , we have Note that the communication mode employs asynchronous broadcast protocol.Hence, we obtain with probability 1 Summing (66) over  from 1 to  and then following from (25), we obtain Therefore, from Lemma 6 in [26], the desired result is obtained.

Conclusion
In this paper, we have considered a constrained distributed optimization problem.Moreover, the global cost function is a sum of local convex function and each agent only knows its own cost function and has access to noisy subgradient of its own objective function.However, each agent does not know constraint set in advance.To solve this problem, we proposed a distributed optimization algorithm over a timevarying network.The algorithm employs random projection method and the communication protocol uses broadcast communication protocol in an asynchronous way.We have showed that the proposed algorithm is convergent with probability 1 by choosing suitable stepsizes.We have also established two asymptotic error bounds with appropriately chosen stepsizes.