DOBRUSHIN ’ S APPROACH TO QUEUEING NETWORK THEORY

R.L. Dobrushin (1929-1995) made substantial contributions to Queueing Network Theory (QNT). A review of results from QNT which arose from his ideas or were connected to him in other ways is given. We also comment on various related open problems.


Introduction and Prehminaries
Dobrushin began active research in Queueing Network theory (QNT) in 1971.At this time he was the Head of a laboratory at the Institute for Problems of Information Transmission (IPIT) of the USSR (now Russian) Academy of Sciences.From a purely mathematical point of view, QNT for him was a natural domain of application of numerous ideas on which he successfully worked during the 60's and 70's (and later), in connection with problems of equilibrium and non- equilibrium Statistical Mechanics and the theory of Markov processes with local interaction.In particular, an important role in shaping his approach to QNT was played by papers [39][40][41][42][43][44][45][46], and later on by [57] and [59].
1This work has been supported in part by the Russian Foundation of Fundamental Research (Grant 96-01-00150), the EC Grant "Training Mobility and Research" under Projects No. 16296  (Contract CHRX-CT 93-0411) and ERBFMRX (Contract CT 96-0075) and the INTAS Grant under Project "Mathematical Methods for Stochastic Discrete Event Systems" (INTAS 93-820).
On the other hand, Dobrushin understood from the very beginning the importance of various applications of the new theory, in particular for designing and exploiting systems of transmission of information and parallel computing.These domains of application just began emerging and there was a flow of questions and demands from engineers and applied mathematicians.In most cases it was about various parameters of a communication of data-processing network such as the expected value or the distribution of the end-to-end delay, queue size, loss probabilities, etc.
It should be said that although the "customers" generating the above questions and demands tried to use general and vague terms, it was clear that these networks were primarily scheduled for military use.This factor was never dominating in the theoretical research conducted by Dobrushin and his associates, but its presence increased with time, especially through the 80's when Dobrushin's laboratory, among others in IPIT, was partially supported by Soviet defense sources.
As Dobrushin was denied, by the authorities, the access to any sensitive material for political rea- sons, the whole organization of the work was rather intricate: the customers contacted exclusively those members of staff who had an official security clearance.However, the work itself was usual- ly done in the form of a free discussion between the members of Dobrushin's group in which every- body could take part, regardless of whether or not they had the security pass.The related topics were regularly discussed at various seminars in IPIT and elsewhere, and the whole subject, as long as the theoretical part was concerned, was treated completely declassified.
However, the reports about the work were passed to the customers under strictly observed rules similar to the above ones.Consequently, the results which took the form of theorems and formulas frequently became a kind of a state secret, without any knowledge on the part of the authors.Sometimes it led to bizarre situations (not uncommon in fields related to a sensitive material in the former Soviet Union) where an author (or authors) wished to present a paper at an international symposium which contained a classified information, of which fact they were formally not aware.According to general regulations, any such paper had to be checked by a special committee created by the Institute.However, the members of such committee usually pre- ferred not to interfere because preventing the publication would mean acknowledging that some- body who had no authorization of being in touch with the sate secrets actually produced one!Dobrushin was not a specialist in any of technical aspects of the network design and exploita- tion (nor did he any serious work in Queueing Theory before), but had a profound intuition and strong imagination which helped him to grasp the gist of many practical questions and develop means of approaching them.In general, he liked to talk about what can be loosely translated as "Mathematical Engineering", meaning a specific branch of Mathematics that addresses problems in the form adapted to demands from applications, in particular, in the communication and data- processing practice.He was obviously inspired by a rapid "mathematization" of Theoretical Phy- sics which he witnessed (and took an active part in).One of the problems that cast a shadow on the whole field of QNT at the time was the apparently tremendous complexity of a queueing net- work caused by the presence of a large number of devices and elements that function in an interac- tive regime.It is worth noting that in that period the majority of mathematicians working in Queueing Theory conducted the research in traditional directions that studied the queue to an iso- lated server (with a single or several service channels).There was a wide skepticism (quite often survived till now) about whether a realistic queueing network model may be approached by rigorous methods.
Much of the analysis was based upon the properties of the famous Lindley equation [102] for the FCFS (first-come-first-served) single-server queue: Wn -t- max[0, w, s, -,].
(1.1) Here, wn, w n + and sn, s n + 1 are the waiting times (WTs) and the service times (STs) of the nth and (n + 1)st customer, respectively, and r n the interarrival time (IAT) between them.Funda- mental properties of equation (1.1) were established in [103].n particular, if (rn, Sn) form a sta- tionary ergodic sequence (which we always assume below) and obey the non-overload condition Es Es<Er, or<l   (1.2)   then there exists a unique "extension" of sequence (7"n, Sn) to a "proper" stationary sequence ('.n, sn;wn) where random variables (RVs) w n are finite and obey (1.1) almost surely.This exten- sion is given by Wn max{ [0' sup j>_o n-j<_k<_n E (sk--rk)}" (1.3)On the other hand, if one reverses the inequality sign in (1.2), there will be no such extension.It can be said that condition (1.2) describes the domain of stability of the queue, and the sequence (r,,sn; wn) (or briefly wn) describes a unique stationary working regime of the server that is fed with input process (Vn, Sn).Pictorially speaking, under condition (1.2) such a regime is attained, as the discrete "time" n grows to ec, regardless of any "initial condition" assigned to w0, the WT of the 0th customer.But if one changes in (1.2) from < to >, the queue inevitably will tend to infinity with time.The end-to-end delay (EED) of a customer is given by the sum T wn + s (the WT plus the ST).
After Kendall [92], a general stationary ergodic process (rn, s,;w, (or sometimes even the "input" process (rn, Sn) is called (and denoted by) G/G/l; the /1 refers here to a single service "channel".If s n is a sequence of independent identically distributed (IID) RVs one uses the notation G/GI/1, and if 7 n and s n are two independent sequences of IID RVs the notation GI/GI/1.Furthermore, if, in the GI/GI/1 case, the rn'S (respectively, s's) have an exponential distribution, one uses the notation M/GI/1 (respectively, GI/M/1), and if both the Vn'S and Sn'S are exponential, the notation M/M/1.In particular, for the M/M/1 case condition (1.2) is fre- quently written as < # where , and # are the rates of the exponential distributions of v and s, respectively.
In the GI/GI/1 case the stationary regime may be described in terms of a stationary Lindley equation w _ max[0, w + s -].
(1.4) Symbol _ means here equality in distribution (that is why (1.4) is called a stochastic equation), as opposed to the point-(or sample-) wise equation (1.1).The RVs s and 7 in the right-hand side of (1.4) have the same distributions as s and n, respectively; these distributions are the known in equation (1.4).The RVs w in the left-and right-hand side have the same distribution which is the unknown in (1.4).Finally, all RVs in the right-hand side are taken to be independent (which corresponds with the GI/GI/1 queue).The results of [103] in terms of equation (1.4) may be translated, in the GI/GI/1 case, into the following statement: under condition (1.2) equation (1.4) has a unique "proper" solution (i.e., there is a unique probability measure on [0, cx) such that (1.4)is satisfied).On the other hand, if in inequality (1.2) one replaces < with > there will be no proper solution.
A similar notation GIG/m, G/GI/m, etc., was developed for the m-channel server.(In this device there are rn single servers working in parallel so that the customer is processed when (i) all previously arrived customers have completed their service or are in service and (ii) there is an idle server available (i.e., not serving a previously arrived customer)).See [93].']:he non-overoad condition here reads Es Es < mEv, or -< m. (1.5)As before, it describes the stability region of the queue: under condition (1.5) there exists a sta- tionary working regime whereas reversing the inequality sign in (1.5) will force the queue to blow up.Under (1.5) the stationary regime is unique in the G/GI/m case [23] but non-unique in a general G/G/m case [103].
The above results may be stated in a continuous-time setting where the customers arrival process (AP) is treated as a marked point process [3, 24, 67, 121].For example, the input of the M/GI queue is described by a Poisson AP with IID marks (service times), and the non-overload condition, as in the discrete-time setting, reads .< liEs.In the continuous-time M/M/1 queue, where the input is described by a Poisson AP of rate , with IID exponential STs of rate #, the queue is completely determined by the random process n(t) giving the size of the queue (i.e., the number of customers in the system including the one in service) at time t.This is a Markov birth-death process, with values 0, 1,... and the jump ij rate ai, j where ai, + 1 ' ai, 1 #' and ai, j-0 otherwise.Under the non-overload condition .< # this process is positive recurrent and its invariant distribution geometric: l\l\rn All practically interesting parameters of the M/M/1 queue (the probability distributions of the WT, the duration of the idle and busy periods, etc.) are "calculable" in terms of this process.For example, the stationary probability that the WT is less than y is given by T(y) 1  exp[(1 )y], y k 0, (1.7) =0, y<0.
Another remarkable fact about the M/M/1 queue is that the exit, or departure process (DP) formed by the times when the customers complete their service and leave the server is again Poisson, of the same rate .This fact is known as Burke's Theorem, see [26,27].
In a general G/G/1 case, a convenient way to describe a continuous-time queue is to use the virtual waiting time process.See, e.g., [32,69].Sometimes it is necessary to consider a "tagged" (or a "reference") customers, who is "put" in (or "extracted" from) the queue, and follow his fate; a convenient way to do so is to use Palm distributions.Having specified a "tagged" customer (usually the one who arrived at a specified time) one considers his WT, EED and other para- meters of interest.The theory of Palm distributions, in the form convenient for the purposes of this article, may be found in [3, 24, 67] and [121].
2. Jackson's Networks" The Origins of QNT The first significant rigorous results in QNT are attributed to Jackson [71,72] who proposed an elegant model of a "j obshop" -like network and discovered a number of its remarkable properties.
Jackson's network has a number of "nodes" 1,...,K with servers S1,...,S K (sometimes we will not distinguish between nodes and servers and label servers by 1,..

.,K).
There is given a family of exogenous independent Poisson APs I"'"K describing the supply of customers (or "tasks") 2 to the network nodes from outside, at rates I,...,K, respectively ('i >-0, 1,...,K).The network functions as follows: when a task arrives in node j (either from outside or from another node of the network), it joins the queue to server S j.We assume that all queues in the network are processed on basis of the single-server FCFS discipline, with respect to the times when the tasks join the queue (unless specified otherwise, such an assumption will hold throughout the paper).All STs are independent of each other, and the ST by server S j is distributed exponentially, of rate #j.After being processed by S j, the task instantaneously K arrives at node k with probability rj, k or leaves the network with probability r-1-rj, k; k=l 2Depending on possible applications, one uses also the terms messages programs "calls" clients etc.In this paper all such terms are in principle exchangeable, but we try to keep wit an underlying "real life" situation.K the values of the routing probabilities rj, k form a sub-stochastic matrix ( rj, k <_ 1, j-1,..., k=l K).See A special version of the model arises when ,j rj 0, 1 <_ j <_ K (i.e., there is no arrival or departure of customers).Such a network is called closed: it has a constant number of tasks circulating along.When max,j > 0 and max rj > 0, the network is called open.
It is not hard to check that the random (vector-valued) variables q(t)-(q(1)(t),...,q(K)(t)) giving the size of the queues at servers Ol,...,S K at time t _> 0 form a Markov process which describes the evolution of the state of the network.For the case of an open network, Jackson found the non-overload condition that guarantees the positive recurrence of process q(t)" pj< #j, I <_j <_K, (2.1) where the vector p--(,Ol,...,pK) is related to --(A1,...,AK) and matrix II-(rj, k) by the balance equation -+n.
(.) Hence p A(I-II)-1 E oe s 0 IIs" Vector p describes the rate at which the tasks will join the queues for servers S1,...,S K in the stationary regime, and condition (2.1) means that all servers are non-overloaded.If the inequality sign in (2.1) is changed from < to > for at least one j, the corresponding process q(J)(t) grows to infinity as tc.In the case of the closed network, process q(t) is always positive recurrent.
The remarkable fact discovered by Jackson is that under condition (2.1) the invariant distribution P of process q(t) is decomposed into the product of geometric distributions" p -1-I N --1 ml,..., rnK --> 0.
In other words, in the stationary regime the queue sizes measured at the same time are indepen- dent of each other and distributed exactly as in the case of the isolated servers with the Poisson APs of the rates pj and IID exponential STs of rates #j, 1 _< j _< K.
By using this fact, it is possible to show that the DPs (i.e., the processes of tasks leaving the * (the total DP has rate network) in the nodes 1,...,K are independent Poisson of rates pjpj jK= 1,j equal to the rate of the total exogenous AP).This is amazing because in general, the process of the tasks joining the given queue is not Poisson, and these processes are not indepen- dent for different servers.These processes are superpositions of exogenous APs and exit processes from the output ports of various servers.The exact distribution of the process forming the queue to a given server in a general Jackson network is not known.
Another open question is related to the stationary distributions of the overall WT W and EED T of a task in a Jackson network.If a task entered the network at node Jl and followed the route Jl, J2,'" Jd before leaving the network then W--w (il)+w(j2)+...+w (jd) T-Wzr-s (jl)-t-s(j2)+ -t-s (jd)   Here w (jk) and s (jk) are the WT and ST of the task at node jk, k 1,...,d.(That is, W (jk) is the duration between the time the task joins the queue for Sx and the time it starts being served.)Such a task may be specified by using the Palm dstnbuton.The main difficulty here is that the w(Jk)'s are measured at the times when the task joins the (generally speaking, dependent) queues along its routs: this creates an intricate correlation between the w(k)'s as well as between the w(Jk)'s and s(Jk)'s.
Formulas similar to (2.1)-( 2.3) hold also for closed Jackson networks.Here, the balance equation (2.2) takes the form p pH.That is, the load vector p is proportional to the invariant distribution of the (stochastic) matrix H. Formula (2.3) is replaced by ( ) Here, 1/Z N is the normalizing factor, with K K >_ 0: j- 11 +...+l K N Dobrushin found it very illuminating that formulas (2.3), (2.4) and (2.5) are in the similar relation as the grand canonical and canonical ensembles in statistical mechanics; see [68].
3. Dobrushin's Prograxnme Dobrushin was perhaps one of the few who immediately realized the importance of Jackson's papers.However, it was clear to him that Jackson's model is a kind of exception where, due to special assumptions one was lucky to find an exact solution.On the one hand, Dobrushin thought that it is important to understand the width of the class of networks possessing properties similar to Jackson's.On the other hand, he considered even more important studying a wider class of networks that display such properties not exactly, but asymptotically.He proposed several deep problems that had strong impact on the works in QNT done at IPIT and around during the last two decades (and continue influencing the research in the field up to present).Namely, assume that the conditions describing the class of Jackson's networks are weakened or modified in some direction.E.g., the APs are not Poisson and/or independent, the STs are not exponential and/or independent, the customers' routes from one node to another are not Markovian, etc.
(QI) Under what conditions do the inequalities similar to (2.1) and based on the correspond- ing balance equations still guarantee the existence (and possible uniqueness) of a stationary regime in a network?(QII) Under what conditions is the stationary distribution described (exactly or approximate- ly) by a product, similarly to formula (2.3)?
(QIII) Under what conditions do the departure processes from a network have the same form as the arrival ones (or can be described in any "traditional" form)?
Of course, stating such questions was absolutely natural after reading Jackson's papers, and surely many who read these papers had them in mind.However, Dobrushin proposed and, with his characteristic enthusiasm, actively propagandized several ideas of approaching these problems.Astonishingly, his ideas, even when they did not lead to an immediate success, turned out to be very useful for other purposes, sometimes related, and sometimes not, to QNT.In this paper we discuss, with various degrees of detail, results obtained in a number of papers that bear his influence or are related to the questions he posed.His own list of the published papers in QNT is not excessively long and consists of [47, 49,50,54,58] and [131].It has to be said that some of the answers actually disprove his previous conjectures: corresponding examples may be found in his own papers as well as papers by other authors (on which we comment in some detail below).However, in many cases the results confirmed his astonishing intuition, and in our view the whole related area develops along with what can be called Dobrushin's approach to QNT.We refer the reader specifically to the reviews [47, 49] and [86] written on the basis of Dobrushin's approach.Dobrushin's activity in QNT was not at all limited to the three above-mentioned directions.In particular, during the last years, he was occupied with the problems of large deviations.IN QNT, he conjectured a specific "bottleneck" phenomenon which we illustrate on the example of the overall WT W. Considers an example of the so-called tandem network pictured in Figure 3.1.
Here, the customers arrive first at server $1; after being processed by S1, they immediately proceed to the input port of $2, etc., and part from the network after service is completed by the server S K.In the previous notation, a Jackson tandem network is specified by setting "1 --"' "2 K 0, 7r j, j + 1 1, 1 _< j < K, and PK 1.
As before, W-2 = lw(J) where w(j)is the WT for server Sj.The bottleneck phenomen- on means that the logarithmic asymptotic, as xoc, of the probability that W > x coincides with that of the probability that w* > x, where w* is the WT for the "slowest" server S i. taken in iso- lation.In other words, the asymptotical behavior of the probability of a large delay in the whole network is determined by the slowest server.(For the Jackson network, where the STs are indep- endent and the ST at node j is exponential of rate #j, the slowest sever is simply the one with the slowest service rate: #i* min[#j: 1 _< j _< K].In [50] such a result is proved for the case where K-2 (two-server tandem network).How- ever, the assumption of the STs is weaker than Jackson's.The input of such a network is describ- ed by a Poisson AP of rate'l"--t with IID marks that are two-dimensional vectors with nonnegative (s.)'s(2))gives the STs of the hUh customer at nodes 1 and 2, respective- components; vector s n n ly.It is assumed that (I) "sand ) are independent and their marginal distributions have ex- ponential moments (but are not necessary exponential)" (J)(0)-Eexp[Os(nj)] < cx, 0 <_ 0 < 0(o j) <_ o, j 1,2 (3.1) (for definiteness we refer in Theorem 3.1 to the maximal values of 0(o j) for which (3.1) still holds).
The non-overload condition reads ,<min [ 1 1 1 (3.2) Es()' Es( As before, to define the WTs (and the EED) of a tagged customer, one can use the Palm dis- tribution where a customer arrives with probability one at a fixed time, say t-0.We assume that the tagged customer has zero STs; this means that after completing service at $1 he immed- iately loins the queue to S 2. Under such a Palm distribution the WTs wO),w (2) and W w 11) + w (2) of the tagged customer become correctly defined RVs; it is this distribution that we refer to in the Theorem 3.1.
Theorem 3.1" [50] For any > 0 lim In P(W > gy) min [/1,/2]" Here /j is the positive solution of the equation 0 I[(J)(0)-1], if such a solution exists, and j-0(o j) if it does not.See Figure 3.2. (1)and s (2) are less restrictive than in Jackson's As noticed, the assumptions about STs s n n networks (the case of tandem Jackson's networks is discussed, e.g., in [119]).For alternative approaches to large deviation problems in QNT, see, e.g., [61,62] and the literature therein.Dobrushin's last published paper in QNT, [131], was another striking example of his approach to the field.Dobrushin's participation was marked, as usual, with an irrepeatable freshness of views and the determination to establish the result in an ultimate "non-improvable" form.
Consider a network with single-servers S1,...,Sg and a common Poisson AP of rate tK, > 0 being a fixed constant.At the time of arrival, a customer chooses randomly a pair of servers (,i, ocj), 1 <_ i, j <_ K, and then joins the queue for the one having the smaller queue size.This rule introduces elements of a local control in the network; see  The STs are supposed to be IID exponential, of rate Iz.As in the case of Jackson's model, the vector-valued process q(t)-(q(1)(t),...,q(K)(t)) giving the queue sizes at time t>_ 0 is positive recurrent Markov, and we denote by E the expectation under its invariant distribution.
-\t] m>_l. (3.4) For comparison, consider the case where the customer simply chooses a server among $1,..., S K at random.It is easy to check that such a network is equivalent to the collection of K independent M/M/1 queues.In this case, denoting again by E the expectation under the invariant distribution of the corresponding Markov process q(t), t _> 0, we have that for any K>I 1EN(>m) (,k) rn rn>l, (3.5) (1.6).We see that even a limited control of the queues in the model under consideration essential- ly "improves" the typical queue size in the network.

The Capacity Region of a Message-Switched Network
In Sections 4-6, we focus on the above question (QI)-(QIII), correspondingly, and on some related results and open problems.In this section we will mainly deal with the so-called message- switched (MS) networks; this class includes Jackson's networks as a particular example.An open (MS) network is described as the one where each task (message or program) is to be subsequently processed by the servers in the nodes of its route; after completing service at the end of the route it leaves the network.The route is understood to be a finite sequence of nodes (or servers), in general, with repetitions.Given a task ith route S (S.(1),...,S/()), one assigns to it a ran- dora vector s of the STs s (j(1)v s (j()) with a joint probability distribution Ps(ds(J(1)) x...x ds(J(d))).Here, d is the length of route S. The exogenous AP of tasks with route S is supposed to be Poisson of rate As; these processes for different routes are supposed to be independent.
However, as noticed in Section 2, the dependence emerges through the fact that different streams of tasks are "mixed" on the input ports of the servers and processed by them, after which they are again separated and mixed in new combinations.Observe that Jackson's model emerges when (A) the rate "s is decomposed into the product /j(1)Trj(1),j(2)" "Trj(r--1),j(r)7r*j(r)' and (B) the joint distribution Ps(ds(J(1))x...xds(J(r))) is decomposed into the product p(Sj(k)) j(k) (Sj) rk 1 (ds )), where P is exponential, of rate #j, 1 _< j _< K.Here vectors , and it and matrix II are as in Section 2. Observe that distribution p(Sj) does not depend on the posi- tion of the node along the route (i.e., each time the task is serviced in S j, its ST is distributed according to p(Sj)).
A natural form of the non-overload condition in a general MS network, based on the balance equations, could be written as E E ES s(J(k)) < 1, 1 <_i<_ K; (4.1) for Jackson's network this coincides with (2.1).Here, E s denote the expectation under Ps" The question of whether the inequalities (4.1) describe a "natural" sufficient condition for the existence of a stationary regime in (or, as Dobrushin used to say, the capacity region of) an MS network turned out to be non-trivial.In one form or another, it gave a strong impetus for the develop- ment of the whole QNT.Dobrushin immediately noted the important progress achieved in papers [9], and especially [89] (see also the book [90]; for the recent exposition of the relevant material, see [135]).The network classes proposed in these papers extends Jackson's networks in the sense that condition (A) is no longer assumed to hold, whereas (B) stays or is slightly modi- fied.In other words, these classes of networks are specified by assumptions about the decom- position of the joint ST distributions Ps(ds(J(1) x... x ds (j(r))) into the product of marginal distri- butions that are exponential or "connected to exponential" and determined by the nodes, but not by the positions of the nodes on the route.We shall use the term Kelly's networks for these net- works.
In particular, it was shown that for Kelly's networks condition (4.1) guarantees the positive re- currence of a Markov process describing the evolution of the (suitably defined) state of the net- work.Furthermore, when one reverses the sign in at least one inequality in (4.1), the Markov process blows up to infinity.Finally, under condition (4.1), the invariant distribution of this process may again be written in a product-form of.(2.3).Following Dobrushin's program, papers [114,116] dealt with several models of MS networks of Kelly's type where the STs are represented as sums of independent exponentially distributed RVs.It was proved that for these models condition (4.1) still describes the network capacity domain.Later on, papers [15,64] and [65] preserved the above Jackson-type condition (a) about the routes but considerably weakened condition (B).They assumed that the joint distribution Ps is still decomposed into the product of marginal distributions p(Sj) but the p(sj)'s are not supposed to be exponential.Yet as before, an important condition was that the marginal p(Sj) depends on the node only and not on the position of this node on the route S. Again, inequalities (4.1) remained sufficient for the existence of the stationary regime.See also [5, 6, 7] and [66].
Dobrushin's conjecture from the very beginning was that (4.1) indeed gives a general sufficient condition for the existence of a stationary regime in an MS network.(He liked to call it the folks or freshmen's conjecture.)Such an impression was apparently confirmed by the above results.However, the paper [117] gave simple examples of MS networks where the service is provided under a discipline with priorities and the queue sizes grow with time to infinity, although (4.1) was fulfilled.The method used in [117] was based on the so-called fluid (or liquid) approximation that was extensively used by Dobrushin (in a slightly different form) in [12, 51,  52, 53] and [55] in connection with statistical mechanics and the theory of Markov processes with local interactions (see books and reviews [47,56,101,122]).
Closed examples were independently considered at the same time in [104] (see also [95] and the references therein).The public, however, was eager to see an example of a priority-free MS network, with a true FCFS discipline.Such examples were constructed by Bramson [17,18].
Due to the importance of Bramson's results, we discuss them here in some detail.The first example [17] is a network of two servers S 1 and S 2 (briefly, 1 and 2).Server 1 is fed with a Poisson AP of rate one; all tasks move along the route 1-+2-+2-+...-+2-+ 1 (4.2) and leave the network afterwards.See  Bramson's

example One
The multiplicity of node 2 in (4.2) equals or; all STs are independent and exponential.However, the mean ST depends on the position of the task on the route.More precisely, denote the task position by (i; j) where i-1,2 stands for the server and j-1,2 for i-1 and j-1,...,d for 2).In other words, j shows which successful time the task is currently visiting node i.
Theorem 4.1: [17] In the network under consideration, with probability one the total number of tasks q(1)(t)+ q(2)(t) in nodes 1 and 2 grows to infinity as t--+oo.
The second example [18] is a network of It' servers (or nodes) 1,...,It" and with two routes (4.4) which are referred to as S 1 and S 2 (or, briefly, 1 and 2 ), respectively.i...i consists of subsequent visits to node i. See Figure 4.2.
The above results stimulated a rapidly growing number of papers where, on the one hand, various conditions are discussed that strengthen the original condition (4.1), and on the other hand, various MS network classes are considered where (4.1) is still sufficient for the existence of a stationary regime.A large part of this activity is concentrated on the fluid approximation method of which Dobrushin was very enthusiastic and the possibilities of which seem far from being exhausted.See [2, 19-22, 29, 30, 31, 33-37, 60, 96-100, 109, 124, 133, 134].
Concluding the discussion of this subject, we mention here results from [15] where, under condition (4.1), the question of equivalence of open and closed networks was discussed (within the class of networks introduced in [15]).
Another direction related to question (QI) where Dobrushin was active is the question of uni- queness and non-uniqueness of the stationary distribution for various network classes.The unique- ness problem had strong connections with Dobrushin's ideas and results in statistical mechanics, in particular, in the theory of phase transitions.See [68].He was also motivated by the reports that some networks manifested specific instability phenomena when the statistical properties of a state may change depending on the initial condition.In his view, the uniqueness problem should have been considered for large or even infinite networks.He was an active propagandist of such approach (see [47, 49]).Furthermore, understanding the difficulty of the problem, he supported the papers adopting the view, even when they were not rigorous (see e.g., [107,108]).In this con- nection we quote the papers [54, 77, 80-87, 116, 126, 127] dealing with infinite networks which were written with his participation or under his influence.Close ideas were used in [10,70] and [105].These papers were mainly addressing the situation where an infinite network possesses a unique stationary distribution.An interesting example of non-uniqueness was discovered in [79].
Here, a Jackson network was considered, with countably many servers S1,$2, As in the case of a finite network, one introduces the (infinite-dimensional) Markov process q(t)= (q(1)(t),q(2)(t),...) describing the queue sizes for the servers.It turns out that the product- formula similar to (2.3) still determines an invariant distribution of process q(t), where p is a solution to (2.2), and the condition (2.1) holds (the vectors , , # and p and the matrix II are now of course infinite-dimensional).However, the solution of balance equation (2.2) may now be not unique; correspondingly, the invariant distribution of form (2.3) is not unique.An interesting open question is whether there are invariant distributions that do not possess property (2.3) (any reversible invariant distribution has form (2.3) as follows from Dobrushin's theorem; see [45]).

Dobrushin's Mean-Field Conjecture
Attempts to find models of networks with the product-form (2.3), or alike, of the invariant distribution, continued through the 60's and 70's; Dobrushin was particularly impressed by [89,90].On the other hand, as said in Section 3, Dobrushin kept a certain skepticism about availa- bility of exact formulas for wide network classes: it was partly due to his general reservations about "solvable models".However, he realized from the beginning that the product-formula and similar representations have deep mathematical consequences, let alone their great practical use.Dobrushin's idea was to consider (2.3) (and its possible extensions) as an asymptotical property that emerges in the course of a specific limit.Speaking of extensions of formula (2.3), we have in mind in this section a) the independence of the processes that generate the queues in the network, and b) the Poissonian form of these processes.Dobrushin's approach may be illustrated on the example of the so-called star-shaped MS network (it was the first rigorous example of the aforementioned limiting procedure; see [57]).In this example the network consists of a large number K of peripheral nodes j, 1 <_ j _< K, and a center O connected with them by the pairs of lines j--+O and O+j.See At the input port of each line there is a single-server that processes (transmits) messages in the corresponding direction.There are K 2 possible routes Sj/:j+O--+k, 1 <_ j, k <_ K; these routes are fed with IID exogenous APs {j/ each of which is Poisson of rate ,/K where , > 0 is a fixed number.So, each message is to be transmitted twice, once along line j--+O and then along O--+k.
It is assumed that the ST (or the length) of a message has the exponential distribution of rate # and is not changed in the course of transmission.In terms of the ST vectors s-(s (1), s (2)) used for a general MS network description (see Section 4), it means that these vectors have equal com- ponents: s (1) s (2) s.Such an assumption contrasts with Jackson's (where the STs vary inde- pendently from server to server) and fits many examples of communication networks where the content of the messages is not to be changed in the course of its transmission.
So, each server j-+O, 1 <_ j <_ K, has to deal with the AP j that is the superposition of the j/'s, 1.<_/_<K.Each server Ok, l<_k<_K, has to deal with a process rk that is a superposition of the portions of the exiting streams from the servers j-+O, 1 <_ j <_ K, which consist of the messages whose destination point is k.The analog of non-overload condition (4.1) takes now a simple form < #, and it guarantees the existence (and uniqueness) of the stationary distribution.Furthermore, the queue for any server j-+O is simply M/M/l, and therefore the stationary distribution of the WT for server j-+O may be found from (1.7).
However, the exact form of the stationary distribution for the whole network is not available: the processes r]k generating the queues for the servers O--+k (considered as the processes of the times the messages join these queues, together with their lengths) are again quite complicated (in particular, they are neither Poisson nor with IID marks, although have the same intensity A) and depend on each other (and on the exogenous APs) in an intricate way.For example, the EED of a message generated at node j with address at node k in the network under consideration is given by the sum T---w (1) -w (2) --28, (5.1)where w (1) and w (2) are message's WTs for server j---+O and O---,k, respectively, and s is its length.The random variable w (1) has distribution T given by (1.7), but as w (2) is correlated with w (1) and s, the distribution of the whole sum in (5.1) cannot be explicitly calculated.
However, if we let K---cx, the limiting picture is simple: the queue for any given server approaches M/M/I, and becomes independent of other queues in the network.The exact state- ment is as follows.
Theorem 5.1: [58] Under the condition A < #, as N---,c, 1) For any k 1,2,..., the process rlk converges to the Poisson process of rate , with IID exponential marks (lengths) of rate #.For any finite sets {Jl,...,Jp} and {kl,...,kq} the processes Jl' 1-1,...,p, become asymptotically independent of each other and of processes ]km, rn-1,...,q. 2) The (Palm) distribution of the EED T of a tagged message converges to the convolution T,T,E 2 where T is determined by (1.7) and E 2 is the exponential distribution of rate The assumption that the messages lengths are exponential is not essential: in a general case independent M/GI/1 queues will appear instead of M/M/1 ones.Furthermore, one can consider a general joint distribution of the components s (1) and s (2) of vector s of the STs: it will lead to a straightforward modification of the above result.From the very beginning Dobrushin realized that this result is a part of a general approach based on the assumption that the network has a "rich" branched structure.Pictorially speaking, in such a network any two messages generated in different sources have little chance to influence upon each other.A natural conjecture then arises that the processes forming the queues to the servers in a "nice" branched network will be close to Poisson, and their intensities could be found from the corresponding balance equations.[A reservation about the nicety is necessary here in view of the Bramson examples.]Furthermore, for different servers these processes will be independent.Thus, the total WT W of a message given by the sum K= lW(J(k)) of the WTs w (j(k)) to the servers Sj(k) along its route S- (Sj(1),.. Sj()) will be distributed approximately as the convolution of the distributions of the WTs in Poisson M/GI/1-queues.On the other hand, the distribution P of the total ST g i s/(k) of a message with route S may be found directly from the joint distribution Ps: Pg(')-fPs(ds(J(1))x'"ds(J(r)))l (s(J(k)) ") Therefore, the distribution of the EEd of a message which, as before, is represented as W + 7, in a nice branched MS network is close to the convolution of two directly calculated distributions.
Dobrushin called this conjecture Poissonian (later on, some authors started using the term the Poisson-independence conjecture).We think that there are all reasons to call it Dobrushin's con- jecture.In a heuristic form, a similar statement had been known among applied mathematicians and engineers for quite a long time [94]; it is in fact a popular tool for calculating various network parameters.Dobrushin's contribution was that he outlined exact limits of its applicability.
Under certain conditions on an MS network, this conjecture has been verified in [25,78] and [88].
Dobrushin also suggested [49] that in a modified form a similar conjecture should hold for other classes of queueing networks.This was confirmed by the results of papers [8, 11, 73, 123,  125].(For loss networks, similar results are discussed in [91]; see also the literature therein.)We discuss here in some detail the result of the paper [73] which also deals with the above problem of uniqueness of a stationary distribution.
Consider the network with K 2 servers, S1,...,S K and S(1),..., S(K), and with K 2 exogenous APs i,j' 1 <_ i, j <_ K, which are again assumed to be independent Poisson, of rate )/K.We assume that the points of i,j represent the arrival of "programs" that have to be executed in parallel by the pair of "computers" (Si;,(J)).See Figure 5.2.$1 0 0 0 0 0 0 0 0 0 0 S O) S (<)   Figure 5.2.The network with parallel execution and synchronization on arrival However, the execution of a given program cannot begin until both computers S and S (j) com- plete the execution of all programs that had arrived earlier (in any of processes i,j' or i',j' res- pectively) and have to be executed by S (in parallel with any of the S(J')'s) or by S (j) (in parallel with any of the S i,'s ).In other words, each program from i,j has two "predecessors", one in computer Si, another in S(j), and its execution begins only when they are both completed.
The execution times of the programs (by the corresponding pairs of computers) are liD exponential, with rate #.
The model described is an example of a network with a synchronization constraint; see [4].
The execution of the program scheduled for pair (,i,S (j)) requires the results of computation of all previously generated programs that were executed by S or S(J); this is called in applications, the synchronization on the times of arrival.
Pictorially, Dobrushin's conjecture for the model under consideration takes the following form: As the "size" of the network Kee, (i) for each program the distribution of IATs between its two predecessors' and its o.wn arrival becomes exponential, with rate , and (ii) all these IATs become independent of each other (as for the same program as for different programs).In addi- tion, as agreed, the STs of the programs are IID exponential with rate # and independent of the IATs.
In other words, the Dobrushin conjecture leads to a natural limiting tree-like structure: a given program has a pair of predecessors, each of which has its own pair of predecessors, etc. See  Such a limiting scheme reminds one of the classical M/M/1 queue; the only proviso that the number of predecessors in the classical queue equals one: this is the customer who came imme- diately before you.Recall that the non-overload condition in the M/M/1 queue is ,/# < 1, and under this condition the stationary regime is unique and determined by the geometric distribution (1.6) of the queue size.In the case of two predecessors the picture is different.Not surprisingly, the non-overload condition is more restrictive: _A < 3-V < 1 (5.2) because the queueing process in the system with two predecessors has more chance to "blow up".However, the less evident fact is that the stationary regime in the such a system is not unique.
More precisely, there is a continuum of distinct stationary distributions of the WT in the limiting system, and these distributions may be labeled by the non-negative numbers g > 0 so that if g >_ ' then the corresponding stationary WT distributions P() and P(n') are stochastically ordered" / P()(d )l( > y) > J P(')(dff )l( > y), >o.
Formally, the result of [73] is contained in the assertion of Theorem 5.2 and 5.3 below.As in previous theorems, consider the Palm distribution given that at time to>_ 0 a "tagged" program has been generated in the process 1,1 (the choice of 1,1 is purely arbitrary) Then the distribution of the WT of the tagged program (i.e., the duration between the time t o of its arrival and the beginning of its execution by the pair (Si, s(J))) becomes a correctly defined RV; in a stationary regime the distribution of does not depend on t.Theorem 5.2: [73] Under condition (5.2), (i) for any k there exists a unique stationary distribution in the above network, (ii) the distribution of the WT of the tagged program converges, as K---cx, to the limit.The limiting distribution, p(O), gives a solution to a "second- order" stochastic Lindley equation _ max[' + s' r', " + s" -"].
(5.4)Here (iii) R Vs v on the left-hand side and fly' and /' on the right-hand side have the same proba- bility distribution (which is the unknown of equation (5.4)), (iv) s,s'," and " are exponentially distributed, s and s' with rate # and r and 7" with rate , and (v) all R Vs in the right-hand side of (5.4) are independent.
Like the standard "first-order" Lindley equation (1.1), equation (5.4) is understood in terms of the probability measures on [0, Theorem 5.3: [73] Under condition (5.2), the second-order Lindley equation (5.4) has a con- tinuum of solutions.The distribution p(O) gives the unique minimal solution to (5.4) in the sense of the stochastic ordering.Furthermore, the set of the solutions to equation (5.4) may be para- metrized by >_ 0 in such a way that they are stochastically ordered as indicated in (5.3).

The Invariance Problem
The third direction in QNT where Dobrushin was conspicuously active was connected with the following form of question (QIII) from Section 2. Describe the distributions of random pro- cesses which are invariant under the "input/output" or "queue transformation" and find condi- tions guaranteeing convergence to these distributions under the iterations of such a transforma- tion.The initial result in this direction is Burke's Theorem, quoted in Section 2. In other words, where (i) the RVs Sn, j Z, are lid exponential of rate #, # > , independent of the tn'S and (ii) w is the WT of the nth customer determined by (1.3).As was noted in Section 3, similar pro- perties hold for Jackson's networks.
In principle, one can replace the exponential distribution of the Sn'S by any other one and drop the condition of independence.It is also possible to state the problem in a more general form by enlisting, in both the AP and DP, customers' STs.This means that one considers the transformation of the marked sample {tn, Sn}-+{t'n, S}.Here, t is again given by t' n t n + w n + s n and w n by (1.3), whereas s n has to be specified.For example, the condition s n s n spec- ifies the case where the STs at the output port of the server are equal to STs at the input port.
In the QNT context, we have here a tandem network (see Figure 3.1).More precisely, server S 1 is fed with a stationary ergodic AP (7n, n E Z) (r, is as before the IAT between the nth and (n + 1)st customer); the customers, after being served by S1, proceed to $2, then to $3, etc.
The question is: what happens to the DP r/K from server S K as K-+oo?According to Burke's Theorem, if is Poisson of rate , and the STs are IID exponential of rate #, A < #, then r/K is Poisson.
Later on [112], a similar result was established in the situation where servers Sj are replaced by a "local" Jackson network so that each customer, after spending a time in one of such networks, proceeds to the next one which is identical to preceding, then to the next one, etc.
The proofs given in [110,112] depend essentially on the fact that the STs are IID and expo- 3The density of g is defined as 1/Er n.
nential.If one relaxes the assumption of exponentiality, but preserves independence, the picture becomes much more complicated: here, even the form of an invariant AP (that is an analog of Burke's Theorem) is unknown.Under some conditions on the marginal ST distribution, it is poss- ible to prove (i) the existence of a one-parameter family of invariant AP's, the parameter being identified with the density, and (ii) the Cesaro convergence to distributions from this family under iterations of the queueing transformation.See [111,113].
A different situation was considered in paper [132] (the result of [132] was also predicted by Dobrushin long ago).Here, one assume that the STs are preserved from server to sever (the tele- graph rule).The exogenous AP corresponds to the G/G/1 queue (i.e., is a stationary ergodic marked point process).The standard non-overload condition ,Es < 1 is assumed where , is the intensity of the AP and Es is the expected ST.The DP j from server S j again includes the STs and is treated as a marked point process.
Essentially, there are three cases emerging, depending on the marginal distribution of the ST.

1)
The case where the marginal ST distribution has a compact support (i.e., the ST takes bounded values), and the longest ST g is taken with a positive probability.

2)
The marginal ST distribution has unbounded support.

3)
The marginal ST distribution has a compact support, but in a neighborhood of its right- most point there is no values taken with positive probability.(This is, e.g., the case where the marginal ST distribution is absolutely continuous near the right-most point of its support.) The most interesting are the two first cases.In case 1) the typical sample of the limiting DP is decomposed into isolated finite "groups".Each group begins with the messages of the longest ST (called the leader of the group).Inside the group, the messages are "densely packed": the time interval between the subsequent messages equals the ST of the second message.Pictorially, such a group is processed by a server without interruption, as a single long "message".Further- more, the distance in time between the groups and the order of the messages within a group are preserved from server to server: this guarantees the invariance of process under the queue trans- formation.The distribution of the sizes of the groups and of the STs of the messages in them, as well as the distribution of the gaps between the subsequent groups, is uniquely determined by the exogenous AP .
On the other hand, in case 3) one conjectures that with probability equal to Es in the limiting DP there will be a single infinite densely packed group whose statistics is determined by the distribution of the exogenous AP (pictorially, to obtain a sample of one has the to "squeeze"-and-"stretch" the sample of in time so that the interval between the appearance of the subsequent messages equals of the ST of the second of them).However, with the complemen- tary probability 1-IEs one will observe an "empty" sample in process ; in particular, this means that in case 3) the limiting DP is non-ergodic.In paper [132] such a picture was in part justified.However, the full proof of this conjecture has not yet been accomplished.

Figure
Figure 2.1.A Jackson network

Figure
Figure 3.2(1) and s (2) are less restrictive than in Jackson's

Figure
Figure 5.1 A star-shaped network

Figure 13
Figure 13.3.The limiting tree of predecessors the stationary Poisson process of rate , is invariant under the transformation of the sample {tn}-{t'n}.Here t n t n + w n + s n,