A Simulation Perspective : Error Analysis in the Distributed Simulation of Continuous System

To construct a corresponding distributed system from a continuous system, the most convenient way is to partition the system into parts according to its topology and deploy the parts on separated nodes directly. However, system error will be introduced during this process because the computing pattern is changed from the sequential to the parallel. In this paper, themathematical expression of the introduced error is studied. A theorem is proposed to prove that a distributed system preserving the stability property of the continuous system can be found if the system error is limited to be small enough. Then, the compositions of the system error are analyzed one by one and the complete expression is deduced, where the advancing step T in distributed environment is one of the key factors associated. At last, the general steps to determine the step T are given. The significance of this study lies in the fact that the maximum T can be calculated without exceeding the expected error threshold, and a larger T can reduce the simulation cost effectively without causing too much performance degradation compared to the original continuous system.


Introduction
Engineering problems involve dealing with physical systems.Since most physical laws are described by differential equations, simulation in engineering is in fact related to numerical resolution of differential equations.This is called continuous system simulation and even the components (i.e., the subsystems or submodels) of it are generally described in time discretization manner [1].
The large engineering system has sustained requirement on distributed simulation for two reasons.First, the continuous growth on system scale and complexity results in intense demand on computing capacity, which can be mitigated in distributed environment by scattering the computing load to networked nodes.Second, the system itself is sometimes geographically fragmented; thus, the distributed structure is needed to be consistent with its topology.Except those constructed in distributed manner from scratch, there are still many classic continuous systems that were constructed in the nondistributed way, in that the system was designed and tested without considering the possible scenario of distributed simulation.The demand to transform such system into the distributed one emerges.The classic continuous system is characterized by computing its components in pipeline, where a computing sequence is set up and the components are computed one by one within a step; the sequence is determined by the component's input/output characteristics and the data dependence.By contrast, the computation on each component in the distributed environment will start simultaneously and advance in a parallel manner; the updating data is exchanged periodically through the network for synchronization.
The difference between two computing patterns can be formulized as follows.Suppose a system containing  components:  = { 1 ,  2 , . . .,   }.A single component can be represented as   = {  ,   ,   }, where   is the input,   is the output, and   is the model function.We have   =   (  ).Assuming a computing sequence  = { 1 , . . .,   , . . .,   } has been determined in the nondistributed environment, the pipeline computing within one step can be described as where  = 0, 1, 2, . . ., ℎ is the step size and the subscript   refers to the index in .In the pipeline computing, the first component is assumed to get the input from a signal source , and the others get inputs following the data dependence among themselves.
By contrast, the one-step computing in distributed environment can be described as where  is the advancing step of the distributed system.It should be noted that the computing sequence is not held any more in this case.
There are two differences comparing these two computing processes.First, the computing is synchronous in the nondistributed environment.For each component, the input is updated to time step  firstly, and then the output at  is computed (the data dependence among components is assumed to be consistent with the computing sequence).On the other hand, the computation is parallel in the distributed environment and each component starts simultaneously.The input of a component will still hold the value of step ( − 1) when the output at  needs to be computed.Second, the advancing steps of two environments, ℎ and , may be different.In the nondistributed environment, ℎ is determined by the equation solver; either fixed or variable step is employed.However, the determination of  needs to consider the bandwidth of the underlying network, signal frequency, human perception limitation, and so forth.It is often the case where the whole system advances with a fixed  and  ≥ ℎ.
A formal approach capable of partitioning a system into parallel parts and properly handling the above two issues is the key to build a distributed simulation from a nondistributed one.This problem, denoted as the Partitioning Problem, was early studied in the simulation of complex mechanical systems [2,3] and then became a research focus in the field of discrete event simulation [2,4,5] and the decentralized, large-scale control system [6,7].However, the proposed approaches all involved extra efforts to reformulize or reengineer the simulated system and did not consider the possible difference on ℎ and  either.These drawbacks set up barriers to apply these approaches broadly.
In [4], anovel partitioning approach is proposed using the system's topology.This approach is easy to perform since most classic simulations have block-diagram structures, but it is "lossy" because the disorder of the component's input/output data cannot be avoided, thus leading to the error on state trajectory of the resulting distributed system.To reduce the error, a portioning rule was proposed to eliminate the possible cumulative delays caused by improper partitioning, which contribute a lot to the overall error.
This paper will study the mathematical expression of this error.The correlation between the error and the advancing step  in distributed environment is revealed.Then, a maximum  can be calculated according to the error threshold specified by the system engineer.The significance to find the maximum  lies in the fact that a larger  will improve the parallelism of the partitioned system by reducing the synchronization frequency between nodes, without causing too much performance degradation at the same time.

Literature Review
The Partitioning Problem was early studied in the coupled problem [2,5] of complex mechanical system simulation, where the fluids, thermal, control, and structure subsystems interacted with each other and formed multi-physical-field system [8].The partitioning is applied to decompose such system into partitions with physical or computational considerations.The resulting system could separately advance in time over each partition, thus gained high simulation efficiency.However, these studies were focused on the finite element analysis, not for general purpose simulation.
The Partitioning Problem is also critical to build the decentralized, large-scale control system [6,7].A graphbased approach was employed in that the system states were connected as vertexes and the coupling strength (measured by weight factor) between any two of them was evaluated.Small weight connections were more likely to form the partitioning edge by which the system was split into relative independent parts.Another methodwas to use the delay differential equations [9,10] to describe the dynamic of the distributed system.Networked control scheme [11] was constructed to obtain the convergence and stability in control of the system.However, this approach focuses on the influence of signal delay associated with network latency, rather than the errors caused by the change of computing pattern.In our opinion, such errors will still exist even though the underlying network is perfect and has no delay.
In the Modeling and Simulation (M&S) domain, DEVS (Discrete Event System Specification) theory casts light into the solving of Partitioning Problem.The basic idea was to transform a continuous system into the DEVS form by quantizing the system states.A DEVS system is comprised of a set of connected components (called "atomic model" or "coupled model").The output of each component will hold unchanged until the values of the states it maintained exceed some predefined quantized level.As a result, the time driving continuous system is transformed into an event driving system; thus, it can be decoupled and partitioned easily [12][13][14].The transformed system is suitable for asynchronous, distributed environment in nature; however, the "illegitimacy" phenomenon, where the states may transit for unlimited times within a limited period, often causes the simulation to fail to converge correctly.As an extension of DEVS, the QSS approach [15][16][17] was proposed to resolve the "illegitimacy" problem by using a special "hysteretic quantization" method, where the output of each component (or model, as called in DEVS system) will not transit to the next quantization level during its descending period until the change has exceeded certain threshold.QSS provides a general approach for the Partitioning Problem since each QSS model interacts with discrete, states updating events.However, the QSS model needs to be particularly designed to allow states transiting between quantization levels.The changes of states are triggered by unpredicted threshold crossing events in QSS.In other words, the advancing "step" of QSS system is unpredicted and time-varying.This characteristic makes QSS unsuitable for applications such as the Hardware-In-the-Loop (HIL) or Man-In-the-Loop (MIL) simulation, where the system has to advance with some fixed step  to align the hardware's working frequency or to take care of the needs of human perception.
The approaches mentioned above all involve extra efforts to reformulize or reengineer the simulated system.For example, in the graph-based approach, the detailed analysis of the coupling strength between system states requests the engineer to have full knowledge of system dynamics.The QSS approach also needs the system components to be modified to follow the DEVS formulation.Additionally, the possible difference between ℎ and  was not considered either.These drawbacks set up barriers to apply these approaches broadly.
By contrast, the partitioning approach proposed in [4] took the advantage of the system's structure characteristics; no extra work is needed except for decomposing the system according to the data transferring route.However, the incurred error needs to be further studied considering the possible correlation with .
The content of this paper is organized as follows: in Section 3, the previous work is briefly reviewed.In Section 4, the mathematical expression of the errors is given with theorems.In Section 5, a series of steps are concluded to determine the maximum .

Problem Description
Normally, a continuous system can be represented by a block diagram as Figure 1 shows [4].This assembling approach based on basic components is commonly seen in the construction of complex system.
The components within this system can be classified into two categories: the Direct-Feed-Through (DFT) component and the Non-Direct-Feed-Through (NDFT) component.The output of DFT component is directly associated with its input; that is, the output at time  is determined by the input at the same time.On the other hand, the output of NDFT component is only determined by its current state and has nothing to do with the current input.It implies that, when computing a DFT component, the components it depends on should be computed firstly to maintain the correct data dependence.The NDFT component has no such requirement since its current output does not rely on the current input.
With parallel computing in the distributed environment, more deductions can be deduced from the foregoing.First, if part of the system deployed on a separate node formed a NDFT component (both DFT and NDFT component can have recursive structure), its output would be delayed for  when updating to other nodes.Second, if two or more DFT components were cascaded, they should not be divided into different nodes; otherwise, the output delay would accumulate from the first DFT component.For example, if the components (5), (7), and (9) in Figure 1 were deployed to different nodes, the output of component (9) will be 3 delayed compared to the original system, component (7) is 2 delayed, and component (5) is  delayed.
These outcomes have been observed in the experiments of [4], and a partitioning rule was proposed to reduce the accumulated delays.The rule is simple: the distributed DFT component should be avoided; each partitioned part should be ensured to form a NDFT component.The system performance was improved by applying this rule, as shown in Figure 2.
Although the partitioning rule made the distributed system more close to the original continuous system, there is still one step () delay among each node.This delay cannot be eliminated completely because of the parallelism nature of distributed environment, leading to the system error (denoted as Δ).To understand the influence of  to Δ, the mathematical expression of Δ needs to be formulized.
To simplify the analysis of Δ, an abstract control system containing two components is employed here: one component is the Controller and the other is the Plant.This system is assumed to be Asymptotically Stable.Denoting the original continuous system as CS and the distributed system as DS, the controller and plant can be described as differential equations in CS [15]: Plant (CS) : where   () and   () are system states,   () and   () are outputs,   () and   () are inputs,   ( * ) and   ( * ) are state functions, and   ( * ) and   ( * ) are output functions.In DS, the controller and the plant are described as () =   (  () , (  )  ()) Plant (DS) : where ( * )  is the discretization operator.This operator indicates the operand updates its value to the external world following the advancing step , just like being discretized.
Perturbation Analysis is used here to analyze the composition of Δ.Perturbation analysis treats Δ as the outcome of disturbances to CS such as the output delay and discretization.The difference between CS and DS can be determined without formulizing the differential equations of both systems and then comparing the eigenvalues.First of all, the DS system is expressed in the form of perturbation equations: (iv) Δ  = (  )  −  , the error of controller output caused by discretization; (v) Δ  = (  )  −   , the plant output error caused by discretization.
The system error is represented as To find out the detailed expression of Δ, a preparation theorem is proposed as follows.

(12b)
As a result, a region  4 can always be found in the vicinity of the origin of ( * ): where  is a positive real number defining the radius of  4 .
Another positive real number  1 (0 <  1 < ) can be found to satisfy (14) over  4 : Considering inequality (11a), we have Consider ( 9), and we have To integrate both sides of ( 16) in [0, ], where  0 ∈  3 is the start point of state trajectory.Inequation (17b) means the state trajectory in DS is strictly constrained Mathematical Problems in Engineering by a diminishing function in  3 .Considering the fact that V() < 0, we have where   1 is the value of () where the state trajectory crosses the level surface that defines region  1 .Inequations (18a) and (18b) mean that the state trajectory of DS will stay inside  1 .Considering V() < 0, we also have where   2 is the value of () where the state trajectory crosses the level surface that defines region  2 .Then, the solution trajectory of DS will go from the start point to  2 within finite time: Inequation (20) means a DS can be found, whose trajectory will converge to the equilibrium point of CS eventually within finite time, given CS being asymptotically stable and the system error being constrained as (13) shows.

The Mathematical Expression of System
Error.The mathematical expression of Δ is studied by analyzing each component of it.
4.2.1.Δ  .Δ  is the variation of the controller's state value within time interval .According to the definition in (5a) and (5b), Integrating both sides of (21b),  for any  <  max , and the overall error will be less than Δ * .Equation (31) also indicates the way to partition CS will influence  max : the more the parts partitioned (supposing each part is deployed on a node), the greater the denominator of the right part of (31), thus the smaller that of the allowable  max .

Experiment
A distributed simulation is constructed in this section to show the determination of .The original continuous system is a point mass system with fraction as follows: where  is spring const,  is mass, and  is friction coefficient.This system is asymptotically stable; the trajectories of  1 and  2 are shown in Figure 4.
A distributed system can be partitioned from the original system, where the structure has the block-diagram representation (see Figure 5).
The differential equations of these two parts are Component I: Component II: where  1 ,  2 are the output of components I and II, respectively.These two parts are deployed on two nodes, forming the simple distributed system.Equation (31) indicates that the parameters reflecting system dynamic need to be determined firstly, which are as follows: From the engineering perspective, these parameters can be directly observed from the simulation results of the original system.In this case,    1 is determined by the lower bound of ẋ1 as Figure 6 shows.
As  (34) The distributed system described in (33a) and (33b) is simulated.The trajectories of the system states are compared to those of the original continuous system, as Figure 7 shows.The threshold value of Δ is 0.5, and then  max is estimated to be 0.1055 s according to (34).Three configurations, where  = 0.1 s,  = 0.15 s, and  = 0.2 s, are tested in the form of distributed simulation.As expected, only the case  = 0.1 s satisfies that both  1 and  2 do not exceed the threshold value of Δ.
However, the actual error does not exceed the threshold value immediately when  is configured to be greater than 0.1 s, as shown in Figures 7(c) and 7(d).The reason is that  max is not strictly estimated since many associated parameters use their boundary values.

Conclusion
The mathematical expression of Δ helps us to gain the insight of system error produced in the construction of the distributed system using the partitioning approach.Giving an expected threshold of Δ, a proper advancing step  of the distributed system can be determined.A larger , compared to the integral step in the nondistributed system, will reduce the data-exchange frequency between simulation nodes, leading to the reduction of demands on system timing performance and network bandwidth.Then, the simulation cost is saved eventually.In fact, this approach can also play a role in multicore or multi-CPU parallel computing environments.
However, for systems that may become unstable after system partitioning, this approach is not so convenient since a Lyapunov function of the continuous system needs to be found firstly.The parameters defining specific regions satisfying ( 13)-( 15) also need to be determined.More work needs to be done to improve this approach's availability.

Figure 1 :
Figure 1: An inverted pendulum system.The states are "cart position" and "pendulum angle." A computing sequence is maintained (indicated by red numbers) in the nondistributed environment.

Figure 2 :
Figure 2: (a) The comparison of the trajectories ("cart position").(a) The distributed system became unstable when  = 2ℎ without applying the partitioning rule.(b) By applying this rule, the distributed system was stable even when  = 100ℎ.

Figure 6 :
Figure 6: The plot of ẋ1 .   1is determined by the bound of it in this example.