Determining Bounds on Assumption Errors in Operational Analysis

The technique of operational analysis (OA) is used in the study of systems performance, mainly for estimating mean values of various measures of interest, such as, number of jobs at a device and response times. The basic principles of operational analysis allow errors in assumptions to be quantified over a time period. The assumptions which are used to derive the operational analysis relationships are studied. Using Karush-Kuhn-Tucker (KKT) conditions bounds on error measures of these OA relationships are found. Examples of these bounds are used for representative performance measures to show limits on the difference between true performance values and those estimated by operational analysis relationships. A technique for finding tolerance limits on the bounds is demonstrated with a simulation example.


Introduction
The analysis of the performance of a network of devices is important in many areas.Computer systems and industrial manufacturing systems are two examples.The types of networks considered in this paper are operationally connected, queue and server devices.That is, each device is connected in some way with every other device in the network and each device may have a queue assigned to it.Certain information about these types of networks may be obtained using a technique known as operational analysis (OA).Relationships used to estimate performance measures (PMs) of networks may be derived in operational analysis under a few restrictive assumptions.OA is a technique which was originally defined as an aid in computer system performance analysis [1][2][3][4][5][6].It can be an aid in the understanding of system performance in general [7] and is a complementary approach to stochastic analysis used in many networks of servers performance analyzes and in computer programs [8][9][10][11][12][13][14].Other used or suggested applications for the OA approach include telecommunications [15], E-commerce [16,17], flexible manufacturing systems [18], and Petri nets [19][20][21].The performance measures derived are such things as average number of units at a device, average response time, and throughput.The behavior of a single, arbitrary device in a network will be considered.
Two basic principles define the OA approach [2].
(1) All assumptions that are made in analyzing the performance of a real system should be subject to direct verification.
(2) All variables that appear in any equation which characterize the performance of a real system should be verifiable by direct measurement.
The validity of PM equations developed using these principles can be shown for a particular set of data because they are based on assumptions which can be directly tested by the observation of data produced by the system of interest over a finite period of time.
The most widely used assumption about the data is that of job flow balance, that is, the number of arrivals to a network (for global flow balance) or to a device (for local flow balance) must be equal to the number of departures from that network or device.Also assumed is one step behavior: only one unit may arrive or depart the network or device at a time.Arrivals and completions do not occur simultaneously.The OA approach assumes that devices must have homogeneous service.That is, the service time of a device in a network is independent of the queue length at any device.Homogeneous arrivals are the corresponding condition for the arrival times.Homogeneous routing holds when the routing frequencies of jobs leaving a device are independent of the queue lengths at other devices in the network.Device homogeneity exists when the rate of output from a device is determined only by its queue length.Other assumptions can be invoked as the need arises to derive OA relations [22].The only requirement is that these assumptions meet the two basic principles of testability given above.
OA assumptions allow for the development of relationships which enable us to determine PMs by collecting only a few types of data, namely, the number of arrivals and departures for each device state and the total time spent in each state [23].These PMs will be accurate only if the OA assumptions are met and only for the finite time period observed.The accuracy of the assumptions can be measured.A device state is the number of items (customers, jobs, entities, etc.) both waiting and in service at a device.
While OA research was originally proposed as an aid in computer performance analysis, it is more general in that developments can be applicable to any system that generates time series data.This would include computer simulation.
The Abbreviations section gives definitions of variable used in this paper.Error measures of various OA assumptions have been defined and are summarized in Table 1 for job flow balance, homogeneous service, and homogeneous arrival [24].The limit over time of the expected value and variance of the job flow balance error is zero [25] so that over time this error is not significant for data runs of reasonable length.The expected value over time of other error measures, such as, for the assumptions of homogeneous service and arrival, may not, in general, tend to zero [25].
Different error measures, in the form of relative errors, have been defined by Brumfield [22].By presenting a set of new assumptions, formulas for the calculation of response time and average queue length in terms of the average and coefficient of variation of service times are developed.Two examples of these new assumptions are homogeneity of queueing and service and homogeneity of residuals.For arrival  "forward residual is either the time remaining in the service period during which  arrives or zero if arrival  begins a service period . . .similarly backward residual is either the time since the beginning of the service period during which  arrives or zero if arrival  begins a service period" [22].Relative error formulas for response time are determined with these new assumptions in addition to the old assumptions of homogeneous service and homogeneous arrivals.Unfortunately, the error terms are quite complex since there are more assumptions with which to deal.
If we are using a relationship to determine a PM derived under OA assumptions, then the resulting value of the PM is in error if the founding assumptions do not hold.This can be checked from the data because of the way OA assumptions are defined.The degree of error in the PM calculated is a function of the assumption error measures.Correction terms have been developed using the assumption error measures [24].When added to the PMs these correction terms produce exact results.It is these correction terms which are studied in this paper.
As an example, assume we are interested in obtaining a value for the average number of jobs in a computer system that a new job sees upon arrival.If we make the homogeneous service assumption, then this average may be estimated by [4] where    is average number of jobs at a device seen by an arriver, assuming homogeneous service,  is average number of jobs at the device, and  is device utilization.
We are interested in finding a correction term, such that The correction term is equal to where Check the Abbreviations section for all symbol definitions.  is the error measure for the job flow balance assumption and  *  is a weak form of the homogeneous service error measure because it may be equal to 0 even when some or all the individual   () values are not.
There are a couple of problems with using the error measures to derive correction terms.One problem is that the amount of data needed to calculate an exact value for an assumption error measure is the same as to find the performance measure of interest directly.The process of determining the error measure for each assumption used to derive a relation, the correction term, and the PM estimate to which the correction term is added is a way of getting something which may be observed more directly.Finding exact values for performance measures in this indirect way over a finite time period may be worthwhile only if a number of PMs are desired.In this case, a single assumption error measure is determined for each assumption and applied to all the PM correction terms of interest.
Another problem with the error measure technique is that these measures apply only to the data observed.For another run of data, new error measure values need to be found.This limitation may be acceptable if PMs cannot be measured directly without changing the nature of the system, for example, in a complex computer system.We would like a way to extend assumption error measures over longer sets of data and, thereby, say something about the system that generated the data.As Sevcik and Klawe [26] stated shortly after OA was introduced "Because operational analysis is based on assumptions that can be tested but that are very unlikely to be satisfied exactly in any finite time period, it is very important to develop a means of dealing with 'fuzzy homogeneity' or situations in which the various independence assumptions are satisfied within some tolerance." This paper addresses this need to define these assumption bounds.
The next section will illustrate how OA relations may be used to reduce data collection while estimating performance measures.This will be followed by a discussion of the determination of bounds on the OA assumption measurement errors for homogeneous service and homogeneous arrival.Sample calculations of these bounds will be presented afterward.An illustration of the use of bounds in a simulation will then be given.

Simplifying Data Collection
Calculating performance measures with OA relationships that are derived under one or more of the system behavior assumptions is usually simpler than using more direct relationships.This is because by making the assumptions a model has been created which reduces, perhaps artificially, the complexity of the behavior of the system.The result is that less information is needed to make an estimate of the PM than would be needed for a direct measurement.
There are situations where it is impractical, if not impossible, to collect sufficient data to determine exact values for PMs over a finite period of time.In some cases, only an estimate of a PM is needed and it is not worthwhile to go to the trouble of determining the precise PM values.Any PM value obtained for a behavior sequence is only an estimate of the underlying system PM.With this realization in mind, it may seem unwise to spend a great deal of effort to obtain an exact value for a sequence which is, in turn, only an estimate of some other value.A good approximation of the sequence estimate may be sufficient.
If we want the average response time, R, of a behavior sequence, we could accumulate the response times of all the jobs that go through the system and get the exact R by dividing by the number of jobs.A simpler procedure would be to say that response time is where  is mean time between completions during busy periods and U is utilization.This equation will give the exact R value if we have a behavior sequence for a single server queue which is in flow balance and has homogeneous arrivals and services.If these conditions do not hold, the equation will not give R exactly, but an estimate of R, call it   .If we collect only the idle time, (0), and the number of completions, C, we can use the same equation to find   .If the behavior sequence lasts for time , then Another example calculates the average number of jobs at a device.With the same assumptions as for estimating response time, the average number of jobs in the queue/server system is This value takes even less data to calculate than does   .The direct calculation of  requires accumulating data every time there is an arrival or completion or requires keeping track of the total amount of time spent at each of the states.Using these equations for predicting future values for R and  of a system presents certain problems.For example, over future time, will the assumptions of the system behave in the same way?Since we can determine and use error measures of the assumptions in order to correct assumption derived PM estimate, it is not necessary that assumptions hold in the future if they have not in the past.With the determination of correction terms all that is really necessary is for the correction terms to remain relatively constant, that is, for the system's violations of assumptions to remain the same over future time periods.
Without knowledge of the assumption error measures and through them the correction terms, the performance measure estimates may be quite bad for any particular behavior sequence [23].As stated before, in a stable system the job flow balance assumption error measure will go to zero as time increases, but, as shown in [24], this is not necessarily true for other assumption error measures.For any behavior sequence it is important to make some assessment, if possible, of the behavior of the PM correction terms.

Performance Measure Bounds
One approach to use the simplified OA formulas for PMs is to determine bounds on the maximum PM error.That is, we are interested in defining bounds on the difference between true values of various PMs at a device for particular state sequences and those PMs estimated by using relationships derived under operational analysis assumptions.We will assume the network is in steady state.
In the following, bounds are found for the assumptions of homogeneous services and homogeneous arrivals.In the case of the job-flow balance assumption, we know that the expected value of the error measure and its variance go to zero [25]: Therefore, as the length of the sequence increases the   can be expected to become insignificant.We will need to assume that a maximum value of the error for services and arrivals for any state is known or can be set.Call these values   and   for the maximum service error and maximum arrival error, respectively.

Bounds on Homogeneous Service Assumption Error. If
is the maximum error for the homogeneous service assumption, then A more useful bound would be on the weak overall homogeneous service error: But, this limit may be harder to know beforehand.Equation ( 9) may be used to find an upper bound on  *  by using the definition Substituting (9) yields The term is the average number at the device seen by a completer.Therefore, The bound given by ( 14) does not take into consideration the fact that the   () values are not independent.In fact, they are related by the expression We can get a tighter bound of the  *  values by taking this dependence into consideration.Equation ( 15) can be shown by substituting the definition as follows: Since what is desired is an upper bound on  *  a solution to the optimization problem below will give the desired result: In order to show the optimal solution, first put this problem in primal and dual forms: The optimal solution to the problem will have to satisfy the Karush-Kuhn-Tucker (KKT) conditions, that is, feasibility of the Primal and Dual, as well as complementary slackness [27].The KKT conditions give the necessary conditions for optimality of the general constrained problem.
Consider the solution where ñ is the median state at completions.Assume for simplicity that there is an even number of states so that ñ ̸ =  for any n.This solution is dual feasible since we showed above that any n is a solution to the dual.
The solution value is Set this value equal to Δ  , which is the overall completer's average minus the average of the set truncated at the median.This can be shown by first taking times the completer's average,   , is Subtracting Δ  yields We know that since ñ is a median.Therefore, is the average of the set of states truncated at the median.So the bound on  *  is As an example, take the behavior sequence in Figure 1.If we want to use the OA equation [24] to calculate   , which is the average number assuming flow balance, homogeneous arrival, and services, we would be interested in the bound of the difference between  and   .We can calculate p(n)=1/3, and   = 7/4.Assume the maximum error is   = 3/5.If the other assumption errors are zero, then the difference,  −   is equal to the correction term: The upper bound on this correction term, using (14), is Using the tighter bound, Δ  , we get This is a reduction of 57.14%, for this example of the difference bound.

Bounds on Homogeneous Arrival Assumption Error.
As in the previous section, we can assume that a maximum error,   , for any state is known beforehand.That is, we assume Then, the weak overall homogeneous arrival error is bounded by where   is the average number at the device excluding the maximum state.
As with the service errors,   (), the   () are not independent.The dependency is This can be shown by substituting the definition of   () into the equation.As with the homogeneous service assumptions, we set up the following optimization problem: Using the Karush-Kuhn-Tucker conditions as before, we can show that with ñ as the median state is the solution to the optimization problem.The value of the solution is found by substituting into the primal objective function to get times the average excluding the maximum,   , is Subtracting The expression is the average of the values truncated at the median.Call this value   .Then, substitution yields an upper bound on the error measure due to violations in homogeneous arrivals of This bound may not be as useful as the bound on the homogeneous service assumption error because the value of Δ  is based on knowing the () values, whereas, for Δ  only completion counts are necessary.

Example Performance Measure Error Bounds
Some examples are given next of how Δ  and Δ  may be used to determine bounds on the difference between exact values of various PMs and the OA estimated values for particular behavior sequences.
4.1.Arriver's Average Queue Length.Using the example from the introduction, for a behavior sequence the average queue length seen by an arriving job may be calculated by where     =   − *  and    = (/)−1 is the average arriver's queue length assuming homogeneous services.
Rearranging and using the Δ  bound give Since   → 0 for any sequence of data in steady state, we assume flow balance holds.Then, This expression shows us that the difference between estimating the average length seen by arrivers with the relationship that assumes homogeneous servers and the true value of   is less than or equal to the bound on  *  .

Response Time.
The exact response time for a behavior sequence can be found by where    = −    = −(  −  *  ) and   = (  + 1).Since, as before, we are interested in the difference between a possible observed value () and a calculated value (  ) we should do some rearranging and get  −   = ( *  −   ).Again, assuming   is small and substituting  *  ≤ Δ  yield  −   ≤ Δ  ,  −   ≤  (  (  −   )) .
The exact average number if these assumptions do not hold is where Rearranging again, and substituting  *  ≤ Δ  ,  *  ≤ Δ  , and   = 0 we get Because the calculation of Δ  requires knowledge of T(n) values, there may be little benefit in using the right hand side of the above relationship to find this error bound.If, however, the behavior sequence can be assumed to have homogeneous arrivals, then we can get a bound on the average number at a device error without the knowledge of the individual T(n) values.In that case, the only time statistics we need are utilization, U, and the fraction of time spent at the maximum state, p(N).

Throughput.
Using the OA version of Little's Law [1] we can say that the difference between a behavior sequence's actual throughput and that calculated assuming both homogeneous arrivals and services will be For   , we can substitute   which is the more general expression since it does not need the homogeneous arrival assumption.From previous developments we know the following relations hold: where If both (60) and (61) are substituted, then the value of the expression must fall within the limits defined by ( 63) and (64).Therefore, the bounds on our throughput error are

Using the PM Bounds
The bounds derived in the previous sections are actually limits on PM correction terms.Assuming we know the   and   values a priori and that we can say something about the homogeneous arrival assumption, then these bounds can be found without knowledge of all the () values for each n.This is the same simplification of data collection that we have in using the OA formulas instead of direct calculations.
As an example of using the bounds in a simulation study, assume we are interested in finding an estimate for average number at a device.A series of 10 runs is made and the bound for the correction term,   , is calculated.If it is positive, we can call this an upper bound.If it is negative, let this be a lower bound.In both cases the other bound is 0. Assume in the simulation runs these bounds always fall between −6 and 3.3.The correction term for each run is approximated by taking the average value between the upper and lower bounds.We would like to be able to say something about the probability that future runs will fall within the [−6, 3.3] limits that have already appeared.We can do this using tolerance limit calculations.
Assume the 10 runs produced the results given in Table 2.The average and standard deviation for these observations are −1.34 and 1.55, respectively.Since the tolerance limits are going to be set at the observed limits, we can say (66) From tolerance limit tables [28] we can say with 95% confidence that at least 91% of future observations of   will fall within the interval [−6, 3.3].That is, if we use   values, we have 95% confidence that the correct  values for each run will fall within these limits 91% of the time.

Conclusion
In this paper, bounds were developed for the operational analysis error measures of homogeneous service and arrival assumptions.These bounds allow us to take advantage of the simplified data collection made possible by the use of operational analysis relationships, even when the assumptions used to derive those relationships are violated.A tolerance limit based method was given in order to be able to say something about the confidence that future correction term values in a time series would be within certain limits.

𝑛:
State of a device, number of jobs both in queue and in service ñ: Themedianstateseenbyadevice ñ : Themedianstateatadeviceatjob completions : Average number of jobs at a device   : The average number of jobs at a device assuming flow balance, homogeneous arrivals, and homogeneous services   : The average number at the device seen by a completing job   : The average of the set of states seen by completing job truncated at the median state value   : The A bound on the homogeneous service assumption error measure, ≤   .

1 Figure 1 :
Figure 1: Example Behavior Sequence for estimating.Homogeneous service error bound.

3 .
Average Number at Device.If homogeneous arrivals, homogeneous services, and job flow balance hold then the average number at a device (i.e., those both in queue and in service) can be calculated by   =  1 −  −  () (1 − ( + 1)  ()) .

Table 1 :
Operational analysis error measures.

Table 2 :
Example estimated correction terms for average number at a device.
average state at a device, excluding the maximum state   : Time-average of the set of states truncated at the median    : Average number of jobs at a device seen by an arriver, assuming homogeneous service (): Proportion of time spent in state    (): The proportion of arrivals when  jobs are at a device, ()/   (): The proportion of completions that leave  jobs at a device, ( + 1)/ :  : A bound on the homogeneous arrival assumption error measure, ≤   Δ  :