Modeling Message Queueing Services with Reliability Guarantee in Cloud Computing Environment Using Colored Petri Nets

1State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China 2School of Software Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China 3School of Electronic Information Engineering, Qiongzhou University, Sanya 572022, China 4Institute of Network Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China


Introduction
Originally designed to help communication between processes in operating systems, message queues are more widely used in various application domains nowadays.In particular in distributed environments, message queues connect different system components and ensure that they can exchange information asynchronously and reliably.Message queues typically support point-to-point communication fashion [1].Traditional message queues are already widely applied in industry applications, for example, Active MQ [2], Rabbit MQ [3], Kafka [4], Microsoft MSMQ [5], and IBM Websphere MQ, many of which have gained good market repercussion from the users.
With the advent of virtualization technology, cloud computing becomes a popular topic for blogging and white papers and has been featured in the title of workshops, conferences, and even magazines [6].Clouds can be defined as a large pool of easily usable and accessible virtualized resources (such as hardware, development platforms, and/or services).These resources can be dynamically reconfigured to adjust to a variable load (scale), allowing also for an optimum resource utilization [7].Cloud computing provides lots of advantages, in which the most famous one is the ondemand provisioning of software, hardware, and data as a service, preventing the developers from being concerned with overprovisioning or underprovisioning for a service.Also, within the cloud, the laws of probability give service providers great leverage through statistical multiplexing of varying workloads and easier management [8].Due to the popularity and merits of cloud computing, message queueing services in cloud computing environment (CMQSs) are broadly adopted recently in commercial markets besides traditional message queues, such as Amazon Simple Queue Service (SQS) [9] and Microsoft's Windows Azure Queue [10], both of which have better support for advanced availability and scalability.
These novel cloud-based message queueing services, however, pose some serious performance and reliability challenges.For example, online payment services rely on messaging services which can provide high reliability and high performance so as to support the acquisition and processing of real-time data and also help to provide better application services and strive for greater market share for enterprise customers.For commercial cloud queueing services, an effective method is urgently needed to measure and predict the performance of services as well as their reliability and scalability.Specially, when facing system updating, it is essential that cloud services are subjected to a rigorous performance analysis before the launch of new version in order to make sure that the update will not bring up performance decline in some aspects.Moreover, because of the flexibility and loose coupling of CMQSs, performance monitoring and analysis should better be conducted continuously when providing services, so as to acquire customer feedback timely and make adjustments adaptively to achieve optimal performance accordingly.Common performance metrics of interest are the expected event notification latency as well as the utilization and message throughput of the various system components [11].In addition, the issue of message consistency, one of the most essential features of message queueing services, is also discussed in this paper when modeling CMQSs, and two types of message consistency options are provided in the model.
Considering the advantages and disadvantages of the existing cloud-based message queues, we find that there is no modeling approach for estimating and analyzing the performance of CMQSs.Besides, the proposed prototypes of message queues mentioned in recent literature are mainly concerned with the system scalability while lacking emphasis on reliability.In this paper, we present an analytical method to model the performance of CMQSs, while considering the system performance with reliability guarantee.We also demonstrate a novel simulation modeling method using colored Petri nets, which is named as visibility-based modeling approach (VMA) for the reason that we model a visibilitytimeout mechanism in the performance model, which is used to ensure the system reliability as well as availability.Normally, the distributed application driving system will potentially be hierarchically structured into layers [12]; thus, we present our CPN model in a hierarchical fashion accordingly.We compare the outcomes reached by both approaches to validate our model and give reasonable performance analysis according to the results.
The rest of this paper is organized as follows.In Section 2, we present a short review of the available literature.In Section 3, we provide a brief introduction to CMQSs as well as visibility-timeout mechanism.Section 4 describes the background on CPN.In Section 5 some necessary assumptions and the analytical modeling method are provided.Then we depict the simulation model using VMA approach in Section 6. Section 7 is focused on the numerical results of both models as well as the discussions about performance analysis.Finally in Section 8, some conclusions and directions of future work are given.

State of the Art
Numerous approaches to evaluate system performance are available in literature.Software performance engineering (SPE) [13] was firstly proposed by Smith for integrating performance prediction techniques into the software engineering process over two decades ago.Since then a large number of performance models are developed and can be classified into various categories, for example, queueing networks, layered queues, types of Petri nets, and stochastic process algebras, which were surveyed in [14].Particularly, the authors of paper [15] have a special focus on componentbased performance evaluation methods and survey their applicability in industrial use.In [16], some problems of existing modeling methods are summarized, for example, a lack of theoretical justification for the performance model and a difficulty in establishing causality across the subsystems for modeling distributed systems, in which events from different subsystems provided by different vendors need to be correlated.Also, the authors present expectations for the future techniques and tools in the paper.
For modeling the CMQSs described in our paper, we aim at addressing both of these two problems mentioned above.We employ both mathematical and simulation modeling methods to verify and evaluate the performance metrics of interest together, and the results achieved by the mathematical model can fit well with the simulation model under the out-of-order option.Although the analytical model described in this paper can only capture a subset of the whole system features (by reflecting the out-of-order model), it can still be seen as a heuristic method for solving the analytical model which can fully justify the simulation performance model theoretically.Moreover, for simulation model, we present a detailed timed colored Petri nets [17,18] model in hierarchical fashion, for the purpose of giving prominence to the different subsystems that comprise the whole distributed message queueing system in cloud environment.
In academic world, some studies have been conducted to apply modeling approaches to messaging systems.The Palladio Component Model (PCM) [19] is one of the approaches used for software performance prediction.It is an architecture description language supporting design time performance evaluation of component-based software systems, and it can be extended to model specific kinds of systems; for example, the authors in [20] combine PCM with a performance completion [21] for message-oriented middleware (MOM) so that the software architects can specify and configure messagebased communication using a language based on messaging patterns (e.g., publish-subscribe or competing consumers).They adopt a model-to-model transformation fashion which integrates the low-level details of a MOM into the highlevel software architecture model and use a SPECjms2007 benchmark case to evaluate and verify their model.Also, [22] exploits the possibility of extending the PCM so as to support the modeling of event-driven service-oriented systems.The authors define a set of mapping strategies between system elements and PCM model elements in order to eliminate the semantic gap between the system implementation and the architecture model caused by the approach proposed in [23] to modeling the asynchronous events using synchronous service calls.
In [11] the authors present a modeling methodology for message-oriented event-driven systems using a case study in the supply chain management domain.The methodology is based on queueing Petri nets [24] and the authors use a standard benchmark to test and verify their model.Also, the authors present a set of generic modeling patterns under different system scenarios, which can be applied in other performance models.
Time analysis is an important performance metric for message transmission, which is concerned by researchers.For example, in [25], the message-scheduling problems are discussed and an asynchronous communication model where messages are delivered in the form of packets is proposed.The authors analyze the packets completion time in a nonlinear way and propose an optimization algorithm in the case of priority messages.However, the message queues in the model and algorithm are only considered on the side of the sender and thus cannot predict the whole message delivery time through senders, brokers, and receivers.
As the features of messaging infrastructure (e.g., space, time, and synchronization decoupling between system components) are essential in distributed systems like clouds, products like Amazon SQS and Microsoft's Windows Azure Queue are becoming more and more popular and are widely used in industry.However, in literature, the discussion about analysis and improvements of cloud message queueing services still needs more attention and concerns.Some researchers have made preliminary exploration of combining message queues with cloud computing environment.
The EQS introduced in [26] presents an architecture of message queue which can scale elastically in the cloud.The authors implement a prototype of EQS based on an existing messaging library (i.e., ZeroMQ) and deploy and evaluate it on a cloud infrastructure (i.e., Amazon EC2) using cloud platform management tool Scalr.However, the results should not be taken as pure performance results but more as a validation of the elastic and dynamic nature of their algorithm in consideration of the dependence on best-effort nature of Amazon EC2 network I/O.Besides, the storage used in EQS is centralized rather than distributed, which is not realistic in cloud environment and could dramatically limit the system availability.
In [27], the authors present the BDQS, that is, a scalable cloud-based queueing service which is built on a distributed storage system Cassandra [28].BDQS can provide at-leastonce and best-effort in-order message delivery.
Another cloud-based message queueing service named SDQS is introduced in [29]; the authors implement the prototype of the message queue on top of a distributed in-memory cache, IBM WebSphere eXtreme Scale (WXS) [30], which is a special type of storage system.Based on the elasticity and fast access of the in-memory cache, SDQS can provide high scalability and availability, as well as multiple message delivery options.However, the authors focus their evaluation of the prototype on scalability rather than message delivery time and conduct the experiments of different process stages separately, that is, firstly the sending process and then the forwarding process, which is unlikely to happen in practical systems.
Since Hadoop has gained extensive usage in applications, a distributed message queueing system HBaseMQ, which is implemented as a light-weight client library to HBase [31], is presented in [32].The authors use HBase system tables to represent queues, messages, and clients (i.e., senders/receivers), in which the "timestamp" columns are used to rapidly pinpoint a message by its timestamp.HBaseMQ inherits many well-tested cloud-ready properties from Hadoop/HBase such as scalability and fault tolerance.Moreover, HBaseMQ can provide two delivery guarantees, that is, "at most once" and "at least once." However, the performance evaluations of HBaseMQ are conducted coarsely and message delivery time analysis is not provided in the paper.
Because the popular commercial queueing service like Amazon SQS does not guarantee the order of messages, nor does it guarantee the exactly once delivery, the authors of [33] propose a hierarchical distributed message queue (HDMQ), which eliminates these defects and outperforms SQS in both throughput and delivery latency.HDMQ is also proved to support in-order and exactly once message delivery.
Realizing that cloud message queueing services are becoming popular in practical use encourages researchers to consider designing systems that can better cooperate with the CMQSs so as to achieve good system performance as well as high system utilization.In [34], the authors provide a highly scalable and distributed job management system CloudKon, which is built upon cloud computing building blocks (Amazon EC2, SQS, and DynamoDB).In [35], the author introduces a novel conceptual model Iris, a decentralized messaging framework, to provide a much simpler way to specify, design, and implement distributed, cloud-based services.
Petri net [36] is another widely used modeling technique for system performance analysis.Pioneered by Carl Adam Petri in his doctoral dissertation, Petri net was recognized as a powerful mathematical theory to describe concurrent and asynchronous systems.Due to the state space explosion problem caused by analyzing large complex systems, Jensen introduced an expansion form of Petri net, that is, colored Petri nets (CPN) [17,18], which can reduce the size of system model dramatically through attaching colors to tokens in order to distinguish various tokens that represent different system types or components.At present, colored Petri nets are gathered with various features which include multiple colorsets, hierarchy, time, and high-level programming language (e.g., CPN ML programming language [17,37]).As CPN can be used to model extensive systems or services which contain concurrent behaviors, a large number of researchers have exploited CPN in sophisticated ways to conduct studies in their own research fields.For example, in [38], CPN is used to model network protocols, in [39], workflow executions are modeled, and in [40] manufacturing systems are analyzed.Moreover, specialized applications like train classification, greenhouse management, ERP projects, traffic control [41], or embedded systems are also able to be modeled using CPN.

System Description
In this paper, we model broker-based message queueing service into the cloud computing infrastructure, and the most famous CMQSs of this kind are the commercial application Amazon SQS [9].
Implementing message queueing service into the cloud computing environment has the advantage of delivering messages with low cost and high availability and scalability.Distributed message queueing service deployed in the cloud can queue messages quickly and reliably while one component in the application environment can consume the messages that are generated by another component.Also, the queue used in this process can be deemed as a temporary repository for messages that are waiting for processing; for example, Amazon SQS has a message retention period that expands from 60 seconds to 14 days which can be determined by the function SetQueueAttributes() provided in the application programming interface.To ensure continuous available services, a persistent distributed storage system is also needed in cloud-based message queueing services, for example, Cassandra in BDQS [27], IBM WXS in SDQS [29], and HBase in HBaseMQ [32].
In order to prevent loss of messages and guarantee the reliability of system, CMQSs adopt a control mechanism named visibility timeout.This mechanism assigns a visibility state to every message, indicating whether a message is available to be processed.In other words, this visibility state can be assumed as a lock; a message with visible state is seen as unlocked while a message with invisible state is locked.A message newly received by broker is initially in visible state (i.e., unlocked); it becomes in invisible state (i.e., locked) while being processed; meanwhile, a time window known as visibility timeout is set especially on this message to prevent simultaneous retrieving of the message from other receivers during this period.Under the circumstance of successful message transmission, an acknowledgment notification needs to be issued by the receiver so that the broker can delete the corresponding message without hesitation.Otherwise, if the message timer expires (i.e., timeout happens), broker can reset the visibility state of the message as visible, which means the message is available to be processed again.An example of message delivery process in broker-based CMQSs is illustrated in Figure 1.
Message consistency has been the subject of intense scholarly discussion since the emergence of CMQSs.Although Amazon SQS provides extremely high availability, it cannot guarantee message ordering as well as single delivery.This is because of the replication storage of messages and the server sampling scheme, which are adopted in the system to guarantee high availability.As demonstrated in Figure 1, broker replicates messages in multiple server nodes to prevent single point failure.When retrieving messages, one of the nodes is randomly sampled and returns a message in its queue.Thus if different nodes which are retaining different replications of messages are selected during successive processing requests, messages could be delivered without ordering.In both [29,33] the authors provide an improvement of message consistency in their systems, which are claimed to guarantee in-order and no duplication delivery.Our modeling method takes message consistency into account and presents two different models accordingly, that is, the inorder and out-of-order models.We address the modeling of in-order delivery type based on the solution provided in [29], that is, using another queue, queue-index, to maintain the order of messages; every time before processing a message, the index needs to be accessed first so as to indicate the oldest message stored in system and then begins the processing of the corresponding message.In general, although the two types of models are distinct with respect to different message consistency options, the modeling methods are similar in other parts.We introduce our modeling method in the case of out-of-order delivery option at first and then depict the differential between the two models.

Background on CPN
Colored Petri nets (CPN) are a language for the modeling and validation of systems in which concurrency, communication, and asynchronization play a major role.Modeling complex processes in terms of CPNs is a nontrivial task [42].This paper uses CPN to model CMQSs, so some basic concepts are presented here.The following definition for CPN is given by Jensen [43].
(i) Σ is a finite set of nonempty types, also called colour sets.
(ii)  is a finite set of places.
(iii)  is a finite set of transitions.
(iv)  is a finite set of arcs such that where the type of a variable V is denoted as Type[V] and the set of free variables in an expression  is denoted as Var where  is the place of () and () MS denotes the set of all multisets over ().
(ix)  is an initialization function.It is defined from  into closed expressions such that Currently, CPN Tools is by far the most widely used Petri net tool [42].We adopt CPN Tools version 4.0 to conduct our modeling, availing our CPN model of the new features supported by this version, for example, real timestamp, time colorsets, and advanced communication.
As an industrial-strength computer tool for constructing and analyzing CPN models, CPN Tools realizes the graphical construction, editing, simulation, and analysis of CPN model through eleven different pallets.In the aspects of graphical construction and editing, CPN Tools provides a hierarchical method to make sure that the constructed model is clear to understand but without affecting system functions.Two ways are available to achieve hierarchical design: one is using subtransitions and the other is using fusionsets.Both of them can connect different places or transitions in different pages; more specifically, subtransition exhibits complex submodels using the concept of subnet whereas fusionset builds bridges between several submodels through combining distinct places.To create hierarchical descriptions, one should first divide large systems into several small nets, identify relationships between these subnets, and then define interfaces both among subnets and between subnets and upper layer nets.Usually, a system can be divided into multilayers and the complexity of the original system leads to deciding how deep the model hierarchy would be the most suitable.Abstractly speaking, in upper levels, model contains system outline without detail whereas in lower levels every important behavior is depicted in detail.
Additionally, time plays a significant role in a wide range of concurrent systems, and different time control strategies may have huge impact on the performance of systems.It is possible to capture system event execution time and analyze the control relationship between events through the concept of time included in CPN model.CPN Tools adds time stamp on tokens so as to keep tracking of the execution time of system events, thus further utilizing the time model to calculate performance metrics such as events delays, system throughput, system components utilization rate, and queue length.

Analytical Queueing Model
In this paper, a message queueing system with reliability guarantee is considered, in which the visibility-timeout mechanism is used to ensure the system reliability; that is, no message is lost during the delivery.We can consider the message delivery process as a Markov process.A message that is produced by an application in a client-end server (i.e., sender) which is waiting to be sent to server nodes in the cloud (i.e., broker) for transmission process will be called customer.Assume the arriving process of customers is Poisson stream with the intensity value equal to  [44].Thus, the interarrival times of customers are in accordance with the exponential distribution and on average a new message will be produced every 1/ time.
The service process of customers in the whole system can be considered containing two phases.The first phase is the process of transmission for customers from senders to brokers and the second phase is the process of message forwarding from brokers to other applications in client-end servers (i.e., receivers).The two processes are considered to be cascaded in system model.However, as these two phases are asynchronous processes rather than synchronized with each other, there are independent waiting buffers (i.e., queues) inside themselves, respectively; we assume that the capacities of both buffers are infinite.The service time in both processes is considered exponentially distributed with parameters  1 and  2 , respectively.Thus the average service time of the two phases is 1/ 1 and 1/ 2 , respectively.Actually, before processing, it is time consuming to search in the message queue; thus, system side scalable acceleration is needed [45].Nevertheless, in cloud computing environment, the scalability can be realized by adding CPUs or using state-of-the-art CPUs because multicore can execute linear searching for queue with shorter depth (i.e., message number) in parallel; as a result, the performance of queue searching can be improved easily.Moreover, according to [45], searching speed of message queue when the order of message reception is different from the receiver's expectation is accelerated; this means the extra searching time caused by message ordering requirement is very small.In the light of these reasons and the fact that the message searching time is in microseconds, we assume that the time is so small which is negligible in our model, regardless of message consistency options.Normally, when a customer is served by these two processes, the applications in receiver will retrieve the message successfully.However, because of the unpredictable situations in network, for example, network congestion or link failure, messages may be lost or delayed for a long time during transmission, thus degrading the system performance.These failure events have been considered and modeled in literatures.For example, [46] has discussed the performance of distributed, content-based publish-subscribe systems while considering the uncertainty in underlying overlay networks in the proposed model.In cloud queues, by performing the visibility-timeout mechanism to guarantee system reliability, those messages which are believed to be lost will be sent again from brokers to receivers, which means some customers in the model will be served again through the second process.Let  be the probability of failure (including network failure and client failure), which means the proportion of messages which needs retransmission is , and thus the number of messages that have been retrieved successfully by the receiver (i.e., the arrival rate in receiver) is reduced by a factor  = 1 − , which is the reliability probability.
According to the above descriptions, we can model the cloud message queueing system as an open Markov queueing model, which is depicted in Figure 2. The model consists of two service processes, that is, Sender to Broker (denoted as Proc1) and Broker to Receiver (denoted as Proc2).Suppose the arriving rates of customers in Proc1 and Proc2 are Λ 1 and Λ 2 , respectively.In open Markov queueing model with 2 server nodes (i.e., the sender queue and the broker queue), because of the continuity of customers and the flow conservation feature, under the circumstance of infinite waiting capacity, the arriving rate of customers is equal to the departure rate of queue [44], and we can have the balance equation for a server node : where  denotes the customers arriving rate of the source node , Λ  denotes the arriving rate of node , 1 ≤  ≤ 2, and  , denotes the probability of a customer who has finished the service from node  and goes to node  for the next service.Thus from the model shown in Figure 2, the arriving rate equation for each node can be expressed by To solve the equations we have Assume the number of customers in node  (including waiting in queue and being served) is   and the probability that   =   is denoted by   (  ).The system matches the hypotheses of Jackson theorem on queueing networks, and therefore solutions follow the product form.
According to [44], in open Markov queueing networks, the joint probability () = ( 1 ,  2 ), which means the number of customers in sender queue and broker queue is  1 and  2 , respectively, can be expressed as follows: indicating that the mean number of customers in each node is independent of each other.In our model, the processes Proc1 and Proc2 are both modeled as an M/M/N queueing, so we can derive the mean number of messages in our model from the queue length distribution in M/M/N queueing model, which is such that Note that because we use multithreads to handle each message delivery process,   means the number of threads in sender and broker and   means the average message processing time of each thread in sender and broker.So the mean number of messages in the system (including sender and broker) is calculated as Applying Little's law, we can have the sojourn time of messages in Proc1 and Proc2 as Finally, assume the one-way network link delay as  link ; considering that before a message has been successfully received by the receiver, two times of network transmission processes are needed, and an extra message retransmission process is also needed in case failure occurs, then the total message delivery latency can be gained as

Hierarchical Timed CPN Model
In this section, we introduce our visibility-based modeling approach (VMA) for message queueing services in the cloud concerning reliability using colored Petri nets (CPN).Our approach includes two features: firstly the inherent hierarchical characteristic of timed CPN to afford model complexity

Senders Receivers
Sender RTxMsg Send_and_Recv and secondly the steps that we employed to construct the detailed submodels from the beginning [47].

Cloud queue
6.1.System Model Architecture.From a programming point of view, we construct timed CPN model in different modules and organize them in different hierarchy levels.We present a three-level hierarchical timed CPN model for CMQSs, as the model structure is shown in Figure 3.
At the top layer of the hierarchical structure is the system model, which represents an overview of system behavior; it is the layer which depicts the semanticity of message queue most clearly.However, we should notice that the double-headed arrows between sender and receiver do not indicate synchronous communication; on the contrary, they are decoupled and exchange events asynchronously due to the distributed environment.The corresponding model is shown in Figure 4.The scenario manifested here is as follows: sender computes the unique identity number of the corresponding queue, locates the queue, and puts message into the server nodes redundantly where the queue exists; similarly, a receiver needs to compute and locate the corresponding queue first when trying to retrieve messages; then a sampled server node which holds the queue returns a message to the receiver according to consistency requirement specified by the receiver.These detailed behaviors are described in submodels, and according to the consistency options discussed in the above section, the submodels can be classified into two types: out-of-order and in-order.

Detailed Submodels Description.
In Figure 5, the square with double border is called subtransition, representing an abstraction of a CPN subnet, that is, model in the lower layer of the hierarchical structure.The middle layer is a detailed expression of subtransition cloud queue in layer 1, as well as an abstraction of detailed message delivery process models in layer 3. Figure 5 exhibits the model of layer 2, which shows the major system components of CMQSs unambiguously as well as strengthening the impression of asynchronous communication of messages through the information flow.Nevertheless, this layer still provides the procedures of message flow via queues macroscopically for the fact that three subtransitions Send Proc, Broker, and Recv Proc hide the detailed behaviors.
In this paper, we describe the three kernel submodels in layer 3 step by step in the following, and here we employ the methodology introduced in [47].
Step 1 (determine model elements: places).Firstly, we have to map the system components and resources into respective elements of CPN model.Generally, hardware or software resources, such as thread, CPU, disk (IO) processes, and network, are usually modeled using tokens inside places.The number of tokens is used to show the size of available resources in the system.Logical entities such as messages, queues, and timers are also modeled using places and tokens in them.Secondly, because of the color feature provided in CPN, we need to assign a specific colorset to each place in which the tokens will be put on the same color.All the tokens in a place can be expressed as a multiset whose elements are predefined colors in colorsets.From the perspective of structure, there are two forms of colorsets: simple colorsets and compound colorsets; usually the latter are defined using the previously declared former.The colorsets used in our models are presented in Tables 1 and 2. Typical resources such as sender, receiver, thread, and visibility are modeled as simple colorsets while message, message queue, and visibility indication are belonging to compound colorsets.Take message queue as an example; the List colorset is commonly used to represent queueing structure in modeling, so we use colset Msgs = list Msg to represent the color of token in the queue that stores all messages sent by related senders.Queue operations such as enqueue, dequeue, and search can be implemented through the list operation interfaces supported by SML language.Particularly, time stamp is a way that CPN provides to indicate the resource usage time; it can only be used in timed colorsets of which the declarations are being appended by the keyword timed at the end.Tokens in timed places are timed tokens, which is identified by the charterer @ + and a number followed, indicating the specific model time at which the token is available for output transitions to use.This time stamp can also help to calculate the resources occupation time of system behavior, with the time length equal to the difference of current time stamp and the produce time stamp of the token.For instance, a token with color MsgTimeout is a timed token, indicating the specific timeout point of a message.
Step 2 (determine model elements: transitions).System behaviors that lead to state change are generally modeled as transitions, and the action of transition fire will cause tokens to disappear or emerge in the places connected with the transition.An arc from place to transition is called output arc and, conversely, input arc.Through output arc, tokens disappear in places while, through input arc, tokens emerge in places.Thus, using tokens, we can model the resources flow of system behaviors explicitly.Function is one of the features provided by CPN to model complex system workflows, objects, operations, or particular restraints.In transitions and arcs, there are special spaces to describe those functions needed in model, called code segments and expressions, respectively.Figure 6 shows the interaction model of message flows, that is, the Send Proc submodel.In this submodel, there are 13 places and 7 transitions in all.
Step 3 (analyze workloads in model).After determining which resources and processes should be included in model, we need to analyze the workloads that system would involve.In distributed computing and storage systems, CMQSs are usually used as a middleware to communicate with independent components; it receives information (i.e., messages)  from senders and distributes them to receivers.Thus, the cloud message queues workload is mainly composed of one kind of item: message.
For the workload intensity, assume the interarrival times of messages are exponentially distributed with the mean interarrival time equal to 1/, and this means the input process of messages is Poisson process with parameter .CPN Tools provides function for generating values from exponential distribution: exponential(r: real), where the mean value is equal to 1/.
In our model, each token that represents message is made up of message ID and content.Message ID is an increasing sequence to ensure both the production and the processing of messages are ordered; we simply assume it as integer number with initial number equal to 1.This is set in the model by putting the initial marking of place Msg ID in Figure 6 as 1 and increasing the marking by 1 each time of message production.Noticing that the message content and length are varied in practical systems, which leads to the unequal process time of messages, we assume in our model that the processing time of messages through system components (e.g., CPU and disk) accords with exponential distribution with parameter ; that is, the mean service time is 1/.Additionally, in order to facilitate the data collection process and performance analysis of our model, we join three time stamps into message token (see Table 1): AT (message generation time), IT (message enqueue time), and OT (message dequeue time).The example usage of these time stamps will be explained in Section 7.1.
Step 4 (construct detailed CPN model).After identifying the basic elements which should be included in the model, we can establish a full CPN model that represents various system components and includes all the principal system behaviors, so as to correctly take advantage of the performance analysis results of the model and identify the possible bottleneck that exists in the system.Certainly, just like the iterations and code review of programming used in software engineering, CPN model also needs several times of adjustments and refinements to obtain a model that addresses the initial modeling goals.message forwarding process in cloud.Notice in Figure 7, the place Queue in Broker submodel represents the logical queue of messages in distributed storage cluster in the cloud.Actually, the real contents of messages are stored in replication among multiple data server nodes to support system robustness, as discussed before, and the allowance of simultaneous nodes failure to maintain system liveness is  − 1 given the total number of server nodes equal to  (see Figure 1).

Modeling Threads Contention.
In distributed systems, application usually uses multiple threads to work simultaneously in its host to communicate or cooperate with others.
The adoption of multithreads mechanism not only enables the processes of tasks in cloud to execute at the same time, but also improves the system resource utilization by sharing.We include this multithreads communication mechanism in our model (see place Thread Pool in Figure 6 and Figure 8) and consider it as server windows with number equal to  in queueing analytical model; as a result, the message handling process can be modeled as an M/M/N queueing model as discussed in Section 5. What is worth mentioning here is that, due to the scalability mechanism supported by the cloud, we assume there are  CPUs in a server node given the number of threads equal to , so as to ensure that these threads can  Consider  1 and  2 are the number of threads in sender and broker, respectively.If  1 or  2 is equal to 1 or a very small number, there will be a long time of response delay for each concurrent message processing because of waiting for idle thread.So not only the message waiting time and the delivery time will be relatively large but also the actual interval processing time does not meet the Poisson process, which is obeyed by the interval message production time.On the contrary, if  1 or  2 is relatively very large compared with system workload, we can assure that every time when a message needs to be sent or received, an idle thread is always available for processing immediately.Consequently, the waiting and delivery time of message processing is shorter.However, if  2 ≪  1 , the queue length in broker is possible to increase sharply.Moreover, more idle threads would occupy more system resources without improving system throughput, thus decreasing system utilization.So the simulation and performance results of our model can help to analyze as well as provide suggestions for the optimal system configurations, for example, optimal threads number, in order to achieve relatively minimum mean message waiting time, minimum mean delivery delay, and the maximum system resource utilization rate.

Modeling Visibility-Timeout Mechanism.
Visibilitytimeout mechanism is adopted in commercial CMQSs such as Amazon SQS and Windows Azure Queue to manipulate the message lifecycle and it is included in our model as a unique characteristic which guarantees the reliability of system (i.e., ensuring every message is delivered despite client failure or network failure) and distinguishes our model from others.
The model of visibility-timeout mechanism is illustrated in Figure 9; it is a lower layer submodel that details the subtransition Visibility Timeout which is contained in the Broker submodel.Every time when the broker is ready to process a message and send it to a receiver, the real content of this message is still stored in queue rather than be deleted immediately; instead, the visibility of message acts as an indication of whether a message in queue is being processed or not.
The default message visibility is visible, which means this message is waiting in queue and is available to be processed and delivered.As shown in Figure 9, the place Msg Vis indicates these visible messages.The color of place Msg Vis is LIdxVis, which means the tokens in the place are in the form of (msgid, visible), and notice that this place plays as an interface to connect with the Send Proc submodel shown in Figure 6.Every time when a newly produced message has been inserted into the queue, a new token emerges in the place Msg Vis, in which the "msgid" field of the token accords with the unique sequence number of this message.Then every time when the broker delivers a message, it retrieves a message whose visibility is visible from the queue.Once the message is being processed, the visibility of this message turns into invisible, which being reflected in our model is the fire of the transition Get Msg in Broker submodel (see Figure 7), and the token (msgid, visible) is deleted and a new token (msgid, invisible) is added in place Invisible.
the broker to delete the original message  in queue after successfully delivering A) in time, the system will cancel the timer and delete all replications of  stored in cloud.Otherwise, the timer will trigger and the broker will send  again assuming that the last sending failed.In Figure 9, the place Timer models the timers set for every message and the place Delete Req models the acknowledgment messages.The transitions Cancel and Trigger model the actions of system under two opposite message delivery situations, respectively, while the detailed action codes are shown in Algorithms 1 and 2, respectively.Specially, when the timer is triggered, the token (msgid, invisible) is removed from place Invisible and the token (msgid, visible) is added again in place Msg Vis; that is, the message in queue is available to be sent again.

Modeling Queue Operations.
The queue operations are contained in Broker submodel.As mentioned above, the place Queue models the logical queue of messages in distributed storage cluster in the cloud and the type of colorset of the place is List, so queue operations like enqueue (i.e., inserting a message into queue) and dequeue (i.e., retrieving a message from queue) are implemented using list-related functions.The enqueue operation is simply realized using list connection function while the dequeue operation is more complex.This is because the real content of message is still stored in queue while the message is being processed, so we have to retrieve an available message according to the visibility indication, which is referred to as the place Msg Vis in our model.The implementation code of dequeue operation uses a recursive procedure, as shown in Algorithm 3. As discussed in previous section, we model two message consistency situations in CMQSs in this paper, that is, the inorder option and the out-of-order option.Notice that all the figures displayed above are suitable for both options except the Broker submodel (see Figure 7, which shows the graph under the out-of-order option), for the reason that there is a difference in the dequeue operation between them.
In [29], the authors suggest using another queue named queue-index to store the sequence order of messages.This queue-index queue exists in the distributed storage clusters just like the message queue except that the former is not stored in replication but rather in only one partition.So every time when delivering a message, the system visits the queueindex queue first to get the message sequence number of the oldest one before retrieving the corresponding message from the queue.Because the authors use IBM WebSphere eXtreme Scale [30] as an in-memory cache storage as well as the FIFO ObjectMap API provided by it to realize the oldest message access process, we can assume that the searching time of the oldest message through multiple server nodes is negligible.The Broker submodel under the in-order option is shown in Figure 10, in which we use the place Queue Index instead of Msg Vis to distinguish the two different models (the differences are emphasized in bold frame), and the tokens in place Queue Index are used to indicate the oldest message ID implicitly.
Under the in-order option, the dequeue operation is modeled by the transition Get Oldest Msg.Additionally, we set an occurring condition for dequeue operation, that is, a guard function for the transition, as shown in Algorithm 4. This function is set to handle the case that message with larger sequence number (newer message) arrives earlier in queue than the message with smaller sequence number (older message).In such situation, the operation has to wait for a while until the corresponding message indicated by queue-index is sent to the queue.As a result, we can infer that the message delivery time under the in-order option is certainly longer than the opposite option.We can quantitatively measure this tradeoff between message consistency and system performance (i.e., extra time delay) according to the simulation results illustrated in the next section.

Performance Analysis
Performance is often a central issue in the design, development, and configuration of systems.It is not always enough to know that systems work properly, and they must also work effectively.Performance analysis can help to evaluate existing or planned systems, to compare alternative configurations, or to find an optimal configuration of a system.Notice that all parameters used in simulation are chosen for the purpose of illustration.

Parameters Setting and Simulation Design.
In order to validate the performance of our model for CMQSs, we use MATLAB R2014a to test the experiments of the mathematical queueing model, and we compare the analytical results with the simulation outcomes of the CPN model, which is simulated using CPN Tools 4.0.We assume that the interval message production time is exponentially distributed with intensity parameter  = 30.3,which means in average every 33 milliseconds (short as ms) a message is produced.Let the service time follow exponential distribution and let the mean service rate be { 1 = 10,  2 = 10}, where  1 and  2 are for | obsBindElem = ∼1.0(5) in (6) obsBindElem bindelem (7) end Algorithm 5: Observer function of monitor Msg Delivery Latency.
senders and brokers, respectively.In other words, the mean service time for each message is 100 ms in both senders and brokers.
Speaking of CPN model, it is possible to investigate the behaviour of the modelled system using simulation, and to conduct simulation-based performance analysis [18].Because of the uncertainty of the simulation output, it is very necessary to make detailed simulation solution before the experiments are carried out, and appropriate statistical techniques are also needed to guarantee the correctness and rationality of the simulation output data.For more flexibility, in our experimental scheme, a function is designed to control the variables used in simulation: simulateConfigs (n: int), where the parameter  represents the repeat times of simulation.In this function, we can set system parameters such as mean service time and the number of threads, which facilitates the simulation process.In our simulation scheme, we design 26 monitors, which are the facilities used to collect simulation data or control the simulation execution process.For example, we use a monitor of breakpoint type to control the stop criteria of simulation, for example, setting the simulation time at 100 seconds.
Besides, to gain the performance metrics, some calculations have to be made using monitors.The monitors of data collection type are the most frequently used type.For instance, in order to calculate the whole message delivery time which is the most concerning metric for decision makers, we design the observer function for the monitor Msg Delivery Latency, as shown in Algorithm 5.This function will be executed every time when the transition Deliver (see the Recv Proc submodel) fires; that is, a message is accepted by a receiver successfully, and then the delivery time of this message will be calculated using the current system time (variable recvtime in the function) minus the message generation time (variable #AT msg in the function).Each of the data for message delivery time will be collected as output of this monitor; using these raw data, we can get the final statistics that we are interested in.

Analysis of Numerical Results.
Using the parameters set above, we present the reached numerical results in this section.Notice that we run 20 independent replications of simulation for each experiment and take their average value to make sure that the data collected during the simulation is independent and identically distributed (IID).
Figure 11 illustrates the average message delivery time with a different number of threads (i.e., server windows)  1 =  2 = 4, 5, . . ., 10, where  1 is the number in sender and  2 is the number in broker.The reason   ≥ 4,  = 1, 2, is to satisfy the constraint condition of system steady state: The results of analytical model and CPN model are shown in the graph as symbols and lines, respectively.Two different values of reliability probabilities,  = 99%,  = 88% (i.e., the probability of message loss is  = 1%,  = 12%, resp.), are used.As can be seen in Figures 11(a) and 11(b), the results obtained by solving the analytical model agree very well with those obtained by simulation under out-of-order option.
For the same input variables and the same values of reliability probabilities, Figure 12 shows the mean number of messages waiting in queue in case  1 is not equal to  2 .More specifically, Figure 12(a) shows the average waiting number when  1 is fixed with the value of 6 and  2 is increasing by 1 starting with 4; Figure 12(b) shows the results with different  1 while  2 is unchanged with the value of 6.Similar to above, the solid lines relate to different message consistency options for the simulation results while the symbols relate to the analytical results.As the number of threads increases, the mean number of messages waiting in queue decreases sharply at first and then smoothly up to the threads number of around  1 = 7 or  2 = 7, when it begins to stabilize.As could be expected, the agreement between the calculation results of mathematical model and the simulation results of CPN model is very well under out-of-order option, which validates our modeling method, that is, the VMA approach for message queueing services in the cloud with reliability guarantee using CPN.
In addition, we compare the results of performance metrics obtained between two different message consistency options, as shown in Figures 11 and 12.In both figures, the red line corresponds to the results under in-order option and the blue line corresponds to the results under out-oforder option.As can be confirmed from the graphs, either the mean message delivery time or the mean number of messages waiting in queue is larger in the case of in-order option than in the case of out-of-order option.This is because, under in-order option, the broker has to send messages according to their sequence numbers; thus, if some messages with larger sequence numbers arrive earlier than those with smaller sequence numbers, they would have to wait for a long time before they can be handled.Moreover, once some messages happen to be lost due to failure, the delivery latency of the following messages tends to be even longer.Also, it can be significantly noticed that the curves for in-order line and out-of-order line are nearly equidistant in each graph, which reveals that there is an approximate linear dependence among them.The results help us to do quantitative analysis of the tradeoff between system performance and message consistency, in which the performance of out-of-order option is about 50% higher than that of in-order option.In addition, by comparing the two graphs in Figure 12, we can notice that, in the case of in-order option (i.e., the red line), the mean number of waiting messages is stabilized at around 3000 per second in the upper graph while this number is stabilized at around 4000 per second in the bottom graph, which is relatively larger than the previous one.Thus, we can infer that, under in-order option, increasing the number of threads  in broker (i.e., the parameter  2 ) is more helpful than that in sender in decreasing the waiting number of messages as well as the message delivery time.This finding can ensure the service providers to put emphasis on the improvement of broker capability rather than sender.
We also analyze and compare the system component utilization in both senders and brokers through simulation, as shown in Figure 13.The graphs show, at a fixed number of threads, the sender utilization tends to remain the same regardless of the message consistency options or the values of reliability probabilities (i.e., the lines related to sender utilization are overlapped).However, the broker utilization rate varies a lot according to different simulation configurations.With  = 99%, increasing the number of threads in senders and brokers reduces the broker utilization, and the decreased amplitude is nearly the same as that of the sender utilization under out-of-order option (i.e., the thinner lines), while the amplitude is much larger under in-order option (i.e., the thicker lines).This is because when the broker sends messages according to their sequence numbers, the broker has to wait for an extra period of time in the case that newer messages (i.e., messages with larger sequence numbers) arrive earlier in the queue than older messages, which leads to an increased idle time.Once the older message has arrived, several messages will begin to be processed in the same time, which leads to an increase in the parallel degree of processing.As the utilization is an estimation of the percentage of system busy time, with the increased idle time and reduced busy time, the broker utilization is much lower under the in-order options.With the reliability probability  = 88%, more messages are triggered and sent again by the broker; thus the utilization of broker is higher than sender.However, under in-order option, for the same reason, the broker utilization turns to be lower than sender when the number of threads increases to a certain point, that is,   > 5,  = 1, 2. These observations can help the cloud message queueing service providers to regulate the limited resources more effectively to minimize their costs as well as maximize their commercial interests.
Moreover, we also calculate the three performance metrics under different service times while fixing the average arrival rate at 30 messages per second.The reliability probability is set as  = 99% under out-of-order consistency option and the results are shown in Figure 14.According to Figure 14(a), it can be observed that longer service time leads to longer delivery time, and for each   (i.e., the number of threads), the delivery time is linearly increasing when mean service time is less than a particular value; for example, when the number of threads is 5, the value is 100 ms while, for a number of 7, the value is 140 ms.Also, we can see that larger   can result in an improvement in the performance of delivery time when mean service time is larger than 100 ms. Figure 14(b) shows, before certain points (e.g., 80 ms for service time when   = 5, or 120 ms for service time when   = 7), increasing service time almost has no effect on the number of waiting messages, while afterwards, the number increases sharply.These turning points can help to warn the cloud service providers to pay attention to the system resource consumption and replan the resource allocation when necessary.Figure 14(c) illustrates that the sender utilization and the broker utilization both grow sharply at first and then rise smoothly as service time gets longer.Also, the relatively steady period (i.e., the utilization rate remains at around 47%) stays longer as the number of threads   increases.
We also examine the effects of message arrival rate on total delivery time, mean number of waiting messages, and components utilization, given that the mean service time is unchanged at 100 ms; that is, service rate per second is  1 =  2 = 10.Unlike the above experiment, we set the reliability probability at  = 88% for a change.Figure 15 presents the performance metrics.From Figure 15(a), it can be seen that the message delivery time increases slowly before a steep rise for any   ; this is because when the arrival rate increases, the system workload traffic (i.e., the parameter   ) increases as well.The closer   is to 1, the greater the traffic intensity will be, which leads to a surprising congestion in queue, thus further resulting in the abrupt rise of delivery time.The congestion also means a larger number of messages waiting to be processed in system with respect to increased arrival rate, as shown in Figure 15(b).Similar to the above experiment, the utilization of sender and broker which are depicted in Figure 15(c) increases as the arrival rate gets larger, except that the utilization of broker is always a little higher than the sender because of low reliability probability.

Conclusions
Cloud has been identified to have many advantages and realizing the full benefits of this new paradigm will require rethinking the way we build applications.As the progressive improvement of cloud-based message queueing services, Mathematical Problems in Engineering performance analysis has become a crucial interest for both cloud providers and cloud consumers before its universal usage in the industrial area.In this paper, we present both analytical and simulation modeling methods to analyze the performance of CMQSs with reliability guarantee.The mathematical model is presented using queueing theory and Markov process, while the simulation model is created employing colored Petri nets.Considering different message consistency requests, we develop two simulation models dealing with in-order message delivery and out-of-order delivery, respectively.We name our novel modeling approach as VMA in virtue of the visibility-timeout mechanism which is adopted in system to enhance reliability.We examine the effects of various parameters such as message arrival rate, broker service rate, message consistency options, and number of resources on the system performance metrics of interest, for example, message delivery time, waiting number of messages, and system components utilization.We also quantitatively analyze the tradeoff between system performance and message consistency.Numerical and simulation results show that the presented models are able to model the performance of CMQSs with reliability guarantee, which is not emphasized in other researches.Future work will consider other service time distributions such as general distribution or Erlang distribution, which makes the model more flexible and realistic in cloud environment, as well as increasing the diversity of service time.We would also like to model differentiated quality of service requirements for classified messages, aiming at improving the performance of CMQSs in multiple levels.

Figure 2 :
Figure 2: Queueing performance model for cloud message queueing services with reliability guarantee.

Figure 3 :
Figure 3: A three-level hierarchical structure of timed CPN model.

Figures 7
Figures 7 and 8 depict the detailed Broker submodel and Recv Proc submodel, respectively, that is, the detailed

Algorithm 4 :
Figure 10: Detailed Broker submodel with consideration of message ordering.

Figure 11 :Figure 12 :
Figure 11: Mean message delivery time with a different number of threads.

Figure 13 :
Figure 13: System components utilization with a different number of threads.