Analysing the Impact of Scaling Out SaaS Software on Response Time

When SaaS software suffers from the problem of response time degradation, scaling deployment resources that support the operation of software can improve the response time, but that also means an increase in the costs due to additional deployment resources. For the purpose of saving costs of deployment resources while improving response time, scaling out the SaaS software is an alternative approach. However, how scaling out software affects response time in the case of saving deployment resources is an important issue for effectively improving response time. (erefore, in this paper, we propose a method analysing the impact of scaling out software on response time. Specifically, we define the scaling-out operation of SaaS software and then leverage queueing theory to analyse the impact of the scaling-out operation on response time. According to the conclusions of impact analysis, we further derive an algorithm for improving response time based on scaling out software without using additional deployment resources. Finally, the effectiveness of the analysis’s conclusions and the proposed algorithm is validated by a practical case, which indicates that the conclusions of impact analysis obtained from this paper can play a guiding role in scaling out software and improving response time effectively while saving deployment resources.


e Motivation and Research Challenges.
Response time degradation is one of the software performance issues that SaaS (Software-as-a-Service) providers may encounter due to changes in the runtime environment of the software. As a new software delivery model, SaaS software is composed of a series of atomic services, which are usually deployed on distributed deployment resources and together perform the function of the software [1,2]. When response time degradation occurs, for the sake of ensuring the efficient operation of the software, SaaS providers usually solve this problem by adopting scaling techniques, for instance, scaling up the deployment resources (i.e., upgrading the configurations of current deployment resources where the software is deployed) or scaling out the deployment resources (i.e., providing more deployment resources for the software). However, that will also result in more costs due to renting additional deployment resources.
For the purpose of saving costs of deployment resources, scaling out SaaS software (i.e., adding more instances of services in software) without using additional deployment resources is another approach to improve the response time. Nevertheless, that may not be an easy task for SaaS providers. In such a situation, the new service instances generated by scaling out the software may have to be deployed on existing deployment resources, and then, resource contention will occur between these service instances, which may have a negative impact on response time. If SaaS providers do not know the appropriate strategies for scaling out software with existing deployment resources, the response time of the software may not be effectively improved, which may also increase the burden of human operation. erefore, how to scale out SaaS software to effectively improve the response time without increasing the cost of additional deployment resources becomes a challenging issue for SaaS providers.

e Proposed Solution.
To address the issue mentioned above, our solution is to study the impact of scaling out SaaS software on response time without using additional deployment resources, in order to provide guidance for SaaS providers to scale out software and improve response time effectively. For this purpose, a method for performance analysis is necessary. Queueing theory has been widely applied in performance analysis [3,4]. Since queueing theory is a mathematical analysis method, the performance metrics of the software (for instance, response time) can be analysed in combination with deployment resources and solved with mathematical formulas quantitatively and efficiently, which can contribute to subsequently analysing the impact of scaling out software on response time and making decisions for scaling-out operations [5]. erefore, in this paper, we use queueing theory for our study. In brief, the key contributions of the paper are summarized as follows: (i) We define the scaling-out operation of SaaS software and then analyse the positive and negative influences of scaling-out operations on response time of the software, which gives the direction of scaling out SaaS software (ii) According to the analysis conclusions, we put forward an algorithm of scaling out software without using additional deployment resources, which is helpful in effectively improving response time of the software while saving deployment resources (iii) Aiming to verify the effectiveness of the analysis conclusions and the proposed algorithm, we set up experiments on a practical case e rest of this paper is organized as follows. Section 2 describes the background and problem statement of this paper. Section 3 introduces the process of analysing the impact of scaling out SaaS software on response time. Section 4 presents a practical case verifying the effectiveness of the analysis conclusions obtained from our study. e discussion of our study is presented in Section 5. Section 6 summarizes the related work, and our work is concluded in Section 7.

Background and Problem Statement
To facilitate the description of scaling out SaaS software and the analysis of the response time of software in combination with deployment resources, in this section, first, we will introduce the concepts of SaaS software, deployment scheme, and deployment resources mentioned in this paper. Based on these concepts, we will give a description of the problem to be addressed in this paper.

SaaS Software.
e SaaS software consists of different types of atomic services based on specific logical interaction relations. Figure 1 shows an example of SaaS software, which is composed of five types of atomic services (S 1 ∼S 5 ). For ease of explanation, we use a tree structure to describe the interaction relationship between atomic services (shown in Figure 2) [6]. In Figure 2, the nodes which are denoted as "seq.," "branch," and "loop" represent sequential execution, branch execution, and loop execution, respectively. If the label of the node is "branch," the value on the subsequent arc means the invocation probability of the corresponding branch, while if the label of the node is "loop," the value on the subsequent arc represents the execution times of the subsequent node.

Deployment Scheme and Deployment Resources.
A deployment scheme describes the deployment resources that support the operation of SaaS software and the allocation of the service instances in SaaS software to deployment resources. Figure 3 describes a possible deployment scheme for the SaaS software in Figure 1. e deployment resources in this deployment scheme are four deployment nodes, which are marked as n 1 , n 2 , n 3 , and n 4 from top to bottom. Each node is a virtual machine (VM), and the value in the parenthesis means the VM type of node. Each service in SaaS software has one or more instances deployed on different deployment nodes. In this deployment scheme, there is one instance of service S 1 and S 2 , respectively, and two instances of service S 3 , S 4 , and S 5 , respectively. For instance, the instance of S 3 on n 2 is denoted as s 3,2 , and the instance of S 3 on n 4 is denoted as s 3,4 .

Problem Statement.
To facilitate the problem statement, first, we give the definition of the scaling-out operation of SaaS software as follows.
Definition 1 (Scaling-Out Operation of SaaS Software). e operation of scaling out SaaS software is defined as a fivetuple SO � D init , SS, S i , Act(s i,t , n t ), D new , where we have the following: 2 Scientific Programming SS is the SaaS software S i is one of the services that constitute SS s i,t is the new instance of S i generated after the scalingout operation n t is a deployment node in D init Act(s i,t , n t ) is a function of the scaling-out operation, which generates the new instance s i,t and deploys s i,t on n t D new is the new deployment scheme generated after the scaling-out operation According to Definition 1, the process of scaling out software can be regarded as a series of scaling-out operations. Let the response time of SS before scaling-out operations be R init (SS), and let the response time after scaling-out operations be R new (SS). e problem and goal of this study are to analyse the impact of scaling-out operations on response time of SaaS software and find an effective approach to obtain a better response time R new (SS) comparing with R init (SS) through scaling-out operations without additional deployment nodes.

Impact Analysis of Scaling Out SaaS Software on Response Time
For the purpose of analysing the impact of scaling out SaaS software on response time, an approach to evaluate SaaS software response time is needed. In this section, first, we will briefly introduce the evaluation method of response time by leveraging a queueing performance model. With the evaluation method, we will focus on analysing the impact of scaling-out operations on response time of the software. Finally, based on the analysis conclusions, we will further derive the algorithm of scaling out software for response time improvement. e flow of our work is also shown in Figure 4.

Evaluation of SaaS Software Response Time.
Since the SaaS software consists of a series of services that are deployed on multiple deployment nodes, the response time of SaaS software can be calculated by aggregating the response time of each service on each deployment node [7]. Let n k be a deployment node, and the VM type of n k is v n k whose computing power is CP(v n k ). S i is any service that composes the SaaS software SS, and there is at least one instance of service S i deployed on n k (denoted as s i,k ). According to the Utilization Law [8], the utilization of service instance s i,k to its deployment node n k can be calculated by the throughput of s i,k and the actual time requirement of s i,k processing a request on n k as follows: where T(n k , s i,k ) is the throughput of instance s i,k , T is the throughput of SS, q i is the invocation probability of service S i , p i,k is the workload proportion of instance s i,k , i.e., the proportion of the workload assigned for s i,k to the workload of S i , and TR(S i ) is the time requirement of S i on a node with unit computing resource. en, the utilization of n k (denoted as U(n k )) can be calculated by the sum of the utilization of each service instance deployed on n k : where DS(n k ) is the service type set of service instances that are deployed on n k . According to existing research on performance based on queueing theory, each deployment node n k can be modelled as an M/M/1 queue with a service centre that has CP(v n k ) computing power and serves the requests in the processorsharing (PS) discipline, and the interarrival time of requests and the service time requirement follow exponential distribution [9,10]. en, the response time of each service instance s i,k on n k (denoted as R(n k , s i,k )) can be calculated from the request arrival rate of s i,k on n k (denoted as λ(n k , s i,k )) and the average request processing rate of s i,k on n k (denoted as μ(n k , s i,k )) as follows [11]: where λ(n k , s i,k ) is approximately equal to T(n k , s i,k ) in the balanced status and μ(n k , s i,k ) can be calculated by dividing the amount of computing resources of n k available to the service instance s i,k by the time requirement of service S i processing a request with unit computing resource as follows: erefore, according to equations (1), (2), and (4), equation (3) can be expressed as follows: Scientific Programming 3 en, the response time of service S i (denoted as R(S i )) can be calculated according to the response times of the corresponding service instances as follows: where DN(S i ) is the set of deployment nodes on which the instances of S i are deployed. Finally, the response time of the entire software can be calculated by aggregating the response time of each service with the corresponding invocation probability as follows: where N is the total number of deployment nodes. After scaling out the software, the response time of the software in the newly generated deployment scheme can also be calculated by equation (7). erefore, based on equation (7), we can further analyse the relationship between scalingout operations of software and variation of response time in the next subsection.

Analysing the Variation of Response Time with the Scaling-Out Operation.
For ease of description, the following definitions are given first.
Definition 2 (Response-Time Contribution of Service Instance). Let s i,k (deployed on deployment node n k ) be an instance of service S i in SaaS software SS, and then, the response time of s i,k is R(n k , s i,k ), which is called the response time contribution of service instance s i,k to response time of SS.
Definition 3 (Response-Time Contribution of the Deployment Node). Let DS(n k ) be the service type set of service instances deployed on deployment node n k , and then, the response time contribution of deployment node n k (denoted as R(n k )) means the sum of the response time contribution of each service instance on n k , which can be calculated as follows: Based on Definition 3, the response time of the SaaS software can be seen as the aggregation of the response time contribution of each deployment node in a deployment scheme, as well as the response time after scaling out the software. Assume that, in deployment scheme D init , service S i has n instances deployed on nodes n k1 , n k2 , . . . , n km , . . . , n kn , respectively (the instances are denoted as s i,k1 , s i,k2 , . . . , s i,km , . . . , s i,kn , correspondingly, and we call them existing instances of S i ). Now, we perform a scaling-out operation according to Definition 1, and then, a new service  instance of S i is deployed on node n t (this instance is denoted as s i,t ) and a new deployment scheme D new is generated. Comparing the response time contribution of nodes in D new with that in D init , we can see that only the response time contribution of the nodes on which the instances of S i are deployed are varied, i.e., the impact of the scaling-out operation on response time is just from the variation of response time contribution of n k1 , n k2 , . . . , n km , . . . , n kn and n t . Hence, in the following part, we will focus on analysing the impact of the scaling-out operation on response time contribution of these nodes.

e Impact of the Scaling-Out Operation on Response
Time Contribution of Nodes n k1 , . . . , n km , . . . n kn . In deployment scheme D init , based on equation (8), the response time contribution of any of these nodes n km (denoted as R init (n km )) can be calculated as follows: where S u represents the service type of instance deployed on node n km including S i , and equation (9) can be expressed as follows: where S o represents the service type (other than S i ) of instance deployed on node n km , q o is the invocation probability of S o , p o,km is the workload proportion of instance of S o deployed on n km , and TR(S o ) is the time requirement of S o . After deploying s i,t on n t , since in deployment scheme D new the instance number of service S i changed from n to n + 1, the workload proportion of instance s i,km on n km (i.e., p i,km ) will decrease (we denoted it as p i,km ′ in D new ): erefore, similar to equation (10), in D new the response time contribution of node n km (denoted as R new (n km )) can be expressed as follows: Comparing equation (12) with equation (10) and according to equation (11), we can obtain the conclusion that R new (n km ) < R init (n km ), which means that the response time contribution of node n km will decrease after the scalingout operation.
Similarly, we can obtain the conclusion that the response time contribution of nodes n k1 , n k2 , . . . , n km , . . . , n kn will all decrease after the scaling-out operation.
We combine the example of SaaS software shown in Figure 1 and the initial deployment scheme shown in Figure 3 to further explain the abovementioned analysis conclusion. For instance, we consider a scaling-out operation SO � D init , SS, S 5 , Act(s 5,1 , n 1 ), D new , i.e., adding an instance for service S 5 (s 5,1 ) and deploying it to node n 1 through the scaling-out operation SO. Since before the scaling-out operation there exist two instances of S 5 , i.e., s 5,2 on node n 2 and s 5,4 on node n 4 , according to the abovementioned analysis, the response time contribution of node n 2 and n 4 will decrease after the scaling-out operation.

e Impact of the Scaling-Out Operation on Response
Time Contribution of Node n t . Based on equation (8), in deployment scheme D init , the response time contribution of node n t (denoted as R init (n t )) can be calculated as follows: After deploying s i,t on n t , the response time contribution of node n t in D new (denoted as R new (n t )) can be calculated as follows: Scientific Programming 5 Comparing equation (14) with (13), we can obtain the conclusion that R new (n t ) > R init (n t ), which means that the response time contribution of node n t will increase after the scaling-out operation.
We also combine the example in Figure 1 to further explain the abovementioned analysis conclusion. We still consider the scaling-out operation SO � D init , SS, S 5 , Act(s 5,1 , n 1 ), D new }. Since the operation adds a new instance s 5,1 to node n 1 , according to the abovementioned analysis, the response time contribution of node n 1 will increase after the operation.
To conclude, we can see that a scaling-out operation will decrease the response time contribution of the nodes where the instances of S i already exist before the operation, which makes a positive influence on the response time improvement of the software, while it will also increase the response time contribution of the node where the new instance of S i is deployed, which makes a negative influence on the response time improvement of the software. If a SaaS provider can perform a series of scaling-out operations to make the positive influence larger than the negative influence, the response time of the software can be effectively improved.
In the next subsection, we will further give an approach of scaling out software for response time improvement based on the abovementioned analysis conclusions.

Response Time Improvement Based on Scaling Out
Software. According to the analysis in Section 3.2, it can be seen that scaling out SaaS software without additional deployment resources will lead to both positive and negative influences on response time improvement of SaaS software. erefore, to effectively improve the response time by scaling out software, we should determine a series of appropriate scaling-out operations that can make a positive influence on the response time of the software larger than the negative influence. For ease of description, first, we give the following theorem.
in which CP(v n km ) and CP(v n t ) are the computing power of node n km and n t , respectively, U D init (n km ), U D init (n t ), U D new (n km ), and U D new (n t ) are utilizations of node n km and n t in D init and D new , respectively, p i,km and p i,km ′ are the workload proportions of s i,km in D init and D new , respectively, and p i,t is the workload proportion of s i,t in D new .
Proof. Let R init (SS) and R new (SS) be the response time of SS in D init and D new , respectively. When the response time is improved after the scaling-out operation, the following inequality should be satisfied: Let R init (n km ) and R new (n km ) be the response time contribution of node n km in D init and D new , respectively, and let R init (n t ) and R new (n t ) be the response time contribution of node n t in D init and D new , respectively. According to the analysis conclusions in Section 3.2, since the response time contribution of nodes other than n k1 , . . . , n km , . . . n kn and n t is constant, inequality (16) can be transformed into where R c represents the sum of response time contribution of all nodes except n k1 , . . . , n km , . . . n kn and n t . Based on equations (2), (9), and (12)- (14), inequality (17) can be transformed into Let en, we can get the conclusion that the response time will be improved after the scaling-out operation if ΔJ(SO) > 0 is satisfied. erefore, the theorem holds.
We further illustrate eorem 1 with the example in Figure 1 and the scaling-out operation SO � D init , SS, S 5 , Act(s 5,1 , n 1 ), D new }. According to eorem 1, we can calculate the judgment metric ΔJ(SO) by equations (2) and (15). If ΔJ(SO) > 0, it indicates that the scaling-out operation SO can make a positive influence on the response time improvement larger than the negative influence, i.e., SO is a feasible operation which can improve the response time of the software effectively, while if ΔJ(SO) ≤ 0, it indicates that SO cannot improve the response time of the software.
Hence, for the given SaaS software, we perform a series of scaling-out operations that satisfy eorem 1, and then, the response time of the software can be effectively improved. Based on the abovementioned analysis conclusions, Algorithm 1 presents the main process of scaling out SaaS software for improving response time.
e algorithm searches multiple feasible scaling-out operations in order to constitute an operation series. Besides, the algorithm will find a relatively optimal operation series through multiple iterations.

Experiments and Analysis
In this section, we conduct experiments with a practical case to verify the theoretical analysis conclusions in the previous section.

Experimental Setup.
e experimental case is a travel planning system, which is composed of several types of services (the structure is shown in Figure 5). Service S 1 counts the most popular places. Service S 2 , S 3 , and S 4 recommend a set of travel places for customers according to different conditions. Service S 2 selects travel places within the most popular places based on the acceptable price of customer. Service S 3 selects travel places based on the place popularity. Service S 4 selects travel places both considering the price and the popularity of places. e invocation probabilities of S 2 , S 3 , and S 4 are 0.3, 0.3, and 0.4, respectively. Service S 5 plans the best travel route for the trip, and service S 6 arranges the luggage for customers. e deployment nodes for the experimental case are provided by an experimental environment implemented by the server equipped with Intel Core i5-6500 CPU @ 3.60 GHz and 16 GB DDR4 RAM. Each node is a VM running Ubuntu Server 14.04 LTS. e initial deployment scheme (denoted as D init ) of the experimental case is shown in Figure 6, which presents the distribution of service instances in the experimental case on these nodes.

Analysis and Verification of the Impact of the Scaling-Out
Operation on Response Time. In this section, based on the initial deployment scheme D init of the experimental case, we select the services in the experimental case to conduct three trials to analyse and verify the positive and negative influences of performing scaling-out operations on response time. In each trial, we execute a scaling-out operation by adding a new instance of S 4 or S 5 and deploy it on one of the nodes.
e specific scaling-out operations are shown in Table 1. en, we obtain the positive and negative influences of each scaling-out operation with our analysis method. In addition, during the experiments, the workload of service is assigned to each instance on average before and after scaling-out operations.
e experimental results are shown in Table 2. In the table, R init and R new are the response time of the experimental case before and after scaling-out operations, respectively. e positive influence represents the decreased value of response time contribution of nodes where existing instances of S 4 and S 5 are deployed after scaling-out operations, while the negative influence represents the increased value of response time contribution of nodes where new instances of S 4 and S 5 are deployed after scaling-out operations.
As we can see from the results, the scaling-out operation in trial 1 makes a negative influence larger than the positive influence, which means that the response time after the scaling-out operation becomes longer than before, so it is not a feasible scaling-out operation, while the scaling-out operations in trials 2 and 3 can make a negative influence less than the positive influence, so they are feasible scaling-out operations for improving the response time of the experimental case. Besides, the positive influences of scaling-out operations in trials 2 and 3 are the same, and it is because the positive influences all come from node n 3 , and in trials 2 and 3, the changes of the workload proportion of instance s 5,3 deployed on n 3 are the same due to the workload-balancing strategy.
Moreover, Table 2 also presents the value of the judgment metric ΔJ mentioned in eorem 1 for each scalingout operation. When ΔJ > 0 (in trials 2 and 3), the entire response time of the experimental case is reduced after executing the corresponding scaling-out operation, and vice versa (in trial 1). erefore, the judgment metric ΔJ can correctly reflect whether the corresponding scaling-out operation can improve the response time of the experimental case, which can also verify the effectiveness of eorem 1. Moreover, comparing trials 2 and 3, the larger the value of ΔJ, the better the improvement in response time, which illustrates that ΔJ can also reflect the improvement degree in response time.

Analysis and Verification of the Algorithm of Scaling Out
Software. In this section, we conduct experiments to further verify the effectiveness of the proposed algorithm in response time improvement through scaling out the experimental case without additional deployment nodes. For the purpose of demonstrating the effectiveness of the algorithm Scientific Programming on the experimental case in different configurations of deployment resources, we conduct three groups of experiments. In group 1, we use the configuration of deployment resources shown in Figure 6, and based on group 1, we generate the configurations in group 2 and group 3, as shown in Table 3.
During the experiments, for each time, we select a type of service to perform a feasible scaling-out operation according to eorem 1. We iterate the abovementioned step until there is no possible scaling-out operation to improve the response time and then terminate the experiments. e first and final deployment schemes are denoted as D init and D final . To compare the effectiveness of our algorithm in different groups, for each group, we have recorded the response times of the experimental case under D init and D final , including the response times obtained from our analysis method (denoted as R a (D init ) and R a (D final ), respectively) and the response Input: SaaS Software SS, initial deployment scheme D init Output: a series of scaling-out operations SO Series � [SO 1 , . . . , SO l , . . . SO x ], which makes the response time of SS better than in D init (1) Select a type of service S i in SS (7) Construct a scaling-out operation SO l � D old , SS, S i , Act(s i,t , n t ), D new / * e operation will add an instance s i,t for S i and deploy it on node n t * / (8) Calculate the judgment metric ΔJ(SO l ) for SO l / * Using equations (2) and (15) * / (9) if ΔJ(SO l ) > 0 do (10) SO Series tmp [l] ⟵ SO l / * Add the operation SO l to the operation series SO Series tmp * / (11) l ⟵ l + 1 (12) D old ⟵ D new (13) end if (14) if there are other possible scaling-out operations do (15) operationSearchFlag ⟵ true (16) else (17) operationSearchFlag ⟵ false (18) end if (19) end while (20) Calculate R(SS) after executing operation series SO Series tmp / * Using equation (7) SO Series ⟵ SO Series tmp / * Update the optimal operation series SO Series with current series SO Series tmp ) in the three experimental groups is 45.5%, 20.3%, and 27.4%, respectively. e average relative error of R a compared to R m (calculated by (|R a − R m |)/R m ·100%) is generally less than 8%, and according to the experimental results, the accuracy is acceptable for finding feasible scaling operations in order to improve the response time of the experimental case. Specifically, when the initial response time of the experimental case is relatively long (group 1 and group 3), the algorithm can bring a relatively large improvement degree of response time. Even if the initial response time in group 2 is not as long as that in the other two groups, the response time is further improved by scaling out the experimental case. erefore, the experiments illustrate that the conclusions of impact analysis and the proposed algorithm can help improve the response time effectively by scaling out software in different configurations of deployment resources, even without additional deployment nodes.

Analysis of the Statistical Significance of Our Method in
Improving Response Time. With the aim of verifying whether the effectiveness of our method in improving the response time of the experimental case is statistically significant, in this section, we further conduct more simulation experiments and perform statistical hypothesis testing to make extensive evaluation of our method. Specifically, based on the experimental case in Figure 5, we conduct three groups of experiments with three different number of initial service instances. Besides, in each group, we also conduct experiments with three different invocation probabilities of services, two different request arrival rates, and two different deployment configurations, i.e., we conduct a total of 36 different experiments.
During the experiments, according to the response times before and after executing scaling-out operations found by our method, we conduct the paired t-test to verify whether the response times of the experimental case after executing the scaling-out operations are significantly different from the initial response times. We consider two different hypotheses as follows: (i) H0: there is no difference between the response times after executing scaling-out operations and the initial response times (ii) H1: there is a difference between the response times after executing scaling-out operations and the initial response times e results of the paired t-test are shown in Table 4, which presents the significance level of the improved response times compared to the initial response times. Concretely, the table shows the number of tests in each group, the degrees of freedom (df ), the standard deviation differences (SD) and the mean of differences of the improved response times with the initial response times, the t value, and the p value.
Generally, for a paired t-test, the significance level is defined as p < 0.05, i.e., if the p value is less than 0.05, we should reject the null hypothesis (H0) and accept the alternative hypothesis (H1). As shown in Table 4, the p value in each group is less than 0.05, so the null hypothesis (H0) is rejected and the alternative hypothesis (H1) is acceptable, which means that the differences between the improved response times and the initial response times are significant, allowing us to conclude that the effectiveness of our method in improving the response time of the experimental case is statistically significant.

Discussion
In the previous section, some experiments have been conducted to evaluate our study, which indicates that the analysis conclusions and the proposed algorithm derived from our study can effectively provide guidance for scaling out SaaS software and improving the response time. Our study is based on the queueing performance model, and its advantage is that it can be used for quantitatively analysing the positive and negative influences of scaling-out operations on the response time of the software in combination with the deployment resources, which can assist in subsequently making decisions for scaling out software with existing deployment resources instead of additional deployment resources. However, we also have identified some limitations. Our current study mainly models the performance of SaaS software, but in the deployment nodes, there also exist some performance interference factors outside of SaaS software, which may not be captured by the model and limit the effectiveness of analysing  and improving the response time of software. In the future, we can extend our model to take these interference factors into account and further enhance the effectiveness of performance improvement.

Related Work
In the field of software performance, there are some works aiming to analyse and improve the performance of software and taking into account the scaling techniques. Vondra et al. [12] developed a new simulation tool based on the queueing model to autoscale VMs in the private cloud. Jiang et al. [13] proposed a novel scheme to autoscale cloud resources for web applications. Bouterse et al. [14] proposed a method to dynamically allocate VMs to provide reserved resources for scalable SaaS applications. ese works mainly studied the scaling approaches in combination with resource provisioning. Besides, El Kafhali et al. [15] presented a queueing mathematical model to estimate the amount of VMs required for scaling software in order to satisfy the performance requirement. is work mainly studied the approach of adjusting the number of resources for scaling software. Some studies also considered the elasticity of resources to guarantee the performance of the scalable software. Salah et al. [16] presented an analytical model that can be used to guarantee proper elasticity for cloud-hosted applications and services, in order to satisfy particular performance requirements. Ghobaei-Arani et al. [17] presented a framework called ControCity for controlling resources elasticity through using buffer management and elasticity management by leveraging the learning automata technique. e aforementioned works mainly considered how to adjust the resources to meet the performance improvement requirement of the scalable software. However, how to specifically adjust and scale the software to improve performance also needs to be further studied.
Zhen et al. [18] proposed a method to automatically scale cloud application instances. Wada et al. [7] presented a method to estimate and optimize the performance of services in a cloud environment by leveraging queueing theory, which considered the strategies of scaling cloud services. ese works mainly require adjusting deployment resources while scaling software. Different from the abovementioned works, we aim to improve the response time of SaaS software based on scaling out software without additional adjustment of deployment resources. erefore, in our work, we analysed both the positive and negative influences of scaling out SaaS software on response time, and based on the abovementioned foundation, we proposed an algorithm of scaling out SaaS software to improve the response time effectively while saving deployment resources.

Conclusions
In this paper, we defined the scaling-out operation of SaaS software and analysed the impact of the scaling-out operation on response time by leveraging queueing theory. e experiments showed that the analysis conclusions can be used to determine whether a scaling-out operation can effectively improve the response time of the software, which can be the basis of making decision for scaling out software. Based on the analysis conclusions, we further proposed an algorithm of scaling out SaaS software for response time improvement. e experiments demonstrated that the response time can be reduced effectively after scaling out the software with the proposed algorithm, which further illustrated that the analysis conclusions obtained from this paper can play a guiding role in scaling out software and improving response time while saving deployment resources. In the future, we will further explore how to apply the analysis conclusions to optimize the performance of SaaS software.  10 Scientific Programming

Data Availability
No data were used to support this study.

Conflicts of Interest
e authors declare no conflicts of interest regarding the publication of this paper.