ACEA: A Queueing Model-Based Elastic Scaling Algorithm for Container Cluster

School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China The Second People’s Hospital of Nantong, Nantong 226002, China Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks, Nanjing 210023, China Institute of High Performance Computing and Bigdata, Nanjing University of Posts and Telecommunications, Nanjing 210023, China Jiangsu HPC and Intelligent Processing Engineer Research Center, Nanjing 210003, China


Introduction
With the advent of the information age, information technology has been widely used in various fields of human life, such as medical big data analysis in the medical field [1]. And the resulting application services have also been growing explosively. To provide an environment for an effective service running environment, cloud computing platforms based on virtualization technology emerge. Cloud computing refers to applications and services running on a distributed network using virtualized resources [2][3][4]. Virtualization is used to build virtual hosts running different operating systems on the same physical machine, while applications and services run on these different virtual machines as needed [5,6]. In the early days of cloud computing [7], it was common to build cloud computing clusters based on traditional virtual machine clusters. As Docker container technology [8,9] is maturing day by day, the way of cloud platform construction has gradually changed into the shaping of a Docker cluster through the integration of multiple Docker physical nodes [10]. Compared with the traditional virtualization architecture [11][12][13][14], the container has the characteristics of low resource consumption, fast startup speed, high deployment efficiency, and good scalability, which can ensure the reliability and timeliness of the elastic scaling of resources for the cluster. However, the following problems still exist in the elastic scaling: (1) The container cluster has the characteristics of large number of resource indicators and complex relationship between indicators, so it is difficult to analyze the relationship between indicators quantitatively. At this time, if the performance model and evaluation function [15][16][17] cannot be reasonably built according to user demand, system resource consumption, and other indicators, it is easy to have unreasonable allocation of resources (2) The number of tasks in the Internet environment has the characteristics of mutation, that is, the number of tasks arriving is irregular and sudden. When the elastic scaling algorithm cannot allocate resources according to the number of tasks in time, the task may be lost due to the long average waiting time of tasks or the resource waste caused by the cluster idling due to the insufficient number of tasks Therefore, it is urgent to solve the problem of how to quantitatively describe the relationship among number of tasks, average waiting time of tasks, and comprehensive resource utilization rate of cluster, for the purpose of ensuring that the comprehensive resource utilization rate of cluster is always at a high level on the basis of controllable average waiting time of tasks. In order to solve the problem, this paper proposes the ACEA. The main contributions of this algorithm are as follows: (1) According to the state information of the container cluster and the application characteristics it carries, building a self-defined QoS constraint relationship and feasible solution space to provides computing constraints for the resources elastic scaling (2) The paper studies the problem of resource elastic scaling in container cluster, introduces the calculation method of task effective arrival rate, and proposes an adaptive elastic capacity expansion framework and performance model based on a queuing model. This paper uses M/M/s/K to describe the relationship among the number of tasks, the average waiting time of tasks, and the comprehensive utilization of cluster resources and solves the problem of building the performance model and evaluation function of a container cluster (3) Taking the cluster evaluation function as the fitness function, through the particle swarm optimization algorithm to search the feasible solution space, improves the dynamic optimization and convergence timeliness of ACEA algorithm, and achieves the goal of improving the accuracy of resource elastic scaling under the condition of ensuring the convergence timeliness of the algorithm The structure of this paper is as follows. Section 2 summarizes the research work on the elastic scaling of container cluster. Section 3 discusses the overall design of the ACEA algorithm. Section 4 discusses the M/M/s/K performance model design and adaptive scaling strategy of the ACEA algorithm. Section 5 validates the overall design of Section 3 and the performance model and cluster schedul-ing strategy of Section 4 through experiments. Section 6 is the conclusions.

Related Work
At present, the research mainly provides appropriate resources for the task under the premise of ensuring the shortest task execution time [18][19][20]. For example, [18] proposes a cloud environment task scheduling algorithm based on the multipriority queue and memory algorithm (MPQMA). The basic idea of this method is to improve the convergence speed with the advantages of MA. [19] provides a comprehensive multiobjective optimization task scheduling model to minimize execution time, delivery time, and execution cost. However, the scheduling model has conflicts of objective function parameters, and there may be the issue of timeliness in multiobjective optimization. [20] proposes a method for automatically testing the entire cloud environment using containers, which serves as the foundation of distributed cloud monitoring. In [21], the cluster elastic scaling technique is divided into reaction scaling and prediction scaling. Reaction scaling refers to the dynamic scaling of a cluster when a burst task request occurs. Prediction scaling refers to predicting task size based on historical data and prediction algorithms and dynamically scale before the change takes place. [22] proposes a layer-by-layer elastic scaling technique for applications. [23] discusses the application of optimization algorithms in load balancing and elastic scheduling. [24] proposes the application of the integrated MOPSO algorithm in task scheduling and optimizes the total task time and average task time. [25][26][27][28] introduce three resource scheduling methods provided by Docker Swarm: spread strategy, binpack strategy, and random strategy. The spread strategy is the default strategy. Docker Swarm prefers nodes with the fewest resources (such as CPU and memory) to ensure uniform use of all node resources in the cluster. The binpack strategy is the opposite of spread strategy, and its purpose is to use one node as much as possible to ensure enough idle nodes; the random strategy is a random selection strategy, that is, the task is completely randomly assigned to the existing nodes. Based on the theoretical analysis, the corresponding advantages and disadvantages of the scheduling algorithm are discussed. Among them, [25,26] introduce Docker Swarm, the most widely used Docker cluster management tool, and provide the spread strategy as the default scheduling strategy. Docker Swarm selects the least quantity of resources to consume according to the number of CPU cores of a node and the unallocated memory; [28] proposes a task scheduling technique based on the genetic algorithm, which effectively allocates cloud computing resources and minimizes the overall response time. [29][30][31] detail the container orchestration tool Kubernetes and its elastic scaling function. The elastic scaling of Kubernetes can dynamically adjust the number of Pod copies for purpose of scaling the container according to the number of tasks. The number of Pod copies is adjusted by periodically querying the status of the Pod to obtain the monitoring data of the Pod, and then, the average usage rate and target usage rate of the existing Pod is compared to determine the number of scaling s. The 2 Wireless Communications and Mobile Computing fuzzy system is famous for the good balance between approximation accuracy and interpretability. [32] uses the fuzzy system for data preprocessing to improve the accuracy of the algorithm. In this paper, whether we can use the above ideas for reference to classify and deal with tasks is found. In summary, the existing container cluster dynamic scaling algorithm mainly provide resources for tasks under the premise of ensuring the minimum task execution time [18][19][20], without considering quantitatively the relationship among the sudden change of the number of tasks, average waiting time of tasks, and comprehensive resource utilization rate of cluster in the Internet environment.

Design of ACEA Algorithm
The overall design is shown in Figure 1. The algorithm ACEA consists of three modules: the information collector, M/M/s/K performance model, and cluster scheduler. The information collector is used to obtain the status of the current cluster and provide input data for M/M/s/K; the M/M/s/K performance model is the core of ACEA, which can be divided into three functional components: the QoS constraint verifier, M/M/s/K modeler, and PSO. The module mainly completes the construction of the cluster performance model, evaluation function and QoS constraints, verification of QoS constraints, and dynamic optimization and provides input data for the cluster scheduler. The cluster scheduler completes the specific cluster scheduling function according to the optimal number of containers, and the cluster state provided by the M/M/s/K performance model. The three modules of the ACEA algorithm are executed in sequence and form a closed loop.

The Process of ACEA
(1) The container cluster is responsible for receiving and processing tasks, which is the processing object of elastic scaling of the algorithm (2) The information collector is responsible for obtaining the current status of the cluster, such as CPU resources R usedCPU , memory resources R usedMEN , IO resources R usedIO , network resources R usedNET , and number of tasks n, etc. used by each container (4) Firstly, the QoS constraint verifier is used to construct QoS constraints. Then, under the constraints of the verifier, the cluster evaluation function in step (3) is used as the fitness function to solve the dynamic optimization problem and obtain the optimal number of containers in the current cluster and achieve the goal of dynamic optimization of the performance of the cluster (see Section 3.2.2 and Section 3.2.3 for details) (5) After the cluster scheduler obtains the optimal number of containers output by the M/M/s/K performance model module, the cluster scheduling strategy is determined according to the optimal numbers of containers and the state of the cluster.   Figure 2, the ACEA algorithm compares tasks and containers providing services to customers and servers, respectively. When the container cluster processing capability cannot meet the QoS constraints, the number of containers in the cluster can be dynamically adjusted according to the number of tasks to enhance the service processing capability for the purpose of a flexible supply of resources.
The M/M/s/K modeler uses the principle of Figure 2 to quantitatively describe the relationships among the number of tasks, the average waiting time of tasks, and the comprehensive resource utilization rate of cluster using the hybrid multiserver queuing model M/M/s/K, so as to solve the construction problem of cluster performance model and evaluation function. The evaluation function is used as the fitness function of dynamic optimization. The reasons for choosing the queuing model M/M/s/K are as follows [33]: (1) The arrival of tasks and the processing time have relatively stable frequencies, while the task has discreteness and independence, which satisfies the condition of exponential distribution (2) There is an upper limit for the tasks that the servers can handle, which is consistent with the concept of "system space" in the queuing model. The upper limit is defined as K (3) When the number of tasks to be processed in the servers reach K, the newly arrived task cannot be effectively processed in accordance with the requirements of the quality standard, which results in the loss of tasks. This is consistent with the principle of "when K locations have been occupied by customers, the newly arrived customers leave automatically" in the queuing model (4) When the number of tasks to be processed in the system is lower than K, the newly arrived task enters the queue and waits for the service, and the principle of "newly arrived customers enter the system to wait in line when the system has a free position" is consistent with the queuing model 3.2.2. QoS Constraint Verifier. The QoS constraint verifier mainly includes two functions: one is to construct QoS constraints and the other is to determine the solution space of the algorithm according to the QoS constraints. QoS constraints define the constraints among number of tasks, average waiting time, comprehensive resource utilization rate of cluster, and the cluster running indicators, which is the basis of constructing the QoS constraints verifier. They specifically include the following contents: (1) Maximum number of running containers: the maximum number of containers that the cluster hardware resources can support is denoted by K. In the production environment, the number of containers running in the cluster should be less than the maximum number of running containers. Otherwise, the container cannot be started due to insufficient hardware resources of the cluster, so there is s < K (2) Average waiting time of tasks: the mathematical expectation of the maximum waiting time that the user can withstand from the time when the task is issued to the time when the cluster starts responding. If waiting time exceeds the average waiting time, the task will be lost. This paper assumes that the average waiting time of tasks does not exceed 50 ms, that is,

Wireless Communications and Mobile Computing
containers is greater than the number of tasks, then there are free containers, resulting in waste of cluster resources. If n > K, which means the cluster is overloaded with tasks, it cannot process additional tasks effectively, resulting in the loss of these tasks; therefore, there must be s < n ≤ K (4) Threshold constraint Define f used i as the weighted sum of resources consumed by the ith container in the presence of a task, and the following relationship exists.
Define f total i as the weighted sum of the resources assigned to the ith container by the system, and the following relationship exists.
R i usedCPU represents the CPU resource used by the ith container, and R i totalCPU indicates the total CPU resources allocated by the system for the ith container. Other resources are similar. a, b, c, and d are the weights of each resource in the total resources and are determined by the task attributes processed by the container, satisfying the relationship of a + b + c + d = 1; this paper assumes a = b = c = d = 0:25.
Define U i as the comprehensive resource utilization rate of a single container, derived from Equation (1), (2).
U down is defined as the lower limit of the comprehensive resource utilization rate of cluster, which means that the comprehensive resource utilization rate of cluster is the lowest. If the utilization rate is lower than this value, it needs to shrink. U up is defined as the upper limit of the comprehensive resource utilization rate of cluster, which means that the comprehensive resource utilization rate of cluster is the highest. If it is higher than this value, it needs to be scaling. According to the requirements for the comprehensive utilization of the cluster, the following constraint is obtained from Equation (3).
3.2.3. Particle Swarm Optimization. Particle swarm optimization (PSO) algorithm makes use of an individual's sharing of information in the swarm, so that the swarm can evolve from disorder to order in the solution space to obtain the optimal solution. Due to its simple operation and fast convergence, PSO has been widely used in many fields such as function optimization, image processing, and geodetic survey [24]. The reasons for choosing PSO are as follows: (1) PSO has many mature applications in function optimization (2) PSO has a fast convergence rate and meets the timeliness requirements of the algorithm (3) PSO is easy to operate for improving computational efficiency In the dynamic optimization solution, PSO uses the cluster evaluation function built by the M/M/s/K modeler in the solution space determined by the QoS constraint verifier to search for the optimal solution for the fitness function and obtain the obtained solution. The optimal solution is provided as input data to the cluster scheduler.

ACEA Performance Model and Scheduling Strategy Design
This section focuses on the design of the M/M/s/K performance model based on the overall design of ACEA and implements the processing flow and scaling conditions of each module through pseudocode.

M/M/s/K Performance Model.
According to the existing cluster state and task attributes, ACEA firstly obtains the mathematical distribution parameters of task arrival and processing time by using statistical principle, then passes the obtained parameters into the M/M/s/K performance model, and finally solves the optimal expansion strategy under the constraints of QoS. The task processing flow is shown in Figure 3. L q is defined as the task average queue length: the mathematical expectation of the number of tasks to be processed in the queued model. According to the Equations (1) and (3) constraints in Section 3.2.2, the following relationships exist: where ρ = λ/μ, ρ s = λ/sμ indicating service intensity, reflecting the busy degree of the system.
λ e = λð1 − p k Þ indicates the effective arrival rate of tasks. The reason for the effective arrival rate of the task request is 5 Wireless Communications and Mobile Computing that the part of the task failed to process properly during the cluster service. The reason for this is that in the process of cluster service, tasks with probability P o cannot be handled properly, and tasks with probability P i can be handled properly. Therefore, the arrangement Equation (8) can be obtained.
Because of the existence of QoS constraints, Equation (9) must satisfy the W q < W q max , that is, there are the following relationships: Equations (10)- (13) show that W q is only related with s, K, λ, and μ, and since λ, μ, and K are relatively independent, s that conforms to the QoS constraints Section 3.2.2 can be regarded as a set of feasible solutions, and all the set of feasible solutions is the solution space. Therefore, the problem of obtaining the optimal index of cluster turns into the problem of obtaining the best feasible solution. For this reason, PSO is introduced to search for the optimal solution in the solution space to achieve the goal of dynamic optimization.

Adaptive Scaling Strategy
Design. This section implements the overall design and cluster scheduling strategy of ACEA in pseudocode. The pseudocode is shown in Algorithm 1. Line 1 defines the model parameters of the algorithm as global variables, including the distribution function parameters, the number of particles, the maximum number of iterations, and the adaptation degree of the task arrival and processing in the queuing model M/M/s/K. Line 2 defines the threshold of the system's comprehensive resource utilization rate and average waiting time of tasks. Line 3 defines the configuration file function of information collection service, which is used to obtain the cluster's state parameters, including CPU usage, memory usage, IO resources, network resources, etc. Lines 4-5 construct the particle fitness function (performance model evaluation function) and particle initialization and particle swarm algorithm according to the parameters of Line 1. Lines 6-12 particles search the feasible solution space for dynamic optimization. Lines 14-16 define the condition for scaling, that is, if the average waiting time of tasks satisfies the requirement, and the fitness value is lower than the lower threshold, contraction() function is performed. Lines 17-19 define the condition for scaling, that is, if the average waiting time of tasks satisfies the requirement and the fitness value is higher than the upper threshold, expand() function is performed. Lines 20-22 defines the condition for stability, that is, if the average waiting time of tasks satisfies the requirement, and the fitness value is between the upper and lower thresholds, scheduling is not performed, and only the current results will be visually managed. There are three main scheduling strategies for Algorithm 1.
(1) Container Contraction. The main reason for the container contraction is that the number of tasks is reduced or the task processing is completed, causing the decrease of various monitoring indicators to different extents. The decrease of ∑f used i will make the resource utilization rate lower than the MIN threshold. In this case, the container cluster needs to be reduced (2) Container Expand. The main reason for the expansion of the container is that the number of tasks increases, causing the increase the monitoring indicators to different extents. The increase of ∑f used i will make the resource utilization rate higher than the upper threshold or the average waiting time of tasks failing to meet the QoS constraints. In this case, the container cluster needs to be expanded

Results and Discussion
The algorithm ACEA has been prototyped in the Docker virtualized cluster. This section will verify the effectiveness of the algorithm in the case of a sudden change in the number of tasks and compare with the existing elastic scaling algorithm to verify the accuracy of the system's model. The objects for comparison are two general algorithms in the field of elastic scaling. The incremental scheduling algorithm (ISA) is an algorithm that periodically checks the state of the cluster through the polling service. When the task average waiting time or the comprehensive resource utilization rate of cluster does not meet the QoS constraints, the cluster scheduler or operation and maintenance personnel, based on historical experience, will quantitatively determine the number of containers to be adjusted in a certain interval to cope with the current task. The quantitative determination of the number of containers required to be adjusted within a certain interval is an increment.
Kubernetes HPA [29] (Kubernetes horizontal Pod autoscaling, Kubernetes Pod) obtains information about resource usage by periodically polling the Pod state during the operation of the container cluster and then compares the average usage rate of the existing Pod with the target usage rate to determine the number of Pod copies, and finally through the horizontal dynamic adjustment of the number of Pod copies to achieve the purpose of scaling. The formula for calculating the elastic scaling of Pod is as follows.
Among them, ExpansionPods indicates the number of containers required; Target Utilization indicates the userdefined resource usage threshold; CurrentUtilization indicates the average resource utilization of the current Pod, and the calculation formula is as shown in Equation (15); Sum() is a summation function for the sum of current utilization; Ceil() is the integer function used to return the smallest integer greater than or equal to the specified expression.
CurrentUti lization = Average value of used resources Resources allocated to Pod by the system : 5.1. Experimental Threshold. The experimental threshold settings in this section are as follows: (1) The upper threshold of the average waiting time of tasks is set to 50 ms, that is W q ≤ 50 ms (2) The upper threshold of the comprehensive resource utilization rate of cluster is 80%, and the lower threshold is 55%. Equation (4) gives the relationship of cluster resource utilization as shown in Equation As shown in Figure 4, the number of containers in the cluster changes simultaneously when the number of tasks changes, and the trend of change is consistent with the number of tasks, which verifies the feasibility of container cluster scheduling strategy. The following validation experiments are based on the experimental data to verify the performance of the average waiting time of tasks and the comprehensive resource utilization rate of cluster.
As shown in Figure 5, when the number of tasks changes, the number of containers in the cluster and the comprehensive resource utilization rate of cluster change accordingly. Corresponding to the left axis of the figure above, the number of tasks and the number of containers have the same trend. Corresponding to the right axis of the figure above, although the number of tasks that the cluster can handle per unit time, i.e., the average service rate μ, is different, the comprehensive utilization rate of cluster resources calculated by Equations (3) and (4) is always between 55% and 80%, which meets the threshold requirement of the comprehensive resource utilization rate of cluster.
As shown in Figure 6, when the number of tasks changes, the number of containers in the cluster and the average waiting time of tasks change accordingly. Corresponding to the left axis of the figure above, the number of tasks and the number of containers have the same trend. Corresponding to the right axis of the figure above, although the number of tasks that the cluster can handle per unit time, i.e., the average service rate μ, is different, the average waiting time of tasks increases synchronously and is always lower than 35 ms, which satisfies the threshold requirement of average waiting time of tasks and does not affect the effective processing of tasks.
It can be seen from the above experimental results that the average waiting time of tasks is always within 35 ms, and the comprehensive resource utilization rate of cluster is always maintained between 55% and 80%, which satisfies the requirement for the threshold set by the user, indicating that the algorithm satisfies the average waiting time of tasks. In the case of time requirements, the comprehensive resource utilization rate of cluster is also guaranteed.

5.3.
Comparison with ISA Algorithm. The biggest feature of this algorithm is easy to implement, no need for plugins, and easy operation and maintenance personnel. Compared with the algorithm ACEA, the performance is shown in Figures 7 and 8.
As shown in Figure 7, with the running of the system, when the number of tasks changes, the number of containers in the cluster and the average waiting time of tasks change accordingly. Corresponding to the left axis of the figure   8 Wireless Communications and Mobile Computing above, with the system running, in order to meet the threshold requirements, the number of containers in the cluster changes when the number of tasks changes. Corresponding to the right axis of the figure above, the average waiting time of ACEA algorithm is higher than that of ISA algorithm, and the difference is less than 12 ms. Meanwhile, although the average waiting time of ACEA algorithm fluctuates, it is always lower than 45 ms, which satisfies the threshold requirement of average waiting time of tasks and does not affect the effective processing of tasks. As shown in Figure 8, with the running of the system, when the number of tasks changes randomly, the comprehensive resource utilization rate and the number of containers in the cluster will change accordingly. For effective comparison of the performance of the algorithm, the change rule of the number of tasks here is consistent with Figure 7. Corresponding to the left axis of the figure above, with the running of the system, in order to meet the threshold requirements, the number of containers in the cluster changes when the number of tasks changes. Corresponding to the right axis of the figure above, when the running time is 250, the comprehensive resource utilization rate of ISA algorithm is higher than that of ACEA algorithm. The reason is that the number of capacity expansion determined by experience, i.e. the increment, matches the number of tasks at present, but overall, the comprehensive resource utilization rate of ACEA algorithm is higher than that of ISA algorithm, and the comprehensive resource utilization rate of ACEA algorithm is always between 55% and 80%, which achieves the goal of ACEA algorithm to improve the comprehensive resource utilization rate on the basis of ensuring the average waiting time of tasks. As shown in Figure 9, with the running of the system, when the number of tasks changes, the number of containers in the cluster and the average waiting time of tasks change accordingly. Corresponding to the left axis of the figure  9 Wireless Communications and Mobile Computing above, with the system running, in order to meet the threshold requirements, the number of containers in the cluster changes when the number of tasks changes. Corresponding to the right axis of the figure above, the average waiting time of ACEA algorithm is higher than that of Kubernetes HPA, and the difference is less than 15 ms. Meanwhile, although the average waiting time of ACEA algorithm fluctuates, it is always lower than 45 ms, which satisfies the threshold requirement of average waiting time of tasks and does not affect the effective processing of tasks.
As shown in Figure 10, with the running of the system, when the number of tasks changes randomly, the comprehensive resource utilization rate and the number of containers in the cluster will change accordingly. For effective comparison of the performance of the algorithm, the change rule of the number of tasks here is consistent with Figure 9. Corresponding to the left axis of the figure above, with the running of the system, in order to meet the threshold requirements, the number of containers in the cluster changes when the number of tasks changes. Corresponding to the right axis of the figure above, when the running time is 200 and 425, the comprehensive resource utilization rate of Kubernetes HPA is higher than that of ACEA algorithm. The reason is due to the influence of Kubernetes HPA resource monitoring transmission delay and Equation (10) average strategy, but overall, the comprehensive resource utilization rate of ACEA algorithm is higher than that of Kubernetes HPA, and the comprehensive resource utilization rate of ACEA algorithm is always between 55% and 80%, which achieves the goal of ACEA algorithm to improve the comprehensive resource utilization rate on the basis of ensuring the average waiting time of tasks.

Conclusions
This paper proposes the ACEA algorithm for balancing the average waiting time of tasks and the comprehensive resource utilization rate of cluster in elastic scaling. This algorithm uses a hybrid multiserver queuing model M/M/s/K to build a container cluster performance model and evaluate functions and QoS constraints and uses a QoS constraint validator to determine the feasible solution space of the algorithm, and the feasible solution space is searched by PSO to achieve dynamic optimization of the algorithm. The experimental results show that the proposed algorithm can ensure the comprehensive resource utilization rate of cluster while guaranteeing that the average waiting time of tasks is satisfied. However, we believe that there is still some follow-up work worth extending, including the following: (1) For the characteristics of sudden changes of the number of tasks, it may be considered to introduce a multiple QoS authentication strategy or a smoothing algorithm to avoid invalid scaling and reduce the jitter of cluster scaling (2) Try to apply ACEA to the microservice architecture container cluster (3) At present, there are some improved clustering algorithms. We can try to cluster tasks on the basis of the improved clustering algorithm and use the task category attribute to improve the efficiency of task forwarding in load balancing

Data Availability
The data used to support the findings of this study are included in the article.