An Efficient Load Prediction-Driven Scheduling Strategy Model in Container Cloud

,


Introduction
A container is a form of operating system-level virtualization, allowing people to use containers to run everything from smallscale microservices to large-scale applications, afording systematic construction on agility and high compatibility [1]. Large containers must centralize the deployment and management of forms, with a container cloud involving a cloud computing technology that provides container deployment and management services. Currently, the container cloud has entered a stage of rapid development in the market. In the public cloud market, containers have widely covered 20%-35% of virtualization applications. According to iResearch, this number will grow to 50%-75% in 2025. Furthermore, by 2025, it will exceed 6 billion yuan, and the container cloud market will maintain high growth [2].
In the container cloud's scheduling process, we frst conduct preselection to traverse all nodes and flter out the ones that do not meet the conditions. All nodes that meet the output of the requirement at this stage will be recorded and used as the input for the second stage. If all nodes do not meet the conditions, the Pod will be pending until the notes meet conditions and the scheduler will retry exploiting them. After fltering, if multiple nodes meet the conditions, the system will enable them according to their priorities and fnally select the node with the highest priority to deploy the Pod application.
Te container cloud platform currently faces several challenges. Firstly, predicting container load conditions is a complex and variable process that becomes increasingly challenging as time progresses. Traditional prediction methods only consider one factor, which oversimplifes the issue. Secondly, a linear prediction model requires stable and continuous time series data, which poses signifcant difculties for accurately predicting load conditions. Te container scheduling strategy also presents several challenges, such as larger container scales and complex dependency relationships, resulting in increased server resource consumption and reduced scheduling efciency. Fine-grained resource scheduling is not considered, and there is a lack of distinction between coarse-grained and fne-grained resources. Additionally, maintaining high scheduling frequency may lead to low resource utilization rates. Tese issues must be addressed to improve the performance and reliability of the container cloud platform.
Spurred by the current defciencies, this paper conducts reasonable modeling and accurate prediction of many heterogeneous resources to ensure the normal operation, safety, and reliability of the container cloud environment that stably responds to emergencies. Te contributions of this work are as follows: (1) Te proposed solution utilizes the CNN-BiGRU-Attention model to develop a load prediction method that addresses traditional model issues and copes with complex load situations in container cloud resources. Compared with other load prediction models such as ARIMA, DBN, and CNN-LSTM, CNN-BiGRU-Attention prediction is better because (1) for time series data, convolutional neural networks train models better than ARIMA and DBN; (2) BiGRU adopts the idea of bidirectional mechanism to build a network structure with bidirectional gated cyclic units, which has a more concise network structure with faster model training compared to LSTM, and the GRU model only extracts features from a single direction for the input sequence, while the BiGRU model is designed to process the load sequence with a pair of opposite direction GRU models and merge the results of two GRU model operations along opposite directions at the output, while a two-dimensional vector transformed by the Gram's corner feld is used to preserve the dependence on time; thus, capturing features that GRU may ignored; and (3) the attention mechanism can capture the long time series data dependencies that may be missed by BiGRU and improve the accuracy of the model. (2) Te container scheduling strategy based on a load prediction model, CSSLPM, reduces the frequency of cross-server call with the binding of coarse-grained node-based and fne-grained container-based scheduling strategy on the base of load prediction, solves the fragmentation problem of server resources, and provides load prediction and resource optimization for cloud container environment.
We experimentally evaluate and improve the proposed load prediction method and container scheduling strategy utilizing cluster-trace-v2018 (a public dataset), CloudSim (open-source cloud computing simulation platform software) [3], and TrainTicket (an open-source train booking benchmark system). Te corresponding results demonstrate that compared with the traditional method, the container load prediction accuracy of the proposed method is improved by 37.4%, and the container scheduling efciency is increased by 21.7%.
Te rest of the paper is organized as follows. Section 2 explains related works, while Section 3 describes the CNN-BiGRU-Attention load prediction model and the CSSLPM container scheduling strategy in detail. Section 4 presents experimental verifcation of the model's accuracy and the strategy's efectiveness. Finally, Section 5 summarizes the paper and suggests future research directions.

Related Work
With the development of container technology, schedulingbased containers have been widely researched, and appealing results have been achieved.
In industry, widely used container orchestration systems are Mesos [4], Kubernetes [5], and Swarmkit [6], developed by diferent companies. Mesos provides an interface for developers to service, where the developers should overwrite the original scheduling method and customize the plan. It has a higher threshold for developers. Kubernetes is an open-source project from Google. Compared with other orchestration tools, it has the most features, including automated software deployment and cluster-level scaling. However, it has many disadvantages, such as complex architecture, involving a large system, and difculty in modifying and operating [5]. Swarmkit is an open-source project developed by Docker in 2016. Because it belongs to the same company as Docker, it is compatible with Docker container technology. Its architecture is easy to read, and the tools are simple to learn. Nevertheless, its scheduling method is too simple to satisfy container scheduling. However, when the number of containers increases, the existing scheduling strategies have limited applicability [6].
Various techniques are widely used for scheduling problems, including mathematical modeling, heuristics, metaheuristics, and machine learning. Mathematical modeling involves optimizing a linear function using a set of linear constraints, with integer linear programming being a common technique. Zhang et al. [7] proposed a linear model for the deployment of containers to servers. Te model considered two optimization objects, energy consumption and network cost. However, due to the complexity of the ILP formulation, it is not suitable for solving large-scale problems. Heuristics are often used to quickly realize solutions, with DRAPS [8] being a proposed algorithm for container deployment in Docker Swarm. Tis algorithm selects the node for container deployment based on available resources and service demand, resulting in a more efcient and balanced usage of resources compared to Swarmkit. However, this approach may lead to high network workloads. Inspired by intelligent processes and behaviors in nature, meta-heuristic algorithms have two important characteristics: selecting the fttest and adapting to the environment. Ant colony optimization [9] is an example of meta-heuristics used for container scheduling, which can enhance resource utilization using proper load balancing. However, this algorithm only considers a few optimization objectives. For example, Li from Southwest Jiaotong University proposed a load balancing algorithm for the Kubernetes cluster [10] that considers the cluster node load information, including CPU, RAM, and disk usage, but does not consider the fne granularity of resources. Machine learning-based solutions ofer more intelligent scheduling decisions to improve solution accuracy and efectiveness than other heuristic algorithms. Nanda and Hacker [11] proposed a deep reinforcement learning technique for container scheduling, which outperformed the shortest job frst and random placement algorithms. However, all methods do not consider container dependency so that the dependent container is distributed to diferent server causing cross-server call and increasing server load. Besides, these methods ignore the grain of server resource and waste some of server fragment resources. In this academic paper, we present CSSLPM-a solution that ofers load balancing and resource optimization capabilities for container cloud environments. Our approach is based on training models to anticipate the state of container cloud loads and leveraging a combination of coarse-grained node-based and fne-grained container scheduling. Trough this method, we are able to decrease the frequency of cross-server scheduling of service invocations and efectively address the issue of server resource fragmentation.
For container scheduling technology, its accuracy and timeliness depend not only on whether the scheduling algorithms are superior but also, to some extent, on whether the analysis of historical load data is comprehensive and whether the prediction methods are highly efcient. At present, academics have widely conducted research on load prediction and proposed many efective methods, such as prediction algorithms based on probability and statistics, big data, and neural networks.
Te classical methods include the ARMA-based time series model, AR model, and ES model for the prediction algorithms based on probability and statistics. However, these methods sufer from limited expressiveness, so some scholars proposed improvement schemes. For instance, Xue et al. [12] fxed the regression model and added particle fltering. However, this method requires a large number of samples. Wei et al. proposed the dynamic feedback load balancing algorithm of load weight by comprehensively considering the nodes' real-time load Information and their performance [13]. Nevertheless, two aspects are considered to ensure that the nodes with better performance can bear relatively more loads so that the ratio of computing resources and loads between nodes in the cluster will be more balanced. However, the problem that complex dependency occupies too many server resources is not considered. In addition, Monfared et al. [14] and Calheiros et al. [15] also improved the prediction algorithm based on the ES and ARMA models, respectively. Specifcally, these methods automatically adjust the parameters according to the actual situation, but they show shortcomings when facing the complex and variable load in a real situation. For example, the prediction accuracy is limited. Chen and Fang [16] proposed a load forecasting algorithm based on the analytic integrated model to achieve load forecasting based on big data analysis for medium-and long-term load time series data. Wang et al. [17] introduced a load forecasting method based on K-mode clustering to achieve load curve forecasting. However, such big data-based load prediction methods are based on massive data, and the technical difculty is greater than general methods. So, the span of historical load time series data cannot be too large. Neural network-based load forecasting methods can solve the drawbacks of big data-based methods, and the forecasting process can be achieved with a smaller amount of data. For example, Islam et al. [18] implemented load prediction by combining NNs and AR models, and Qiu et al. [19] proposed a load prediction method based on RBM and DBN to achieve workload prediction for virtual machines in cloud environments. Ashraf [20] used an LSTM-RNN network in automatic scaling for load prediction of virtual resources, and Guo et al. [21] developed a type-awarebased prediction method that determines the current load type based on the dynamic change of the load to switch the prediction method accordingly.
We analyzed the computational complexity of the abovementioned prediction models and fnally selected BiGRU to extract the load data dependencies. Te ARMA model is a linear model with relatively low model computational complexity due to the small number of parameter models. Te computational complexity of the DBN model mainly depends on the number and size of its hidden layers and the number of neurons in each layer. Te computational complexity of the LSTM model depends mainly on the length of the input sequence and the size of the hidden layers. Under the same parameter scale, the AMRA model has the lowest computational complexity but poor prediction for non-smooth sequences. BiGRU has lower computational complexity than DBN and LSTM due to its simple structure and has better prediction performance than DBN and LSTM for shorter length input sequences, while LSTM may outperform BiGRU in some long-term dependency tasks. Te diferences among diferent scheduling strategies and load prediction methods are shown in Tables 1 and 2, respectively. In summary, the existing studies of neural network-based load prediction methods using simple NNs models fail to capture the complex features in load timing data, lack consideration of the data context, and underperform in training efciency. When the service invocation, load balancing, and resource optimization are too complex in the cloud environment, numerous defects are revealed, and the desired prediction accuracy cannot be achieved. Most currently used scheduling strategies sufer from additional resource overhead, resulting in resource waste or scheduling strategy being too simple to meet the scheduling objectives. Terefore, this paper proposes a resource scheduling technique based on load prediction and describes and models the container resources in cloud environments in a more reasonable and diversifed way with real-time, accuracy, and scalability goals.

Load Prediction Algorithm and Container Scheduling Strategy
Tis section describes the proposed load prediction algorithm and container scheduling strategy in a container cloud environment, with the corresponding workfow illustrated in Figure 1.
International Journal of Intelligent Systems

+++ +++
Large-scale problem: ability to work with large-scale scheduling problems. Low network load: ability to work under low network load. Sufcient optimization objective: ability to optimize the scheduling process in a multi-objective manner. Dependencies: ability to consider container dependencies. Performance: the less the resources and time consumed by scheduling, the better the performance; more "+" means better performance. CPU utilization: ratio of CPU consumed by the algorithm in scheduling only to the overall CPU consumed by the algorithm; more "+" means better CPU utilization. 4 International Journal of Intelligent Systems

Load Characteristics and Load Model.
Tis subsection builds a load model to quantitatively describe the resource load of the container clouds by extracting characteristic quantities such as CPU, memory utilization, and network latency from the container and node perspectives and temporalizing the container cloud load data by the corresponding load calculation formula. To optimize load prediction, it is necessary to quantify the resource load conditions, i.e., based on comprehensive and in-depth analysis of load characteristics, load factors that can be quantifed are extracted from resource monitoring information and combined into a load model consisting of multi-dimensional factors. Regarding the load characteristics, Dinda from Carnegie Mellon University, USA, conducted experiments and studies for load variation and obtained a large amount of load information during longterm tracking sampling. Moreover, the author conducted the statistical analysis and organized and summarized the load characteristics, with the results reported in [22], some of which are summarized in Table 3.
According to the load characteristics in Table 3 and considering that a container is the smallest service unit in a container cloud environment and a node is the smallest service unit of the container carrier, the resource load situation of the container cloud comprises container and node loads. Te latter two components' load values are modeled from a coarse and fne granularity perspective. Te time series correlation and high self-similarity in load  International Journal of Intelligent Systems characteristics should detect not only the instantaneous load values but also the load values within a time sequence. Indeed, a time series data model should be established to predict the future load value sequence based on the historical load value sequence. Hence, this paper models the container load, node load, and timing data in three aspects.

Container Load Model.
Te load of a container is dependent on the management kernel services of the node host operating system, as noted in [23]. Despite load average being a measure of each service's load, there exists no direct measurement method for the container load. However, given that CPU and memory are vital computing resources for containers when providing their services and that these parameters are easily measurable, their average utilization value in unit time t can approximate the container's load value. It is critical to consider real-time load as an essential factor in predicting accuracy. A smaller time unit t results in a more instantaneous load value that can be calculated using the following equation: where load i i denotes the container load value in the i-th unit time, avg( * ) is the mean value operator, used c and used m are the CPU and memory utilization sampled values of the node host in a given segment t, respectively, and ω c and ω m denote the mean CPU and memory utilization parameters in the load model.

Node Load Model.
Shang et al. [24] studied the node load by proposing an improved dynamic load model that selected some characteristic static physical infuences. However, the non-signature and uncertainty of these metrics prohibit refecting the actual load of the resources in a container cloud environment. Terefore, we combine the characteristic dynamic and static factors to model the node hosts' state and actual load availability. Te dynamic factors comprise resource availability time h, resource request r, and resource service intensity q [25,26]. Combining those two factors ensures that the node load values can be obtained quickly and enhances the node's modeling accuracy. Assuming that the container cloud environment is a cluster of n nodes, expressed as node 1 , node 2 , . . . , node n , and each node has m resources, then for node i (1 < i < n), (1) Resource request r is that sum the average number of service requests received by all node. Assume node i received r i requests in per unit time, the resource request of cluster R t is expressed as follows: (2) Resource service intensity q is the ratio of the average time time a of node i to complete a service request to the average time interval time i of the service request. p i is the parallel service capability of node i, and is expressed as equations (3) and (4): and the resource service intensity Q of the node cluster in unit time t expressed as Based on the above, we also consider the static factors. Tus, this paper mainly considers the static factors regarding the average CPU utilization, memory, disk I/O, and network bandwidth. Hence, a dynamic weighting algorithm [27] is used to describe the nodes' load state, with the load of node i per unit time t expressed as where c, m, d, and n denote the CPU, memory, disk I/O, and network bandwidth utilization of node i per unit time t, respectively. ω 1 , ω 2 , ω 3 , ω 4 , ω 5 , and ω 6 represent the proportion of each infuencing factor in the load model, with the parameter values determined by the service type provided by the container.

Temporal Data Model.
Since the loads are correlated and highly self-similar time series, we combine the modeling of a container and the node loads presented in previous subsections, with the load timing data over time expressed as where L n denotes n sequence periods of container cloud environment load data, n is the sequence length, and load i is the load value of the container at the i-th unit time t.

Load Prediction Model and Algorithm.
In this section, we detail the load prediction model. Specifcally, we leverage the temporal load data obtained in the previous section as input to extract sequence features through a convolutional neural network (CNN) prediction model. We further design Continuously occupied/continuously idle High self-similarity Te load curve is self-similar when the Hurst exponent index is high, i.e., similar load curves may occur in diferent time windows.

Periodic event triggers
International Journal of Intelligent Systems a bidirectional gated recurrent network layer BiGRU to continuously capture long-range features. After passing through the BiGRU layer, an attention mechanism layer is implemented to capture interdependencies between longrange features. Finally, a prediction processing algorithm generates load prediction data. We rely on a generic CNN model as the basis of our proposed model framework. However, since the CNN cannot capture long-range features, it is supplemented with a BiGRU recurrent neural network to capture long-range feature sequences. Additionally, we introduce an attention weight layer to enhance the infuence of important information. Te proposed deep learning network preserves long-range feature sequences during the model training process and enables accurate feature capture from the input sequence. Te proposed load prediction model comprises fve parts: input layer, CNN layer, BiGRU layer, attention layer, and output layer. Regarding the model's workfow, frst, the load timing data of the container cloud environment are used as the input. Ten the CNN's convolution and pooling operations extract local features. After that, the BiGRU layer and the attention layer learn the changing pattern of the load timing data from the local features extracted by the CNN layer and predict the changing trend of the future load conditions. Finally, the output layer provides the prediction results.

CNN Layer.
Te CNN layer comprises two convolutional layers, two pooling layers, and one fully connected layer, with both convolutional layers being onedimensional, and the non-linearReLU function is selected as the activation function. Due to the large volatility of the container cloud load time series data and to reduce the model parameters and the risk of overftting, the maximum pooling method is chosen for both pooling layers. After the above two convolution and pooling operations, the original load timing data will be mapped to the feature space of the hidden layer, which will be transformed by adding a fully connected layer to extract the feature vectors. Te Sigmoid function is selected as the activation function in the fully connected layer. Terefore, the output feature vector H c of the CNN layer is expressed as equations (8)-(12): where C 1 and C 2 are the outputs of convolutional layers 1 and 2, respectively, and this output result vector is expressed as C i � [C 1 , C 2 , . . . , C n ], i � 1, 2. P 1 and P 2 are the outputs of pooling layers 1 and 2, expressed as 4 , and b 5 are the deviations, O denotes the convolution operation, max( * ) is the maximum pooling function, and the CNN output layer is n. Finally,

BiGRU
Layer. Te prediction model must capture the data features over a time sequence, and the CNN convolutional neural network has certain defects, i.e., it cannot capture long-range features well. Recurrent Neural Networks combine the input of the current moment with the hidden state of the previous moment by introducing a recurrent connection between the input and the hidden layer, allowing the hidden state to be transmitted continuously in time, which can efectively capture long sequence dependencies [28]. Terefore, we introduce the BiGRU layer (an RNN variant) in the prediction model, which extracts long sequence dependencies from the local feature vectors output from the CNN layer. Figure 3 illustrates the BiGRU layer comprising the forward and inverse GRU networks. Te GRU network mainly comprises update and reset gates, with the former gates controlling the degree of remembering the hidden state and the latter controlling the degree of the hidden state-like information acting on the candidate set. Tese states combined determine the degree of the hidden state acting on the current hidden state of the previous layer, and when both states are zero, it is only related to the input of the current layer.
For the GRU network, assuming that the current moment is t, the model is expressed as equations (13)-(16): where r t is the reset gate at time t, σ is the Sigmoid activation function, ω r , ω z , and ω h denote the weight matrices, · is the dot product, b r , b z , and b h are the bias values, W t is the local feature vector of the input at time t, and z t is the update gate at time t. h t is the candidate set of the current state, and h t is the hidden state at time t, which is also the fnal output vector. Te BiGRU model comprising the GRU network is expressed as equations (17)- (19): where

Attention Layer.
Te attention mechanism assigns diferent weights to model input features, enhancing the impact of important information while avoiding information loss for long sequences. It also allows the model to capture long-range interdependent features in the sequence. In this work, the attention mechanism layer learns features and patterns of BiGRU output data by combining multiple structures and iteratively estimating the optimum weight parameter matrix using the weight assignment principle to calculate probabilities of diferent feature vectors. Figure 4 illustrates that the input of the attention mechanism layer is the output vector H t that the BiGRU layer has activated. Moreover, the probabilities corresponding to diferent feature vectors are computed according to the weight assignment principle, assisting the iteration process to optimize the weight parameter matrix. Assuming that the current moment is t, the weight coefcients of the attention mechanism layer are expressed as equations (20)-(22): where e t denotes the value of the attention probability distribution determined by the output vector h t of the BiGRU layer at moment t. Additionally, ] and ω are the weight coefcients, b is the bias value, and s t denotes the output of the attention layer at moment t.

Load Prediction Processing Algorithm.
Te CNN-BiGRU-Attention model's load prediction algorithm predicts future load data in a container cloud environment through multi-step prediction. Te algorithm is divided into four parts: initializing fag bits for load overfow and rise, calculating load variation sequence, determining load rise degree, and giving fnal prediction based on load rise and fall values. Algorithm1presents the proposed algorithm's pseudocode.

Container Scheduling Strategy.
In this section, we describe the scheduling strategy and design the container scheduling strategy CSSLPM (container scheduling strategy International Journal of Intelligent Systems based on load prediction model) involving coarse-grained (node) and fne-grained (container) scheduling, respectively. CSSLPM is based on the container and node predicted load values, where container-level scheduling includes the container expansion or shrinkage calculation, and node-level scheduling includes migration triggering, container selection, node selection, and container migration.
Based on the CNN-BiGRU-Attention prediction model, we design the CSSLPM architecture illustrated in Figure 5, which is based on our proposed load prediction method and utilizes the output scheduling demand as the input of the container scheduling strategy. Depending on the output, CSSLPM frst determines whether to select a fne-grained or coarse-grained scheduling strategy, then generates scheduling triggers in the strategy group, and fnally the triggers actually do the scheduling work for the container cloud environment. Te more accurate the load prediction, the more CSSLPM can solve the problems faced by the container cloud environment in terms of load balancing and resource optimization.

Scalable Flexible Scheduling Method.
Te scalable fexible scheduling method is a powerful tool for load balancing in container cloud environments. During heavy trafc periods, such as holidays, it can efectively balance the load at the node level based on predicted load values. Elastic scaling ensures that the container cloud remains responsive under sudden increases in load, while also maintaining overall load balance. Te scale-out elastic scheduling method consists of three steps: triggering expansion/downsizing based on current and predicted load values; calculating the appropriate replica count using an aggressive but dynamic weighted scheme to prevent overloading; and scheduling with a cooling period to avoid constant scaling operations triggered by predicted values. Using this method, containers can avoid overload and scaling operation problems while maintaining rapid response times during peak load periods.
Hence, we consider the load forecast value, the current resource load value, and the resource threshold, and the dynamic weighting is expressed as where load p is the load prediction value, load c is the current load value, ω 1 and ω 2 are weights, and f t is the marker value obtained by dynamically weighting the predicted and the current load values. Meanwhile, the trigger strategy of resilient scheduling includes two trigger cases: Input: Prediction list pre , and Load threshold 〈threshold low , threshold high 〉 Output: Predict the outcome of the processing 〈list pre [0], isExceede d, isRose, riseDegree〉 if da ta a grater than da ta b (7) toAdd (downArea, data a − data b ) (8) else (9) toAdd (riseArea, da ta b − da ta a ) (10) end if (11) end for (12) riseDegree (riseArea − downArea)/(list pre [0] * l) (13) checkIsExceeded (〈threshold low , threshold high 〉 , riseDegree) (14) isRose riseArea > downArea ? true: false ALGORITHM 1: Pseudocode of predictive processing algorithm. 10 International Journal of Intelligent Systems (i) Expansion Trigger. If the dynamically weighted tag value exceeds the pre-defned expansion threshold, it suggests that the container replica set is likely to face excessive load in the near future, and expansion should be considered to allocate additional resources. (ii) Reduction Trigger. If the dynamic weighted marker value is lower than the pre-defned shrinkage threshold, it suggests that the container replica set is likely to have excess resources in the near future, and a shrinkage operation should be considered to optimize resource allocation.
To avoid triggering multiple expansion or contraction operations quickly, a resilient scheduling cooling phase is designed to judge whether it is in the cooling phase after each trigger to reduce the load volatility impact. Te corresponding workfow is illustrated in Figure 6.
Te key step in the entire scaling scheduling process is to conduct the replica count calculation process of the containers based on the current and the predicted resource value. Next, we introduce the expansion and scaling replica count calculation schemes to provide accurate data support for fexible scheduling.
(1) Expansion Replica Count Calculation. To avoid underexpansion, we adopt a slightly aggressive strategy in the expansion process, i.e., we select larger predicted and load values for the expansion replica number calculation to reserve sufcient resources for the service. Specifcally, we use the mathematical expectation method to calculate the number of replicas which is expressed as where R cur and R exp denote the number of current and desired replicas, load pre is the load forecast value, load cur is the current load value, and load exp denotes the desired load value.
(2) Reduction Replica Count Calculation. Te scaling operation is afected by load volatility, and to reduce the frequent scaling phenomenon, we adopt a dynamic weighting method to determine the number of replicas [29], i.e., the distance weighting formula dynamically adjusts the weights occupied by the load and the predicted values in calculating the desired number of replicas, minimizing the impact of load volatility. Te distance weighting is expressed as equations (25) and (26): where W pre and W cur are the prediction weight and load weight, and load pro , load cur , and load exp satisfy the conditions load exp > load cur and load exp > load pro . Ten, the formula for the weighted resource values is expressed as Tus, the expected replica number R e xp is expressed as

Integrated Migration Scheduling Method.
We propose an integrated migration scheduling method for maintaining balanced load in container cloud environments with highintensity services. Coarse-grained elastic scaling ensures load balance among nodes under continuous high-load pressure, and it continuously adjusts the load of each node.
Te suggested method has four phases: container migration triggering, container selection, target node selection, and container migration. In the frst phase, we trigger container migration based on the load values of nodes and predicted load values. In the second phase, containers are migrated to achieve dormancy of underloaded and load reduction of overloaded nodes. Te third phase uses a load correlation algorithm to select an optimal set of nodes to prevent redundant work. Finally, in the container migration phase, an online pre-merge-based algorithm improves migration efciency and guarantees success rate.
Tis integrated method maintains load balance during long periods of high load while ensuring efcient container migration.
(1) Container Migration Triggering. Various resource usages, such as CPU and memory, afect a node's load and fuctuate due to the containers' diversity and dynamics. Terefore, the deployed nodes and containers must meet the following requirements in determining the load balancing conditions. Te containers must be unique, i.e., each container can and will only be deployed on top of a particular node in the container cloud environment which is expressed as n D m n � 1, ∀m ∈ N, D m n ∈ 0, 1,  where D m n � 1 means container n is deployed to node m. Otherwise, D m n � 0. Resource fniteness, which refers to the fact that when multiple containers are deployed on the same node, the total amount of resources required by all containers must not exceed the total amount of resources that the node has, is expressed as where r CPU n , r MEM n , r Net n , and r di sk n denote the amount of CPU, memory, network bandwidth, and disk IO resources required by container n and R CPU m , R MEM m , R Net m , and R Disk m denote the total amount of CPU, memory, network bandwidth, and disk IO resources owned by node m. Meanwhile, we add the predicted load data to the load balancing model proposed in [30] to evaluate all nodes in the container cloud environment, i.e., from the evaluation results of all nodes, the load and resource of the container cloud can be expressed as equations (31) and (32): where U R j denotes utilizing resource R in node j, l is the total number of nodes in the container cloud environment, U R avg is the average resource utilization, and F is the load balance of all nodes in the container cloud environment, ranging from (0, 1) to (0, 1). Te larger the value of F, the more unbalanced a load of resources in the container cloud environment, and vice versa.
In the trigger conditions of the container migration, the joint action of various resource conditions causes a node load problem, with the migration triggering factors described as follows. First, the load prediction result determines whether the node exceeds the threshold set. Te node will be directly added to the overloaded node collection if it exceeds the threshold. Otherwise, according to the current calculation of the container cloud environment load, if the load balance is less than the set threshold, the underload sorting of nodes is performed, and the corresponding nodes are added to the underload node set. If the load balance exceeds the threshold set, the overload sorting of the nodes is performed, and the corresponding nodes are added to the overload node set. Finally, when both the underload and the overload node sets are not empty, the container migration schedule is triggered, and vice versa. Te corresponding process is illustrated in Figure 7.
(2) Container Selection. We describe a method for selecting the container to be migrated. During this process, a major concern is selecting the object to be migrated [31]. Tus, after obtaining the set of nodes to be migrated, the corresponding container must be selected from each node for migration. However, the service container state and the dependencies between the containers are also important factors to be considered in the container scheduling process, which efectively reduces the time consumption of the containers by calling each other. Terefore, we adopt a container selection strategy based on the container dependency model [17]. Te container dependency model uses a weighted directed graph to represent the dependency invocation relationship between the containers on a node belonging to the set to be migrated and is expressed as equations (33) and (34): ω ij � 0, no call from container i to container j, k, k calls from container i to container j, where w ij denotes the set of all containers on the node, w ij denotes the dependency relationships between all containers on the node, and w ij is the number of invocations from container i to container j. Tus, after the containers' dependency partitioning, we obtain k disjoint dependency neighborhoods, represented by V i (0 < i < �k). Ten, the synthesis of all container invocations in neighborhood V i is expressed as Similarly, the sum of the number of calls between all containers in neighborhood V i and all containers in the neighborhood V k is expressed as Te container selection strategy is shown as follows: International Journal of Intelligent Systems (i) Node Underload. For underloaded nodes with small containers, their resource utilization is low. To optimize resource allocation, these containers should be migrated separately to other nodes that are not overloaded or underloaded. Tis will facilitate subsequent migration and reallocation of resources, ultimately allowing underloaded hosts to hibernate, resulting in container consolidation, improved resource utilization, and reduced energy consumption. (ii) Node Overload. To reduce node load on overloaded nodes, containers must be fltered and selected for container scheduling. Tis involves dividing each overloaded node into dependency neighborhoods using the container dependency model. Next, several dependency neighborhoods with the smallest sum gain(V i , V k ) are chosen from each node's dependency neighborhoods. Finally, the containers from these chosen neighborhoods are added to the set of containers to be migrated in the next step of container migration scheduling.
(3) Target Node Selection. Te traditional target node selection algorithm simply considers the relationship between containers in the process of container migration, and there is a risk that container migration causes the target node to be overloaded to trigger the migration work several times. Terefore, we propose a target node selection algorithm based on load correlation (NSALC), in which the Pearson correlation coefcient method is introduced as the main method to calculate the load correlation between containers and nodes. In the process of container migration target node selection, the constraint of load correlation is added, which can efectively solve the problem of migration leading to node overload and thus frequent triggering of migration. Terefore, the resource is expressed as where res m C a |m � 1, 2, 3, . . . , n denotes the load history timing data of container C a in n time windows, res C a is the average value of load history timing data of container C a , res m N k |m � 1, 2, 3, . . . , n denotes the load history timing data of node N k in n time windows, and res N k denotes the average value of load history timing data of container N k .
Te load correlation between container C a and node N k is expressed as where ω is the weight, which is used to set the node selection threshold, and Cor (C a ,N k ) ∈ (0, 1).Te load correlation Col r between the set of containers to be migrated and node N k for a set of n containers is expressed as A node is selected when the load correlation between the set of containers to be migrated and the node exceeds Cor (Col r ,N k ) . In summary, the pseudocode of the load correlation-based node selection algorithm (NSALC) is presented in Algorithm 2.
(4) Container Migration Phase. Next, we describe the working process of container migration, comprising three parts. First, a full check of the containers to be migrated on the source node is performed before container migration, and the memory image fle of the containers before migration is obtained. Ten, incremental checks are continuously performed on the containers to be migrated during the container migration process, and the memory image fle of the container at migration time is obtained. Finally, the two memory image fles are pre-merged and transferred to the target node, and the containers to be migrated to the source node are stopped by restoring the pre-migration state of the containers according to the memory image fles.
Te main factors afecting a container's online migration downtime include the checkpointing, transfer, and recovery phases. Tus, reducing the time consumed by these three phases is the key to reducing downtime. Hence, this paper adds a fast memory synchronization optimization design to the traditional migration method. Before restoring the container to normal operation in the recovery phase, the memory image fles are pre-merged, i.e., the memory image fles received on the target node are pre-merged, and the fnal memory image fles are generated. In short, in the subsequent recovery phase, the recovery process of the container can be completed only by obtaining the fnal generated merged image fle. Additionally, the process of merging memory image fles and reducing downtime can be conducted through the transmission process.
For the containers, the generated memory image fles can be roughly divided into two categories: pages.img and pagemap.img fles. Te former fle type is mainly used to store the specifc content of memory pages, and the latter type is mainly used to store the memory mapping relationship. pagemap1.img fle in Figure 8 indicates that the frst four memory pages are read from pages1.img fle and placed at address 0x1000000. When an incremental checkpoint is executed, a fag bit in_parent is added to the pagemap.img fle which indicates that the identifed memory page was read from the previous checkpoint.
Te role of the state check is to dump the process state information of the container to be migrated into an image fle, either fully or incrementally. Tis operation requires using a call function to the ptrace interface to inject parasite code into the container's processes and enable the collection of memory data for the container's processes. Additionally, this operation relies on the /proc fle system, with the image fle mainly including fle description information, process parameter information, and memory mapping information. Te specifc fow of the container state detection point is as follows: (i) Recursively traversing /proc/$ pid/task/ and /proc/$ pid/task/$ tid/children based on the container process pid to collect information about the process tree constructed from the container process and its child processes. (ii) Freezing the process tree in step 1 by calling the PTRACE_SEIZE command of the ptrace interface. (iii) Collecting fle descriptions, process parameters, memory maps, and other information about the container process and its child processes and writing them to the corresponding memory image fles. (iv) Dynamically injecting the parasite code in PIE format into the container and child processes. When the mmap operation is invoked by the container process and its child processes, the parasite code will be invoked into the corresponding memory address space and the memory changes will be recorded. (v) Executing the rt_sigreturn() system call by calling the ptrace interface to clean up the parasite code injected in the appeal step.
Te pseudocode of the pre-copy-based container online migration algorithm is presented in Algorithm 3.

Experimental Analysis
Tis section presents the experimental results evaluating a container scheduling strategy based on the CNN-BiGRU-Attention model and using the CSSLPM policy. First, we describe the experimental setup, present the actual results, and then analytically discuss the fndings.
We utilize CloudSim, a popular simulation software program for evaluating virtual technology-based scheduling algorithms, to validate the efectiveness and efciency of the Input: Container collection to be relocated Col r Node Collection Col N Output: Target Node Collection Col N r (1) for each Col i r in Col r (2) for each N k in Col N (3) for each C a in Col i r (4) Cor (Col r ,N k ) Cor (Col r ,N k ) + calculate Cor (C a ,N k ) (5) end for (6) Cor (Col r ,N k ) Cor (Col r ,N k ) /n (7) if Cor (Col r ,N k ) > 0.6 (8) Col N r r N k (9) end if (10) sort Col N k r by Cor (Col r ,N k ) (11) Col N r max (Col N r r ) (12) end for (13) end for ALGORITHM 2: Pseudocode of NSALC. International Journal of Intelligent Systems container scheduling strategy (CSSLPM) that is based on a load prediction model. In addition, we compare CSSLPM with classic container scheduling strategies such as FFHS [32], MOPSO [33], and Spread [34], which are used as a control group.

Research
Questions. Te following research questions are answered to verify the accuracy of the CNN-BiGRU-Attention model and the efectiveness of the CSSLPM.

RQ1.
Is the CNN-BiGRU-Attention model signifcantly superior to other load prediction models, and is there hyperparameterization in the CNN-BiGRU-Attention model? RQ2. Is CSSLPM efcient in scheduling containers based on load prediction? Does it ensure better container cloud resource utilization than other scheduling strategies? RQ3. Does the container scheduling strategy based on the CNN-BiGRU-Attention prediction model still have better prediction results on real-running software systems? Is it able to maintain container cloud load balancing?

Experimental Datasets.
We employ the Alibaba public dataset cluster-trace-v2018, with further information listed in Table4. Specifcally, the cluster-trace-v2018 dataset was produced by tracking operational data from approximately 4000 machines over eight days. It has a larger scaler than the previous v2017 version and contains DAG information for production batch workloads. Moreover, we use TrainTicket, a benchmark system developed by Professor Xin's team at Fudan University, to validate our scheduling strategy. Tis microservice-based system refects a real train ticket ordering system in a production environment and consists of 41 microservices, including 24 business logic services and 17 infrastructure services. Te system can be run in a small container cloud environment. In addition, 22 typical industrial microservice failure cases are replicated in the TrainTicket system and can be verifed directly without the need for injection in this paper.

Operating Environment Confguration.
Te corresponding training and simulation experiments were conducted for the designed load prediction and container scheduling techniques. In particular, the hardware device details are listed in Table 5, and the utilized software and version information is presented in Table 6.

CNN-BiGRU-Attention Model Prediction Accuracy
Test. Tis section aims to validate the prediction accuracy of the CNN-BiGRU-Attention model and compare it with similar models using a cross-validation process. Te process includes optimizing the model hyperparameters using a multi-layer iterative search network implementation. Te optimized model is then used for comparative experiments against competitor models under the same load prediction conditions. Te results are analyzed to draw clear conclusions and answer RQ1.
(1) Analysis of Model Hyperparameter Selection. Several key parameters in neural network-based prediction algorithms can signifcantly impact prediction accuracy. Tese parameters generally include the length of each input load, the number of hidden layers, and the number of neurons per Input: to be migrated node Pod x , Target node Pod y , Global Checkpoint Ω c , Incremental checkpoints Δ c , Iteration number N, Treshold L Output: Target node Pod y (1) ImitializenCheak Ω c � Image (Pod x and container) (2) Transfer (Pod y , Ω c ) (3) for each i in N (4) Δ c � Image (Pod x and runningcontainer) (5) if (Δ c > L) (6) break/n (7) Transfer (Pod y , Δ c ) (8) end for

Name
Description machine_meta Meta and event information for the machine machine_usage Resource usage per machine container_meta Te container's meta and event information container_usage Resource usage per container batch_instance Instance information in batch workloads batch_task Instance information in batch workloads hidden layer. In this experiment, the length of each input load data and the number of hidden layers are set. Additionally, the number of neurons in each hidden layer was preserved to reduce the difculty of this experiment. Finally, the maximum number of iterations during training was set to 240, the tensor dimension in the attention mechanism layer was set to 64, and the error was appropriately set due to hardware considerations, affecting the prediction accuracy. Terefore, we use a cross-validation method to obtain the optimum parameters based on a multi-layer iterative search network implementation. Te best parameters are also based on the model parameters, prediction accuracy, and model training time. In this paper, the best parameter confgurations are obtained by ranking parameter setups in descending order according to their prediction accuracy. Te experimental results are illustrated in Figures 9 and 10, where the horizontal coordinates indicate the parameter combinations. For example, 32-2-50 means that the length of each input load datais 32 bytes, the number of hidden layers is 2, and the number of neurons in each hidden layer is 50.
From the above experimental results, we observe that the training time used by the CNN-BiGRU-Attention model is longer when the data length of each input load is 16, the number of hidden layers is 3, and the number of neurons in each hidden layer is 40. However, in this case, the pre-error is relatively low. In this paper, we consider that this is the best-performing experimental model. Hence, the load versus the predicted values using this parameter combination are depicted in Figure 11, where the solid red curve indicates the actual load of the node, and the blue dotted curve indicates the predicted load value of the CNN-BiGRU-Attention model. Te results infer that the CNN-BiGRU-Attention model sufers from hyperparameterization, answering RQ1. Te test results show diferences between the predicted and load value curves, which do not ft well, but the curve trends and magnitudes are the same, and a better load prediction is possible.
(2) Comparison of Experimental Results. Tis study compares the load prediction error of the CNN-BiGRU-Attention model to that of ARIMA, DBN, and CNN-LSTM models under the same experimental conditions. Load series of 5, 10, 15, and 20 minutes are used as input data, and the MAE, MAPE, and RMSE metrics are    Figure 12 illustrates the experimental results using MAE, where the horizontal coordinate is the length of the time series, and the vertical coordinate is the absolute mean error value of the load prediction values. Figure 12 highlights that the average absolute prediction error of the container and node loads using the competitor models increases the length of the load time series. However, the load prediction algorithms based on the CNN-BiGRU-Attention model have smaller average absolute errors for each length of the load sequence, indicating that the overall diference between the predicted and load values is small.
Te evaluation results of this experiment using MAPE are shown in Figure 13, where the horizontal coordinate is the length of the time series, and the vertical coordinate is the average absolute percentage error value of the load prediction values. Te fgure reveals that the average absolute percentage prediction error of the container and node loads using the competitor models increases as the length of the load time series increases. Nevertheless, the load prediction algorithms based on the CNN-BiGRU-Attention model have smaller average absolute percentage errors for all lengths of load sequences, suggesting that the error values have a smaller ratio to the load values and the errors are relatively low.
Te results of the evaluation of this experiment using RMSE are shown in Figure 14, where the horizontal coordinate is the length of the time series, and the vertical coordinate is the root mean square error value of the load predictions. As seen from the fgure, the RMSE of the predictions of both container load and node load by various models tends to increase as the length of the load time series increases. In comparison, the load prediction algorithms based on the CNN-BiGRU-Attention model have smaller root mean square errors for each length of the load sequence, which indicates that the overall diference between the error values and the load values is small.

CSSLPM Container Scheduling Strategy Validation.
In this section, CloudSim, a cloud computing simulation software program widely used in academia to test and evaluate scheduling algorithms based on virtual technologies, is used to answer RQ2, i.e., validate the efectiveness and scheduling efciency of the container scheduling strategy (CSSLPM) based on the load prediction model. First, this paper extends various simulation interfaces and builds a container model, enabling its cloud data center to support the container-based scheduling simulation process. Second, to simulate the heterogeneous nature of cloud data centers, this paper sets up various server types for conducting this experiment. Te start/stop latency of the containers and virtual machines is also set at the second and minute levels. Finally, the current scheduling strategies commonly used in industry and academia are compared with the CSSLPM scheduling strategy. Additionally, fve diferent types of multiple servers are set up to simulate the heterogeneity of cloud servers to run container clusters. Te confguration information of each server is listed in Table 7.
First, based on the cluster size, this experiment sets up three diferent sizes of container clusters for testing, i.e., ten- volume, hundred-volume, and thousand-volume covering small-, medium-, and large-scale application scenarios. Te container cluster confguration information is presented in Table 8. Second, for the virtual machines (nodes) and containers used in this experiment, four diferent types of virtual machines (nodes) and containers are set up, with the specifc confguration information reported in Tables 9 and 10 same experimental environment, all three sets of experiments included in this experiment are executed 30 times, and the model uses the optimal combination of parameters. Te experimental results include resource usage in the cluster, the average number of containers migrated in 5 minutes, and the average number of nodes created.
(1) Use of Resources. Resource usage's main concern for the same container cluster is achieving relative load balancing on heterogeneous servers. In this experiment, multiple servers of fve diferent types are set up. Terefore, the resources of the same type of servers are calculated together to observe the execution of each scheduling strategy. Te results are illustrated in Figures 15(a)-15(c).
Te test results reveal that the total server load in the FFHS, Spread, and MOPSO scheduling strategy environments is signifcantly higher than the total load in the CSSLPM environment for each level of container cluster of this experiment.
(2) Container Scheduling Situation. For the same container cluster, container scheduling focuses on the number of containers scheduled and the number of nodes created within the cluster during the observed time. It further considers the efciency of the scheduling strategy in achieving relative load balancing of the cluster during the observed time. Te results are shown in Figures 16(a) and 16(b).
From the side-by-side comparison of the above test results, it can be seen that CSSLPM is 4.6%-25.8% more efcient than FFHS, Spread, and MOPSO in terms of the number of containers scheduled and nodes created, respectively, in the same cluster environment. Terefore, compared with FFHS, Spread, and MOPSO scheduling strategies, the CSSLPM designed in this paper has better performance regarding load resource usage and container scheduling. Tis also shows resource fragmentation and uneven load across servers in the FFHS, Spread, and MOPSO scheduling environments. Tere are two main reasons for this: frst, the three scheduling strategies do not consider the interdependencies between the containers. Tis may cause containers with interdependencies to be migrated to diferent servers, resulting in cross-server scheduling of service requests and increasing the load on the servers. Second, these three scheduling strategies do not consider the fne-grained scheduling of resources, resulting in fragmented server resources not being utilized efectively. In contrast, CSSLPM combines coarse and fne granularity to achieve container scheduling. On the one hand, container migration-based scheduling uses nodes as the scheduling unit, solving the load imbalance problem from a coarsegrained perspective, and, during the process, aims to migrate containers with interdependent relationships to the same server, thus reducing the frequency of cross-server scheduling of service calls. On the other hand, the scheduling based on container elasticity scaling uses containers as the scheduling unit. Tis solves the problem of server resource fragmentation from a fne-grained perspective and further improves the utilization of cluster resources.  We frst conduct a runtime load monitoring process based on the container cluster supporting the TrainTicket ticketing system. Furthermore, we obtain the load data of each node from this process (part of the load data is reported in Table 11), which is used to validate the load prediction algorithm and container scheduling strategy.

Case System
Based on the above load data, these data are input into the CNN-BiGRU-Attention model and the control group model for load prediction. Te parameters of the CNN-BiGRU-Attention model are set as follows: the length of each input load data is 16, the number of hidden layers is 3, and the number of neurons in each hidden layer is 40. Figure 17 Table 7: Server confguration.

Number
Cores MIPS/core Memory (GB) Bandwidth (Gbps) Storage (TB)  T1  44  2200  128  10  2  T2  16  2200  64  10  2  T3  36  2300  24  10  2  T4  36  2300  64  10  2  T5 16 2200 24 10 2    International Journal of Intelligent Systems shows the prediction results of some of the intercepted load data. Te solid black line represents the real load data, the red dashed line is the prediction value of the method designed in this paper, and the rest are the prediction values of the control model.
As seen in Figure 17, both the CNN-BiGRU-Attention model and the control model can predict the load in the setting of this experimental setup, and the trend of each predicted value is the same. However, the CNN-BiGRU-Attention model afords the best ft, although it does not achieve a perfect ft in terms of prediction results. In order to verify the accuracy of each load prediction model more intuitively, we employ the MAE, MAPE, and RMSE prediction error metrics for each prediction model and node load. Te corresponding results are reported in Table 12.
In addition, based on the above prediction results, the efectiveness of the CSSLPM policy is further validated. A fve-virtual machine environment is created in the server, and a multi-node container cluster is built in this environment to support the operation of the TrainTicket passenger ticket system. To analyze the experimental results, we utilize the results from the experiment (2). In particular, Figure 18 shows the resource usage of each VM, and Figure 19 compares the container scheduling results.
For the same experimental setup environment, Figure 18 highlights that the resource load profle of each VM under the designed CSSLPM policy is better than the load profle under the control group, where the VMs have the same attributes, and there is no resource heterogeneity. As seen in Figure 19, the number of scaling and migrations is lower for the designed CSSLPM policy compared to the control group, and therefore, the proposed CSSLPM achieves better load balancing and resource optimization with less scaling and fewer migrations.

Internal Treats.
Te internal threat refers to the internal threat from the proposed method that limits the efectiveness of the proposed method. Te threat to the efectiveness of the CSSLPM scheduling policy comes from the accuracy of the CNN-BiGRU-Attention load prediction model, which is limited by two main factors. Te frst factor is that the load model established in this paper does not match the load of real scenarios totally, i.e., there is a problem of container cloud load that cannot be described, and this paper adopts the idea of approximation instead of  International Journal of Intelligent Systems load modeling, i.e., the average value of CPU and memory utilization per unit time is used to approximate the load value of containers, which can efectively reduce the realtime load data processing delay but inevitably leads to the degradation of prediction accuracy. Te second factor is the range of model hyperparameters. Te second factor is that the range of model hyperparameters does not completely cover the whole parameter domain. Since it is a timeconsuming task to compare the model training efect of each parameter combination, the hyperparameters of the model are selected from a limited range of parameter combinations, and the existence of globally optimal hyperparameters outside this range may improve the accuracy of the prediction model.   system TrainTicket. Although our work has a more comprehensive experimental design, it lacks consideration of the real industrial environment such as the heterogeneity in the container cloud environment is not only composed of several servers and the user access pressure of the real ticketing system is not really simulated. We must admit that these diferences between the experimental environment and the real scenario do threaten the efectiveness of the CSSLPM.

Conclusion
Tis paper proposes a load prediction method and container scheduling strategy for container cloud environments. Te proposed method uses a CNN-BiGRU-Attention model to improve the accuracy of load prediction and achieve load balancing and resource optimization. However, there are some limitations, such as inaccurate load modeling and insufcient accuracy in fexible scheduling, which need to be addressed in future research. Additionally, real industrial environment validation is needed since the proposed method has only been experimentally validated in simulation platforms.

Data Availability
Te data used to support the fndings of this study have been deposited in the Alibaba Cluster Data repository (https://github. com/alibaba/clusterdata/blob/master/cluster-trace-v2018/trace_ 2018.md).

Conflicts of Interest
Te authors declare that they have no conficts of interest.

24
International Journal of Intelligent Systems