Head Node Selection Algorithm in Cloud Computing Data Center

Cloud computing provides multiple services such as computational services, data processing, and resource sharing through multiple nodes. (ese nodes collaborate for all prementioned services in the data center through the head/leader node. (is head node is responsible for reliability, higher performance, latency, and deadlock handling and enables the user to access cost-effective computational services. However, the optimal head nodes’ selection is a challenging problem due to consideration of resources such as memory, CPU-MIPS, and bandwidth. (e existing methods are monolithic, as they select the head nodes without taking the resources of the nodes. Still, there is a need for the candidate node which can be selected as a head node in case of head node failure. (erefore, in this paper, we proposed a technique, i.e., Head Node Selection Algorithm (HNSA), for optimal head node selection from the data center, which is based on the genetic algorithm (GA). In our proposed method, there are three modules, i.e., initial population generation, head node selection, and candidate node selection. In the first module, we generate the initial population by randomly mapping the task on different servers using a scheduling algorithm. After that, we compute the overall cost and the cost of each node based on resources. In the second module, the best optimal nodes are selected as a head node by applying the genetic operations such as crossover, mutation, and fitness function by considering the available resources. In the selected optimal nodes, one node is chosen as a head node and the other is considered as a candidate node. In the thirdmodule, the candidate node becomes the head node in the case of head node failure. (e proposed method HNSA is compared against the state-of-the-art algorithms such as Bees Life Algorithm (BLA) and Heterogeneous Earliest Finished Time (HEFT).(e simulation analysis shows that the proposed HNSA technique performs better in terms of execution time, memory utilization, service level sgreement (SLA) violation, and energy consumption.


Introduction
e recent advancements in Internet technologies and cloud computing have transformed the way data is processed. Cloud computing provides an environment including different resources, for example, memory, storage, and processing cores, for hosting applications and processing data according to the user requirement. It offers users flexible, pay-per-use, and on-demand scalable services hosted at a remote data center which can be accessed from anywhere and at anytime. e data center is comprised of a large number of networked servers and nodes that work together in a cluster to share resources. Cloud computing systems rely heavily on nodes coordination for the parallel execution of the tasks. A cluster leader node manages the task of communication and synchronization among the other nodes. Consequently, the selection of an optimal leader node is an ultimate requirement for preventing the network from becoming unstable. e leader election to choose the best node in a data center is a challenging problem because leaders are responsible for managing segregated data, sharing resources among nodes, and overcoming latency. Hence, communication between nodes is impossible without the leader/master nodes, so leader selection must be carefully considered. As the nodes are directly linked to the leader, the leader ensures that there will never be a deadlock among the network nodes and that tasks will be processed efficiently.
For an effective leader election process in distributed networks and cloud computing environments, the research community has proposed several protocols [1][2][3][4][5][6][7][8] and algorithms (Biswas et al.) [9][10][11][12][13][14][15][16]. A bully algorithm is proposed in [17][18][19][20] for leader election which selects the leader nodes dynamically based on node ID criteria. In [9,21,22], the authors proposed a ring algorithm, in which each node shares its ID with all other nodes and keeps track of them in a database. Based on priority, the algorithm selects one node as a leader from this list. In [11,[23][24][25], node IDs are generated at random, and each node is assigned a priority number. After that, the node with the highest priority number is selected as the leader node. For leader election in [26][27][28], a message-passing strategy has been proposed. e comparison of these recent techniques that are based on ID and are randomly generated is shown in Table 1. However, these recent techniques failed to consider the maximum resources of the hosts due to increasing the complexity. ese approaches introduce latency because of the higher message passing rate and slow response of the nodes. Some recent works related to leader election in IoTare presented in [29][30][31]. e major drawback of the above-mentioned methods is that the node's resource profile is never considered. As a result, selecting the weaker leader node becomes equally likely. In high-load situations, the weak leader crashes, necessitating reelection of the leader, which slows down overall processing and delays task execution across network nodes. As a result, node resources must be considered during the leader election phase to ensure that task processing occurs efficiently and the network is stable.
In this paper, the Head Node Selection Algorithm (HNSA) is proposed to select a leader node in the cloud data center. e HNSA is based on a genetic algorithm (GA). e proposed HNSA selects the node as a head that has more resources than other nodes. e proposed algorithm selects the head node to ensure effective job execution, resource sharing, and communication among the nodes without any delay and in effective manner. Our proposed method gets the metainformation about jobs, VMs, and available hosts in the data centers and computes the fitness value of different hosts that are based on available host resources such as CPU-MIPS, throughput, bandwidth, and RAM. Using the metainformation, generate the initial population of chromosomes for the operation of GA. e chromosome comprises multiple jobs on different hosts, and it is taken as a gene. e chromosome length depends on the total number of jobs that are submitted in the cluster of data centers. To select the head node from each cluster, we locate that host which has the maximum fittest value against the fitness function. is fitness value is calculated after performing the GA operations, that is, crossover and mutation. In our proposed method, the crossover scheme used the swapping technique among the jobs on hosts, whereas, in mutation swapping, the one individual means to swap the job on the host. e fitness value depends on resources like CPU-MIPS, RAM, bandwidth, and throughput; these resources are the factors of our fitness function. is proposed approach is significant to utilize in a unified way to select the head node efficiently. It successfully selects the head node in the cluster for efficient job execution and reduces any delay among the nodes. e proposed technique also considers the sudden failure of the head node. When the head node is failed due to some reason like the host overloaded within the time interval, we select the candidate node as a head node for smooth communication and execution of jobs. So, our approach's performance is efficient without suffering from any vulnerability like communication delay or failure of the host. To justify and validate the impact of our approach, we perform some experiment and compare our approach against some state-of-the-art algorithms. e following are the main contributions of this research: (i) A novel method is proposed by using a genetic algorithm-(GA-) based Head Node Selection Algorithm (HNSA) for efficient data processing in the data center of cloud computing (ii) e proposed model provides efficient data processing and resource sharing due to optimal head node and candidate node selection through the HNSA (iii) Our proposed technique is efficient in SLA violation, execution time, energy consumption, and memory utilization as compared to state-of-the-art approaches e remaining part of the paper is structured as follows: In Section 2, the state of the art in head node selection is presented. In Section 3, the proposed GA-based head node selection methodology is explained. In Section 4, an experiment is conducted and the outcomes are presented; and, finally, we conclude our work in Section 5.

Related Work
In this section, we have reviewed the existing techniques related to frequent leader/head node selection. Ktari et al. [40] presented an agent-based selection approach employing dynamic trees. e main concern of the authors was to Paper Random ID Priority-based Highest ID Resources [32] No No Yes No [33] No Yes No No [9] No No No Yes [10] No No No Yes [11] No Yes No No [34] Yes No No No [35] No No Yes No [20] No No Yes No [22] No No Yes No [23] No Yes Yes No [36] No Yes Yes No [24] No Yes Yes No [37] No Yes Yes No [38] Yes maintain a forest tree in which the root node was selected as the leader which was determined by the highest ID value. In the process of leader election, the ID value was created randomly without taking into account any other resources. Similarly, in [41], a method for leader selection based on the probabilistic investigation of traffic lights was presented. In [42], an approach, namely, old algorithm of the ring, was utilized, which employed the unidirectional interface to join all hubs or nodes.
is technique was concerned to ensure that a job was successfully running on all nodes. In this paper, the system worked by randomly producing a distinctive priority number and a node was selected as the leader that had the highest priority. e failure of the leader hub caused the entire procedure to be performed again to find out the new leader. e leader hub selection approach needed about 2(n − 1) messages, communicated all over the network in which initially (n − 1) messages were transmitted to initiate the leader selection procedure. Finally, again (n − 1) messages were passed to nominate the novel leader. Later on, in [43], EffatParvar et al. had tried to enhance the traditional ring algorithm mentioned in [42] by considering the requirement relations together with the verification of numerous distributed protocols employed for leader nomination. e approach in [43] utilized the Temporal Ordering Specification Language (TOSL) along with Analysis of Distributed Processes (ADP).
Several changes were introduced in the existing approach, that is, introduction of the bully algorithm [44], which was based on a tree structure and performed the comparison message complexity. Similarly, in [26], an extended form of bully leader selection method was presented in the environment of cloud computing in which the leader node selection was based on a Super Node (SN) [45] improving the leader nomination speed and message complexity was minimized to O(n2/k) to O(n2), in which k was showing the total nodes, while k was presenting the district. Moreover, the approach in [43] had the leader election complexity of O(k2n2), where n was presenting the total nodes and k was an upper bound value on the multiplicity of the labels. Similarly, in [46], a methodology for nominating the leader node was presented. e major motivation of this work was the selection of a single leader node; therefore, the failure of the leader caused the entire network to fail, which resulted in restarting the entire procedure. e approach in [47] had the election complexity of O(logn) and the message complexity in this [24] is O(k2n2).
To deal with the problem of single leader election, two approaches have been presented in [9], which worked with nominating multiple leaders to avoid the risk of communication delays and minimized the latency rate. ree empirical techniques were employed in a cohesive way for calculating the proper status of leaders in a polynomial period. Another model for leader node selection was introduced in [48] based on employing the probabilistic grounded framework. e work had presented the improvements on the energy consumption and the consistency problem of channel communication. In literature, few approaches have been concerned with the leader selection in a peer-to-peer and distributed network [49,50]. e method in [49] proposed an approach for the peer-topeer network, while for distributed environment a technique was introduced in [51] employing software mediators to improve the speed of leader election procedure and minimize the energy consumption. Moreover, some protocols for fault-tolerance-based leader election in asynchronous distributed systems were presented in [52]. In [22,53], two new mobility conscious methods for leader election were presented in the environment of ad hoc network systems. ese techniques ensured that each connected node in the topology had been associated with a single leader. e approaches in [22,[54][55][56] have highly relied on a temporarily ordered routing technique called TORA.
After performing the analysis of existing techniques, we have analyzed that most of techniques in literature for selecting a leader are highly dependent on a priority number or on a unique identifier that is usually randomly generated by a framework. ese approaches do not take into account the network topology and availability of real-time resources before performing the leader node selection procedure [7,[57][58][59][60]. Moreover, with the increase in the complexity of the cloud computing environment, the leader election task becomes more challenging. To deal with the mentioned problem, several techniques have been presented; however, still these frameworks do not take into account all resources because of increased computational cost. We are presenting a novel approach for the leader/head node selection which is based on the idea of GA.

Proposed Research Model
e presented work comprises three main modules. e first is "initial population generation," the second is "head node selection," and the third is "candidate node selection" for efficient network performance. Consider a data center D c with multiple servers called hosts H � h 1 , h 2 , h 3 , . . . , h i that are grouped in the form of C � C 1 , C 2 , C 3 , . . . , C y and each host h i has R m � R hi.1 , R hi.2 , R hi.3 , . . . , R hi.m resources, for example, bandwidth, CPU, throughput, and memory. In the data center, jobs like J � j 0 , j 1 , j 2 , . . . , j k1 , j k that contain multiple tasks to each cluster C y are submitted. ese jobs or tasks are randomly mapped on different hosts H i that can make chromosomes sets like . , h i j k and h i j k represents genes in a chromosome. Apply the genetic operations, that is, crossover and mutation, on the chromosomes that are generated by using the initial population P � p 1 , p 2 , p 3 , . . . , p q . Afterward, the fitness function is applied against each chromosome for evaluation. Correspondingly, the head nodes are selected from the evaluated chromosome with the highest value. e responsibility of head nodes is to allocate the jobs to each node in the cluster to effectively improve the efficiency load balancing of the network. e structural design of the HNSA is presented in Figure 1.

Initial Population Generation.
In our proposed algorithm, generate the initial population by mapping the jobs on different hosts h i by using the scheduling algorithm.
ere are many ways to schedule the jobs between the intermediate nodes. In the proposed method, initial population formation is scheduled as follows: j 1 on h 2 , j 2 on h 1 , j 3 onh i , and j 4 on h 3 , as illustrated in Figure 2. ere are several numbers of chromosomes in each set of the population represented as P q � X 1 , X 2 , X 3 , ..., X z . Each set of chromosomes contains multiple genes, and the genes represent map jobs on different hosts. For example, the chromosome X 1 is represented as

Head Node Selection.
In this phase, we perform some genetic operations like crossover and mutation. In the crossover procedure, select two random population individuals called chromosomes. ere are multiple crossover strategies; we use two cut-point techniques because we select two random points for cut by using the following equation: where the random number is generated by using rand(2)(k1) + 1, for selecting the two cut-point positions β 1 and β 2 . ese two cut-point positions are selected in two sets of chromosomes which have the highest fitness values. When selecting two random cut points β 1 and β 2 , perform the crossover operation. e crossover operation is performed from two new individuals that are called offspring in the population, as shown in Figure 3. In this way, better offspring are generated for the next population, and repeat this process till the best solution is achieved.
After the crossover, we perform the mutation process on the offspring that comes from the crossover operation. is mutation process is unary that applies to the single feature of offspring. During the mutation, process deviations are very minimal because generally a very small value is selected. In our method, the substitution process is used in mutation. So, we select any random position from the offspring and substitute this value as shown in Figure 4. Now, we evaluate the excellence of our proposed method by using the fitness function (F c ). In F c , we calculate the fittest value of all solutions through the following equation: where f h is the fitness of each host which is calculated by using the following equation: where the system parameters like CPU-MIPS, memory, bandwidth, and throughput are considered. e values of these parameters are altered by considering SLA (Service    Level Agreement) that works on the base of knapsack algorithm [61], shown in Table 2. e parameters with higher values of weight will impact the fitness value more as compared to the parameters with fewer values. In our proposed method, we used tournament selection, selecting the individual or the host which has the highest fitness value that is calculated by using the fitness function from a randomly generated population, which will be selected as a head node and candidate node. e objective of our proposed method is to choose the leader node or head node with the higher (α) resources.

Selection of Candidate Node.
After using GA operation, we will select a head node in each cluster. In this phase, choose a candidate node as a head node from each cluster C in a data center. e reason for choosing the candidate node as a head is because the head node is failed before the time interval and then the candidate becomes the head node. e candidate node is the backup of the head node and takes all responsibilities of the head node. e proposed technique will return two optimal hosts from each cluster, and we will choose one as a head node and the other as a candidate node. After each time interval, our system will run to optimize the head node and candidate node (Algorithm 1).

Results and Discussion
A few experiments are performed to examine the performance and efficiency of the proposed method by comparing it against some state-of-the-art algorithms. e details of results and discussion are mentioned in the following subsections.

Experimental Setup.
To evaluate the efficiency and performance of the proposed method, an extensive set of experiments are performed by using CloudSim Plus [20] simulator. e CloudSim simulator is based on CloudSim [21]; it is a framework for designing and simulating the data centers. CloudSim Plus simulates the more realistic scenarios because it is an extension of CloudSim. For experimental simulation, numerous sets of heterogeneous hosts/servers (MIPS range: 1000-4000) are used. e processor AMD-Ryzen 5 2500 is used for processing and the Python library is used to create the graph. e parameters execution time, energy consumption, SLA violation, and memory utilization are used to evaluate the performance of the proposed method. e time in which the host finishes the job is called execution time. Energy consumption is how much energy is utilized during the job execution. e memory utilization is how much MIPS are allocated to the host during the interval of time. When the job is not fully executed in the given time frame, the SLA violation occurs. To evaluate and analyze the efficiency of the proposed method, the graphs are generated by using input metrics as used in [62].

Experimental Analysis.
In this section, we describe the particular facts about how to experiment. Experiments were used to equate the performance efficiency of the proposed method with some state-of-the-art algorithms. In all experiments, system parameters like RAM, CPU-MIPS, throughput, and bandwidth were used. e system parameter weightage is given according to the SLA, means the user's requirements, as shown in Table 3.
e following experiments are performed as explained below. Figure 3: Crossover.  In the first conducted experiment, the proposed method (HNSA) is compared with some state-of-the-art algorithms in terms of effectiveness. e performance evaluation is based on parameters like execution time, energy consumption, memory utilization, and SLA violation. In this experiment setup, the performance is evaluated by increasing the number of hosts/ servers. However, a fixed number of tasks are used in the whole scenario. e host range is 20-100 along with the increment of 20, and the range of the tasks is fixed to 1000. In terms of execution time, the performance of the proposed HNSA is more efficient than those of state-of-the-art algorithms as shown in Figure 5. During the performance analysis, the proposed HNSA takes minimum time to completely execute a job as compared to state-of-the-art algorithms like BLA and HEFT. Figures 6 and 7 show that the proposed algorithm also performs better in terms of SLA violation and memory utilization. e utilization of memory is improved because the number of hosts is increased and the number of tasks is fixed. e first conducted experiment's results validate that the proposed HNSA had a better performance and improved the efficiency in terms of execution time, memory utilization, energy consumption, and SLA violation.

Evaluation through the Task/Job
Increment. e second conducted experiment was to calculate the performance efficiency of the proposed method with the same parameters. In this experiment, the number of tasks is incrementing; however, a fixed number of hosts is used. e tasks' range is from 200 to 20,000 with an increment of 200 and the number of hosts is 50, which is fixed during the whole simulation. Figures 8 and 9 show that the performance of the proposed method is efficient in terms of SLA violation and execution time. In this, the experimental results validate the performance efficiency based on parameters like execution time and SLA violation with the increment of tasks. State-of-theart algorithms like BLA and HEFT had inefficient performance in terms of execution time and SLA violation as compared to HNSA. e results of this second experiment validate that the proposed algorithm takes less time to complete the job execution. Also, it validates that the HNSA reduces the SLA violation as compared to BLA and HEFT algorithms during the workflow execution.

Evaluation through Different Weightages (Θ).
In this experiment, different weightages are given to parameters like CPU-MIPS, memory, bandwidth, and throughput to analyze the performance efficiency of HNSA against the BLA and HEFT algorithms. During this simulation, we used two Input: List of Jobs and Servers Output: Head Node, Candidate Node (1) Population � randomly generate; (2) while i � 1 to total population (3) population list � randomly assign each job to a host (4) end while (5) calculate the cost (); (6) if (time � � scheduling interval) (7) for (! Stopping criteria) (8) Perform Crossover Function(); (9) Perform Mutation Function(); (10) Calculate Fitness Function (); (11) end for (12) leader and candidate node � host with the maximal value from equation (1) in the first index of the population list is head node and the second is candidate (13) if (Head Node Failed) (14) candidate � host at the second index of the population list became a head node (15) end if (16)  different scenarios and gave different weightages to parameters according to the user requirement. In scenario 1, the range of hosts is fixed as 50 and the number of tasks/jobs is increasing by 1000. However, the weightages of resources like RAM, throughput, bandwidth, and CPU are different. e parameter weightages in scenario 1 are CPU 10%, RAM 60%, bandwidth 20%, and throughput 10%, as shown in Table 4. Figure 10 validates that the performance of HNSA is better compared to state-of-the-art algorithms in terms of execution time and SLA violation. In scenario 2, the numbers of jobs and hosts are the same during the evaluation of the performance of HNSA against those of some state-of-the-art algorithms using all parameters. However, the weightages of all parameters are different: CPU-MIPS 10%, bandwidth 70%, RAM 10%, and throughput 10% as scenario 1, shown in Table 4. Figure 11 validates the simulation results in scenario 2, where the performance of the HNSA is more efficient compared to the BLA and HEFT algorithms. e second conducted experiment was to calculate the performance efficiency of the proposed method with the same parameters. In this experiment, the number of  tasks is incrementing; however, the number of hosts used is fixed. e range of tasks is from 200 to 20,000 with an increment of 200 and the number of hosts is 50, which is fixed during the whole simulation. Figures 10 and 11 show that the proposed method's performance is efficient in terms of SLA violation and execution time. In this, the experimental results validate the performance efficiency based on parameters like execution time and SLA violation with the increment of tasks. State-of-the-art algorithms like BLA and HEFT had inefficient performance in terms of execution time and SLA violation as compared to HNSA. e results of this second experiment validate that the proposed algorithm takes less time to complete the job execution. It also validates that the HNSA reduces the SLA violation as compared to BLA and HEFT algorithms during the workflow execution.

Scheduling Comparison of Jobs.
In this experiment, we analyze the performance of the HNSA in comparison to some state-of-the-art algorithms such as HEFT and BLA. In this simulation, ten tasks/jobs are randomly generated, and the length of each task/job is as follows: 61000, 62000, 63000, 64000, 65000, 66000, 67000, 68000, 69000, and 70000. Table 5 shows the execution time of each job. Figure 12 shows the execution time with state-of-the-art algorithms and Figure 13 shows the execution time of tasks on different servers/hosts using the proposed algorithm. e HNSA maps the tasks on those hosts which efficiently meet the job/task requirement and are executed efficiently.
rough HNSA, T9 runs on h2/S2 with minimum time as compared to BLA compiling T9 on S3, so HNSA efficiently executes the jobs as compared to BLA.

Energy Consumption with the Increment of Tasks and
Hosts. In this simulation, the energy consumption is evaluated with two different scenarios. In the first scenario, energy consumption is calculated through the increment of the hosts, and in the second scenario, it is calculated through increment of the number of jobs. Figures 14 and 15 show the energy consumption of HNSA and state-of-the-art algorithms in kw/h of each job to be executed. e experiment results of this first scenario show that the HNSA reduces the energy consumption due to efficient host uses for job execution. e optimized resources of the efficient host are used to execute the job in minimum time and other servers/hosts turn off to save energy. So, the HNSA algorithm uses optimized hosts to utilize the maximum resources and turn off the unused hosts and map the tasks/jobs on available optimal hosts.

Conclusion
In our methodology, we focused on the problem of the head node selection in the cloud data center. In the data center, the leader node is to ensure efficient execution of the task/ jobs and deliver the well-organized services to the users according to their requirements. So, the selection of a head/ leader node is a challenging task. However, in literature, most of the techniques are proposed to overcome the issues of head node selection. Still, there is a problem with all proposed techniques because these techniques failed to consider the resources of the host/server. So, we proposed an optimized algorithm called the HNSA, inspired by the processes of the genetic algorithm. Our proposed algorithm is based on a random population approach, which combined the behaviors of individual mean jobs on hosts. rough the HNSA, select the head node and candidate node based on available resources. e reason for the candidate node selection is that if the head node is failed within the time interval, then the candidate node becomes head node without long delay. To validate the performance efficiency and reliability of the proposed HNSA, we performed some simulations and compared these results against those of some state-of-the-art algorithms. e performance evaluation parameters are execution time, SLA violation, memory utilization, and energy consumption. Our results validate that our proposed HNSA is more efficient compared to stateof-the-art algorithms. In the future, we will implement new protocols, and this technique will also be implemented in IoT environment, wireless network, and fog computing.   T0   T3  T2  T6   T7  T9   T8   T1   T4   T5   S1 S2 S3 S4  T0  T1  T2  T3  T5   T6   T9   T4 T7 T8 S1 S2 S3 S4 Figure 12: rough BLA

Data Availability
Data sharing is not applicable to this article because the authors have used data that are generated randomly, the details of which are in the section titled "experiments and results" within this article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.