Dynamic Q-Learning-Based Optimized Load Balancing Technique in Cloud

,


Introduction
Nowadays, cloud services are profound as a very important component in smart devices and high-end applications.Te utilization of cloud resources is increasing every day due to an increase in demand.Cloud computing techniques are integrated with wider domains to store data in various forms.Handling such structured and unstructured data formats adds additional complexity and overhead to the computing machines.Massive systems today must be more efcient in their operation, requiring less power and taking up less room.Modern processor design should prioritise power and energy efciency.True multitasking is made possible by multicore processors, allowing users to execute multiple complicated functions in parallel and get more done in less time.Multicore processors, which pack two or maybe more processor cores into a single chip, ofer superior performance and innovative features which keep systems running at lower temperatures and with greater efciency.Cloud computing is a revolutionary model for delivering and using Internet-based information technology services.Te word "cloud computing" mentions to the practise of ofering a variety of services through the Internet, the most common of which is the rental out of virtualized, easily scalable hardware.User expectations and requirements are prominently growing in daily life due to advancements in digital components and self-thinking AI techniques [1].By investigating outstanding results in recognition, translation, and prediction tasks, the emergence of machine learning techniques and deep networks has reached new heights.Processing such complex tasks using neural networks demands high-end GPU devices' support, huge bandwidth, and massive storage.To provide these resources at a low cost, a novel approach to resource utilisation and allocation is required [2].
Cloud computing is not a sophisticated methodology for supplying wanted, customer required, adaptability approaches to a collection of computational assets that are customizable and might be quickly provisioned and unloaded with exhausted considering efort or administration which analyse the unique sequencing of jobs for expert algorithms.Cloud computing is a worldview that gives needed, the consumer-required, adaptive approaches to a group of computational possessions that are confgurable and potency be promptly provisioned and unconstrained with tired considered efort or management.Various virtual machines (VMs) in a cloud computing environment share the same physical resources (bandwidth, memory, and CPU) on a single physical host.System virtualization enables an enormous amount of VMs to segment the throughput of a host ranch.Because the outline's resources are communal by several consumers and applications, it can be demanding to devise a reasonable schedule for task scheduling that takes asset consumption as well as foundation execution into account.Te efciency of task scheduling is impacted in a variety of ways by a variety of framework boundaries, including memory space, the bandwidth for the system, and processor power.In the cloud, the primary objective of task scheduling algorithms is to keep the load the same on the processors by taking into account the bandwidth of the system.Tis is carried out to improve the processors' productivity and utilization, as well as to cut down on the quantity of time it takes to comprehensive the task [3].An adaptive genetic algorithm (AGA) that is one of a kind was used in the development of a load-balancing job scheduling system for the cloud that combines the benefts of cloud computing with the algorithm.Tis approach addresses a task scheduling sequence with customary work and the squatter task mark span while simultaneously fulflling among hubs load balancing requirements.It mounts multiftness tasks while simultaneously embracing an insatiable algorithm to appoint the population, carrying invariance to portray the load that has intensifed amongst hubs, and they compare and contrast the way that AGA and JLGA provide restitutions.Tis substantiates the validity of the scheduling method as well as the practicality of the augmentation technique [4].
Considering all these components and providing an intelligent service based on the latest artifcial intelligence approaches makes the researcher pursue the investigation in a very challenging way and requires wider attention.Sometimes such cases are treated as NP-hard types of problems, and solving them requires very smart approaches.Te emergence of reinforcement learning with deep neural network approaches has attained a very prominent position in handling such highly complex tasks [5].
Load balancing and dispersion is a topic that has been extensively researched, with a correspondingly large body of research.In particular, queueing up models with diferent performance indicators, including such weighted imply response time, have already been studied to better understand the optimum power supply issue [6].
Te performance and efciency of a solution that is predicated on machine learning will be afected by the presentation of the machine learning algorithms, as well as the attributes and nature of the information.Te next machine learning (ML) subfelds, reinforcement learning, frequent pattern learning, slight decrease of highdimensional and feature extraction, data clustering, and regression, and also classifcation analysis, can be utilised to construct data-driven structures efciently and efectively.Deep learning is a relatively revolutionary innovation that was derived from the household of machine learning techniques known as artifcial neural networks (ANNs).Its purpose is to intelligently analyse data [7].Each machinesupervised learning serves a unique purpose; even when applied to the exact same category, diferent machinelearning algorithms will produce varying results.Tese variations are because each algorithm's performance is dependent on the characteristics and qualities of the data.Terefore, selecting a learning algorithm to create solutions to a target domain can be a difcult task.We must have a comprehension of both the appropriateness and the fundamental principle of ML [8].Reinforcement learning (RL) is a technique that, when applied in an environmentdriven setting, enables machines and application services to evaluate the optimal behaviour spontaneously in order to improve their efectiveness within a specifc setting.Te justifcation of RL is either penalties or rewards, and the objective of this approach is to carry out actions in such a way as to minimise the penalty and maximise the reward, all the while making use of the environmental insights that have been extracted.RL can be used to develop the efciency of complex systems in a variability of contexts, including manufacturing, supply chain logistics, driving autonomous tasks, robotics, and other areas.Tis can be accomplished by performing operational optimization or by automating processes with the assistance of AI models that have been trained.Traditional load balancing techniques are often static and lack the ability to adapt to changing conditions in real-time.Tis can lead to suboptimal resource allocation, performance degradation, and increased costs.To address these issues, researchers have proposed various dynamic load-balancing techniques that leverage machine learning algorithms to absorb an optimal policy for resource allocation centred on current conditions.In this context, the proposed technique of "Dynamic Q-Learning-Based Optimized Load Balancing Technique in Cloud" is a reinforcement learning-based approach that uses Q-learning

Related Works
Te cloud users who experience service delay and performance worse on computing tasks due to high trafc and other factors will lower the usage of cloud services.But the day-to-day life storing and processing of high volume of data cannot be carried out using single devices.Te reliability and security on the other hand show momentous role in handling such sensitive data [9].Since the incorporation of various mechanisms amended meaningful improvements in cloud environments, we further investigated various research articles, and a detail of the literature is shown in Table 1.Te survey investigated various components used in the earlier studies precisely.In the context of a heterogeneous multicloud environment, an analysis of an efective method was conducted for work scheduling.Although the rest comprised two-stage scheduling, the MCC algorithm only used a single step for its scheduling.
Tey put the algorithms through extensive testing by utilising a variety of benchmarks as well as artifcial datasets.Teir displays were evaluated in terms of make span and typical cloud usage, and the fndings of the trials were compared to indicate how successful the algorithms are.Task scheduling in the cloud is dependent on our metaheuristic method.Tey presented the scientifc categorisation as well as the near survey of the algorithms.On the basis of bio-inspired and swarm insight methodologies, a methodical investigation of task scheduling in cloud and network modelling has been familiarized.Tis study should give per-users the ability to select a rational methodology for presenting improved strategies when organizing client's applications by providing them with more options [10].
Te author Ullah et al. [1] has proposed the robust cloud framework to handle the failures.Te model efciently utilizes the energy and schedules the workloads properly.Tough it works well, it should be extended for large scale.Ullah et al. proposed a novel model based on the failure handling mechanism.Te resource management and energy efcient approaches are dealt in.Tese approaches improve the task execution confrmation rate at high level and ignore the delay and failure issues caused due to various reasons.Te author discussed about the energy and SLA policies in his work, and still failure handling and energy preservation are unanswered.Although the work considered mapping of VMs and load balancing approaches, still other parameters are not dealt properly such as cost and execution time lines.On the other hand, the researcher introduced decentralized approaches based on agents.In addition to that, the work provides optimized resource allocation approaches and investigates the complexity and cost factors (Table 1).Since it demands to incorporate other parameters, it fails to produce the expected performance.Panda et al. [2] have researched about the parameters such as resource and cost using optimization mechanism.It produces comparable performance in terms of quality, service-reply time, and robustness.Gawali and Shinde [6] further state that the idea induced by the researcher reduces the resource requirements and cost for VMs.But the model requires higher amount of data to achieve the acceptable performance threshold.Xu et al. [5] used multiple agents to stabilize the various jobs among the heterogeneous server systems.It becomes risky when number of servers are increased.Due to various criterial checks, the work presented fails to produce the expected performance.
Mobile Information Systems

Reinforcement Learning Techniques in Machine Learning
Reinforcement learning (RL) is introduced in machine learning area to achieve prominent results in dynamic decision-based execution process.Te performance of the proposed model is regularized and optimized by the incorporation of various parameters and values.Te existing words discussed in the paper explore the evidence for RL in cloud areas for load balancing and resource allocation [13].
Te efcient usage of resources and utilization of services are an important task in load balancing, which requires a dynamic algorithm that makes the decision for the present situation and allocates the resources according to the composition comment.Te practice of trial and error policy followed in RL approaches increases the performance and optimizes the cloud services.Here, in the RL approach, we used fve regions in which six data centres were taken into machine with 40 hosts in the value maintained with time space manager of values with bandwidth of 1000 mps speed Table 2. Te Q-learning methodology follows a reinforcement strategy by performing the best actions based on the present state to achieve maximum reward points.Te letter Q represents quality in terms of selecting the actions to get higher reward points [14].It is known as "of-policy" due to its randomness and ability to perform actions without considering any policies or fxed rules.Tis technique prefers the policy that yields maximum rewards by providing a good solution to the problems.In a cloud environment, adopting the Q-learning methodology provides efcient support to the load balancing activity to utilise the available resources efciently.Te use of VM instances allows for increased reliability and fault tolerance.By distributing the workload across multiple VM instances, the system becomes more resilient and can handle fuctuations in demand more efectively.Terefore, organizations can meet customer needs more efectively and minimize downtime or service disruptions Figure 2. Te Q-learning methodology is presented in the cloud environment using Q-Tables.Te Qtables are made up of states and actions that necessity be taken in order to achieve the preferred outcome.Te initial value is set as zero and gets updated every time a decision is made.It guides the agent to select the appropriate actions based on current Q-values [15].
Energy and load balancing metrics also received increased weighting, with their sum equalling.Te following expression is a mathematical description of the same generalised co-optimal control approach: In equation (1), wll signifes weights allotted and (xl) characterizes individual appropriateness function at 0 < l ≤ n.
For a well-organized explanation, every VM's load can be used to estimate the total load on the data centre [16].( In equation (2), n indicates the attributes depending on global and local abilities of number of nodes we are connecting and F is functional value corresponding to x vales and y values in summation of various virtual machine task values in resource utilization and execution time [17].Te task implementation on a VM machine through the energy assessment is determined with resource utilization and execution time.Energy expended Hij of ith task on jth VM is articulated as In equation ( 3), U ij and CO ij represent relative intermediary variation to current and earlier virtual machines (VMs), where i th task and j th task will currently be maintained in the product of both processes of elements in virtual machine [18].In the cloud VMs, typically, respectively, virtual machine could be characterized as a tuple/row (VM = {id; mips; bw; pes_number}) In equation (4) (degree of imbalance (D i )), degree of imbalance is an assessment measure to test the volume of load distribution above the virtual machines in expressions of their presentation and performance capabilities.Te trifing value of the level of imbalance means for a load of the distribution procedure is other stable (balanced).Degree of inequity is resolute by [19].Here, F max signifes a maximum execution time attained, F min symbolises to the minimum execution time attained, and F a indicates for average widespread execution time attained complete altogether the virtual machines.
Te value of j is between 1 and n.Te value of j ranges from 1 to n, including both endpoints, where k symbolises the number of virtual machines and thereby while the job increases, the n value increases.
In equation ( 5), makespan is the complete achievement time essential to widespread the execution of entirely tasks.On another hand, in terms of built-up, makespan is the time interval amongst the start point and fnish point of a categorization of jobs/tasks or an application.Te makespan resources indicate the capability of the scheduler to efciently and efectively allocate tasks to strategies (virtual machines).If the value of the makespan is high, it indicates that the scheduler is not efectively allocating tasks to devices during both the planning and execution phases [20].
In equation ( 6) (resource utilization (R u )), resource utilization is a presentation quantity to fgure the consumption of devices/resources.A high utilization price/value in the resources for cloud providers develops the concentrated yield.
In equations ( 7) and ( 8), schedule cost (SC) and execution cost symbolize the cost for cloud computing user for cloud computing provider alongside the utilization of devices to accomplish tasks.Te chief independent for a cloud computing user is to reduction the cost together with operational utilization and minutest makespan [21].
A high exploitation price/value means for cloud provider grows the determined yield.
In equations ( 9) and ( 10), ECT ab signifes the desirable execution time of m i ps i task on task length of the virtual machine.Te proposed method uses a multidivision group model for multiobjective optimization, allowing for the division of the global domain into diferent domains that can be individually optimized [22].
Te load-balancing tool is used in two situations: the frst is when a VM starts, and the second is when the load rises or falls above the threshold.Figure 3 depicts the detail algorithm for VM start-up [23].First, in Algorithm, we receive

Mobile Information Systems
Te proposed model employs multiple agent-based decision making systems for monitoring the diferent activities which are happening in the cloud environment.Te agents are autonomous and use sensors to infer the actions to be performed.On the other hand, VMs also act as agents and work based on the instructions of autonomous agents.Te proposed model employs a user agent (UA) for regulating resource allocation and load balancing activities.Te autonomous agents interact with the VMs by sending messages.Based on this, it decides further actions and provides real-time tracking information to regulate the RA and LB tasks [24].Te major role of placing the multiple agents is governing the activities such as energy consumption, load balancing, and fault tolerance, from which we estimate the global level measures of the cloud environment.Te incorporation of Q-learning and performing updates at each level is introduced in the proposed work.Te VMs communicate through a cloud environment consuming two diferent ways.Te exploitation way of interaction has led to decide the actions based on set of rules defned based on the earlier decisions and rewards [25].
Another way is exploring, in which the decision is executed randomly to secure high reward points.State transition process is continuously monitored and execution of actions is decided by VMs.Mainly, all the decisions are aimed to secured high reward points using Q-learning methodology (Figure 4).Obviously, the states' S with action A is focused to obtain reward R. Te Qtables retain the latest updates and actions [26].Te total reward is computed using the following equation: For {Get the t hours load predictions of the starting VM} (10) VMPreload < -Get-L p LoadPrediction (VMid) (11) {Get load prediction of each VM on host} (12) HRes < -Get_ResFromLoad (VMs, PreLoads, eachhost) (13) endFor (14) For: each server PM in datacenter (15) PM.Tcpu > β (16) workloadBalance in Data center() (17) End Function ALGORITHM 1: Dynamic programming algorithm to load utilization corresponding to bandwidth and network availability.

8
Mobile Information Systems and St = [s 1 , s 2 , . .., s n ] represents the set of states and Ac = [a 1 , a 2 , . .., a n ] denotes the set of actions to be performed by an agent, which indicates a customary of states and actions of learning agent, respectively.r k+1 indicates the reward obtained by performing the action Ac.Te discount factor is α, c [27].Te value for learning rate lies between 0 and 1.
Based on equation (11), it is aimed to achieve high rewards from set of actions performed in the cloud environment.Te VMs execution is managed using equation (12), where load balancing and work load of the VMs are computed each time.Based on which, the decisions are made [28][29][30][31][32].
In Algorithm 2, VM minimum confguration is input variable, and we are applying the n � 1 as the master node virtual machine and the total values of the machine will be obtained by new one obtained and output variable is optimization of VM creation in the same confguration.Due to maximum values, we need to check the maximum values of virtual nodes.Te next step has to set aside resources for VM start-up and transfer VMs for load balancing.In this stage, our method must compute this same load-balancing factor and select the appropriate host.As deliberated previously, the complex nature of the host selection method is O(n), where n characterizes the quantity of hosts within pool of resources.As a result of the inordinate amount of hosts, the time required for virtual machine allocation and relocation will convert excessively lengthy [37][38][39][40].Te energy failure is computed using equation (13), where θ Time denotes the total time consumed by the VM towards the energy consumed with the load maintained in the system.
Te energy failure is computed using the following equation, where θ p denotes the energy consumed with the load maintained in the system [41][42][43][44].

Result and Discussion
Tis section at frst demonstrates the reasonableness and exactness of load prediction models and relationships between entities, and then we reveal results obtained by employing our technique.Te results indicate that the proposed method is an efective method for the virtual machine's material requirements and then assign or schedule load-balancing assets with the total virtual machine capacity of 25 VM machine, and the number of processor is 5 for initial capacity.Next, we focus on the task scheduling for this VM with the memory capacity and the bandwidth for the size of machine, which we will have in this assigned machine.Finally, we have the memory management in which we manage the type of time and space with 8 per processor and in total jointly will produce the 240 capacity of processor in it.Here, we used full strong memory of 4 gb and with viable memory of 2 gb is used in the experimental machine.In our experimentation, we use a cluster collected through four computer servers of two kinds and one storing array.In our investigation, the workload is produced by a load engine.To start generating the CPU load, the load generator programme will contact some internet applications at unexpected times.
Terefore, it is important to accurately capture the resource demand from individual virtual machines on a server in order to understand the impact of virtualization overhead and optimize performance and resource allocation.By accurately capturing the resource demand from individual virtual machines on a server, organizations can gain insights into the impact of virtualization overhead and make informed decisions to optimize resource allocation, improve performance, and ensure efcient utilization of server resources.Nevertheless, the CPU characteristics for various sorts of hosts (AMD and Intel) distinguish because of amount of cores on the chip that infuences them.Because internet backbone I/O parameters are typically larger than disc I/O parameters, disc virtualization consumes less CPU resources than virtualization technology.Consideration of 20 virtual machines (VMs) performing a variety of tasks demonstrates a linear improvement in performance for the dynamic Q tabling algorithm.With more tasks, there is a greater need to balance energy consumption, costs, and workloads.A similar trend is seen in measures of time and resource utilisation, both of which have increased to refect the growing complexity of the scheduling procedure (Table 3).
It is observed that the utilisation of the VMs' resources (CPU and bandwidth) has a huge impact in energy consumption.According to the values, the proposed DQ theory did better when there were fewer tasks to complete, which shows the DQ theory algorithm results, also showing that the proposed DQ theory has better results.Given the increased demands, this is of crucial importance (from 200 to 1000 tasks) (Table 4).Research shows that as work fow increases, algorithmic performance degrades for mutually task scheduling and load balancing.As the proposed DQ theory has maintained its high performance even under heavier loads, it has been ranked among the top scheduling algorithms (Tables 5-8).
Full virtualization consumes more CPU resources than paravirtualization when using multiple kinds of virtualization technology.Tis is primarily due to the fact that virtualization technology uses the response to an increasing mechanism to achieve network virtualization, whereas the  Among the compared algorithms, the proposed D-Q theory performed the best.Te quicker convergence of the D-Q theory algorithm is directly responsible for this improvement, which in turn lessens the waiting time and resource loss that resulted from queueing.Troughput     Mobile Information Systems its steadiness and attained a better quality in D-Q theory of RIN gathering rate associated through the prevailing system.In the event-based and time-critical applications, the DQ learning algorithm proves to be an efective tool by achieving equal distribution with less errors.Te time and the number of sets used for assessing the performance of other algorithms such as GA, DCOS,MSDE, PSO, WOA, and MSA were inherited and the only dissimilarity identifed in the algorithms were employed and estimated for dissimilar statistical measures.

Conclusion and Future Scope
Obtained measurements were gathered and compared with those from existing improvement packet scheduling to determine how efcient is the proposed Q-learning.As demonstrated by the results, the Q-learning-based RL task scheduling outstripped the up-to-date in all relevant metrics, including energy savings, cost, strength index improvement, task completion time, turnaround time, and total system throughput.Te sophistication and overhead of the proposed algorithm can be reduced in the future by adding more QoS parameters.Te decisive objective of the research is to ofer a practical solution to the dynamic load balancing problem in cloud computing, which could advance resource utilization and performance while sinking costs.Te proposed technique has potential applications in a variety of cloud-based services and environments, including cloudbased applications, platforms, and infrastructures.Te incorporation of such hybrid approaches increases the cloud performance to the next level and makes decision dynamically.Te proposed model secures 20% greater performance compared to earlier studies.In LB indexing, comparing to other DQL algorithms results in 15% more LB values than another algorithm does with 20% throughput.Te task completion time of DQL is very minimum and on an average response time showed a maximum of 10% increase in all the other values of the algorithm used in these experimental results.Finally, CPU utilization increases up to 35% for the remaining algorithms compared to DQ learning with 15%.Even though the present work showed better results when compared to the existing up-to-date methods, a dynamic load-balancing algorithm machine learning in an additional number of work load as a variable will be used in the future.For real-time applications, it could remain more advantageous if the load of the request is transformed vigorously.Tese work provides simply the generic values for bandwidth and throughput.In adding to this, cost of networks and protected data communication have to be occupied into contemplation for further expansion.Tis proposed a dynamic Q-learning model that reduces energy consumption, makespan time, and improved resource utilization, thereby the load balancing of particular VM shares the resources when it is overloaded.As a future work, we planned to fne tune the model performance to achieve higher efciency in multitasking environment.Our load-balancing method in this paper only considers memory and CPU load.As a result, we must include the load of network and disc I/O in our load-balancing method.Mobile Information Systems (a) Completion time: CTij � ΣFti−Sti, Ni � 1 (b) Response time: RTij � ΣSubti−Wti, Ni � 1 (c) Troughput: Tij � ΣSucc tasks Total time, Ni � 1 3.1.Processing Time of Multiple VM.If network bandwidth is constant,

Figure 8 :
Figure 8: DQL on RIN method towards response time.

Table 1 :
Comparison techniques of diferent framework algorithms.� {a1, a2, . . .an} through n tasks in job queue and VM set VM � {b1, b2, . . .bm} by m VMs in VM pool set.Here, on basis of the processing time as well as the completed task, the impartial parameters can be determined.
the starting VM's prediction load for ensuing several hours.Ten, we choose n hosts in the VMMC that have lower loads.Ten, one suitable host will be chosen for the VM to run on from these n hosts.Te load-balancing factor for the host in the virtual machine will run the input value to reach the maximum threshold values.Figure 2: Workfow of Q-learning methology.
Data centre �  Load, Let VMid � the VM which will start (2) for every data in PT � load in to DC Capacity in DC (3) Let Tres_bottom � the b t bottom threshold for the load of VM (4) Let Tres_stop � the t t top threshold for a load of VMMC (5) Let n � the amount of hosts that the VM might be running on (6) Input: VMid, b t , t t , n Mobile Information SystemsIf i � � master, then Else If i � � slave or older, then Check the total value � max value Else If i � � member, then End if For Progress the swarm to acquire new solutions If Nkill � 0 and Ns < Nsmax then

Table 3 :
Confguration values of virtual machine in capacity.

Table 4 :
Performance results of dynamic Q-learning theory on RIN method with VM � range from 200 to 1200 tasks.

Table 6 :
DQL on RIN methods towards load balancing task completion time (ms).

Table 7 :
DQL on RIN methods towards load balancing response time (seconds).

Table 8 :
DQL on RIN methods towards load balancing CPU utilization (seconds).
12gure5: DQL on RIN method towards load balancing.12MobileInformationSystemsevaluation of diferent optimisation-based task scheduling algorithms using D-Q theory is discussed in this study.Tis fgure demonstrates how the proposed D-Q theory, by virtue of its load-balanced and energy-aware scheduling, outperforms the competing algorithms in terms of throughput.Because of its superior global search ability and convergence rate, D-Q theory is responsible for the suggested model's noticeable performance boost once likened to the contemporary replacements.CPU consumption might infuence upto 45%, an average of 35%.However, at the night, a CPU utilization is typically less than 15%.Tis is recognised.Figures5-9demonstrate that the proposed system preserved