Dispatching-Rule Variants Algorithms for Used Spaces of Storage Supports

. The paper is regarding the fair distribution of several ﬁles having diﬀerent sizes to several storage supports. With the existence of several storage supports and diﬀerent ﬁles, we search for a method that makes an appropriate backup. The appropriate backup guarantees a fair distribution of the big data (ﬁles). Fairness is related to the used spaces of storage support distribution. The problem is how to ﬁnd a fair method that stores all ﬁles on the available storage supports, where each ﬁle is characterized by its size. We propose in this paper some fairness methods that seek to minimize the gap between used spaces of all storage supports. In this paper, several algorithms are developed to solve the proposed problem, and the experimental study shows the performance of these developed algorithms.


Introduction
e manner that ensures the storage process is a primordial issue. Indeed, facing big data and few storage supports, it is important to seek a method that stores all given files in these storage supports. ese methods must guarantee an equity distribution of the files in the available storage supports. We mention that the equity distribution can be called file size balancing. e file size balancing depends on some scheduling algorithms to guarantee a minimum gap between the used spaces of the storage supports. In the nonbalancing case, we face a problem where some storage supports have high used space, and at the same time, some storage supports probably have low used space. To avoid these cases, appropriate scheduling algorithms can be applied. In this paper, we proposed some balancing algorithms to obtain an equity distribution of the files to the storage supports, as far as we know this problem is never studied in the literature review. Some research works related to the balancing process can be cited. Singh et al. [1] proposed a dynamic load balancing algorithm of strongly connected servers, which takes into account these servers capability of parallel processing and their request queuing capacity in order to classify the overloaded servers and the least loaded servers, and once a server gets overloaded, its load is migrated to the least loaded one. In [2], Hung et al. introduced an enhancement of the max-min scheduling algorithm by decreasing the completion time of the clients' requests; this algorithm uses a "supervised machine learning" that clusters utilization percent of virtual machines and clusters size of requests, and then the virtual machine which has the least utilization percent is appointed to the largest cluster requests. However, another work innovated new strategy increases the utilization of virtual machines in an efficient way. is strategy is a heuristic based on a load balancing algorithm which is applied on infrastructure as a service cloud [3]. Besides, Ragmani et al. [4] devised another new strategy of load balancing to increase cloud performance. On the contrary, another work integrated a load balancing algorithm to resource scheduling to provide a higher quality of cloud service [5]. In [6], Hung et al. proposed a load balancing algorithm called max-min and max algorithm that computes the average completion time for every task in all nodes, and then the task with the maximum average completion time is dispatched to the unassigned node with the minimum completion time which is less than the task maximum average completion time. e work in [7] focuses on maintaining information about every virtual machine efficiency in an allocation table that resides in a data center, by increasing the allocation count for an efficient virtual machine which has been allocated by a request and decreasing its allocation count after completing that request. Some other works apply balancing algorithms to solve reallife problems. Indeed, Jemmali et al. [8,9] treated the problem of gas turbine aircraft engines. However, Jemmali [10] focused on the equity distribution of projects revenues assignment. In the latter work, authors proposed several approximate solutions to solve the problem. Several other works treated the problem of balancing in different applications. Hasan et al. [11] applied balancing algorithm on small-cell networks to adjust handover parameters of the overloaded cells with adjacent cells. However, the balancing algorithm was applied on voltage loads of capacitors in a modular multilevel converter [12].
Xu et al. [13] introduced a technique that rewrites data blocks and defragments the backups of VM images as well as the authors proposed a technique for restoration of VM image backups by caching these data blocks. However, in [14], Xu et al. proposed a method based on enhanced k-means clustering that finds VM images of duplicated segments to be selected, and then these VM images can be loaded into memory.
Jain and You and Koseki and Ogawa [15,16] proposed a method of load balancing of a set of nodes within the cluster storage system. is method identifies a source node and target node based on a threshold value of the load as well as a proximity between the source and target nodes. is method chooses the data objects to be moved from the source node to the target node without exceeding the threshold value of load in the target node after moving these data objects.
Besides, Gulati et al. [17] introduced a software system which handles the placement of virtual machines and implement load balancing between several devices automatically by applying migrations of data between devices and without the need to the storage arrays.
In [18], Hu et al. introduced a load balancing strategy of VM resources using genetic algorithm and the system previous data and its current state. is strategy selects the best load balancing and alleviates or gets rid of dynamic migration.
However, Aerts et al. [19] proposed models of load balancing that can be done based on NP-hard retrieval time as well as blocks basis.
In this paper, our study focused on distributed load balancing algorithms, due to using the centralized algorithms limit the scalability in the future as well as they make the system less fault tolerable. Besides, our algorithms deal with a batch of files that need to be stored in temporary storage, and depending on the system planned backup time T, the load balancer is triggered. erefore, number of files is not a matter. is paper is organized as follows. In Section 2, we present the studied problem and we give some details about the problem in general. Section 3 presents six proposed algorithms for the studied problem, and the experimental results are presented in Section 4.

Problem Definition
e problem studied in this paper is the proposition of the fairness method that guarantees the fair distribution of several files to the storage supports. e problem can be presented as follows. Let F be the set of given files that must be stored on a fixed number of storage supports. e number of files is denoted by n f and the number of storage supports is denoted by n s . e set of storage supports is denoted by ST � st 1 , · · · , st n s . Each file f j with j � 1, · · · , n f is characterized by its size s j . When the file f j is stored, the cumulative file size is denoted by cs j . e total used space for the storage support st i when all files are stored is denoted by Us i with i � 1, · · · , n s . e minimum (maximum) used space after the termination of the backup procedure is denoted by Us min (Us max ). Example 1 can illustrate the studied problem. Example 1. Let n f � 7 and n s � 2. Table 1 represents the sizes for each file f j .
We seek to store the seven files on the two given storage supports. Applying an algorithm rule, the result is given in Figure 1. e results given by the scheduling shown in Figure 1 are as follows. We store the files {2, 6, 3, 7} in storage support 1 and files {4, 5, 1} will be stored in storage support 2. Based on the latter schedule, the used space for storage support 1 is 33. However, storage support 2 has a used space of 24. e gap between storage support 1 and storage support 2 is Us 1 − Us 2 � 9. Seeking to reduce the latter gap is our primordial objective in this research work. us, we must search for a schedule that reduces the gap with a more efficient value less than 9.

Approximate Solutions
Our objective in this paper is to minimize the gap between storage supports. To do that, we must, in the first step, define the gap in a general case. e gap can be calculated using different methods. We propose the indicator that can calculate the gap as follows: for each storage support, we subtract the minimum value of all used spaces from the used space of the corresponding storage support. erefore, considering the n s storage supports, the total capacity gap (TC g ) is given by the following equation:  8  3  10  5  11  7  13 2 Discrete Dynamics in Nature and Society Our objective is to minimize TC g given in equation (1). Based on the standard three-field notation in [20], the studied problem is denoted by P‖TC g .

Proposition 1.
e P‖TC g is an NP-hard problem.
Proof. Since P‖Us min is NP-hard problem [21] the studied problem is NP-hard problem because TC g � To achieve the goal of the work, we propose several algorithms to give approximate solutions. e proposed algorithms in this paper are based on three methods to solve the studied problems: the first method used the dispatching rules (nonincreasing sizes order algorithm (NISA) and nondecreasing sizes order algorithm (NDSA)), the second method is based on swapping approach (swapping nonincreasing-decreasing sizes order algorithm (SIDA), swapping nondecreasing-increasing sizes order algorithm (SDIA)), the third type of method is more complicated and is based on a mixed approach between the largest files and smallest ones (swapping nonincreasing-decreasing sizes with order algorithm (SIDA r ), r-Swapping nondecreasing-increasing sizes with order algorithm (SDIA r )).

Nonincreasing Sizes Order Algorithm (NISA).
is algorithm is applied since all files are ordered by the nonincreasing order of its sizes. After that, we store the files which have the greatest size in the storage support that has the minimum used space.

Nondecreasing Sizes Order Algorithm (NDSA).
is algorithm is applied since all files are ordered by the nondecreasing order of its sizes. After that, we store the files which have the smallest size in the storage support that has the minimum used space.

Swapping Nonincreasing-Decreasing Sizes with Order
Algorithm (SIDA). Instead to apply just one order (nonincreasing or nondecreasing), we adopt a mixture one by one. is means that for a first selection, we pick the file which has the largest size and for the second selection, we take the file which has the smallest size and so on until all the files are stored.
is algorithm works in fact on swapping between two algorithms. e function that call the algorithm NISA(), is denoted by NIS(), and the function that call the algorithm NDSA is denoted by NDS(). ese two functions return the file index that satisfies the applied algorithm. e function Store(j) is responsible to store the file f j in the most available storage supports. e most available storage supports are the storage supports which have the minimum used space. e algorithm of SIDA is given in Algorithm 1.

Swapping Nondecreasing-Increasing Sizes with Order Algorithm (SDIA).
is algorithm is based on the same idea of SIDA. e difference here is instead of beginning with the file that has the largest size, we inverse it and begin with the file that has the smallest size. e second file will be the one that has the largest size and so on. e algorithm of SDIA is given in Algorithm 2. ( is algorithm is based on the following idea. Instead of swapping files by files applied in SIDA, we select once the largest file and once the smallest file. e question is how the algorithm becomes if we go ahead for the swapping by two files or three files or r files.
If we choose the 2-swapping, this means we select the two files having the largest sizes and we store it. After that, we select the two files having the smallest sizes and so on two by two. e algorithm of 2-swapping, which means that r � 2, is given in Algorithm 3.
Algorithm 3 can give the solution just for 2-swapping files. We can generalize Algorithm 3 by searching the solution when r > 2. e algorithm for a predetermined r � t is given in Algorithm 4. e performance and generalization of the abovementioned algorithm are based on the iteration of Algorithm 4 several times, and then we select the best solution.
is generalization will be given as Algorithm 5.

r-Swapping Nondecreasing-Increasing Sizes with Order
Algorithm (SDIA r ). is algorithm has the same idea as the abovementioned algorithm. e difference here is instead of starting with the largest files, we start with the smallest files.
In Algorithm 5, we replace NIS(P) by NDS(P) in instruction 4 and we replace NDS(P) by NIS(P) in instruction 14. is modification will give the new algorithm SDIA t . In Algorithm 5, we modify SIDA t (F) by SDIA t () in instruction 2, and then the new algorithm SDIA r will be obtained.

Case Study
In this case study, we give the comparison between NISA and all other heuristics expected NDSA because for 100% of cases, NISA is better. So, we show a case study of instances that our proposed algorithms are better than NISA.

4.1.
Comparison of NISA and SIDA r . Let the instance with 10 files be assigned to 2 storage supports. e sizes of the 10 files are given in Table 2.
is case study is given for the first execution of the algorithm NISA. is means the used space is zero for both storage 1 and storage 2. e first step of the NISA algorithm is ordering the files in a nonincreasing way. is gives the following order of a batch of files f 3 , f 8 e steps of NISA algorithm are as follows: the largest file f 3 will be placed in the storage support with minimum used space. Storage 1 is selected. en, the second largest file f 8 is placed in storage 2 which is the one with minimum used space at this point. After that, the third file f 2 will be assigned to storage 2 because it has the minimum used space at this point (storage 1 : 280, storage 2 : 268). Now, the used space in storage 2 is 526 and so on. erefore, the schedule given by applying the algorithm NISA is as follows: in storage 1, NISA assigned the files f 3 , f 1 , f 10 , f 4 , f 5 . So, the total size assigned to this storage support is 939. However, in storage 2, the algorithm assigned the files f 8 , f 2 , f 7 , f 9 , f 6 . So, the total size assigned to storage support 2 is 988. erefore, the gap between storages is TC g (NISA) �  On the contrary, applying the algorithm SIDA r , the schedule is given as follows: in storage 1, SIDA r assigned the files f 3 , f 5 , f 6 , f 7 , f 9 , f 10 . So, the total size assigned to this storage support is 973. However, in storage 2, the algorithm assigned the files f 1 , f 2 , f 4 , f 8 . So, the total size assigned to storage support 2 is 954. erefore, the gap between storages is TC g SIDA r �  It is clear to observe that g SIDA r is less than g NISA . erefore, by comparing the results given by the algorithms NISA and SIDA r , the difference is 30. us, SIDA r gives the minimum gap.

Comparison of NISA and SIDA.
Let the instances with 10 files be assigned to 2 storage supports. e sizes of the 10 files are given in Table 3. e schedule given by applying the algorithm NISA is as follows: in storage 1, NISA assigned the files f 7 , f 5 , f 2 , f 10 , f 4 . So, the total size assigned to this storage support is 269. However, in storage 2, the algorithm assigned the files f 6 , f 3 , f 8 , f 9 , f 1 . So, the total size assigned to storage support 2 is 251. erefore, the gap between storages is TC g (NISA) � k � NDS(P) (5) else (6) k � NIS(P) (7) end if (8) Store(k) (9) P � P\k (10) j + + (11) end while ALGORITHM 2: Swapping nondecreasing-increasing algorithm: SDIA.
On the contrary, applying the algorithm SIDA, the schedule is given as follows: in storage 1, SIDA assigned the files f 7 , f 4 , f 9 , f 5 , f 2 . So, the total size assigned to this storage support is 260. However, in storage 2, the algorithm assigned the files f 1 , f 6 , f 3 , f 10 , f 8 . So, the total size assigned to storage support 2 is 260. erefore, the gap between storages is TC g (SIDA) �  It is clear to observe that TC g (SIDA) is less than TC g (NISA). erefore, by comparing the results given by the algorithms NISA and SIDA, the difference is 18. us, SIDA gives the minimum and optimal gap because TC g (SIDA) � 0.

Comparison of NISA and SDIA.
Let the instance with 10 files be assigned to 3 storage supports. e sizes of the 10 files are given in Table 4. e schedule given by applying the algorithm NISA is as follows: in storage 1, NISA assigned the files f 8 , f 9 , f 7 . So, the total size assigned to this storage support is 132. However, in storage 2, the algorithm assigned the files f 1 , f 6 , f 3 , f 4 and the total size assigned to this storage support is 135. For the third storage, the assigned files will be f 10 , f 2 , f 5 and the total size assigned to the latter storage support is 120. erefore, the gap between storages is Us i − Us min � (132 − 120) +(135 − 120) +(120 − 120) � 27.

(6)
On the contrary, applying the algorithm SDIA, the schedule is given as follows: in storage 1, SDIA assigned the files f 7 , f 1 , f 9 , f 6 . So, the total size assigned to this storage support is 131. However, in storage 2, the algorithm assigned the files f 8 , f 2 . So, the total size assigned to storage support 2 is 123. In storage 3, SDIA assigned the files f 4 , f 3 , f 10 , f 5 . So, the total size assigned to this storage support is 131. erefore, the gap between storages is TC g (SDIA) �  It is clear to observe that TC g (SDIA) is better than TC g (NISA). erefore, by comparing the results given by the algorithms NISA and SDIA, the difference is 9. us, SDIA gives the minimum gap.
if (j > n f ) then (9) Break; (10) end if (11) end for (12) if (j ≤ n f ) then (13) for (it � 1 to it � t) do (14) k � NDS(P) (15) Store(k) (16) P � P\k (17) j + + (18) if (j > n f ) then (19) Break; (20) end if (21) end for (22) end if (23) end while ALGORITHM 4: t-swapping algorithm: SIDA t . Discrete Dynamics in Nature and Society the files f 10 , f 6 , f 7 , f 1 , f 3 . So, the total size assigned to storage support 2 is 4969. erefore, the gap between storages is On the contrary, applying the algorithm SDIA r , the schedule is given as follows: in storage 1, SDIA r assigned the files f 5 , f 9 , f 2 , f 4 , f 6 . So, the total size assigned to this storage support is 4965. However, in storage 2, the algorithm assigned the files f 3 , f 1 , f 7 , f 10 , f 8 . So, the total size assigned to storage support 2 is 4965. erefore, the gap between storages is Us i − Us min � (4965 − 4965) It is clear to observe that TC g (SDIA r ) is better than TC g (NISA). erefore, by comparing the results given by the algorithms NISA and SDIA r , the difference is 8. us, SDIA r gives the minimum and optimal gap because TC g (SDIA r ) � 0.
Inspired from this case study, we proposed to apply our algorithms on a cloud computing domain by adding a new component called "scheduler" in the architecture of the cloud computing. is component will be responsible for applying the proposed algorithms and it gives the optimal schedule.

Experimental Results
In this section, we propose different classes of instances to compare the performance of proposed algorithms. e main comparison in this paper is that we compare the developed algorithms with the NISA algorithm. e NISA algorithm is used in the literature review as LPT algorithm which has several applications in the industry. e proposed algorithms in this paper were coded and executed by Microsoft Visual C++ (Version 2013). e computer that is utilized to run all programs coded in C++ has the characteristics as follows: (i) Processor: Intel ® Core ™ i5-3337U CPU @ 1.8 GHz.
We adopt the choice that the classes used to discuss the results obtained by the developed algorithms are inspired by the classes proposed in [21]. e generation of the file sizes s j will be through two kinds of distributions. e unit of the size is proposed as Mo.
e studied classes are given as follows:    Several statistics can be presented in this work. We start by the overall Table 7 that shows the percentage of each heuristic when the studied heuristic equals to the best one. e corresponding average time is calculated for each heuristic in Table 7. e algorithm SIDA r is the best one among all algorithms with percentage 75.2% and average time 0.115 s. However, the algorithm NISA has 45.2% and NDSA has 0%. e algorithm that consumes more running time compared with others is SIDA r . Table 8 presents the behavior of G H and time when the number of files is changed. For all algorithms, when the number of files increases, the Time increases. e worst G H value which is equal to 0.99 is obtained for the algorithm NDSA when n f � 3000. However, the best gap 0.09 is obtained for algorithm SIDA r when n f � 2500 (Table 9). Table 10 represents the behavior of G H and Time when the number of classes is changed. It is noticed that the worst gap is 0.98 when the NDSA algorithm is applied to class A. On the contrary, the SIDA r algorithm achieved the best gap in the same class which equals 0.02. e table also shows that the best execution time is less than 0.001s for the algorithms NISA, NDSA, SIDA, and SDIA. However, the algorithms SIDA r and SDIA r have the worse execution times which are larger than 0.1s, and the highest execution time is 0.125 s for SIDA r algorithm when it is applied to class C.
For the algorithm SIDA r , we can observe that the gap is 0.31 and 0.25 for classes D and E, respectively. However, these values are higher than the gap values of classes A, B, and C which are less than 0.1.

Conclusion
In this paper, we focused our study on the resolution of the NP-hard problem of assignment of several files to different storage supports. We developed six algorithms to solve the latter problem; these algorithms are essentially based on the dispatching rules with variant methods. ese methods are categorized into the nonincreasing (decreasing) order rule and the mixture method that uses both the nonincreasing and decreasing order rules. In addition, we proposed the r-swapping methods which are based on storing the first r files using the nonincreasing rule and then storing the next r files by applying the nondecreasing rule and so on until storing all files. In this paper, we chose the number r which equals n f − 1. e experimental results show that the best algorithm is SIDA r , which outperforms the old algorithms in the literature review. e proposed algorithms can be enhanced to develop more performed new algorithms.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.