Minimizing the Average Waiting Time of Unequal-Size Data Items in a Mobile Computing Environment

In a mobile computing environment, waiting time is an important indicator of customer satisfaction. In order to achieve better customer satisfaction and shorter waiting times, we need to overcome different constraints and make the best use of limited resources. In this study, we propose a minimization problem, allocating unequal-size data items to broadcast channels with various bandwidths. The main idea is to solve the problem in the continuous space R. First, we map the discrete optimization problem from Z to R. Second, the mapped problem is solved in R optimally. Finally, we map the optimal solution from R back to Z. With the theoretical analyses, we can ensure the solution quality and execution speed. Computational experiments show that the proposed algorithm performs well. The worst mean relative error can be reduced to 0.353 for data items with a mean size of 100. Moreover, almost all the near-optimal solutions can be obtained within 1 millisecond, even forN = 500, whereN is the number of data items, that is, the problem size.


Introduction
Broadcasting is an efficient mechanism to transmit information in a mobile computing environment.Popular messages (e.g., weather reports) or instant information (e.g., stock quotes) can be widely disseminated via the broadcast mechanism.This success is mainly due to the high bandwidths of downlinks, which are used for the transmission of data items from a broadcast server to unlimited mobile users.Note that the bandwidths of channels and the sizes of data items might be unequal.The simplest strategy is to equally allocate data items to all channels, or load balance, but it is not the best way to reduce waiting time.Consequently, sophisticated broadcast scheduling or data partition algorithms are called for.
Waiting time, or expected delay, is an important indicator for measuring broadcast performance, for the waiting times of mobile users directly influence their customer satisfaction [1][2][3][4][5].For example, Chien and Lin [3] found that customers' waiting experiences may negatively affect their attitudes towards a given service.Moreover, improving the service implies the improvement of user experiences.Therefore, a good broadcast mechanism that is able to reduce waiting time can achieve better customer satisfaction.
Let us observe the simplest form of such problems, that is, ∑  [(∑    )(∑  1)].These problems feature single-item queries, multiple channels, and skewed data preferences.Several equal-bandwidth broadcast channels and multiple equal-size data items need to be broadcast over multiple channels periodically.In a mobile computing environment, users can download their desired items via their mobile devices, such as smartphones.Imagine that we allocate a few popular items to a channel, that is, a short cycle length, and other ordinary items to another channel, that is, a long cycle length.Most users can download popular items in a short time, without a long wait.Clearly, access probability and cycle length directly influence broadcast scheduling.Note the unbalanced workloads of these channels (i.e., the different amounts of data) in an optimal allocation.Once the optimal schedule is found, we can achieve shorter waiting times and better customer satisfaction.However, determining optimal schedules requires much execution time.That is, the time complexity excludes the related optimization algorithms from practical use.Now we consider some complicated forms of such problems, that is, ∑  [(∑    )(∑    )].These problems may have unequal-size data items or various bandwidths.There 2 Mobile Information Systems are still multiple channels and skewed preferences for data items.For example, in [6], the authors assumed that the bandwidths were different.This consideration makes this problem more difficult.In [7], the authors considered another problem, in which the sizes of data items are unequal.This assumption makes the problem more flexible and practical.
In recent years, various other forms have also been studied.In [8], mobile users could download multiple items at a time by sending a simple query.The authors assumed that two queries might have some items in common.To shorten the broadcast cycle length, the duplicate items were centralized and allocated to the same channel.In [9], video was broadcast on a single channel, so the data size was variable.In [10], mobile users could download multiple items in a multichannel environment.However, the wireless links were unreliable, so disconnections occurred frequently.In that study, reducing the waiting time was not the first priority.Instead, the authors aimed to minimize the deadline miss rate.All of the above considerations make the problems more complicated.Obviously, these problems cannot be solved easily or optimally when the problem size is large, so we need some more efficient algorithms to deal with these problems.
Such problems are usually time-consuming or even NPhard, so several intuitive or metaheuristic algorithms have been proposed.However, traditional algorithms have some shortcomings.For example, although dynamic programming [7,12] and branch-and-bound algorithms [13,14] can provide optimal solutions, they are time-consuming and cannot be applied to large problem instances; for example,  = 500.On the other hand, some metaheuristic algorithms [8] or greedy approaches [12,15,16] generate solutions very quickly.Nevertheless, their solution qualities are not stable.The reason is that their searches are done by random walks in their solution spaces.For the same problem instance, multiple executions of an identical algorithm can generate different solutions.Moreover, as in the case of tabu search, a great amount of memory is needed for keeping track of past experiences.Such algorithms are also unsuitable for large problem instances.
In this study, we consider a waiting time minimization problem (abbreviated as WTM).To make this problem more flexible and practical, meaning that unequal bandwidths and various data sizes are allowed, we propose a linearly convergent algorithm based on a steepest decent technique.First, WTM is mapped from Z  (the discretized space) to R  (the continuous space).Next, WTM  in R  is solved optimally in linear time.Finally, the optimal solution is mapped from R  back to Z  .
The rest of this study is organized as follows.Section 2 gives the formal definition of the proposed problem.Section 3 establishes the theoretical basis for the problem.Section 4 presents the linearly convergent algorithm.This study is compared with past research in Section 5.In Section 6, computational experiments are conducted to evaluate the performance of the proposed algorithm.Finally, conclusions are drawn in Section 7.

Problem Formulation
The waiting time minimization problem (WTM) is formulated as follows.There is a database D = { 1 ,  2 , . . .,   } to be broadcast in a mobile computing environment, where   is the th data item for  = 1, 2, . . ., .Let   and   denote the access probability and the size of   , respectively.The total amount of data is Λ(= ∑  =1   ).Assume that the access pattern , that is, the sequence of (  ,   ), is given in advance.Assume that a broadcast server is equipped with  channels, numbered from 1 to .We let the bandwidth of channel  be   .Without loss of generality, assume that   ≥   for all  < .Each channel is divided into time slices of equal size, called buckets.We need to partition D into  parts based on the access pattern  and assign each part to one of the  channels, one item to several consecutive buckets.Then  broadcast programs are formed, and each of them will be broadcast cyclically.The average waiting time is defined as the average amount of waiting time spent by each user before he/she receives the desired data item.Finally, the objective is to minimize the average waiting time of the  programs under the above assumptions.The objective function can be written as min∑ where   ∈   means   is allocated to channel  or machine .
Three properties regarding WTM are discussed as follows.First, for each single channel , its corresponding broadcast program cycle length is Λ  = ∑   ∈    .Then the expected waiting time in receiving   on the channel is (0.5Λ  +   )/  .If Λ  is much larger than   , the expected waiting time can be simplified to 0.5Λ  /  .Namely, the position order of data items makes no difference on the final result, that is, the average waiting time.Second, the average waiting time can be determined only by data partition.That is, the average waiting time is the weighted sum of the average waiting time on each channel, that is, 0.5 ∑  =1 (Λ  ∑   ∈    )/  .Third, such minimization problems are time-consuming or even NP-hard [6,16,17].Consequently, some efficient minimization algorithms are called for.With the above observation, we can define proper objective functions in the next section.

Theoretical Basis
In this section, a steepest decent technique is employed to solve the problem.First, we map the WTM problem from Z  to a new problem WTM  in R  .Second, we employ a steepest decent technique [11,18] to obtain the optimal solution to WTM  in R  .Finally, we map the optimal solution from R  back to Z  .The main idea behind the transformation is to improve the solution quality and execution speed.Although WTM can be solved optimally by some optimization algorithms (e.g., a branch-and-bound algorithm) in Z  , it is too time-consuming when  is large.On the other hand, though some metaheuristic algorithms (e.g., GA) are able to provide instant solutions in Z  , their solution quality is not guaranteed.For the same problem instance, they may provide different solutions.Consequently, instead of directly optimizing WTM in Z  , we map WTM to R  and take advantage of the linear convergence and optimality of the gradient-based technique in R  .
The parameters used in this study are summarized in Parameters at the end of the paper.Parameters , , and  are defined earlier.Parameters 0 and 1 are the access patterns used in different spaces, that is, Z  and R  .We also need two cumulative functions, (), (), and two interpolating functions, (), (), to define the objective functions   (n) in R  and   (x) in Z  , respectively.Moreover, n ∈ Z  and x ∈ R  are partition vectors or position vectors.

3.1.
Mapping WTM from Z  to R  .First, two objective functions for both WTM and WTM  are defined.Then we show how to map WTM from Z  to WTM  in R  .Finally, we prove that the geometric properties, such as concavity, of both WTM and WTM  are similar.These proofs ensure that both solution spaces are close to each other.
The relationship between WTM and WTM  is similar to that between the 0-1 knapsack problem and the fractional knapsack problem [19].If we can solve the problems in R  optimally, then the rounded solutions mapped back to Z  are therefore near-optimal solutions to the original problems.To show this relationship, two proper objective functions play an important role.Namely, the objective functions of WTM and WTM  must resemble each other.Once the two similar objective functions are determined, we can claim that the optimal solution of WTM  is very close to that of WTM.
Definition 1 helps us to transform the data partition problem into a sequence partition problem.Since the position order of an access pattern will lead to different results, we need the following definition to help us to ensure the optimality of WTM and WTM  .Definition 1.Given an optimal solution, the  programs can be concatenated into an optimal program.Let 0 denote the sequence of (  ,   ) that has the same order as the optimal program for WTM.Similarly, let 1 denote the sequence of (  ,   ) that has the same order as the optimal program for WTM  .
WTM and WTM  have different preferences for the order of access pattern.Consider the relationship between the 0-1 knapsack problem and the fractional knapsack problem again.For the fractional knapsack problem, if we take the items of greatest unit value one by one in a preemptive manner, we can easily achieve the maximum objective cost [19].Thus we try to solve the continuous-case WTM  by sorting the items   in descending order of (  /  ).That is, the sequence 1 is obtained by sorting (  /  ) in nonincreasing order.On the other hand, for the 0-1 knapsack problem, we may achieve the optimal solution but sacrifice some valuable items because they are too big to fit the knapsack.Clearly, the optimal item partition of the 0-1 knapsack problem cannot be solved by using simple sorting rules.That is, when we reduce an item partition problem to a sequence partition problem, it is difficult to determine the optimal sequence for partition.Consequently, we do not aim to find the optimal sequence 0 for WTM.Instead, we just use 1 for WTM  to simulate 0.
Definitions 2 and 3 introduce two cumulative functions regarding the access pattern .With the two cumulative functions, we can define a new objective function in R  in a more comprehensive way for later transformation.Definition 2. Given , the function () of cumulative probability is defined by Similarly, we define another function to express the cycle length of a broadcast program.The function () is defined as follows.
Definition 3. Given , the function () of the cumulative data size is defined by ( Now we redefine the objective function for the original problem WTM in Z  .The original objective function is defined from the viewpoint of data partition, whereas the new objective function is formulated from the viewpoint of sequence partition.With the cumulative functions () and (), we can determine a proper sequence in advance and then perform partition on the sequence.For simplicity, the leading coefficient of expected waiting time of each channel, 0.5, is omitted in the rest of this study.Definition 4. Given  and two constants  0 = 0,   =  for any column vector n where   is the bandwidth of channel .
An interpolating function () regarding access probability is defined for mapping WTM from Z  to R  .To preserve the geometric properties, we interpolate the  + 1 points (, ()),  = 0, 1, . . ., , by using  separate line segments.The interpolating function () is defined as follows.
By Lemma 8,   (n) and   (x) have the same function values at grid points.That is, they have similar geometric properties.Therefore, the optimal solution to WTM in Z  is close to that of WTM  in R  .The optimal solutions to WTM and WTM  are also close to each other.

Optimal Solution x
* to WTM  .In this subsection, we solve WTM  optimally and obtain the optimal solution x * in R  .First, we introduce the concept of gradient.Then the optimality and convergence speed are discussed.
The steepest decent technique we employed is based on the concept of the gradient [11].Unlike other metaheuristic algorithms, the steepest decent technique always converges at the global minimum instead of piecing several local minimums.Moreover, this steepest decent technique converges in  dimensions in linear time instead of performing meaningless random walks.The notation of the gradient is defined as follows.
Similar algorithms are found in [2,11,20,21].We can modify them slightly in order to obtain the optimal solution x * in R  .The details of the algorithm will be presented in the next section.Here, we show only the basic steps of the steepest decent technique as follows: (1) Evaluate   (x) at an initial position x (0) .
(2) Determine the steepest decent direction from x (0) that results in a decrease in the value of   .(3) Move an appropriate amount  (i.e., step size) in this direction, and the new position is x (1) .
In order to implement the algorithm easily, we reduce the equation to a single variable function Note that the value  0 that minimizes V() is also the value needed for (11).Because the root-finding process in (12) requires much execution time, Burden and Faires [11] employed a quadratic polynomial to interpolate V() in order to accelerate the root-finding process.The details regarding the quadratic polynomial will be shown in the next section.
The proposed algorithm converges linearly and the proofs of convergence are omitted.Readers can refer to [11,18].Even if this algorithm converges rapidly, an accurate initial solution can accelerate the convergence speed more.The following definition and lemma help us to choose an accurate initial solution.
The following lemma shows how to obtain an accurate initial solution by determining the elements of a position vector or partition vector x.Lemma 11.If the optimal solution x * to the dummy objective function  1 (x) is given, then Δ *  /  ≤ Δ * +1 / +1 for  = 1, 2, . . .,  − 1.
Proof.We show it by contradiction.Suppose Δ *  /  > Δ * +1 / +1 for some .Since x * is the optimal solution to  1 (x), the sum of products It contradicts that x * is the optimal solution.The proof is complete.
Proof.Since n # minimizes  1 (n),  1 (n # ) ≤  1 (n † ).On the other hand, we need to show  0 (n * ) ≤  1 (n # ).Note that 0 is the optimal sequence to the discrete-case problem WTM and that it is difficult to determine.We just assume that n * is the optimal solution to  0 (n).Since 1 is obtained by some simple sorting rule and dedicated to the continuouscase problem WTM  , the sequence 0 is more suitable for the discrete-case objective function,   (n), than 1.Therefore, we obtain  0 (n * ) ≤  1 (n # ).

Proposed Algorithm
Algorithm 1 shows the proposed algorithm GRA.In the first step, we prepare the cumulative functions () and Procedure GRA (, D, 1, TOL,  max ) INPUT: the number of channels , the database D, the sorted access pattern 1, the tolerance TOL, the maximal number of iterations  max .OUTPUT: the near-optimal solution n † to WTM.

Step 10
Set //Note that we use Newton's forward divided-difference formula [11] to find a quadratic //polynomial ( //Note that the critical point of  occurs at  0 .

Step 13
Set x = x − z.
Algorithm 1: The algorithm of GRA.
() according to   and   , respectively.Then we also construct the dummy objective function  1 (x), and we set the initial value of x = [ (0) 1 ,  (0) 2 , . . .,  (0) −1 ]  .All these  − 1 elements of x need to satisfy that Δ *  /  = Δ * +1 / +1 (see Lemma 11).In Steps 3 and 4, we evaluate the magnitude of  1 at x and determine the steepest decent direction.If a zero gradient occurs, the algorithm will stop.In Steps 5-8, we need to find a new position whose magnitude of  1 is smaller than the current one.Then we move a distance of  towards the steepest decent direction, and x is replaced with the new position (Steps 9-13).We check if any stopping criterion is met in Step 14.In Step 15, we employ  max to limit the total iterations; thus, the algorithm will stop anyway.

Comparison with Other Research
In the real world, such minimization problems usually call for instant and near-optimal solutions, especially for large problem instances.As a result, many metaheuristic algorithms have been proposed for providing instant and near-optimal solutions, for example, [22][23][24].Although these algorithms achieved better execution time, their solution quality cannot be ensured or bounded.
Therefore, we need deterministic algorithms that are able to converge linearly and achieve near-optimality.Jea et al. [20] solved a basic similar problem, DAP, that is, min∑ Here, the number of machines or channels is denoted by , literally meaning  in this study.The terms in the square bracket are in a very simple form.Even so, the partition problem is also time-consuming for obtaining an optimal solution when the problem size is larger than 50.This problem was solved near-optimally by their proposed algorithm [20].Wang and Jea [21] proposed another partition problem, SPP, that is, The terms in the square bracket become slightly complicated.This problem was also solved near-optimally in linear time.Jea and Wang [2] introduced another partition problem, DBAP; that is, The terms in the square bracket look similar to those of SPP.However, the transformation between Z  and R  becomes more difficult.After establishing a complete theoretical basis, this problem was also solved near-optimally.In this study, we propose the WTM problem, that is, As shown in Figure 1, WTM is the most complicated form.WTM relates to not only partition but also permutation.In the three problems, DAP, SPP, and DBAP, the position orders of items in n * and x * are the same.However, for WTM, the position orders of items in n * and x * might be different.This makes transformation between Z  and R  more difficult, so we sacrifice some accuracy of position order and force the transformation to be done.Even so, WTM describes a general form of such problems and still achieves near optimality.

Computational Experiments
The experiments are divided into three parts.First, since the real-world situations are complicated, we need to determine some significant system settings.We develop a basic genetic algorithm (GA) to conduct a pilot experiment for determining these settings.Second, when the problem size is small, both algorithms (i.e., GRA and GA) are compared with an exhaustive search algorithm.Third, when the problem size is large, we compare GA and GRA to evaluate their solution quality and execution speed.
Table 1 summarizes the parameters used in the experiments.Parameters , ,   ,   , and   have already been defined in Section 2. Access probability   follows a Zipf distribution with parameter  [20].Item size   follows a discrete normal distribution with parameters 200 and 50.Bandwidth   follows a discrete uniform distribution DU(1 − 0.25, 1 + 0.25).For GA, the population size, crossover rate, and mutation rate are 100, 0.8, and 0.05.For GRA, the maximum iteration number and tolerance are 10,000 and 0.01, respectively.All the proposed algorithms were implemented in PASCAL and executed in a Windows 7 environment on an Intel Xeon E3 1230 @ 3.20 GHz with 8 GB RAM.For each setting, 30 random trials were conducted and recorded.
In the first part, we develop a basic genetic algorithm (GA) to test the performance.Each chromosome is randomly generated.For example, let  = 4,  = 2 and generate 5(=  +  − 1) random values, 0.42, 0.95, 0.13, 0.21, and 0.36.According to their magnitudes, the largest number 0.95 means a channel separator.That is, item 4 (having the fourth smallest value, 0.42) is allocated to channel 1, and items 1, 2, and 3 are allocated to channel 2. For each population, there are 100 random chromosomes.The fitness is defined as   =  −0.5  / ∑  =1  −0.5

𝑘
, for  = 1, 2, . . ., , where   is the objective cost achieved by the th chromosome.A standard roulette wheel selection is employed, and two parent chromosomes are selected for generating two child chromosomes by a twopoint crossover.Moreover, a simple single-point mutation is adopted.GA will terminate if the run time is over 100 milliseconds or no improvement is made during the recent 100 generations.
In the second part, Figure 2 shows the effect of  on the performance for  = 10.The relative error is defined  by ( GA −  OPT )/ or ( GRA −  OPT )/, where  GA ,  GRA , and  OPT are objective costs obtained by GA, GRA, and an exhaustive search algorithm.It is seen that GA takes more run time when  increases.Moreover,  does not influence GRA and most instances can be solved within 1 millisecond.On the other hand, GA can easily become trapped in local minimums, since there are many local minimums for such an optimization problem.Consequently, the relative error of GA is relatively higher.Since the setting of  = 2 has the least significance, we omit it in later experiments.
Similarly, Figures 3-6 show various effects on the performance for  = 10.As shown in Figures 3 and 4, identical access probability ( = 0) and equal bandwidth ( = 0) make the problem easier, so we only observe larger  and  in later experiments.In Figures 5 and 6, when  and  increase, the relative errors increase slightly.The worst mean relative error is 0.353 when  = 0.75; that is, the data items are large.Therefore, we only test the instances with larger item sizes in later experiments.Figure 7 compares two algorithms for  = 12 in terms of execution speed and solution quality.Again, GRA almost takes no time to obtain near-optimal solutions, whereas GA needs 3 seconds on average.For the setting of  = 12, there are more local minimums than  = 10.It becomes more difficult to locate the optimal solution.Therefore, GA takes more run time but still has larger relative errors.
In the third part, Figures 8 and 9 show the performance of two algorithms when the problem size is large.Since we cannot obtain optimal solutions when  is large, we compute their relative deviations.The relative deviations are defined by ( GA −  min )/ and ( GRA −  min )/, where  min = min{ GA ,  GRA }.GRA outperforms GA greatly in terms of solution quality and execution speed.For most instances, they can be solved within 1 millisecond for  = 5 and  = 500.Even for the worst case, GRA takes only 16 milliseconds.On the other hand, GA cannot jump out of local minimums, no matter how many trials it tries.Its solution quality depends highly on the initial population.If there are no high-quality solutions at the beginning, it is difficult for GA to locate the optimal solution in the -dimensional solution space.
Figure 10 shows the effect of  on execution time.To observe the average execution time of GRA, we set  = 1,000,  = 0.0, 0.1, 0.5, 1.0, 2.0,  = 0.5,  = 0.5,  = 0.5, and  = 1 to 50.Since GA cannot compete with GRA for solution quality when  = 500, we do not examine the behavior of GA for such a large problem size.As shown in the figure, when  increases, the run time of GRA slightly increases.It means the number of channels is directly proportional to the run time of GRA.In fact, the number of channels of a mobile environment is far lower than 50.It implies that GRA is able to deal with scheduling 1000 data items for any real-world broadcast server.
Figure 11 shows the effect of  on convergence speed.In this experiment, we set  = 1000,  = 2, 5, 10, 20, 50,  = 0.5,  = 0.5,  = 0.5, and  = 0.0 to 1.0.When  = 0.0 and  = 50, all the data items are of the same popularity.However, the sizes of the data items are different and the bandwidths of the channels are also distinct.These variations make it difficult for GRA to converge.On the other hand, as  = 1.0, there are only a few popular data items which should be allocated to channel 1 (i.e., that with the highest bandwidth).The other items with very low access probabilities can be roughly allocated to the remaining channels.Therefore, GRA converges very rapidly when  > 0.8.
In Figure 12, we implement an optimization algorithm, that is, a branch-and-bound algorithm (B&B), and compare it with GA and GRA.Since B&B is very time-consuming, we only observe the results for  ≤ 15.Both B&B and GRA can provide optimal solutions when  ≤ 15.On the other hand, the average run time of B&B for  = 15 is 30 seconds, whereas the run time of GRA is always less than 4 seconds, even for  = 200.It is clear that the time complexity will exclude such optimization algorithms from practical use.
In sum, GRA is a practical algorithm, even for  = 500.As compared with the other two algorithms, GRA is more suitable for application in the real world.Moreover, we also guarantee that each solution is generated within a linear time and give an bound.

Conclusion
Minimization problems are usually time-consuming, especially for large problem instances.Consequently, most traditional studies have employed metaheuristic algorithms to solve such problems.However, such algorithms have several shortcomings.First, their solutions are obtained by trial and error, so solution quality is not guaranteed.Second, their approximation algorithms cannot converge linearly.Third, some traditional methods need to keep track of partial results, so they are memory-consuming.Fourth, some traditional methods, such as dynamic programming and branch-andbound algorithms, are not scalable.Once the problem size increases, it may take several days to generate an optimal solution, and such a delay is impractical.
Mapping a discretized problem from Z  to R  is an interesting idea.In this study, a gradient-based algorithm is proposed to deal with WTM.We first map it from Z  to R  .Then the mapped problem is solved optimally in R  .Finally, the optimal solution is mapped from R  back to Z  .Moreover, the theoretical basis ensures that the proposed algorithm can converge linearly, provide high-quality solutions, and require less memory.
In the near future, we will extend the concept to other optimization problems.By mapping a problem from its original domain to another domain, we are likely to find a more time-efficient and cost-effective way to achieve similar results.

Figure 11 :Figure 12 :
Figure 11: The effect of  on the execution time of GRA.

Table 1 :
System settings in the experiments.  ∈    ) − | subject to ∑  =1 [(∑   ∈    )(∑   ∈    )] ≤  and ∑   ∈  1 ≤ V for all nonempty   .The constraints and absolute values make the problem harder.It is interesting that the discretized problem could also be solved by the same technique in R  .