A Memetic Lagrangian Heuristic for the 0-1 Multidimensional Knapsack Problem

We present a new evolutionary algorithm to solve the 0-1 multidimensional knapsack problem. We tackle the problem using duality concept,differentlyfromtraditionalapproaches.OurmethodisbasedonLagrangianrelaxation.Lagrangemultiplierstransformthe problem,keepingtheoptimalityaswellasdecreasingthecomplexity.However,itisnoteasytofindLagrangemultipliersnearest tothecapacityconstraintsoftheproblem.ThroughempiricalinvestigationofLagrangianspace,wecanseethepotentialityof usingamemeticalgorithm.SoweuseamemeticalgorithmtofindtheoptimalLagrangemultipliers.Weshowtheefficiencyofthe proposedmethodbytheexperimentsonwell-knownbenchmarkdata.


Introduction
The knapsack problems have a number of applications in various fields, for example, cryptography, economy, network, and so forth.The 0-1 multidimensional knapsack problem (0-1MKP) is an NP-hard problem, but not strongly NP-hard [1].It can be considered as an extended version of the well-known 0-1 knapsack problem (0-1KP).In the 0-1KP, given a set of objects, each object that can go into the knapsack has a size and a profit.The knapsack has a certain capacity for size.The objective is to find an assignment that maximizes the total profit not exceeding the given capacity.In the case of the 0-1MKP, the number of capacity constraints is more than one.For example, the constraints can be a weight besides a size.Naturally, the 0-1MKP is a generalized version of the 0-1KP.
Let  and  be the numbers of objects and capacity constraints, respectively.Each object  has a profit V  , and, for each constraint , a capacity consumption value   .Each constraint  has a capacity   .Then, we formally define the 0-1MKP as follows: maximize k  x subject to x ≤ b, x ∈ {0, 1}  , (1) where k = (V  ) and x = (  ) are -dimensional column vectors,  = (  ) is an  ×  matrix, b = (  ) is an -dimensional column vector, and  means the transpose of a matrix or a column vector., b, and k are given, and each element of them is a nonnegative integer.In brief, the objective of the 0-1MKP is to find a binary vector x which maximizes the weighted sum k  x satisfying  linear constraints x ≤ b.
For the knapsack problem with only one constraint, there have been a number of researches about efficient approximation algorithm to find a near-optimal solution.In this paper, we are interested in the problem with more than one constraint, that is, the multidimensional knapsack problem.In [2,3] among others, the exact algorithms for 0-1MKP have been introduced.Heuristic approaches for 0-1MKP have also been extensively studied in the past [4][5][6][7][8][9][10][11][12][13].Also, a number of evolutionary algorithms to solve the problem have been proposed [6,[14][15][16][17][18][19].A number of methods for the 0-1 biknapsack problem, which is a particular case of 0-1MKP, have also been proposed.The reader is referred to [20][21][22] for deep surveys of 0-1MKP.
However, most researches directly deal with the discrete search space.In this paper, we transform the search space of

Lagrangian Optimization
2.1.Preliminaries.The 0-1MKP is a maximization problem with constraints.It is possible to transform the original optimization problem into the following problem using Lagrange multipliers: It is easy to find the maximum of the transformed problem using the following formula: To maximize the above formula for the fixed , we have to set   to be 1 only if V  > ∑  =1     for each .Since each V  does not have an effect on the others, getting the maximum is fairly easy.Since this algorithm computes just ∑  =1     for each , its time complexity becomes ().
If we only find out  for the problem, we get the optimal solution of the 0-1MKP in polynomial time.We may have the problem that such  never exists or it is difficult to find it although it exists.However, this method is not entirely useless.For arbitrary , let the vector x which achieves the maximum in the above formula be x * .Since  is chosen arbitrarily, we do not guarantee that x * satisfies the constraints of the original problem.Nevertheless, letting the capacity be b * = x * instead of b makes x * be the optimal solution by the following proposition [13,19].We call this procedure Lagrangian method for the 0-1MKP (LMMKP).Proposition 1.The vector x * obtained by applying LMMKP with given  is the maximizer of the following problem: In particular, in the case that   is 0, the th constraint is ignored.That is, x * is the maximizer of the problems which have the capacities c's such that   ≥  *  and   =  *  for all  ̸ = .In general, the following proposition [19] holds.Proposition 2. In particular, if LMMKP is applied with  such that   1 = 0,   2 = 0, . . .,     = 0, replacing the capacity b * by c such that in Proposition 1 makes the proposition still hold.
Instead of finding the optimal solution of the original 0-1MKP directly, we consider the problem of finding  corresponding to given constraints.That is, we transform the problem of dealing with -dimensional binary vector x into the one of dealing with -dimensional real vector .If there are Lagrange multipliers corresponding to given constraints and we find them, we easily get the optimal solution of the 0-1MKP.Otherwise we try to get the solution close to the optimum by devoting to find Lagrange multipliers which satisfy given constraints and are nearest to them.

Prior Work.
In this subsection, we briefly examine existing Lagrangian heuristics for discrete optimization problems.
Coping with nondifferentiability of the Lagrangian led to the last technical development: subgradient algorithm.Subgradient algorithm is a fundamentally simple procedure.Typically, the subgradient algorithm has been used as a technique for generating good upper bounds for branch-and-bound methods, where it is known as Lagrangian relaxation.The reader is referred to [24] for the deep survey of Lagrangian relaxation.At each iteration of the subgradient algorithm, one takes a step from the present Lagrange multiplier in the direction opposite to a subgradient, which is the direction of (b * − b), where b * is the capacity obtained by LMMKP, and b is the original capacity.
The only previous attempt to find lower bounds using Lagrangian method is CONS [10]; however, CONS without hybridization with other metaheuristics could not show satisfactory results.LM-GA by [23] obtained better results by the hybridization of weight-coded genetic algorithm and CONS.In LM-GA, a candidate solution is represented by a vector ( 1 ,  2 , . . .,   ) of weights.Weight   is associated with object .Each profit   is modified by applying several biasing techniques with these weights, that is, we can obtain a modified problem instance   which has the same constraints as those of the original problem instance but has a different object function.And then, solutions for this modified problem instance are obtained by applying a decoding heuristic.In particular, LM-GA used CONS as a decoding heuristic.The feasible solutions for the modified problem instance are also feasible for the original problem instance since they satisfy the same constraints.So, weight-coding does not need an explicit repairing algorithm.
The proposed heuristic is different from LM-GA in that it improves CONS itself by using properties of Lagrange multipliers, but LM-GA just uses CONS as evaluation function.Lagrange multipliers in the proposed heuristic can move to more diverse directions than CONS because of its random factor.

Randomized Constructive Heuristic.
Yoon et al. [13,19] proposed a randomized constructive heuristic (R-CONS) as an improved variant of CONS.First,  is set to be 0. Consequently,   becomes 1 for each V  > 0. It means that all positive-valued objects are put in the knapsack and so almost all constraints are violated.If  is increased, some objects become taken out.We increase  adequately for only one object to be taken out.We change only one Lagrange multiplier at a time.We randomly choose one number  and change   .
Reconsider (3).Making (V  − ∑  =1     ) be negative by increasing   let   = 0 by LMMKP.For each object  such that   = 1, let   be the increment of   to make   be 0.Then, = , and apply LMMKP again, exactly one object is taken out.We take out objects one by one in this way and stop this procedure if every constraint is satisfied.
Algorithm 1 shows the pseudo code of R-CONS.The number of operations to take out the object is at most , and computing   for each object  takes () time.Hence, the total time complexity becomes ( 2 ).

Local Improvement Heuristic.
Our goal is to improve a real vector  obtained by R-CONS whose corresponding capacity is quite close to the capacity of the given problem instance.To devise a local improvement heuristic, we exploited the following proposition [13,19].Also, which   to be changed is at issue in the case that several constraints are not satisfied.Hence, it is necessary to set efficient rules about which   to be changed and how much to change it.If good rules are made, we can find out better Lagrange multipliers than randomly generated ones quickly.
We selected the method that chooses a random number (≤ ) and increases or decreases the value of   by the above theorem iteratively.In each iteration, a capacity vector b * is obtained by applying LMMKP with .If b * ≤ b, all constraints are satisfied and hence the best solution is updated.Since the possibility to find a better capacity exists, the algorithm does not stop here.Instead, it chooses a random number  and decreases the value of   .If b * ≰ b, we focus on satisfying constraints preferentially.For this, we choose a random number  among the numbers such that their constraints are not satisfied and increase  k hoping the th value of corresponding capacity to be decreased and then the th constraint to be satisfied.We set the amount of   's change to be the fixed value .
Most Lagrangian heuristics for discrete problems have focused on obtaining good upper bounds, but this algorithm is distinguished in that it primarily pursues finding feasible solutions.Algorithm 2 shows the pseudo code of this local improvement heuristic.It takes () time, where  is the number of iterations.
The direction by our local improvement heuristic looks similar to that by the subgradient algorithm described in Section 2.2, but the main difference lies in that, in each iteration, the subgradient algorithm changes all coordinate values by the subgradient direction but our local improvement heuristic changes only one coordinate value.Consequently, this could make our local improvement heuristic find lower bounds more easily than subgradient algorithm usually producing upper bounds.

Investigation of the Lagrangian Space.
The structure of the problem space is an important factor to indicate the problem difficulty, and the analysis of the structure helps efficient search in the problem space [25][26][27].Recently Puchinger et al. [28] gave some insight into the solution structure of 0-1MKP.In this subsection, we conduct some experiments and get some insight into the global structure of the 0-1MKP space.
In Section 2.1, we showed that there is a correspondence between binary solution vector and Lagrange multiplier vector.Strictly speaking, there cannot be a one-to-one correspondence in the technical sense of bijection.The binary solution vector has only binary components, so there are only countably many such vectors.But the Lagrange multipliers are real numbers, so there are uncountably many Lagrange multiplier vectors.Several multiplier vectors may correspond to the same binary solution vector.Moreover, some multiplier vectors may have multiple binary solution vectors.
Instead of directly finding an optimal binary solution, we deal with Lagrange multipliers.In this subsection, we empirically investigate the relationship between binary solution space and Lagrangian space (i.e., {x  s} and {s}).
We made experiments on nine instances ( ⋅ ) changing the number of constraints () from 5 to 30 and the number of objects from 100 to 500.We chose a thousand of randomly generated Lagrange multipliers and plotted, for each pair of Lagrange multiplier vectors, the relation between the Hamming distance in binary solution space and the Euclidean distance in Lagrangian space.Figure 1 shows the plotting results.The smaller the number of constraints () is, the larger the Pearson correlation coefficient () is.We also made the same experiments on locally optimal Lagrange multipliers.Figure 2 shows the plotting results.
Locally optimal Lagrange multipliers show much stronger correlation than randomly generated ones.They show strong positive correlation (much greater than 0.5).It means that binary solution space and locally optimal Lagrangian space are roughly isometric.The results show that both spaces have similar neighborhood structures.So this hints that it is easy to find high-quality Lagrange multipliers satisfying all the capacity constraints by using memetic algorithms on locally optimal Lagrangian space.That is, memetic algorithms can be a good choice for searching Lagrangian space directly.

Proposed Memetic Algorithm. A genetic algorithm (GA) is a problem-solving technique motivated by Darwin's theory of natural selection in evolution.
A GA starts with a set of initial solutions, which is called a population.Each solution in the population is called a chromosome, which is typically represented by a linear string.This population then evolves into different populations for a number of iterations (generations).At the end, the algorithm returns the best chromosome of the population as the solution to the problem.For each iteration, the evolution proceeds in the following.Two solutions of the population are chosen based on some probability distribution.These two solutions  are then combined through a crossover operator to produce an offspring.With low probability, this offspring is then modified by a mutation operator to introduce unexplored search space into the population, enhancing the diversity of the population.In this way, offsprings are generated and they replace part of or the whole population.The evolution process is repeated until a certain condition is satisfied, for example, after a fixed number of iterations.A GA that generates a considerable number of offsprings per iteration is called a generational GA, as opposed to a steady-state GA which generates only one offspring per iteration.If we apply a local improvement heuristic typically after the mutation step, the GA is called a memetic algorithm (MA).Algorithm 3 shows a typical generational MA.
We propose an MA for optimizing Lagrange multipliers.It conducts search using an evaluation function with penalties for violated capacity constraints.Our MA provides an alternative search method to find a good solution by optimizing  Lagrange multipliers instead of directly dealing with binary vectors with length  ( ≪ ).
The general framework of an MA is used in our study.In the following, we describe each part of the MA.
Encoding.Each solution in the population is represented by a chromosome.Each chromosome consists of  genes corresponding to Lagrange multipliers.A real encoding is used for representing the chromosome .
Initialization.The MA first creates initial chromosomes using R-CONS described in Section 2.3.We set the population size  to be 100.
Mating and Crossover.To select two parents, we use a random mating scheme.A crossover operator creates a new offspring by combining parts of the parents.We use the uniform crossover.
Mutation.After the crossover, mutation operator is applied to the offspring.We use a gene-wise mutation.After generating a random number  from 1 to , the value of each gene is divided by .Local Improvement.We use a local improvement heuristic described in Section 2.4.The number of iterations () is set to be 30,000.We set  to 0.0002.
Replacement and Stopping Condition.After generating /2 offspring, our MA chooses the best  individual among the total 3/2 ones as the population of the next generation.Our MA stops when the number of generations reaches 100.
Evaluation Function.Our evaluation function is to find a Lagrange multiplier vector  that has a high fitness satisfying the capacity constraints as much as possible.In our MA, the following is used as the objective function to maximize, which is the function obtained by subtracting the penalty from the objective function of the 0-1MKP: where  is a constant which indicates the degree of penalty, and we used a fixed value 0.7.

Experiments
We made experiments on well-known benchmark data publicly available from the OR-Library [29], which are the same as those used in [6].They are composed of 270 instances with 5, 10, and 30 constraints.They have different numbers of objects and different tightness ratios.The tightness ratio means  such that   =  ∑  =1   for each  = 1, 2, . . ., .The class of instances are briefly described below.
The proposed algorithms were implemented with gcc compiler on a Pentium III PC (997 MHz) using Linux operating system.As the measure of performance, we 1 Multistart R-CONS-L returns the best result from 5,000 independent runs of R-CONS-L. 2Average CPU seconds on Pentium III 997 MHz.
used the percentage difference-ratio 100 × |  − |/ which was used in [6], where   is the optimal solution of the linear programming relaxation over R. It has a value in the range of [0, 100].The smaller the value is, the smaller the difference from the optimum is.First, we compared constructive heuristics and local improvement heuristic.Table 1 shows the results of CONS, R-CONS, and R-CONS-L, where R-CONS-L starts with a solution produced by R-CONS and locally improves it by the local improvement heuristic described in Section 2.4.We can see that R-CONS-L largely improves the results of R-CONS.
Next, to verify the effectiveness of the proposed MA, we compared the results with a multistart method using R-CONS-L.Multistart R-CONS-L returns the best result from 30.500:  = 0.19

Figure 1 :
Figure 1: Relationship between distances on solution space and those on Lagrangian space (among randomly generated Lagrangian vectors).* -axis: distance between Lagrangian vectors, -axis: distance between binary solution vectors, and : Pearson correlation coefficient.

Figure 2 :
Figure 2: Relationship between distances on solution space and those on Lagrangian space (among locally optimal Lagrangian vectors).* -axis: distance between Lagrangian vectors, -axis: distance between binary solution vectors, and : Pearson correlation coefficient.

Proposition 3 .
Suppose that  and   correspond to {x, b} and {x  , b  } by the LMMKP, respectively.Let  = ( 1 ,  2 , . . .,   ) and   = (  1 ,   2 , . . .,    ), where   =    for  ̸ =  and   ̸ =    .Then, if   <    ,   ≥    , and if   >    ,   ≤    .Let b be the capacity of the given problem instance and let b * be the capacity obtained by LMMKP with .By the above theorem, if  *  >   , choosing   = ( 1 , . . .,    , . . .,   ) such that    >   and applying LMMKP with   makes the value of  *  smaller.It makes the th constraint satisfied or the exceeded amount for the th capacity decreased.Of course, another constraint may become violated by this operation.

Table 1 :
Results of local search heuristics on benchmark data.

Table 2 :
Results of memetic Lagrangian heuristic on benchmark data.