A Scheduling Problem in the Baking Industry

This paper addresses a scheduling problem in an actual industrial environment of a baking industry where production rates have been growing every year and the need for optimized planning becomes increasingly important in order to address all the features presented by the 
problem. This problem contains relevant aspects of production, such as parallel production, setup time, batch production, and delivery date. We will also consider several aspects pertaining to transportation, such as the transportation capacity with different vehicles and sales production with several customers. This approach studies an atypical problem compared to those that have already been studied in literature. In order to solve the problem, we suggest two approaches: using the greedy heuristic and the genetic algorithm, which will be compared to small problems with the optimal solution solved as an integer linear programming problem, and we will present results for a real example compared with its upper bounds. The work provides us with a new mathematical formulation of scheduling problem that is not based on traveling salesman problem. It considers delivery date and the profit maximization 
and not the makespan minimization. And it also provides an analysis of the algorithms runtime.


Introduction
Production planning problems have been studied extensively since the early twentieth century, and they can be found throughout literature. One of the pioneers in the work on these problems was Henry Gantt in his book "Work, Wages and Profit" [1] in which he demonstrates the need for a job schedule in order to increase production efficiency.
Currently many industries are seeking solutions for the job sequencing problem in order to increase productivity, reduce costs and, consequently, increase profit. Due to the burgeoning consumption of foodstuffs worldwide, optimized planning is necessary. For decades in the industry, planning rules were used in order to prioritize products by taking into account only a few production stages, but, given the increase of complexity and modernization of production, planning as a whole has to be optimized; that is, the entire production chain, from production and stock to shipping and sales, must be taken into account.
The purpose of this paper is to study planning problems applied to baking industry, where, in the current scenario, companies have high product turnover; that is, the goods are highly perishable and companies cannot meet all the demands for their products made by customers; and these decisions are highly correlated to production planning and transportation.
The difficulty of the problem in question is to define a production sequence in each line considering setup and production times, together with jobs that have set dates to be executed, in such a way as to maximize company profit. Thus, the problem has particular production characteristics, such as batch production and a scenario with several parallel production lines. Another relevant point to be considered in this study is that the solution should be found in a timely computational manner for planning purposes, as it is to be executed for the next 24 hours and cannot run for longer than minutes, allowing the implementation of planning in hours following the decision. There are many variations of scheduling problems, many of them widely studied in literature, but the problem studied here is not found in literature with all of the features proposed herein; we can only find methods to solve part of the problem.
Because of the diversity of production planning problems, the problems can be divided into classes. The ratings found for the more general problem of sequencing are flow-shop and job-shop. For flow-shop each process is identified in one job and job-shop is the problem where the order of each process may not be the same in one job. This problem presented here is a flow-shop problem.
In studies of flow-shop problems, the first theoretical results were presented to minimize the makespan, that is, total production time, with Johnson [2] and Bellman [3] determining the optimal sequence for the cases with two machines and special cases with three machines. More general cases for sequencing with three machines came from Lomnicki [4] and Ignall, and Schrage [5] by applying branch and bound methods introduced by Little [6]. For the problem of flow-shop with more than two machines Garey et al. [7] proved that it is NP-hard; thus we can only solve small problems accurately with algorithms such as branch-bound. Therefore, heuristic methods are proposed for the problem.
In order to study the flow-shop problem with a family setup, Sridhar and Rajendran [8] propose a heuristic to minimize the total time with an algorithm based on simulated annealing. Ziegler [9] proposes a method to minimize the total time weighted by weights in the process. Schaller [10] presents a new approach to the problem of flow-shop setup with families to minimize the makespan. Schaller [11] proposes a new lower bound for the problem and implements a two-stage heuristic algorithm based on branch and bound.
França et al. [12] works on the same problem, but with the Schaller genetic algorithm with local search, "memetic algorithms" (MA), and [13] achieves superior results by using hybrid methods with tabu search and the genetic algorithm.
The problems with delivery dates were studied by Croce et al. [14], who showed a genetic algorithm in which each chromosome consists of m subchromosomes, one for each machine, which identifies each transaction made by the machine. According to the authors, the results were better than those found by Adams et al. [15], but at greater computational cost. Sittisathanchai and Dagli [16] also present a genetic algorithm where the chromosome represents the operational sequence.
Flexible scheduling involves problems where we have the option of executing operations on different machines. This kind of problem is more comprehensive than the problem of traditional scheduling and production in parallel, which ensures that a transaction is made only by a single machine. Arthanari and Ramaswamy [17] were pioneers with exact two-stage methods, using two identical parallel machines in the first stage and one machine in the second stage. Later, Brah and Hunsucker [18] worked on the development of more general branch and bound algorithms, but for problems with more than 8 jobs, 5 stages and 2 or 3 machines, processing time becomes impractical.
For multistage jobs, Sawik [19] proposed a constructive heuristic in which the route of a job is determined at each iteration. Kittichartphayak and Ding [20] developed a heuristic similar to Sawik [19], but for larger orders of tasks and stages, which was extended by Guinet and Solomon [21]. Smutnicki and Nowicki [22] propose the use of tabu search in the work with satisfactory results.
The studied problem is a flow-shop problem with parallel production, setup times, batch production, due date, and transportation capacity, which in large scale justify the use of metaheuristic to solve it.

Mathematical
Modeling. The scheduling problem can be modeled as a mixed integer linear programming problem.
The problem requires a short-term study in great detail; that is, the study will be forecast to take place over two to seven days from the current day. Given the high level of time detail, it is necessary to deal with accuracies in minutes, as the setup time and production time may occur within minutes. Therefore, a discretization of time can result in inaccuracies because a large number of periods can cause a problem with a large number of variables, thus making it unenforceable, and a small number of time periods may not address the problem with the required accuracy.
Traditional approaches of the scheduling problem are based on the traveling salesman problem, where a variable represents the precedence order of task. But in this case it is necessary to know what time each product will be ready for shipment. Accordingly, a new model is proposed to represent the job sequencing where each variable represents which jobs are executed in each time period (Figure 1). This problem is defined in a set of period , where = {0, . . . , max} and max is the last period. Production is defined in two stages: in the first one, a preproduct known as mass is produced, and, in the second stage, a mass is transformed into a final product. The set of mass will be represented by and the set of products determined by the set . Each product ∈ will be produced by a single mass that can produce more than one product defined by Prod( ) ⊂ . Each mass has its production cost defined by cost 1 ( ), which defines the cost of producing each product. The production time of each mass is represented by ( ), and the setup time to prepare the mass of two products 1 and 2 is ( 1 , 2 ). Production occurs through two processes. The first process works in a mixer, which mixes all the ingredients of a mass, and the cost of ingredients will determine the profits from the products made with this mass. After the mixer, there is a second process, where the mass will be roasted, sliced, and packaged. In this second case the mass will be transformed into the final product. Products can be grouped into classes due to of the production characteristics, such as cooking time, types of cuts, and packaging. Therefore, within a production line different products with those that have the same production characteristics can be produced.
The production lines will be represented according to the set , and mass produced on each line will be determined by set Lin where for each we have Lin( ) ⊂ , where  an input can be present in more than one production line. We emphasize that the production of masses will occur in production batches, and each mass has a certain number of units produced per batch represented by batch( ).
One of the objectives of the problem is to fulfill the demand. Demand is divided by markets such that each market requests a daily mix of products. These requests can be fully addressed, or partially addressed, or not addressed at all. If the market is not attained there is no penalty, only the profit made by that market will not be obtained.
The amount of products to be manufactured is a function of demand for each market. The customers will be represented by the set , the demand in each customer for a product in a period will be ( , , ), and the profit for each sale will be profit( , ).
In order to transport products to markets, there are options of trucks with different capacities that can be used; all types of trucks will be represented by , with the cost of each truck given by cost 2 ( , ) and capacity by cap( ). The service to the customer should take place at a specific time because the trucks have schedules for loading in factories and unloading at customers' premises; accordingly, the demands of each market should be seen at a predefined period. A summary of model features and variables is given in Figure 2.
To describe the model, consider the following model parameters.

Sets
: set of periods.
: set of mass.
: set of products.
Prod( ): set of products produced by mass .
: set of production line. Lin( ): set of mass produced in production line .
: set of customer.
: set of trucks.   Define the following set of variables. Product:

Constraints
(1) Unique constraint on the use of the line at each period: in each line, only one product can be produced per period because it is not possible for a machine to run two products simultaneously where ( , , ) is a binary variable that represents the production of mass in period and line .
(2) Constraints of stock training: at each period, the stock is formed by the stock from the previous period plus the sum of what was produced, minus the sale for a given product where stk( , ) is the stock of product in the period , ( ) is time of production for mass that produces , ( , , ) is the amount of production of product in the period and line , and V( , , ) is the sales of product in the customer and the period .
(3) Demand constraints: at each customer, the sale of a product cannot exceed the requested demand Step 2: if (List != Empty) Step 1: create List of Step 4: if(Feasible(El)) Step 5: AddSolution(El) Step 3: El = RemoveElement(List) Figure 3: Representation of the greedy heuristic. Priority list Scheduling of production line (4) Constraints of setup time (cleaning and change of mass in the production line): at each exchange of mass the machine should be stopped by a number of periods defined by ( , 1 ). Therefore, if the machine is used for a mass , the other mass 1 cannot use the machine for the next ( , 1 ) periods

ID Customer Product Profit
where is a large enough number compared to the variables and problem constants. (6) Constraints of transport capacity: there is a limit to the amount of goods that each truck can transport; therefore, if a truck is used, a maximum of cap( ) units of products will be sent by truck , where the truck is filled with baskets of the equal dimensions where ( , , ) is the amount of trucks used for customer over period and type of trucks .

Objective Function.
The objective of the problem is to maximize profit, that is, maximize the difference between the sale price of each product in each market and production costs of each product combined with transportation costs. The problem presented is characterized as a mixed integer linear programming problem The mixed integer linear programming problem consists of

Solution Methods and Implementation Details.
Several methods are proposed to solve the problem. This problem can be solved by an exact method for solving integer linear programming problems, such as branch-bound. The solution will be presented through the Xpress solver using the interior point method and branch-bound in the default solver configuration. Other solution methods presented will be a method based on a greedy heuristic ( Figure 4) and also metaheuristics of the genetic algorithm type to solve the problem.

Greedy Heuristic.
The greedy algorithm can be found in Introduction to Algorithms by Cormen et al. [23] or Bendall and Margot [24]. In the context of scheduling delivery date problems, the problem is solved exactly for one machine in ( [25], page 207).
The greedy heuristic can be divided into two phases: (1) start-up and creation of the priority list ordered, (2) production sequencing.
A list of priorities should be created to be used in the algorithm; accordingly, the following criteria will be used: where each product is , is mass that produces the product , is a customer, and Average Cost of Shipping ( ) = ∑ ∈ | ( , , ) ̸ = 0 ∑ ∈ cost 2 ( , ) ∑ ∈ cap ( ) , For each product in each market we can prioritize the one with the biggest impact in the objective function and it will have the highest execution priority.
In the production sequencing for each element of the list Priority, we checked whether it would be possible to run it on a production line so that the solution does not become infeasible; that is, for each element determined by a product, market, and delivery time, we have to run it before the delivery time. Thus, if space is available (idle line space before the delivery time longer than production and setup time of the product in question) in a line that produces this element, this product will run within this space. If space is not available, the element is discarded. We note that the task should be executed as late as possible, in order to ensure that products with earlier delivery times and lower priorities can occupy earlier positions in the sequence.
At the production adjustment stage, the algorithm will only place idle time between the execution of two tasks at the end of the stage, because idle time between two tasks is · · · · · · · · · · · · Figure 7: Creation of initial population to genetic algorithm.  impractical; therefore it can be adjusted without any loss in objective function. We can represent the heuristic by means of Figure 3. is one element of priority list that represents a sale of product in customer .

Genetic
Algorithm. The application of methods for using genetic algorithm in scheduling problems can be found at Allahverdi et al. [26], which shows several authors working with genetic algorithms in different scheduling problems.
The genetic algorithm is based on building an initial population where each individual represents a possible solution, and, through this population, building new populations through an evolutionary process to find better solutions.
We can divide the algorithm into a few steps: initial population, crossover, mutation, and selection.
To represent an element of the population on the genetic algorithm will be used one matrix; this representation is unusual to the genetic algorithm in implementations already known. Each row represents a production line and each column a product that has been produced. This matrix has dimension Z × , with being quantity of lines and maximum quantity of products produced.
For each production line we will have a set of genes where each gene represents the product to be produced. Therefore, the solution is represented by a set of schedules of lines, which are sets of chromosomes, where each chromosome is a set of genes as shown in Figure 5, which represents in each line what will be produced by the genetic algorithm represented in Figure 6.
(1) Creation of Initial Population. The initial population is created randomly by respecting the feasibility of the solution. For each element of the population vectors are created, and each vector represents a production line and each matrix element represents the product that will be produced according to Figure 7, in order that the sum of production time and setup of a line does not exceed the maximum time that the line can operate. Thus, we can create all the elements of the population.
(2) Crossover. The crossover phase will start the process of building the next population. At this stage, two elements are chosen at a time, and these two elements exchange components from their solutions with each other. This exchange of components is known as crossover; it is executed by choosing two random points of each parent chromosome, and these components are inherited to the new solution that will be created. After a certain number of children are formed, the crossover phase is over (Figure 8). the mutation rate of their genes. This is done by randomly choosing a gene and replacing it with another element so that the solution remains feasible. This new element is also chosen randomly ( Figure 9).

(4) Selection. Selection can be executed in various manners.
Here, this will be done in two different ways: tournament and selection of the best individual. After creating the child elements of the population in the previous steps, a new population will be formed to run a new iteration. With the option of choice of both parent and child components for the next generation, the tournament will randomly select a predetermined number of elements and among them the ones with the best objective function; this process is repeated until a new generation is built. Through the method of selecting the best individual, the best individuals comprise the new population.
Algorithm 1 is proposed to calculate the upper bound of the elements of genetic algorithm. In this algorithm we introduce as the desired upper bound; ( , ) is the amount of product produced between the steps −1 and , ( , ) is the amount of the product met in customer , and ( , ) is the vector of customer indices, sorted by the profit of product from the period to period .
We can represent the metaheuristics with Figure 10.

Efficiency.
To evaluate the efficiency of greedy and genetic algorithm, a new ILP model is presented. It is necessary because greedy algorithm is calculated through priority list defined in (9) and genetic algorithm through of upper bound calculation defined in Algorithm 1. This model is easy to resolve due to little quantity of integer variables. To all tests the runtime was less than 10 seconds. For the model to follow, each method will supply a resulting scheduling to the model that evaluates the efficiency, which will obtain the value of a function of an ILP problem as efficiency of the method.
The index , , , and and the parameters ( , ), ( , ), cost 2 ( , ), and cap( ) were defined in Section 2 and given ( , ) as the amount of product available for shipment during period calculated by scheduling informed as a parameter. And we obtain the real variable V( , ) being the amount of product sold at the customer and the integer variables V( , , , ) are the quantity of product shipped to customer by truck over period and ( , , ) is the truck release to customer in the period .

Constraints
The objective function is defined by (7). Thus, we have the integer linear programming problem that will give us the result of each method for the presented problem Max (7) s.t. (3) , (11) , (12) , (13) where V ∈ + , , ∈ + . Step 4: mutation in the new population Step 3: generation of new elements Step 5: select the best elements Step 1: create initial population Step 2: If (number of iterations ≤ maximum) Figure 10: Representation of the genetic algorithm.

Results and Discussion
The following are the results of a presentation by a real example of baking industry and the analysis of the complexity of each method. The tests were carried out on a computer equipped with Intel Core i7, 2.93 GHz, 6 GB of memory, and the 64-bit Windows 7 operating system. The resolution of integer linear programming models was calculated by the solver XPress 7.1. The greedy heuristic and genetic algorithm methods were implemented in C-language.

Algorithm Performance.
In order to analyze the performance of the greedy heuristic, we will calculate the processing time of the algorithm due to the growth of its dimensions. First we will examine the additional time as a function of the growing number of products and markets. Accordingly, we fix the number of production lines at 40 and the number of vehicles at 5.
Greedy Heuristic. The analysis will be executed with randomly generated data, calculating the average run time of the greedy heuristic as shown in Figure 11.
The plotted points suggest the adjustment of a linear curve, where we can calculate the parameters of the curve by the method of least squares and get the value of 2 = 0.9966, where 2 = ∑ (error ) 2 /( − 1) and error is the difference between the value of the curve and points and the total points used, which shows a linear correlation between the data; therefore the complexity depending on the products and markets is approximately (no. product) * (no. markets) that is, ( * ).
Now if we fix the number of products and markets we can analyze the variation in the runtime depending on the number of lines and production options for each product. We will set this for 80 products, 20 destinations, and 5 vehicles. Figure 12 presents the result of complexity in terms of lines and options for each product line.
We can see by Figure 12 that the curve is a 2-degree polynomial and, by applying least squares, we get 2 = 0.9746   showing a correlation between the variables, and it can be said that complexity in terms of production lines is ( 2 ).
Genetic Algorithm. We can repeat the analysis for the genetic algorithm. The set of points (product * destination, time) in Figure 13 suggests a logarithmic function as a setting curve. By finding the parameters by the method of least squares, we obtain the function shown in Figure 13 and we find 2 with the value of 0.9851; then we can say that the complexity of the algorithm regarding products and destinations is (log( * )). The set of points (line option * product-line, time) in Figure 14 suggests a setting curve logarithmic function. After finding the parameters by least squares we can say that the complexity is (log( )).

Convergence of Genetic Algorithm.
The parameters used to genetic algorithm are in Table 1, and the best settings towards accuracy (approximation of optimal solution) and runtime were AG6, AG16, and AG17, which presented the convergence in 15.
Several parameters were used for genetic algorithm as shown in Table 1; the choice of the best parameters was done by results of run time and accuracy. The genetic algorithm proved little sensitiveness in relation to the variation of mutation rate, but high sensitiveness in relation to the selection mode and population size. Regarding selection mode, the best selection mode found quicker and better solutions if compared to the tournament mode. The convergence of best selection is quicker than the tournament one because it prioritizes the best solutions to continue the methods while tournament tries to select the best participants in a random subset of a given population. This is done in an attempt to find better solutions in other feasible regions.
Therefore, we conclude that the best solutions can be found around a best solution elected by the mode; this is noticed when greedy heuristic and genetic algorithm (when using best select mode with high mutation rate (0.10) and large population size (100 elements)) get good solutions; that is, genetic algorithm gets the best solutions in finding good points and exploring this region around the elected location, and this is why we use the AG17 set.
Analyzing the set of parameters that use the best selection mode, in Figure 15 we can observe the convergence of the sets. AG17 presented the greatest benefit in general: runtime, accuracy, and convergence, while AG16 set had a faster convergence but it converged to a worse solution than AG17 and AG6. The AG6 set got the solution next to AG17, but its convergence was slower if compared to AG17 because it did not explore other feasible regions due to its small population size, taking a bigger runtime. Table 3 the efficiency of algorithms that represent the quality of solutions obtained and the instances tested in Table 2 is presented. The data from the tests executed can be found in [27].

Result of Solutions. In
In comparing greedy heuristic with genetic algorithm results the first one had a quicker runtime but a worse accuracy. The runtime difference can be explained due to  greedy heuristic which executed less and simpler iterations than genetic algorithm. In the other hand it explores a smaller space to find good solutions. Genetic algorithm got better accuracy than greedy heuristic which can be seen in Table 3, which is expected because metaheuristics does a search for best solution in a bigger  Trucks  AE1  11  11  1  5  AE2  25  10  5  5  AE3  50  10  10  5  AE4  60  13  10  5  AE5  70  14  10  5  AE6  80  16  17  5  AE7  90  18  17  5  AE8  115  21  17  5 region inside feasible region than greedy heuristic. Although genetic algorithm had worse runtime, it is acceptable because it is within expectation in practical (for the biggest instance was 1.5 minutes) and the accuracy was very satisfactory (for small instances less than 10% of upper bound obtained for ILP). The objective of the problem is to maximize the profit through the best operation including production sequencing and shipments. Figure 16 and Table 4 demonstrate the best solution according to genetic algorithm to real data (AE8), with the sequencing of each production line over a period of 24 hours.
We can ascertain that the genetic algorithm in its AG17 parameterization had the best performance; for instance AE1 approached the upper bound of the ILP and obtained most of the best solutions. The greedy heuristic presented satisfactory solutions for some instances, but for others it was well below  Table 4: Best production planning (genetic algorithm)-real data.
the best solution; however, the run time was low, unlike the genetic algorithm, which has a greater runtime and ILP, which has a nonviable runtime for executing a plan that needs to be put in place within hours. Among the parameterizations of the genetic algorithm, AG17 was efficient and stable; although in two instances AG6 was more efficient, in others it was well below the objective function value of AG17, showing some instability with changes in the dimensions of the problem. AG16 failed the best objective function values compared to the others.
The genetic algorithm was efficient for solutions with data from actual dimensions with an acceptable runtime and better objective function value compared to the greedy heuristic, showing whether it is applicable to the problem and stable on the variation of the dimensions of the problem.
In the evaluation of all methods with all the parameterizations presented, the one that performed best was the AG17 parameterization of the genetic algorithm, as it had the best approach for small problems and the best objective function value compared to other possibilities presented. AG17 also had low run-time, losing to the instances of the greedy heuristic and for some tests below AG16 parameterization, but the quality of the solution of the greedy heuristic is well below the genetic algorithm and the AG16 instance in terms of execution time was very close to the AG17 instance.

Conclusions
The aim of this study was to represent a widely regarded scheduling problem in the baking industry, which has conflicting variables, and to propose a mathematical solution to solve it through methods such as genetic algorithm and greedy heuristic.
In this study it was possible to formalize the problem with a mathematical representation so that it can be solved as an ILP problem, because in literature we cannot find a representation of the problem as a whole, only part of the problem.
One of the contributions of this paper is a new model to scheduling problem that is not based in traveling salesman problem (TSP). Due to complexity of due date and the maximization of profit and not minimization of makespan this problem cannot be modeled as TSP.
It was demonstrated that the real scheduling problem is able to be modeling as mixed integer linear program problem different the classical models. And it is possible to resolve this problem utilizing metaheuristics to find good solutions.
The solutions from greedy heuristic presented a very low runtime and the value of objective functions was the next best solution obtained. And the algorithm was shown to have polynomial order of complexity in practical, which shows that even if the problem grows over time it is still acceptable.
However, the genetic algorithm showed strong adaptation to the problem, so the solution was easily represented to be used by the genetic algorithm, making the algorithm easy to implement. The results showed themselves to be very efficient, obtaining good quality solutions for some instances which came close to the upper bound obtained in solving the ILP. The algorithm also proved to be efficient for larger instances, obtaining good solutions in an acceptable runtime, that is, within the limits to enable a viable plan. In the complexity analysis, the algorithm proved itself to be efficient in having logarithmic complexity, which shows that even with the growth of the problem the running time should not vary greatly, thus enabling the execution of the algorithm for larger problems.