A Review of Piecewise Linearization Methods

1 Department of Information Technology and Management, Shih Chien University, No. 70 Dazhi Street, Taipei 10462, Taiwan 2 Program in Industrial and Systems Engineering, University of Minnesota, 111 Church Street SE, Minneapolis, MN 55455, USA 3 School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai 200433, China 4 School of Management, Tokyo University of Science, 500 Shimokiyoku, Kuki, Saitama 346-8512, Japan 5Department of Business Management, National Taipei University of Technology, Section 3, No. 1 Chung-Hsiao E. Road, Taipei 10608, Taiwan


Introduction
Piecewise linear functions are frequently used in various applications to approximate nonlinear programs with nonconvex functions in the objective or constraints by adding extra binary variables, continuous variables, and constraints.They naturally appear as cost functions of supply chain problems to model quantity discount functions for bulk procurement and fixed charges.For example, the transportation cost, inventory cost, and production cost in a supply chain network are often constructed as a sum of nonconvex piecewise linear functions due to economies of scale [1].Optimization problems with piecewise linear costs arise in many application domains, including transportation, telecommunications, and production planning.Specific applications include variants of the minimum cost network flow problem with nonconvex piecewise linear costs [2][3][4][5][6][7], the network loading problem [8][9][10][11], the facility location problem with staircase costs [12,13], the merge-in-transit problem [14], and the packing problem [15][16][17].Other applications also include production planning [18], optimization of electronic circuits [19], operation planning of gas networks [20], process engineering [21,22], engineering design [23,24], appointment scheduling [25], and other network flow problems with nonconvex piecewise linear objective functions [7].
Various methods of piecewisely linearizing a nonlinear function have been proposed in the literature [26][27][28][29][30][31][32][33][34][35][36][37][38][39].Two well-known mixed-integer formulations for piecewise linear functions are the incremental cost [40] and the convex combination [41] formulations.Padberg [35] compared the linear programming relaxations of the two mixed-integer programming models for piecewise linear functions in the simplest case when no constraint exists.He showed that the feasible set of the linear programming relaxation of the incremental cost formulation is integral; that is, the binary variables are integers at every vertex of the set.He called such formulations locally ideal.On the other hand, the convex combination formulation is not locally ideal, and it strictly contains the feasible set of the linear programming relaxation of the incremental cost formulation.Then, Sherali [42] proposed a modified convex combination formulation that is locally ideal.Alternatively, Beale and Tomlin [43] suggested a formulation for the piecewise linear function similar to convex combination, except that no binary variable is included in the model and the nonlinearities are enforced algorithmically, directly in the branch-and-bound algorithm, by branching on sets of variables, which they called special ordered sets of type 2 (SOS2).It is also possible to formulate piecewise linear functions similar to incremental cost but without binary variables and enforcing the nonlinearities directly in the branch-and-bound algorithm.Two advantages of eliminating binary variables are the substantial reduction in the size of the model and the use of the polyhedral structure of the problem [44,45].Keha et al. [46] studied formulations of linear programs with piecewise linear objective functions with and without additional binary variables and showed that adding binary variables does not improve the bound of the linear programming relaxation.Keha et al. [47] also presented a branch-and-cut algorithm for solving linear programs with continuous separable piecewise-linear cost functions.Instead of introducing auxiliary binary variables and other linear constraints to represent SOS2 constraints used in the traditional approach, they enforced SOS2 constraints by branching on them without auxiliary binary variables.
Due to the broad applications of piecewise linear functions, many studies have conducted related research on this topic.The main purpose of these studies is to find a better way to represent a piecewise linear function or to tighten the linear programming relaxation.A superior representation of piecewise linear functions can effectively reduce the problem size and enhance the computational efficiency.However, for expressing a piecewise linear function of a single variable  with +1 break points, most of the methods in the textbooks and literature require adding extra  binary variables and 4m constraints, which may cause a heavy computational burden when  is large.Recently, Li et al. [48] developed a representation method for piecewise linear functions with fewer binary variables compared to the traditional methods.Although their method needs only ⌈log 2 ⌉ extra binary variables to piecewisely linearize a nonlinear function with  + 1 break points, the approximation process still requires 8 + 8⌈log 2 ⌉ extra constraints, 2  nonnegative continuous variables, and 2⌈log 2 ⌉ free-signed continuous variables.Vielma et al. [39] presented a note on Li et al. 's paper and showed that two representations for piecewise linear functions introduced by Li et al. [48] are both theoretically and computationally inferior to standard formulations for piecewise linear functions.Tsai and Lin [49] applied the Vielma et al. [39] techniques to express a piecewise linear function for solving a posynomial optimization problem.Croxton et al. [31] indicated that most models of expressing piecewise linear functions are equivalent to each other.Additionally, it is well known that the numbers of extra variables and constraints required in the linearization process for a nonlinear function obviously impact the computational performance of the converted problem.Therefore, this paper focuses on discussing and reviewing the recent advances in piecewise linearization methods.Section 2 reviews the piecewise linearization methods.Section 3 compares the formulations of various methods with the numbers of extra binary/continuous variables and constraints.Section 4 discusses error evaluation in piecewise linear approximation.Conclusions are made in Section 5.

Formulations of Piecewise Linearization Functions
Consider a general nonlinear function () of a single variable ; () is a continuous function, and  is within the interval [ 0 ,   ].Most commonly used textbooks of nonlinear programming [26][27][28] approximate the nonlinear function by a piecewise linear function as follows.
() can then be approximately linearized over the interval [ 0 ,   ] as where  = ∑  =0     , ∑  =0   = 1,   ≥ 0, in which only two adjacent   's are allowed to be nonzero.A nonlinear function is then converted into the following expressions.
The above expressions involve  new binary variables  0 ,  1 , . . .,  −1 .The number of newly added 0-1 variables for piecewisely linearizing a function () equals the number of breaking intervals (i.e., ).If  is large, it may cause a heavy computational burden.
Li and Yu [33] proposed another global optimization method for nonlinear programming problems where the objective function and the constraints might be nonconvex.A univariate function is initially expressed by a piecewise linear function with a summation of absolute terms.Denote   ( = 0, 1, . . .,  − 1) as the slopes of line segments between   and  +1 , expressed as () can then be written as follows: where where  ≥ 0,   ≥ 0,   ≥ 0,   ∈ {0, 1},  are upper bounds of  and   are extra binary variables used to linearize a nonconvex function () for the interval Comparing Method 2 with Method 1, Method 1 uses binary variables to linearize () for whole  interval.But the binary variables used in Method 2 are only applied to linearize the non-convex parts of ().Method 2 therefore uses fewer 0-1 variables than Method 1.However, for () with  intervals of the non-convex parts, Method 2 still requires  binary variables to linearize ().
Another general form of representing a piecewise linear function is proposed in the articles of Croxton et al. [31], Li [32], Padberg [35], Topaloglu and Powell [36], and Li and Tsai [38].The expressions are formulated as shown below.

Method 3. Consider
where where  is a large constant and The above expressions require extra  binary variables and 4 constraints, where  + 1 break points are used to represent a piecewise linear function.
Form the above discussions, we can know that Methods 1, 2, and 3 require a number of extra binary variables and extra constraints linear in  to express a piecewise linear function.To approximate a nonlinear function by using a piecewise linear function, the numbers of extra binary variable and constraints significantly influence the computational efficiency.If fewer binary variables and constraints are used to represent a piecewise linear function, then less CPU time is needed to solve the transformed problem.For decreasing the extra binary variables involved in the approximation process, Li et al. [48] developed a representation method for piecewise linear functions with the number of binary variables logarithmic in .Consider the same piecewise linear function () discussed above, where  is within the interval [ 0 ,   ] and  + 1 break points exist within [ 0 ,   ].Let  be an integer, 0 ≤  ≤  − 1, expressed as Let () ⊆ {1, 2, . . ., ℎ} be a set composed of all indices such that ∑ ∈() 2 −1 = .For instance, (0) = , (3) = {1, 2}.
To approximate a univariate nonlinear function by using a piecewise linear function, the following expressions are deduced by the Li et al. [48] method.

Mathematical Problems in Engineering
where    ∈ {0, 1},  , ,   , and   are free continuous variables,   and   are nonnegative continuous, and all the variables are the same as defined before.
The expressions of Method 4 for representing a piecewise linear function () with  + 1 break points use ⌈log 2 ⌉ binary variables, 8 + 8⌈log 2 ⌉ constraints, 2 non-negative variables, and 2⌈log 2 ⌉ free-signed continuous variables.Comparing with Methods 1, 2, and 3, Method 4 indeed reduces the number of binary variables used such that the computational efficiency is improved.Although Li et al. [48] developed a superior way of expressing a piecewise linear function by using fewer binary variables, Vielma et al. [39] investigated that this representation for piecewise linear functions is theoretically and computationally inferior to standard formulations for piecewise linear functions.Vielma and Nemhauser [50] recently developed a novel piecewise linear expression requiring fewer variables and constraints than the current piecewise linearization techniques to approximate the univariate nonlinear functions.Their method needs a logarithmic number of binary variables and constraints to express a piecewise linear function.The formulation is described as shown below.
The linear approximation of a univariate (),  0 ≤  ≤   , by the technique of Vielma and Nemhauser [50] is formulated as follows.

Formulation Comparisons
The comparison results of the above five methods in terms of the numbers of binary variables, continuous variables, and constraints are listed in Table 1.The number of extra binary variables of Methods 1 and 3 is linear in the number of line segments.Methods 4 and 5 have the logarithmic number of extra binary variables with  line segments, and the number of extra binary variables of Method 2 is equal to the number of concave piecewise line segments.In the deterministic global optimization for a minimization problem, inverse, power, and exponential transformations generate nonconvex expressions that require to be linearly approximated in the reformulated problem.That means Methods 4 and 5 are superior to Methods 1, 2, and 3 in terms of the numbers of extra binary variables and constraints as shown in Table 1.Moreover, Method 5 has fewer extra continuous variables and constraints than Method 4 in linearizing a nonlinear function.
Till et al. [51] reviewed the literature on the complexity of mixed-integer linear programming (MILP) problems and summarized that the computational complexity varies from (⋅ 2 ) to (2  ⋅ 3 ), where  is the number of constraints and  is the number of binaries.Therefore, reducing constraints and binary variables makes a greater impact than reducing continuous variables on computational efficiency of solving MILP problems.For finding a global solution of a nonlinear programming problem by a piecewise linearization method, if the linearization method generates a large number of additional constraints and binaries, the computational efficiency will decrease and cause heavy computational burdens.According to the above discussions, Method 5 is more computationally efficient than the other four methods.Experiment results from the literature [39,48,49] also support the statement.
Beale and Tomlin [43] suggested a formulation for piecewise linear functions by using continuous variables in special ordered sets of type 2 (SOS2).Although no binary variables are included in the SOS2 formulation, the nonlinearities are enforced algorithmically and directly in the branch-and-bound algorithm by branching on sets of variables.Since the traditional SOS2 branching schemes have too many dichotomies, the piecewise linearization technique in Method 5 induces an independent branching scheme of logarithm depth and provides a significant computational advantage [50].The computational results in Vielma and Nemhauser [50] show that Method 5 outperforms the SOS2 model without binary variables.
The factors affecting the computational efficiency in solving nonlinear programming problems include the tightness of the constructed convex underestimator, the efficiency of the piecewise linearization technique, and the number of the transformed variables.An appropriate variable transformation constructs a tighter convex underestimator and makes fewer break points required in the linearization process to satisfy the same optimality tolerance and feasibility tolerance.Vielma and Nemhauser [50] indicated that the formulation of Method 5 is sharp and locally ideal and has favorable tightness properties.They presented experimental results showing that Method 5 significantly outperforms other methods, especially when the number of break points becomes large.Vielma et al. [39] explained that the formulation of Method 4 is not sharp and is theoretically and computationally inferior to standard MILP formulations (convex combination model, logarithmic convex combination model) for piecewise linear functions.

Error Evaluation
For evaluating the error of piecewise linear approximation, Tsai and Lin [49,52] and Lin and Tsai [53] utilized the expression |() − (())| to estimate the error indicated in Figure 2. If () is the objective function,   () < 0 is the th constraint, and  * is the solution derived from the transformed program, then the linearization does not require to be refined until |( * ) − (( * ))| ≤  1 and Max  (  ( * )) ≤  2 , where |( * ) − (( * ))| is the evaluated error in objective,  1 is the optimality tolerance,   ( * ) is the error in the th constraint, and  2 is the feasibility tolerance.
The accuracy of the linear approximation significantly depends on the selection of break points and more break points can increase the accuracy of the linear approximation.Since adding numerous break points leads to a significant increase in the computational burden, the break point selection strategies can be applied to improve the computational efficiency in solving optimization problems by the deterministic approaches.Existing break point selection strategies are classified into three categories as follows [54]: (i) add a new break point at the midpoint of each interval of existing break points; (ii) add a new break point at the point with largest approximation error of each interval; (iii) add a new break point at the previously obtained solution point.
According to the deterministic optimization methods for solving nonconvex nonlinear problems [29,33,38,39,48,49,[53][54][55][56], the inverse or logarithmic transformation is required to be approximated by the piecewise linearization function.For example, the function  = ln  or  =  −1 is required to be piecewisely linearized by using an appropriate breakpoint selection strategy, if a new break point is added at the midpoint of each interval of existing break points or at the point with largest approximation error, the number of line segments becomes double in each iteration.If a new breakpoint is added at the previously obtained solution point, only one breakpoint is added in each iteration.How to improve the computational efficiency by a better break point selection strategy still needs more investigations or experiments to get concrete results.

Conclusions
This study provides an overview on some of the most commonly used piecewise linearization methods in deterministic optimization.From the formulation point of view, the numbers of extra binaries, continuous variables, and constraints are decreasing in the latest development methods especially for the number of extra binaries which may cause heavy computational burdens.Additionally, a good piecewise linearization method must consider the tightness properties such as sharp and locally ideal.Since effective break points selection strategy is important to enhance the computational efficiency in linear approximation, more work should be done to study the optimal positioning of the break points.Although a logarithmic piecewise linearization method with good tightness properties has been proposed, it is still too time consuming for finding an approximately global optimum of a large scale nonconvex problem.Developing an efficient polynomial time algorithm for solving nonconvex problems by piecewise linearization techniques is still a challenging question.Obviously, this contribution gives only a few preliminary insights and might point toward issues deserving additional research.

Figure 2 :
Figure 2: Error evaluation of the linear approximation.

Table 1 :
Comparison results of five methods in expressing a piecewise linearization function with  line segments (i.e.,  + 1 break points).