Optimization of High-Dimensional Functions through Hypercube Evaluation

A novel learning algorithm for solving global numerical optimization problems is proposed. The proposed learning algorithm is intense stochastic search method which is based on evaluation and optimization of a hypercube and is called the hypercube optimization (HO) algorithm. The HO algorithm comprises the initialization and evaluation process, displacement-shrink process, and searching space process. The initialization and evaluation process initializes initial solution and evaluates the solutions in given hypercube. The displacement-shrink process determines displacement and evaluates objective functions using new points, and the search area process determines next hypercube using certain rules and evaluates the new solutions. The algorithms for these processes have been designed and presented in the paper. The designed HO algorithm is tested on specific benchmark functions. The simulations of HO algorithm have been performed for optimization of functions of 1000-, 5000-, or even 10000 dimensions. The comparative simulation results with other approaches demonstrate that the proposed algorithm is a potential candidate for optimization of both low and high dimensional functions.


Introduction
One of the basic problems of numerical optimization techniques is the computing globally optimal solutions of highdimensional functions. The aim of optimization is the finding of optimum values of the objective function through learning the parameters of the function given in the defined domains. The learning algorithms are basically divided into two categories. The algorithms based on derivatives of the cost functions (or objective functions) are called derivative based learning algorithms, and the algorithms that do not use the derivatives of the cost functions are called derivative free learning. Recently various learning techniques have been applied to obtain the solution of different optimization problems. However, derivative based learning techniques do not fare well for finding global optimal solutions of the nonlinear problems having many local optimal solutions. Derivative free learning techniques and evolutionary computing are effective optimization techniques that can be used to solve "local minima" problem and find global optimum of the problem.
Recently number of researches has been done on global optimization, but there are still not many powerful techniques for optimization of dense high-dimensional problems. This is because the global optimization of high-dimensional functions is computationally expensive, cost involved. These 2 Computational Intelligence and Neuroscience problems are characterized by many parameters, and many iterations and arithmetic operations are needed for evaluations of these functions. In practical applications, evaluation of the function is often very expensive and large number of function evaluations might not be very feasible [19].
Some learning algorithms have been designed for global optimization of high-dimensional functions. Reference [20] uses new variant of differential evolution (DE), named DECC-I and DECC-II for high-dimensional optimization (up to 1000 dimensions). The algorithms use several novel strategies that focus on problem decomposition and subcomponents cooperation. An improved differential evolution algorithm [21], self-adaptive differential evaluation algorithm [22], differential ant-stigmergy [23], particle swarm optimization [24,25], modified multiscale particle swarm optimization [26], surrogate-assisted evolutionary programming [27], and group search optimizer (GSO) inspired by animal behavior [28] are designed and applied for global optimization of high-dimensional functions. As shown the designed algorithms are basically modification, improvement, and adaptation of existing evolutionary algorithms in particularly DE, PSO, and GA. Using these methods the researchers try to obtain reasonable results for optimization functions. In spite of some success, these techniques are still not very much suitable for high-dimensional global optimization problems [19]. The proposed algorithms are more suitable for lowdimensional problems. The dimension that was used in above research papers was maximum 100 and some of them 1000. In this paper, the novel method that solves high-dimensional global optimization problems having sizes of 1000, 5000, and 10000 is proposed. The proposed novel method is called hypercube optimization (HO) algorithm. The HO algorithm is based on designing hypercube, selecting the best elements and applied them to multivariate systems for optimization of the objective function. This algorithm approaches optimal points using the best elements determined during learning.
The paper is organized as follows. Section 2 presents the hypercube optimization algorithm proposed. The processes used in the algorithm are described. Section 3 describes the test functions used in simulations. Section 4 includes application of the algorithm on test functions. Section 5 presents comparative results of HO algarithm with some existing methods. Finally, in Section 6 conclusions are presented.

Hypercube Optimization Algorithm
The HO algorithm is an evolutionary algorithm that takes inspiration from the behaviour of a dove discovering new areas for food in natural life. In such behaviour a flying dove searches for new locations of food. The dove flies down in a unique way and marks the area that may have food. The dove flies up again and it chooses the previously marked areas and changes and shrinks the sizes of the search area. In a search process, the dove is not limited to a single area. The dove picks new search area according to the density of food (domain for the objective function). The dove stops flying and keeps in mind the area which has food. After eating the food, the dove is looking for a new search area. The dove jumps or flies down another area branch to find a new area. The dove does not fly to another area when it gets to an area that has the most food.
In the paper, the hypercube is used to describe the search area. Inside the search area, the value of an objective function is evaluated according to the quantity and density of food. Next, the functional distances between each of two solutions are determined. This distance helps the algorithm to determine the next new search area. This is performed using the displacement-shrink process.
The hypercube optimization algorithm is a derivativefree learning method based on evaluation of set of points randomly distributed in an -dimensional hypercube. After evaluation the point shifts and contracts according to the average between previous best points in order to determine new best points inside the hypercube. The contraction is greater when the movement is smaller to accelerate the convergence. This operation will be reported as an optimal solution at the end of the iterations.
The HO algorithm is an intense stochastic search method based on hypercube (HC) evaluation. The general structure regarding the visualization of the flowchart of the hypercube optimization algorithm is illustrated in Figure 1. As shown from the figure, the HO algorithm includes three basic processes.
Step A (initialization and evaluation process). The algorithm begins with the generation of a hypercube and initialization of matrices and variables within the hypercube. Here Computational Intelligence and Neuroscience 3 the hypercube is represented by the center and size (radii). The new points with uniform distribution are randomly generated within the hypercube. It proceeds to the through main loop, by which convergence to the global minimum is sought, and it finishes when any of the termination criteria is fulfilled.
Step B (displacement-shrink process). The displacementshrink process is deployed to find the new best point. This is implemented by computing the average of the current best point and the previous best one. The average between both values is taken as a conservative measure to avoid excessive fluctuations in the search.
Step C (searching space process). The searching space process controls the movements of solutions according to the defined interval (commonly [0, 0.1]). The searching space process initializes a new hypercube and repeats the whole process.
The initialization and evaluation process, displacementshrink process, and searching space process are repeated in each learning iteration. While specific termination conditions are satisfied the whole processes are continued to execute.
At each iteration, the newly generated hypercube changes and shrinks its sizes until the optimum points are located. Unlike other methods, like particle swarm optimization, the points in the hypercube optimization algorithm do not move according to a specific rule nor does the method record them, except for the best points. This permits a rapid selection of a new best zone and an intense search in it. Thus, the hypercube optimization algorithm does not perform any local search but rather it is always global. This behavior allows the algorithm to move rapidly to globally best points, as it does not waste time in local searches.
Following in the next subsections the descriptions of each step are presented in detail.

Hypotheses and Representation of Solution.
As in all realvalued single-objective unconstrained optimization algorithms, we try to find the minimum (or equivalently the maximum) of a scalar objective function ( ) and represent the free parameters as a vector or point = ( 1 , 2 , 3 , . . . , ), where is the dimension of the problem. Therefore, is a mapping R → R. We assume the following hypotheses.
(i) is available only as a black box; that is, we have no knowledge or possibility of control of its interior functions. We access only via input-output.
(ii) has a continuous domain inside the bounds; that is, every point inside the bounds has a mapping by .
(iii) is well-behaved in the domain, at least numerically; that is, it is continuous and presents certain smoothness. This constrains overly noisy functions, where there is no spatial correlation. But implicit is also the assumption of some noisiness, whereby finite differences in the neighborhood of a point are not similar to the derivatives of the noiseless function. (iv) The number of searching points ( ) is enough for correctly sampling 's domain (related to the previous point). Therefore, is directly related to the dimension of the problem ( ) and 's smoothness.

Initialization and Evaluation Process.
Initialization and evaluation is the first block of hypercube optimization algorithm. The starting conditions are (1) initial (and global) boundaries for all points: these boundaries are the sides of the hypercube; (2) initialization of solutions inside the hypercube and an initial random choice of a best point 0 (if not available, the central point of the initial hypercube will be taken) in the given set.
Initial points of the hypercube optimization algorithm are presented in Table 1. At the starting stage the data radii and centre of the HC are generated randomly and these parameters are used to initialize the first HC. Then uniformly distributed searching points are generated inside the hypercube. Using these points, the values of the objective function are determined. Here the concept is to have an approximate knowledge about the location of the lowest values of . This initial sampling has to be sufficiently dense so as to probe all the possible zones of higher and lower values; otherwise, the algorithm can take the zone sought (global optimum) as a simply better one (local optimum). As pointed out above, this density (and hence the number of points ) is a function of the dimension and the smoothness of the function. The problems with higher dimension will require higher .
The hypercube optimization algorithm begins with the initialization of matrices and variables; it proceeds to the main loop, by which convergence to the global minimum is sought, and it finishes when any of the termination criteria is fulfilled. The details regarding the visualization of the flowchart initialization and evaluation of the HO algorithm 4 Computational Intelligence and Neuroscience Compute the values of F = f(X) random X new points inside the HC Using X new points find minimum (best) value of functions F best , and corresponding X best points center of last HC and best X best points Create next HC, derive X matrix Figure 2: Flowchart of the initialization and evaluation process.
Here B is displacement-shrink process and C is searching space process.
are illustrated in Figure 2. After the start block, initial point 0 is generated as the centre of the first hypercube (HC). The initial value of the radii of the first HC is determined according to the change interval of the test (objective) functions. Next using the value of centre 0 the dimension of the hypercube is derived according to formula (1). After creating the hypercube, the matrix is generated within this hypercube. The size of is defined by ( × ). is a number of generated points. We need to comment that in future iterations ( = 2, 3, . . .) the hypercube is created using the values of matrix.
We have illustrated this process as follows with initial points to create them with default values.
(1) Dimension of hypercube is (2) Row vectors with lower and upper boundaries of HC are LB = min ( bounds) , (3) Dimensions of -dimensional HC's are (4) Central values are (5) Vector with radii of HC is According to matrix, the row vector with lower and upper boundaries of the hypercube (2) is determined. Using these boundaries, obtained from the first hypercube (zone), the radii (4) and the centre (5) points of the next hypercube are determined. matrix, defined as searching points, is applied to determine the values of the test function, that is, ( ( )) matrix, as pointed out above in Table 1. In the next step using the HC, the new uniformly random points are derived. The number of points is defined according to the dimension of the HC. These points form the new new matrix. This matrix is used to evaluate the test functions. As a result of evaluation, the best (minimum) value of function best and the corresponding best points are determined. By "best" we mean the vector that corresponds to the best fitness (e.g., the lowest objective function value for a minimization problem) in the entire population at th iteration. The best point is improved (updated) using local search; that is, new best = best + Δ . Here 0 ≤ ≤ 1, is the objective function. The improvement is continued until Δ becomes acceptably small value less than a preset value (tol ). The derived best points are used to determine the centre and the radii of the next hypercube. This operation is realized by calculating the mean of the center of the last HC ( last centre ) and the previous best ( best ) points; that is, ( last centre + best )/2. This process is called "displacement. " As shown the created second HC is derived from the previous HC and the sizes of the second HC will be less than the sizes of the previous one. In future operations, the last-second HC will be used to create the nextthird hypercube.
In summary, we can unify the evaluation and learning processes as follows. When the new hypercube is initialized, the function is evaluated at new points, randomly (with uniform distribution) chosen from inside of the hypercube. The new minimum is determined and compared with the last minimum. If the new minimum is worse (greater) than the previous one, then a new iteration will be started. If the same value is repeated several consecutive times then the algorithm ends, and the best minimum is considered as the global minimum.
After the above given initialization and evaluation processes the implementation of displacement-shrink process and searching space process is performed. The whole process is repeated until specific termination conditions are satisfied.

Displacement and Shrink
Process. The center of the next hypercube will be just the average between the current best point and the previous one; that is, ( last centre + best )/2. The average between both values is taken as a conservative measure to avoid excessive fluctuations in the search and to prevent moving suddenly to a neighboring zone where a lower value was found, but which perhaps is just a local minimum. The radii of the new hypercube are determined as new = old * . Here is a factor of convergence which is defined in the next section (see (10)).
In addition to moving, the hypercube has to contract in order to refine the search and to converge to a unique and certain-assumed global-minimum. This contraction is controlled by the movement of the average of best values. For large displacements, there is no contraction, as we interpret that the global minimum is still very uncertain. For small or null displacements, the hypercube will shrink, as we interpret this to mean that we are closer to the global minimum: the contraction is greater for smaller movements. This derives the fast convergence of the method, while it prohibits getting stuck at undesired (local) minima.
The details regarding the visualization of the flowchart of the displacement-shrink process of the hypercube optimization algorithm is illustrated in Figure 3. At first, the minimum of value of best is compared with the new value of mean corresponding to the point mean = ( last centre + best )/2 determined as pointed out in the previous section. If mean value is less than best value then, in given iteration, displacement (or movement) is computed and normalized twice: first each element of is divided by the corresponding initial range (and thus the displacement is transformed into a unity-sided hypercube) and then that quantity is normalized again, dividing it by the diagonal of hypercube √ . These operations are illustrated as follows: (1) normalized (previous for minimum): (2) normalized min (current for minimum): (3) normalized distance (should be bounded by 0 and sqrt of ): Test F best > F mean Figure 3: Flowchart of the displacement-shrink process. Here A is initialization and evaluation process and C is searching space process.
(4) renormalized distance (should be bounded by 0-0.1): In the result of these operations, points are shrunk (become smaller) to the centre point . These points are used to evaluate the test functions again. In the next blocks the hypercube continues moving and shrinking until one of the following conditions are not met.
(i) The change in consecutive best values is smaller than a preset value (tol ), for a preset consecutive number of times. This is also interpreted as convergence in space.
(ii) The same or worse value is found consecutively a preset number of times. This is interpreted as nonconvergence in space.
(iii) The change in best value (renormalized distance) is smaller than a preset value (tol ), for a preset consecutive number of times. This is interpreted as convergence in R space. The whole process is repeated until specific termination conditions are satisfied.
(iv) The maximum number of iterations is reached: of course, in this case convergence is not guaranteed, as possibly lower values could be found with more iteration.
Each condition is tested for thirty consecutive times. If these conditions are not satisfied then the searching space process will be initialized.
We need to notice that the movement of will not be performed if the mean value will be larger than best value. In such case, the searching space process will be initialized.

Searching Space Process.
The searching space process initializes new center and size (radii) in order to create new hypercube. The objective function is evaluated at new points which are randomly chosen from the hypercube and having uniform distribution. The searching space process controls the movements of according to the interval defined, in particularly for movements < 0.1. The value of movement is determined by . The flowchart of the searching space process of the HO algorithm is illustrated in Figure 4.
If the movement of satisfies the condition then a factor of convergence is calculated and updated at each iteration: where is computed by (9) and describes the normalized distance moved by the average of last two best values of . Next the update of solutions will be performed. The size (in all the dimensions) of the hypercube is reduced by multiplying by this factor. Thus, the hypercube reduces or maintains its size for nontrivial movements and shrinks otherwise. The whole process is repeated until specific termination conditions are satisfied.

Test Functions
The proposed hypercube optimization algorithm is tested on five continues test functions which are widely used in the literatures: Ackley path function, Rastrigin function, Rosenbrock function, Griewank function, and Sphere function [19][20][21][22][23]. The test functions are more applicable for the experimental evaluations of methods used in global optimization problems. The designed algorithm is implemented in MATLAB.  Figure 4: Flowchart of the searching space process. Here A is initialization and evaluation process and B is displacement-shrink process.

Rosenbrock Valley Function. Rosenbrock's valley function
is known as the second function of De Jong. This test function is continuous, scalable, naturally nonseparable, nonconvex, and unimodal.

Sphere Function. The simplest benchmark function is
sphere model which is also called De Jong's function 1. This test model is continuous, unimodal, and appearance of convex.

Simulation Studies
The performance of the hypercube optimization algorithm is tested on the five benchmark functions given above. The benchmark functions 1 ÷ 5 are evaluated by considering the cases in which the problem dimensions are set as 1000, 5000, or even 10000 dimensions. At first the dimension is set as 1000. The population size is also set to 100, 1000, or even 10000. We have summarized the best average fitness (e.g., the lowest objective function value) and the average number of the test function evaluations over successful 30 runs. For each evaluation, the learning of the algorithm is continued 5000 iterations. The hypercube optimization algorithm has global minimum that was obtained with much well convergence process for these test functions.
No optimization algorithm guarantees convergence for any function, but it is a good practice to test the HO algorithm for several benchmark functions and tune the parameters. Therefore, we have tested the hypercube optimization algorithm on a set of benchmark functions, and the algorithm has yielded improved results, sometimes reaching the better solution faster than well-established algorithms. The details regarding the visualization of the test function results are given below.
In the next step, the test functions are evaluated for the cases in which the problem dimensions of 1 ÷ 5 are set to 5000 or even 10000 dimensions. The population size is set to 100. The convergence graphics have also been obtained and averaged through evaluations over successful 30 runs. The details of results regarding the visualization of the test function are given as follows.

Ackley Path Function. The
Ackley path function is an extensively used multimodal test function. Figure 5 ilustrates the convergence graphic of HO algorithm for 5000 dimensions. The population size of the HO algorithm is almost insensitive to the dimension of the problems. The minimum of Ackley test function was obtained as 2.76 − 07. Figure 6 depicts the convergence graphic of the HO algorithm for the Ackley test function having 10000 dimensions. The minimum value of the function was obtained as 1.16 − 06.

Rastrigin Function.
The Rastrigin function is a typical nonlinear multimodal function. This test function is a fairly difficult problem for evolutinary algorithms due to the high number of dimensions and large number of local minima. Figure 7 depicts the convergence graphic of HO algorithm for the Rastrigin test function having 5000 dimensions. The minimum was obtained as 7.13 − 10. The HO algorithm can find near-optimal solutions with much well convergence with high dimension for this test function.

Rosenbrock Function.
The Rosenbrock function is a typical naturally nonseparable, nonconvex, and unimodal. This test function is also a fairly hard problem for evolutionary algorithms. Figure 9 depicts the convergence graphic for the Rosenbrock test function having 5000 dimensions. The minimum value of function was obtained as 1.15 − 08. The HO algorithm can find optimal or near-optimal solutions with much well convergence. This fact indicates that HO algorithm is almost insensitive to the dimension of the problems.

Sphere Function.
The Sphere function is a typical unimodal test function. Figure 11 depicts convergence graphic of HO algorithm for the Sphere test function having 5000 dimensions. The minimum value of test function using HO algorithm was obtained as 4.64 − 020.
In Figure 12, the convergence graphic of hypercube optimization algorithm for the Sphere test function having 10000 dimensions is given. The minimum value was obtained as 2.40 − 016 with much well convergence. This test function Computational Intelligence and Neuroscience is a fairly easy problem for finding the total optimum and in the fast convergence.

Griewank Function.
The Griewank function is also a typical nonlinear multimodal function. This test function is tested using many multiobjective evolutionary algorithms [23]. Figure 13 depicts the convergence graphic for the Griewank test function having 5000 dimensions. The minimum value of function was obtained as 3.34 − 013.
In Figure 14, the minimum value of test function using HO algorithm was obtained as 1.11 − 016 for 10000 dimensions. The HO algorithm can find optimal or near-optimal solutions with much well convergence with high dimension for this test function.

Comparison
The hypercube optimization algorithm has yielded in general quite better results, sometimes reaching the better solution faster than well-established algorithms. The usage of multiobjective evolutionary algorithms allows us to find global optimal solutions and avoid local optimum problem. The simulation results of HO algorithm that was obtained with test functions with different dimensions and averaged over 30 runs are given in Table 2. Using the table we can see