Hierarchical Genetic Algorithm for B-Spline Surface Approximation of Smooth Explicit Data

1 Centro de Investigaciones en Óptica, A.C., Loma del Bosque 115, Col. Lomas del Campestre, 37150 León, GT, Mexico 2 Instituto Tecnológico Superior de Irapuato, Carretera Irapuato Silao Km 12.5, 36821 Irapuato, GT, Mexico 3 División de Ingenieŕıas Campus Irapuato-Salamanca, Universidad de Guanajuato, Carretera Salamanca-Valle de Santiago Km 3.5+1.8 Km, Com. Palo Blanco s/n, 36885 Salamanca, GT, Mexico


Introduction
Surface approximation is a recurrent problem in geometric modeling, data analysis, image processing, and many other engineering applications [1].Typically in engineering applications the approximating techniques are applied both to the reverse engineering problem and to design a surface that matches a set of desired characteristics.In reverse engineering, one is concerned with the automated generation of a CAD-model from a set of points digitized from an existing 3D object.
In this regard, the surface approximation problem may be formulated in different ways, yielding results depending on various choices.B-spline surfaces can be used to approximate an arbitrary set of data analytically; in this case a system of simultaneous equations is obtained, which is linear with respect to the weights but is nonlinear with respect to the knots.Therefore, the choice of the number and positions of the knots of a spline is fundamental, as well as the method used to solve the system of equations.Both tasks are critical but also troublesome, leading to a hard continuous multimodal and multivariate nonlinear optimization problem with many local optimal solutions.
To tackle this problem, several methods have been proposed in the literature.For instance, Shepard's method [2], the finite element method [3,4], and the tensor product spline method [5][6][7][8] are the most widely used and successful methods.Shepard's method, also known as the original inverse distance weighted interpolation method, deals with this issue through a continuous interpolation function from the weighted average of the data.The finite element method is a numerical approach for solving differential equations.This method consists of assuming the piecewise continuous function for the solution and obtaining the parameters of 2 Mathematical Problems in Engineering the functions in a manner that reduces the error in the solution.The tensor product spline is another method commonly used to approximate surfaces.It is a generalization of the spline approximations, which aims to get smooth functions from scattered points [9].
Other techniques based on intelligent search schemes have been proposed.In [10], the authors reconstruct a surface from a given set of data points, based on a two-step algorithm using particle swarm optimization.In contrast, in [11] a genetic algorithm is applied iteratively to fit a given set of data points in two steps.Additionally, in both previous approaches, a least squares solution must be performed to compute the B-spline coefficients [12,13].In order to get this solution, it is often convenient that knot distribution follows the rule given by the Schoenberg-Whitney theorem [14].
On the other hand, in [8] the authors use a genetic algorithm to optimize the number and locations of the knots in order to construct a B-spline curve that fits a set of data points.A modification of [8] is presented in [15].Here the authors use a real-coded genetic algorithm that can treat data having discontinuous points.Another example is presented in [16].Here the authors use a firefly algorithm to compute the approximating explicit B-spline curve to a given set of noisy data points.Even although these works address the case of curves, they can be applied to approximate a surface by repeated applications of the method for one-dimensional data for  and  directions separately but with considerable time consumption.
Since we consider a spline based approach, we remark the fact that the main issue associated with the surface approximation through splines is to find the best set of knots, where the term "best" implies an adequate choice in the number and location of the knots.To perform this task, in [5], the author provides a survey on the main algorithms used to carry out this task, which are based on regression spline methods and their respective optimizations.
Unlike the authors mentioned above, we tackle the Bspline surface approximation problem by using the hierarchical genetic algorithm.To be more specific, we consider a hierarchical structure to represent both the model structure (number and location of knots) as a binary encoding and the model parameters (spline coefficients) as a real encoding.Thus, we search for the best B-spline based surface model using a novel fitness function.As a result, our method can simultaneously determine the number and position of the knots as well as B-spline coefficients.In addition, our approach is able to find solutions with fewest parameters within the B-spline basis functions.
In this work, we focus on the approximation to a given set of noise data points, which are arranged in a rectangular domain, under the assumption that these data represent a smooth surface.These are the conditions under which the method can be applied.However, our method can be extended to approximate unsmooth functions over scattered data points.
This paper is organised as follows: the fundamentals of HGA and the description of B-spline surfaces are presented in Section 2, followed by the description of our approach in Section 3. In Section 4 we present some numerical results and a comparison with popular method.Finally, in Section 5 the paper closes with the main conclusions.

Background
In this section, the fundamentals of the hierarchical genetic algorithms and surface approximation by splines are explained in detail.
2.1.Hierarchical Genetic Algorithms.Genetic algorithms (GA) [17], originally developed by Holland, have been used to optimize a fitness function by mimicking the natural evolution of living organisms.Individuals of this evolution are computational representations of potential solutions for the problem to be solved.Each individual is represented as a computational encoding and is commonly called chromosome.The entire set of individuals examined at a given time is called the population; this population is evaluated with a fitness function to determine which individuals represent better solutions to the problem.Fitness values are used by the selection operator to scale the likelihood that a chromosome will be used to produce a new chromosome.The production of new chromosomes from old ones is achieved through the application of the genetic operators, crossover, and mutation.An iterated application of selection and genetic operators is repeated until a desired termination criterion is reached.
On the other hand, HGA has a hierarchical structure of chromosomes.From biological and medical viewpoints, the genetic structure of a chromosome is formed by a number of gene variations arranged in a hierarchical manner.In the light of this issue, Man et al. [18] proposed a hierarchical structure of chromosome to emulate the formulation of a biological DNA structure, where genes dominate other genes and there are active and inactive genes.Such a phenomenon is a direct analogy to the topology of many engineering systems [19].Thus, allow us to emulate the formulation of a biological DNA structure so that a precise hierarchical genetic structure can be formed for engineering purpose.
The computational chromosome in an HGA consists of two types of genes called control and parametric genes.Typically, control genes are represented as a binary encoding, and the parametric genes are coded as real numbers, although these last ones can be coded as any type of data structure.The purpose of control genes is to represent the parametric selection; this is particularly important for determination of structure or topology on the individuals.On the other hand, the parametric genes are usually used to represent the parameters of the designed variables.The main advantage of HGA is that both the system structure and the parametric variables can be optimized in a simultaneous way and that the standard genetic operators continue being used.A more detailed discussion about HGA can be found in [18].

Surface Approximation by
Tensor Product Splines.We can describe the problem of surface approximation as follows: given a set of noisy measurements in a rectangular domain described as  , ,  = 1, . . .,   ,  = 1, . . .,   , and expressed in the following form: where  is an unknown functional relationship that we wish to estimate, the term  , represents the zero-mean random errors, and  , is a sample at (  ,   ).Therefore, the goal is to find the best estimation of the function .
In this study, we assume that  is a smooth surface that can be well approximated in the interval [, ] × [, ] by a B-spline surface.The B-spline surfaces are constructed as a tensor product of univariate B-spline basis functions.The Bspline surface is modeled using the following considerations: let us define { 1 , . . .,   } as a set of  points placed along the domain of the variable  and let {V 1 , . . ., V  } be a set of  points placed along the domain of the variable , which are called interior knots.Thus, the knot vectors are defined as follows: With these assumptions, the function  can be now written as a tensor product: where  , are the B-spline coefficients and  , (),  , () are the B-spline basis functions of order  and , respectively, defined over the knot vectors u and k.The B-spline basis functions are denoted by the following recurrence relations: If  and  are specified beforehand,  can be completely specified by  = {u, k, }, where u and k are the knot vectors and  is the coefficient matrix.Now, the problem is to find the number and location of the interior knots { 1 , . . .,   , V 1 , . . ., V  } and then estimate the coefficients  , .This problem cannot be solved with simple standard methods due to fact that the parameter estimation in B-spline based methods is a high-dimensional nonlinear optimization problem.A more detailed discussion about B-splines can be found in [20,21].

B-Spline Surface Approximation Using HGA
In this paper, we use an HGA to determine simultaneously the number and positions of the knots (model structure) and the B-spline coefficients (model parameters) by minimizing a fitness function.In this approach, the main characteristics to consider are (1) the chromosome encoding of potential solutions, (2) the fitness function, to evaluate the fitness of the chromosomes, and (3) the operators to evolve the individuals.
3.1.Chromosome Encoding.We use a fixed length binary string to represent the number and the locations of the interior knots, { 1 , . . .,   , V 1 , . . ., V  }, and real numbers to represent the  B-spline coefficients.Furthermore, we assume that   is a subset of design points   and V  is a subset of design points   .Thus, the maximum number  of interior knots is equal to the number of points   on interval (, ) and the number of coefficients is  + , as well as the maximum number  of interior knots is equal to the number of points   on interval (, ) and the number of coefficients is  + .
Once the previous assumptions are stated, we can represent the chromosome of an individual as where each   is a control bit and  , is a real value (coefficient).Here, each control bit enables or disables one of the interior knots and one of the coefficients simultaneously.We establish one-to-one correspondences between the interior knots and the coefficients to be activated at the same time.The real values represent the coefficients of the B-spline.The general structure of a chromosome is graphically shown in Figure 1.This representation scheme does not allow us to duplicate knots, because our interest is on smooth functions.However, it can be extended to handle discontinuous functions if we introduce an additional type of gene.
Finally, the number of interior knots  and  is limited by the number of points in each surface dimension.However, in [22,23] the authors recommend placing a knot after every three   design points; that is, the number of knots interior can be divided by three.This is enough to ensure that a knot is at, or near, the positions required to capture the topology of the surface.

Fitness Function.
To evaluate the fitness of each individual , the fitness function  is formulated as a sum of three terms and is given by the following equation: where each term of the equation is described as follows.
(a) The first term is the residual sum of squares (RSS).
It is used as a measure of the deviation between the observed data set and the estimated function f.The RSS is calculated as follows:   (b) The second term is the sum of the squared differences (SSD), which is the discrete approximation of the gradient.This term is used to penalize high sums of gradients to generate smooth solutions.The SSD is given by the next equation: (c) The last term is a penalty function for knot structure (PKS).It is computed as follows: In (9), PKS tries to favor solutions with uniform distributions of knots for each dimension.In other words, it penalizes solutions with knots very close, which generates over fitting of the function.Therefore, the individuals with fewest knots and better distribution are favoured.

Selection Operator.
The roulette wheel method is used as a selection operator.In this method, each individual is assigned to one of the slices in the roulette wheel.This selection strategy favors best fitted individuals but also gives a chance to the less fitted individuals to survive.To prevent premature convergence, the sigma scaling method [24] is used.This method tries to keep the selection pressure relatively constant over all evolution process, and it is calculated according to where  new is the new scaled fitness value,  act is the current fitness value,  is the average fitness,  is the standard deviation of the population, and  is a constant to control the selection pressure.In addition, elitism is used in order to keep elite individuals in the next population to prevent losing the best solution found.

Crossover Operators.
The uniform crossover operator is used for the binary-valued chromosome and the simulated binary crossover operator (SBX) is used for the real-valued chromosome [25].These crossover operators are applied with the same crossover probability.In the uniform crossover method, two parents are chosen to be recombined into a new individual.Each bit of the new individual is selected from one of the parents depending on a fixed probability.On the other hand, in SBX method, two new individuals  1 and  2 are generated from the parents  1 and  2 using a probability distribution.
The procedure used in SBX is the following: first, a random number  between 0 and 1 is generated.Then the probability distribution  is calculated as otherwise, (11) where  is a nonnegative real number that controls the distance between parents and the new individuals generated.After obtaining , new individuals are calculated according to

Mutation Operators.
For the binary-valued chromosome, the bit mutation method is used.In this method, each bit is inverted or not depending on a mutation probability.For the real-valued chromosome each numeric value  is changed depending on the same mutation probability according to where  is the maximum increment or decrement of the real value and rand is a function that generates a random value between 0 and 1.This operator acts on each gene of the chromosome independently, generating a binomial distribution of mutated genes.Thus, the mutation probability is applied per locus per replication.

HGA Parameters
Tuning.We carried out experiments in order to tune the HGA parameters.We have tuned the parameters of HGA by running our algorithm over 100 noisy data sets generated from a test function.The test function is generated as is stated in Section 4. To analyze the process of convergence, the fitness of the best individual is recorded over 1500 generations for each noisy data set tested.
First, we analyze the variation of the mutation probability of the HGA while all other parameters are fixed.In Figure 2, we can see the results of the variation of the mutation probability from 0.005 to 0.050.As we can see in the same figure, the HGA performs better with a mutation probability of 0.008 (line in color green).From Figure 2, we can observe that the performance of HGA-based method decrease with an increase of the mutation probability.
On the other hand, we vary the population size and we analyzed their effects in the process of convergence of HGA. Figure 3 shows the performance of HGA for several population sizes.We can see that the algorithm performance is quite similar for population sizes of 90, 120, and 200 (lines in color red, cyan, and magenta resp.).The average time of execution for each population size is shown in Table 1.Therefore, the best choice is the population size of 90 individuals.
Finally, in Figure 4, the performance of HGA for 3000 generations is shown.In this figure, the 50 best runs of HGA are plotted.As it can be seen in this figure, the convergence of the HGA is achieved at the 1500th generation.

Numerical Results
We carried out numerical simulations to evaluate the performance of our approach.Thus, in order to perform these tests, we defined an experimental set of four bivariate functions, whose equations are given in Table 2 and graphically shown in Figure 5.These test functions were taken from previous works [5,[26][27][28] as a reference to validate our method.Besides, a comparison with the well-known locally weighted scatter smoothing (LOWESS) method [29] is presented.In this study, an experimental setup was configured for each test function as follows: the function is evaluated at 1024 points in a square grid of 32 × 32 over the interval [0, 1] × [0, 1].Then the data are translated to make the range nonnegative in order to facilitate comparisons among them.Here, a zero-mean normal noise with  known is added to the data set.An example of the noisy data sets generated for the four test functions are shown in Figure 7. Furthermore, for each experimental setup, a collection of 100 noisy data sets is generated at three different signal noise ratios (SNR) 2, 3, and 4, respectively, where SNR is defined as SD()/.Thus, we tested our method over a total of 1200 data set for the four   functions.Finally, the proposed approach and the LOWESS method were applied to estimate the test functions.
In the numerical tests, our approach was configured as follows: we used cubic B-spline functions, that is,  = 4 and  = 4, and the interior knots are defined as a subset of design points ( max = 32,  max = 32).The population was randomly initialized at the beginning.Each control gene   is randomly selected from [0, 1] and each parametric gene   is calculated as a random real number defined over the range [min( , ), max( , )] of the measurements  , .The HGA parameters were tuned experimentally and they are presented in Table 3.The population was evolved during 3000 generations in all cases.
For comparison purposes, we performed the same numerical experiments with the LOWESS method.For this, we made use of the Curve Fitting Toolbox provided by MATLAB.In this simulation, the default parameters were considered.To compare the results, we used the mean square error (MSE) given by where f is the estimated function given by each method applied and  is the real function.The MSE value is computed and recorded for each of the 100 test sets for the four test function.The box plots of MSE( f) values are shown in Figure 6.The box plot shows a visual representation of the distribution of MSE values and facilitates the comparison between them.
The box plot partitions a data distribution into quartiles.The quartiles divide the data into four equal parts; 25% of the data are in each part.The central box is used to indicate the positions of the upper (third) and lower (first) quartiles; the interior of this box indicates the interquartile range (IQR), which is the distance between the upper and the lower quartiles and consists of 50% of the data.The horizontal line inside the box is the median value of the distribution.The vertical lines, sometimes referred to as whiskers, extended from the box out to the smallest and the largest observations, show the full spread of the distribution; if there are observations that fall more than 1.5 × IQR above the upper quartile or below the lower quartile, these are considered as suspected outliers and they are plotted as individual points marked with a cross.
In Figure 6, for each combination of test function and SNR there is a panel, in which the box plots of MSE values from HGA and LOWESS are arranged in pairs.The noise level added to the signal is decreased from left to right.As we can see in the same figure, in most cases the MSE values of HGA are considerably lower than those from the LOWESS algorithm.In some cases the upper quartile of the MSE values for the HGA is under the first quartile of those from LOWESS, which implies that 75% of MSE values are lower than 75% of MSE values from LOWESS; in others words, less than 25% of MSE values for the LOWESS algorithm are comparable with the 75% of those from HGA.The highest difference can be seen on Figure 6 for the test function 4 with SNR = 4, where the upper quartile of HGA is far below the lower quartile of LOWESS.Moreover the median of MSE values for HGA is the lowest virtually for all cases.
On the other hand, the spread of the middle half (the box) of the MSE values it seems less variable and symmetric around the median in some cases for LOWESS method, contrary to those from HGA.In addition, although both algorithms present unusual MSE values known as outliers, the LOWESS method has fewer outliers than the HGA.We consider that this behavior represents cases where the HGA converges towards local optima or even arbitrary points rather than the global optimum; nevertheless, this behavior is expected in any heuristic technique.
Additionally, we computed the mean and standard error of mean for MSE values from both methods.They are summarized in Table 4.We see from this table that HGA achieved the lowest MSE values on all experiments.Particularly, we can see the most significant statistical difference appearing in test function 4 and SNR = 4, where the mean of the MSE values from HGA is far away from the mean of MSE values from the LOWESS method.The most statistically significant difference is marked in bold.As for the standard errors, although the LOWESS method has lower values than those from HGA, the differences between means are significant, so despite this; the results of HGA are better.
To visually evaluate the performance of our approach, an example from the obtained results for each function is graphically shown in Figure 7.For practical reasons, we only present results from the HGA, together with an example of    the simulated noisy data.From this figure, we can qualitatively evaluate the performance of our algorithm.Figure 7 suggests that the HGA has the ability to adapt the surfaces optimally.
Finally, it could be assumed that the success of our approach is due to the simultaneous global search performed over all parameters of the B-spline model.This shows the potential of the HGA to manage the loss of information and a high noise level.On the other hand, the main drawback is the computational time required for the HGA compared with the LOWESS method.The average computation time for the LOWESS method was 4 seconds for all functions tested, compared to 201 seconds for the HGA for all functions tested.However, this drawback can be solved if we consider a parallel implementation of the HGA.Our algorithm was implemented in C++ language.Both algorithms were executed on a PC using an AMD Phenom II processor running at 1.9 GHz with 4 MB of RAM.

Conclusions
In this paper, we proposed an efficient hierarchical genetic algorithm to tackle the B-spline surface approximation problem.The method introduced a novel hierarchical gene structure for the chromosomal representation, thus allowing us to find simultaneously the best model with the fewest knots, optimal knot locations, and coefficients of the B-spline surface.It is important to highlight the fact that the method does not require subjective parameters like smooth factor or knot locations to perform the solution.
To test our method, we performed several tests on benchmark functions as well as a comparison with the LOWESS method, which is provided with the Matlab Curve Fitting Toolbox.Numerical results show that our method responds successfully to the problem of surface approximation.In terms of visualization (qualitatively), the obtained results are comparable to the original surfaces.Comparative tests demonstrated a better performance of our method than the LOWESS method over all the proposed tests.Given the performance characteristics of the proposed approach, our future work will be to apply this method over an experimental data set.We are interested in extending our approach to experiment with variable length chromosome and different basis functions.

Figure 1 :
Figure 1: General structure of a chromosome.

Figure 2 :
Figure 2: Variation of the mutation probability.Each line shows the average over 100 test sets.

Figure 3 :Figure 4 :
Figure 3: Variation of the population size.Each line shows the average over 100 test sets.

Figure 5 :
Figure 5: Perspective plots of test functions.

Figure 6 :
Figure 6: Boxplots of the MSE values for both methods.In each panel, the boxplots correspond to the algorithms tested.The columns of panels correspond from left to right to 2, 3, and 4 SNR ratios.The rows of panels correspond from top to bottom to test functions 1 to 4.

Figure 7 :
Figure 7: Results from HGA.The columns correspond from left to right: noise surface inputs and cloud data inputs, approximated surfaces by HGA.The rows correspond from top to bottom to the test functions 1 to 4.

Table 1 :
Average computation time for the HGA over 100 test sets.

Table 2 :
Experimental set of four bivariate functions.

Table 3 :
Parameters used for the HGA.

Table 4 :
Simulation results.Mean of the computed MSEs with estimated standard errors based on 100 test sets.