Epistasis-based Basis Estimation Method for Simplifying the Problem Space of an Evolutionary Search in Binary Representation

An evolutionary search space can be smoothly transformed via a suitable change of basis; however, it can be difficult to determine an appropriate basis. In this paper, a method is proposed to select an optimum basis can be used to simplify an evolutionary search space in a binary encoding scheme. The basis search method is based on a genetic algorithm and the fitness evaluation is based on the epistasis, which is an indicator of the complexity of a genetic algorithm. Two tests were conducted to validate the proposed method when applied to two different evolutionary search problems. The first searched for an appropriate basis to apply, while the second searched for a solution to the test problem. The results obtained after the identified basis had been applied were compared to those with the original basis, and it was found that the proposed method provided superior results.


Introduction
Binary encoding typically uses a standard basis, and when a non-standard basis is used, the structure of the problem space may become quite different from that of the original problem. In an evolutionary search, various methods can be used to change a problem space by adjusting the basis, including gene rearrangement, different encoding methods, and the use of an eigen-structure [1][2][3][4][5][6][7][8][9][10][11][12].
An investigation was conducted to elucidate the possibility of changing the basis in binary encoding and the corresponding effects on the genetic algorithm (GA) [13]; however, it was not possible to determine which basis should be applied to smooth the problem search space. In genetics, epistasis means that the phenotypic effect of one gene is masked by another gene; however, in GA, it refers to any type of gene interaction. In a problem with a large epistasis, as the genes are extremely inter-dependent, the fitness landscape of the problem space is very complex and the problem is difficult [14]. Several studies have been conducted to assess the difficulty of problems from the perspective of epistasis [15][16][17][18][19][20][21]. Epistasis has the advantage that it is possible to measure the extent of nonlinearity only with fitness function. In this paper, we define the difficulty of the problem or problem search space as the nonlinearity level of gene expression. Also, we use epistasis as a measure for the difficulty of the problem.
There are three main contributions of this paper. First, an epistasis approximation is used to identify a basis that will reduce the complexity of an evolutionary search problem. Second, the basis is expressed by a variable-length encoding scheme using an elementary matrix. Finally, a GA is defined that can be used to change the basis of an evolutionary search space. This means that when a basis is given, one can tell how it affects the GA. Our intention in this study is that a non-separable problem can be transformed into a separable problem by performing an appropriate basis transformation. Such an altered environment enables GA to search space effectively. This paper is organized as follows: Section 2 describes the principle of reducing the complexity of a problem space in an evolutionary search by changing the basis and presents the motivation for evaluating the basis using the epistasis. In Section 3, a method is introduced for changing a standard basis to another basis for a binary encoding problem. Then, a GA is introduced that can be used to apply a change of basis. Once an appropriate basis has been selected, this algorithm is more efficient at searching for a solution than the conventional GA. In Section 4, a method is proposed for estimating a basis that reduces the complexity of an evolutionary search problem. Section 5 describes a GA that can be used to search for a basis by applying the proposed estimation method. Here, a variable length encoding scheme that consists of an elementary matrix is employed so to increase the efficiency of the search for an appropriate basis in the problem space. Section 6 presents a description of the tests used to validate the method and then discusses the results. In the tests, an appropriate basis for the target problem is found via the GA, and then the identified basis is applied to the target problem. The conclusions that can be drawn from this study are presented in Section 7.

Motivation
In this section, the concept of the epistasis is introduced as a means of estimating a basis that will reduce the complexity of the problem. First, a principal component analysis (PCA) is used to extract important information by changing the basis in real number encoding. Next, an example of changing the basis in binary encoding is presented to illustrate that a complex problem can be converted to a simple problem by changing the basis. Lastly, the epistasis between the original and modified problems are compared. If the epistasis of the problem decreases when the basis is changed, it implies the complexity of the original problem has decreased. Thus, a suitable basis can be identified using the changes in the epistasis before and after the prospective basis has been applied to the problem of interest.

An Example of Changing a Basis in R n
A PCA is used to obtain the principal components of the data by transforming the data into a new coordinate system via an orthogonal transformation. When the data is projected in the coordinate system, the position where the variance is the largest becomes the first principal component. The second principal component is in a position that is orthogonal to the previous component at the position with the second largest variance. Consequently, if the eigenvectors and eigenvalues of the covariance matrix are obtained and sorted in descending order, the principal components can be found. This is identical to changing the basis from the original coordinate system to a coordinate system based on the variance of the data. In general, by using only the important principal components, lost data are used.

Change of Basis in Binary Representation
Binary encoding typically employs a standard basis; however, it is sometimes easier to manipulate a problem in a non-standard basis. The following example illustrates that the relationship between the basis vectors is dependent on the basis. Here, Z 2 is a field that has elements of zero and one, the addition operator corresponds to the exclusive-or (XOR) operator, and the multiplication operator corresponds to the AND operator. The standard basis B s for vector space Z n 2 is {e 1 , e 2 , . . . , e n }, where e i consists of column vectors in which the i-th entry is one and the remaining n − 1 entries are zero.
In the vector space Z n 2 , if the vector v and the evaluation function F are as follows, then the basis vector e i of B s has a dependency relationship with the other basis vectors e j in F .
where α i ∈ Z 2 and ⊕ is the XOR operator.
Let us assume a function F ′ performs the same operation as F but in a new basis and suppose n is even. If a set B is composed as follows: then B becomes the basis. One property of a basis is that every vector can be represented as a linear combination of basis vectors. That is, , which is the representation of v with respect to the basis B.
Here, F ′ is a function that evaluates [v] B , has the same operation as F (v), and satisfies the following relationship: It can be seen that the basis vector e ′ i of B is independent of the other basis vectors in F ′ . In fact, F ′ is identical to the onemax problem that counts the number of ones in a bitstring. Therefore, for a vector in which all α ′ i are set to one, the evaluation value becomes the largest value, and if this vector is transformed with the standard basis, an optimum solution can be obtained. Figure 1 shows the relationships of the basis vectors according to the basis with n = 6 in the graphs.

Epistasis According to the Basis
In a GA, the epistasis indicates the correlation between the genes. If the epistasis for a particular problem is large, then the genes are very inter-dependent, the fitness landscape of the problem space is extremely complex and the problem is difficult. In Section 2.2, it was shown that the complexity of a problem varies depending on the basis. The epistasis numerically expresses the complexity of such a problem. In general, when the genes in a problem are very dependent, the epistasis has a large value. In contrast, when the genes are independent, the value is zero.
The results of calculating the epistasis according to the problem size n of evaluation functions F and F ′ in Section 2.2 are shown in Table 1. In this paper, the method proposed by Davidor [14] is used to compute the epistasis. In F , because the dependency relationship with other basis vectors increases as n increases, the epistasis also increases. However, for F ′ , since the basis vectors are independent, the epistasis is zero. Thus, it is expected that the search space can be simplified via an appropriate change of basis.
The epistasis can be used to check if the search space can be simplified by using a particular basis. If the epistasis of the problem after changing the basis is lower than the epistasis of the original problem, then this indicates that the problem has become easier. However, using the epistasis in this way requires all solutions to be searched. An alternative is to estimate the actual epistasis by calculating the epistasis of a sample set of solutions. Note that nonlinearity may be misleading due to approximation error by solution sampling. It hinders to find the proper basis for the target problem. The target problem may be transformed into a more complex problem through a basis transformation. That is, the basis transformation can rather prevent a GA from efficiently finding the solution.

Change of Basis
This section presents a GA that performs an effective search through a change of basis. Before presenting the GA, we introduce the related terminologies and theories of change of basis in binary representation. Next, we apply the change of basis in the onemax problem to show how the problem actually transformed. In addition, a methodology for evaluating solutions in the transformed problem will be described. Finally, we propose a GA that effectively searches solutions through applying the change of basis. On the other hand, searching for an appropriate basis will be covered in Sections 4 and 5.

2
A basis for an n-dimensional vector space is a subset that consists of n vectors and every element of the space can be uniquely represented as a linear combination of basis vectors. Since it is possible to use one or more bases in a vector space, the coordinate representation of a vector with respect to the basis can be transformed via an equivalent representation to other bases via the invertible linear transformation. Such a transformation is called a change of basis. The following theorem was derived from the basic theory of linear algebra [22]. A matrix A is defined as binary if A ∈ M n×n (Z 2 ). In general, if B is the standard basis, [v] B is the representation of v with respect to the basis B. In Theorem 1, nonsingular binary matrix T = [T ] B2 B1 is a coordinate-change matrix from basis B 1 to B 2 . When a T is given, T can be viewed as a coordinatechange matrix from the standard basis to B T , which is related to the T . For every vector v ∈ Z n 2 , T v = [v] BT holds and B T is {T e 1 , T e 2 , . . . , T e n }. This study considers a change of basis from a standard basis to another basis. Thus, estimating the basis is equivalent to estimating an appropriate T .

Analysis of Changing a Basis in the Onemax Problem
The onemax problem maximizes the number of ones in a bitstring and has zero epistasis. Here, a onemax problem in which the basis was changed using a selected nonsingular binary matrix T is compared to the original onemax problem. The specific onemax problem of interest has a size of three. The T is defined as follows: Then, it can be shown that Table 2 shows the original vector and that obtained using T v = [v] BT . From this, it can be seen that after the basis change, the problem became more complex.
The evaluation function F of the onemax problem is as follows: On the other hand, from Table 2, it is difficult to identify a rule for the fitness of [v] BT for the onemax problem. The evaluation function F ′ of [v] BT can be obtained by computing v by changing the basis from B T to B s and evaluating v with F . That is, where T −1 is the inverse matrix of T . The above equation is obtained by multiplying the left side by BT and then applying F to both sides. In this way, the basis on both sides can be easily changed using T and T −1 .

Genetic Algorithm with a Change of Basis
In general, a GA is expected be more efficient when searching for a solution to a simple problem than a complex problem. As shown in Section 2.2, a complex problem can be changed to a simple problem by changing the basis. With this in mind, if an appropriate a change of basis is applied to a problem space to be searched by a GA, this will greatly improve the efficiency of the search process. A flowchart of the proposed algorithm is shown in Figure 2 and the corresponding steps are detailed in Algorithm 1.

Algorithm 1 A GA with a change of basis
Step 1: the population P of the GA is initialized and the fitness is evaluated.
Step 2: P is replaced by the population P ′ whereby the standard basis B s is changed to the basis B.
Step 3: by using the genetic operator on the GA, the offspring population O ′ is produced from P ′ .
Step 4: the fitness of O ′ is evaluated using the population O that was used to change the basis from B to B s .
Step 5: P ′ and O ′ are used to create a new generation and update P ′ to the new generation.
Step 6: the process from Step 3 onward is repeated as many times as there are generations. When the number of generations has been exceeded, then we return P ′ whereby the basis B is changed to the standard basis B s . End Randomly generated vector, e.g., v = (0, 1, 1) T , with the standard basis B s .  Figure 2: Flowchart of a GA with a change of basis.
If Steps 2 and 4 are excluded, then Algorithm 1 produces a typical GA. However, if the problem is transformed with an appropriate basis in Step 2, the original problem space is transformed into an easier problem space, which is expected to make it easier for the GA to find an optimum solution. On the other hand, Step 4 shows that the generated offspring vector is evaluated by changing the basis to the standard basis. This is identical to the method in Section 3.2 that evaluates a solution in another basis.

Evaluation of a Basis
The objective is to identify a basis that can be used to change a complex problem into to a simple problem. While such a basis was examined in Section 2.2, in that case, the change in basis converted the onemax problem from a simple to a complex problem.
When a basis and a target problem are given, a method is proposed that uses the epistasis to evaluate whether the basis is appropriate for the problem space. A meta-genetic algorithm (Meta-GA) is generally used as a method for estimating a hyperparameter of a GA. The two methods are compared to analyze the advantages and disadvantages of the proposed method.

Evaluation with Epistasis
Assume a target problem P and basis B are given. To determine the smoothing effect of B on P , a sampling population S can be obtained from P . Then, S ′ can be obtained by changing the basis for S from the standard basis to B. The epistasis of S ′ that numerically shows the difficulty of the problem can then be calculated. The lower the epistasis is, the more appropriate B is as a basis for P . The epistasis calculation method proposed by Davidor [14] is shown in Algorithm 2. Suppose the chromosome length is l and the number of samples in S is s. Then, the time complexity of evaluating a single basis becomes O l 2 s . This is because the cost of executing the change of basis is l 2 s. The change of basis is performed for a total of s vectors, and the cost of the change of basis is

Algorithm 2 Basis evaluation based on epistasis
Require: Sampling population S 1: procedure Evaluation(B, S) ⊲ Evaluation a basis B 2: S ′ ← Change of basis from B s to B on S ⊲ B s is standard basis 3: 4: for each ind in S ′ 5: ⊲ a is allele value (0 or 1) 8: for i ← 1 to size(ind) do 14: for each a in allele values 15: 16: for each ind in S ′

Evaluation with a Meta-genetic Algorithm
The use of a meta-GA to optimize the parameters and tune GAs was first proposed by Grefenstette [23]. Here, a meta-GA to determine whether the basis is appropriate for the problem space of the GA. A method of evaluating a basis with a meta-GA is shown in Algorithm 3. By applying Algorithm 1 with a given B and an instance of GA, k populations are searched. Then, using the best fitness in each population, the basis is evaluated. That is, when k units of fitness are found to be acceptable, it is estimated that B is an appropriate basis of the instance. The reason for searching k populations is because even with a basis that is not appropriate, a good solution may be obtained by using the GA to search once. To calculate the time complexity, with respect to the target GA, let the number of generations be g, population size p, and chromosome length l. The time cost of line 10 in Algorithm 3 is the largest. When p offspring are generated, the time consumed is pl 2 . Since this is repeated kg times, the worst case time complexity becomes O(kgpl 2 ). Note that in the experiment evaluated in this paper, k is set to 5 and g is set to the chromosome length. return BestFits 17: end procedure

Finding a Basis Using a Genetic Algorithm
This section describes the components of the GA used to search for a basis for the problem space with the evaluation method outlined in Section 4. The method of applying a basis and the genetic operator for the encoding are discussed, and the fitness of the basis is evaluated using the method of either Algorithm 2 or 3.

Encoding with an Elementary Matrix
A nonsingular binary matrix can be regarded as a change from a standard basis to another basis. That is, a basis corresponds to an appropriate the matrix. If a typical 2D type of encoding is used to encode the matrix, a repair mechanism may be required after recombination. In this case, one option is to conduct the repair using the Gauss-Jordan method; however, this will require a length of time equal to O n 3 time.
Every nonsingular matrix can be expressed as a product of elementary matrices [22]. Therefore, in GL n (Z 2 ), if a solution is expressed as a product of elementary matrices, it is possible to maintain their invertibility. Each element in an elementary matrix can be expressed by a variable-length linear string [24], which allows a new encoding to be applied. Note that any recombination method for a variable-length string can be used. In the following, an elementary row operation is defined and then the elementary matrix in M n×n (Z 2 ) is introduced. Elementary row operations are Type 1 or Type 2 depending on whether they were obtained using (i) or (ii) of Definition 1. Let us define S ij n as an elementary matrix of Type 1 that interchanges the i-th row and the j-th one for i and j. Also define A ij n as an elementary matrix of Type 2 that adds the i-th row to the j-th row for i and j.
When the representation of a nonsingular binary matrix is considered in the order of an elementary matrix, this representation is not unique. Also, it is difficult to determine how many equivalent representations exist for a nonsingular binary matrix. Several equivalences were proposed by Yoon and Kim [24] as Propositions 1 and 2 by way of a simple idea. The newly discovered equivalences proposed in this paper are denoted in Proposition 3. Their proof is provided in the appendix section.
Proposition 1 (Exchange rule). For each i, j, k such that i = j, j = k, and k = i, the following five exchange rules hold.
Proposition 2 (Compaction rules). For each i, j, k such that i = j, j = k, and k = i, the following two exchange rules hold.
Proposition 3. For each i and j such that i = j, the following three rules hold.
For example, the encodings of matrices P 1 and P 2 are as follows: let P 1 = S 12 4 A 21 4 A 12 4 and P 2 = A 21 4 S 12 4 . Then, calculate d e (P 1 , P 2 ) based on a sequence alignment between P 1 and P 2 , where d e is the edit distance and the insertion, deletion, and replacement functions have weights of one, one, and two, respectively. First, consider the original form: Then, d e (P 1 , P 2 ) = 3. This allows the parents to be changed into other forms. Note that: From these rules, d e (P 1 , P 2 ) = 2. Thus, the propositions can produce offspring that are more similar to the parents.

Crossover
Any recombination for a variable-length string can be used as a recombination operator for the encoding and the edit distance is typically used as the distance for the variable-length string. This changes one string into another by using a minimum number of insertions, deletions, and replacements of the elementary matrix. A geometric crossover that is associated with this distance is called a homologous geometric crossover [25]. Several general string genetic operators can be used. In the case of a string encoding of the elementary matrix, a mathematically designed genetic operator was proposed [24]. Specifically, the geometric crossover by sequence alignment is expected to be effective. Here, alignment refers to allowing the strings to stretch in order to provide a better match. A stretched string involves interleaving the symbol '' anywhere in the string to create two stretched strings of the same length with a minimum Hamming distance. The offspring is generated by applying a uniform crossover to the aligned parents after removing the '' symbols. Here, two offspring solutions are generated as solutions of the two parents.
The optimal alignment of the two strings is as per the Wagner-Fischer algorithm [26], which is a dynamic programming (DP) algorithm that computes the edit distance between two strings of characters. This algorithm has a time complexity of O (mn) and a space complexity of O (mn) when the full dynamic programming table is constructed, where m and n are the lengths of the two strings.

Initial Population Generation, Selection, and Mutation Replacement
An initial population is generated with a random number of random elementary matrices. The random number is generated from a normal distribution where the mean is 3n and the standard deviation is n when the problem size is n. If the random number is smaller than one, it is fixed at one. The selection operator applies a tournament selection method by choosing three parents. The mutation operator applies one of three operations, namely insertion, deletion, or replacement, to each string with a 5% probability. Furthermore, the probability that each individual will be mutated is set at 0.2. Lastly, replacement refers to replacing the parent generation with an offspring generation. The details of this process are as follows: the selection operator is used for candidates of the offspring generation. When the population of the parent generation is p, then p parents are extracted by applying the selection operator p times. The probability of two parents pairing up and applying the crossover is 0.5. When the crossover is not applied, the two parents become candidates for members of the next generation, while in the opposite case, the two offspring become candidates for members of the next generation. Each candidate proceeds with a mutation probability of 0.2 and replaces the parent generation with the next generation.

Target Problem in Binary Representation
In this section, two problems are described for which better solutions can be obtained with an appropriate basis.
Bs v , an evaluation function of variant-onemax can be generated even when a nonsingular binary matrix is given. As for the optimum solution of variant-onemax, when the problem size is n, the number of ones becomes n through the change of basis, and n becomes the optimal solution.
2. N K-landscape: the N K-landscape model consists of a string of length N and a fitness contribution is attributed to each character depending on the other K characters. These fitness contributions are often randomly chosen from a particular probability distribution. In addition, the number of hills and valleys can be adjusted by varying N and K. One of the reasons why the N K-landscape model is used in optimization is that it is a simple instance of an NP-hard problem.
In the experiments, the GA is used to search for solutions to the above the two problems. The GA consists of tournament selection, one-point crossover, and flip mutation, and the replacement replaces all the parent generations with offspring generations. The tournament selection process chooses the best solution among three randomly selected parents, the one-point crossover combines a solution involving two offspring with the solution of two parents, while in flip mutation, each gene is flipped from zero to one or from one to zero with a probability of 0.05. The replacement method is the same as that described in Section 5. In other words, in the composition of the next generation, the number of parents extracted is equal to the number in the population. Two parents are paired up with a 50% probability that the crossover will be applied. When the crossover is not applied, the two parents become member candidates of the next generation, while in the opposite case, the two offspring become member candidates of the next generation. Each member candidate undergoes mutation with a 20% probability that it will replace an existing parent. When the chromosome length of variant-onemax or N K-landscape is n, the size of the population is set to 4n. Because the fitness of the optimum solution of the variant-onemax problem is n, solutions of 10,000 generations have to be searched until an optimum solution has been identified. In the N K-landscape, the fitness of the optimum solution is different for each N, K, and all solutions must be searched to obtain an optimum solution. Thus, 300,000 generations must be searched to find an optimum solution for the N K-landscape problem.

Results
The evaluation function of variant-onemax requires a nonsingular binary matrix that corresponds to a basis. For the basis of variant-onemax that has a chromosome length of n, a random number of elementary matrices are generated and then are multiplied sequentially. The number of elementary matrices is generated from a normal distribution that has a mean of 3n and a standard deviation of n/2.
In the experiment, instances of variant-onemax where n was 20, 30, and 50 were generated. With the GA described in Section 5, the following bases were searched for each instance: meta-GA-based basis B 1 , epistasis-based basis B 2 where the sampling number was n 2 , and epistasis-based basis B 3 where the sampling number was n 3 .
A total of 100 independent searches were conducted for each instance, and the number of times that an optimum solution was identified was counted along with the execution time. The results for the variant-onemax experiment are shown in Table 3. In the table, a type of 'Original' indicates that a solution instance was evaluated without a change of basis. Similarly, 'Meta,' 'Epistasis-sq,' and 'Epistaiscu' refer to evaluating solution instances by applying B 1 , B 2 , and B 3 , respectively, to change the basis. In addition, the box plot in Figure 3 depicts the fitness distribution of the 100 best solutions obtained by performing 100 independent searches for each instance. A fitness is a value between zero and one that can be obtained by dividing the fitness of the optimum solution. That is, a value of one on the y-axis indicates the fitness of an optimum solution, while values approaching zero indicate a lower fitness. In most cases, it can be seen that the search performance of the GA is efficient with the change of basis. When N is 50, 'Epistasis-cu' does not seem to improve the search performance of the GA. This was likely because the population of the GA was not evenly distributed throughout the sample population.
In Table 3, 'Meta' found opimal solutions more frequently than the other methods. In particular, when n was 30, the 82nd most optimal solution was obtained out of 100. This indicates that the corresponding basis was appropriate. However, because the computation time for this approach was very long, it cannot be applied in practice. Note that when n is 50, it was over 2 hours. Furthermore, no difference was observed when compared to the case in which the basis was not changed. The method of evaluating the basis using the epistasis provides a good indication of when changing the basis will provide a better result. In particular, when n is 20, the number of optima found in 'Original' is 30, and the numbers of optima found in 'Epistasis-sq' and 'Epistasis-cu' are 64 and 33, respectively. In summary, these tests confirmed that a sample size of n 2 provided good results while requiring less time than a sample size of n 3 . Therefore, in terms of time and performance, a sample size of n 2 was deemed reasonable for estimating an epistasis. Table 3: Results of each of the best solutions obtained by conducting the GA experiments 100 times on an instance of the variant-onemax problem. ('# of optima' is the number of optima found during 100 experiments, 'Average' is the average of 100 best solutions, and 'SD' is the standard deviation of 100 best solutions. Q 1 , Q 2 , and Q 3 are the first, second, and third quartiles, respectively. 'Time' is the sum of the time to search for the basis and that for the GA experiments.)  The value of N in the N K-landscape experiment represents the size of the problem. In this experiment, there were N characters of zero and one and the total number of populations was 2 N . The evaluation functions were randomly generated according to K. In terms of the instance generation, each gene was dependent on K other genes and a value between [0, 1] was assigned. The fitness of the N Klandscape is based on the fitness of each gene. Therefore, the maximum and minimum fitness values, which are between zero and one, may be different for each instance. In the experiment, 100 independent searches for a solution are conducted for each instance. Table 4 shows the results of N K-landscape experiment in which the best solution and the computation time for each of the 100 searches were compared.
In the table, when the type is 'Epistasis,' this indicates that a basis was obtained based on the epistasis using a sample set of size n 2 , and which 100 independent searches were conducted for that instance. A box plot showing the distribution of the 100 best solutions is shown in Figure 4.
Upon analysis, the method of searching for the solution after changing the basis exhibited better performance than the original problem. In particular, in the box plot, it can be seen that the distribution of solutions obtained by changing the basis was more concentrated and had a higher mean. In the N K-landscape, when 'Meta' and 'Epistasis' were compared, neither side exhibited better performance. However, it can be seen that the computation time of Meta was about 430 times longer than that of 'Epistasis.' Furthermore, although the 'Epistasis' consumed slightly more time than the 'Original,' it tended to have a more efficient evolutionary search. For these reasons, the method used to obtain the 'Epistasis' results was found to be the best among the three methods evaluated. Table 4: Results of each of the best solutions obtained by conducting the GA experiments 100 times on an instance of the N K-landscape problem. ('Best' is the best fitness among solutions found in 100 experiments, 'Average' is the average of 100 best solutions, and 'SD' is the standard deviation of 100 best solutions. Q 1 , Q 2 , and Q 3 are the first, second, and third quartiles, respectively. 'Time' is the sum of the time to search for the basis and that for the GA experiments.)

Experimental Analysis
The results of the above experiments confirmed that a basis obtained by estimating the epistasis improved the efficiency of searching for a solution using a GA. In this section, an analysis is performed to examine how much the basis found in the experiment reduced the epistasis. The basis was estimated in such a way that the epistasis of the sample population S was reduced. Whether the GA proposed in Section 5 was effective can be confirmed by comparing the epistasis of S and that of S ′ in which the basis was changed to the one identified by the search S. It is expected that the latter epistasis will be smaller. A comparison of the epistasies between S and S ′ in the variant-onemax and N K-landscape experiments can be seen in Tables 5 and 6, respectively. First, in Table 5, n is the chromosome length of the variant-onemax experiment. The sizes of the sample sets were n 2 and n 3 , respectively; 'Before' and 'After' show the epistasies of S and S ′ , respectively. For every n, it was confirmed that a lower epistasis value was obtained when the basis was changed. Moreover, when the sampling size was 'square', the epistasis was reduced more compared to the 'cubic'. Thus, there was a higher possibility that the GA would conduct a more efficient search and find a better solution. When n was 20, since there were 2 20 solutions, the epistasis for all the solutions, not the sample sets, can be obtained. Here, it was confirmed that the epistasis was 4.50, and since the epistasis was 4.46 and 4.35 when the sampling sizes were square and cubic, respectively, this indicates that the original epistasis was accurately estimated.
In Section 5.2, the size of the sample set S in the N K-landscape experiment was N 2 . Table 6 shows the epistasies of S and S ′ after the basis was changed, respectively, for the values of N, K used in the experiment. The 'Before' and 'After' results indicate the epistasies of S and S ′ , respectively. As in the case of the variant-onemax experiment, it was confirmed that for every n, a lower value of epistasis was obtained when a change of basis was applied. When N was 20, the epistasis for all the solutions, but not the samplings, was obtained. When K was 3, 5, and 10, the epistasis was 3.24e −3 , 3.38e −3 , and 4.13e −3 , respectively. These values are close to the respective epistases of S, 3.17e −3 , 3.16e −3 , and 4.28e −3 .

Conclusions
In this paper, a epistasis-based evolutionary search method was proposed for estimating a basis that would simplify a particular problem. Two test problems were constructed, a basis was identified by estimating the epistasis, and after the basis was changed, the results before and after the basis change were compared. The epistasis-based basis estimation method was found to be extremely efficient compared to a meta-GA in terms of time. This was also found for the N K-landscape in which the epistasis-based basis estimation method provided similar results. Thus, it is reasonable to estimate the basis by using the epistasis rather the meta-GA algorithm.
To estimate an epistasis, sample sets of size n 2 or n 3 sampling data were used. It was therefore necessary to conduct a study to find an appropriate sampling number. However, the method of finding the basis was carried out using a simple GA. In the future, a study should be conducted to identify a better basis. Also, by applying various factors in the GA or other genetic operators or by applying the method shown in appendix section, a higher quality search can be performed.
Furthermore, the experiment evaluated specific problems that could be simplified with a change of basis. In further research, it will be necessary to identify the characteristics of problems that could benefit from a change of basis. Note that the basis evaluation method is applicable to not only binary encoding, but also to k-ary encoding. In addition, it can be used to evaluate any vector space in which the epistasis can be calculated.    3. S ij n = S ji n by the definition of S ij n . Now, consider the euqation S ij n A ji n = S ji n A ji n . Note that the left side S ij n A ji n = A ij n A ji n A ij n A ji n = A ij n A ji n 2 , and note that the right side S ji n A ji n = A ji n A ij n A ji n A ji n = A ji n A ij n A ji n 2 = A ji n A ij n .

Data Availability
The data used to support the findings of this study are included within the article.

Conflicts of Interest
The authors declare that they have no conflicts of interest.