A Simple Fitness Function for Minimum Attribute Reduction

The goal of minimal attribute reduction is to find the minimal subset R of the condition attribute set C such that R has the same classification quality as C. This problem is well known to be NP-hard. When only one minimal attribute reduction is required, it was transformed into a nonlinearly constrained combinatorial optimization problem over a Boolean space and some heuristic search approaches were used. In this case, the fitness function is one of the keys of this problem. It required that the fitness function must satisfy the equivalence between the optimal solution and the minimal attribute reduction. Unfortunately, the existing fitness functions either do not meet the equivalence, or are too complicated. In this paper, a simple and better fitness function based on positive domain was given. Theoretical proof shows that the optimal solution is equivalent to minimal attribute reduction. Experimental results show that the proposed fitness function is better than the existing fitness function for each algorithm in test.


Introduction
For a given dataset, attribute reduction is a fundamental problem in rough set theory as proposed by Pawlak and Sowinski [1]. Formally, it is a nonlinearly constrained combinatorial optimization problem whose objective function is the size of a candidate subset of attributes and whose constraints, represented in terms of classification quality of attribute subsets, are the conditions met by an attribute reduction. As shown by [2], this problem has been proven to be NP-hard.
To overcome this problem, many strategies had to be considered in the literatures during the past two decades. In general, there are two kinds of categories for minimum attribute reduction: greedy (or hill-climbing) categories and stochastic categories. The greedy categories usually employ rough set attribute significance as heuristic knowledge. It starts off with an empty set or attribute core and then adopts forward selection or backward elimination. Hu and Cercone give a reduction algorithm using the positive region-based attribute significance as the guiding heuristic [3]. Wang et al. develop a conditional information entropy-based reduction algorithm, using conditional entropy-based attribute significance [4]. Hu et al. compute the significance of an attribute making use of heuristic ideas from discernibility matrices and propose a heuristic reduction algorithm [5]. Susmaga considers both indiscernibility and discernibility relations in attribute reduction [6]. These categories are fast but do not guarantee to find an optimal or minimal reduction. Some researchers use stochastic methods for rough set attribute reduction. These categories are optimization methods. It has a higher probability of finding a minimum reduct than the first category. Wroblewski combines a genetic algorithm with a greedy algorithm to generate short reducts. However, it uses highly time-consuming operations and cannot assure that the resulting subset is really a reduct [7]. Taking Wroblewski's work as a foundation, Bjorvand and Komorowski apply genetic algorithms to compute approximate reducts [8]. The algorithm makes several variations and practical improvements both in speed and in the quality of approximation. The reduct generation algorithms based on genetic algorithms for the rough set attribute reduction are quite efficient [9].
But rough set can only deal with the discrete attributes. A method for discretization based on particle swarm optimization (PSO) is presented in [10]. Taking this work as a foundation, an algorithm for knowledge reduction in rough sets is proposed based on particle swarm optimization in [11]. This algorithm can solve some problems that the existing heuristic algorithm cannot solve. In order to improve the efficiency of the algorithm, many scholars constantly improve and update these algorithms. Santana-Quintero Luis et al. present a multiobjective evolutionary algorithm which consists of a hybrid between a particle swarm optimization approach and some concepts from rough sets theory [12]. The main idea of the approach is to combine the high convergence rate of the particle swarm optimization algorithm with a local search approach based on rough sets which is able to spread the nondominated solutions found. After that, two-step particle swarm optimization to solve the feature selection problem was given by Bello et al. in [13]. The improved algorithm is a method which can improve the search efficiency. Chi et al. presented a method for continuous attribute discretization based on quantum PSO algorithm [14]. Hsieh and Horng presented a method for feature selection based on asynchronous discrete PSO search algorithm [15]. Also, other stochastic algorithms were used to attribute reduction, for example, ant colony algorithm (ACO) [16] and support vector machine (SVR) [17].
For systems where the optimal or minimal subset is required, stochastic category may be used. In this case, this problem is transformed into a problem of finding a maximum (or minimum) value of a fitness function at first, and then some stochastic optimization method is applied to solve the fitness maximization (or minimization) problem. A common way to transform a constrained optimization problem into an unconstrained fitness optimization problem is to use penalty methods [18]. For such methods, designing a better fitness function is the most important work. To get good performance, the fitness functions should meet the requirements that the fitness evaluation of a candidate solution is appropriate and the optimality equivalence is guaranteed. Here, the optimality equivalence means that the optimal solution of the fitness maximization problem corresponds to a minimum attribute reduction. Unfortunately, the existing fitness functions do not well meet the above mentioned requirements and consequently affect the performance of the related algorithms [19].
In this paper, an applicable fitness function was proposed. Compared with the existing fitness functions as mentioned earlier, it not only takes into account less factors but also overcomes the drawback. The experimental results show that, for each of the two tested algorithms, the use of the proposed fitness function has a higher probability to find a minimum reduction than the use of the function proposed in [19].
The rest of the paper is organized as follows. Section 2 presents some concepts about minimum attribute reduction and reviewed and analysed a fitness function proposed in [19]. In Section 3, a new fitness function and properties are presented. In Section 4, the results of experiments and comparison analysis are given. Finally, Section 5 concludes the paper.

Basic Notions and Related Works
In this section, we will review some basic notions in the theory of rough sets which are necessary for the description of the minimum attribute reduction problem.
A decision table can be represented as = { , , , }, where = { 1 , 2 , . . . , } is a nonempty finite set of objects, = ∪ , where is a set of condition attributes and is a decision attribute set, is the domain of attributes belonging to , and : × → is a function assigning attribute values to objects in .
With any ⊆ , there is an associated indiscernibility relation IND( ): Let ⊆ ; the -lower approximation of is defined as where [ ] denotes an equivalence class of IND( ) determined by object . The notation POS ( ) refers to the -positive region given by POS ( ) = ∪ ∈ /IND( ) . The -approximation quality with respect to decisions attribute set is defined as = |POS ( )|/| |.
For decision table = { , , , }, it may have many attribute reductions; the set of reductions is defined as For attribute reduction, the minimal attribute reduction with minimal cardinality will be searched. The minimal attribute reduction problem can be formulated as the following nonlinearly combinational optimization problem: Let Red be a set which was defined as follows: By the definition of Red , the following proposition is apparent.

Proposition 1. Let ⊂ ; then is the optimal solution of the optimization problem (3) if and only if ∈ .
According to Proposition 1, each element of Red corresponds to a minimal reduction. In order to solve problem (3), the most commonly used approach is to transform it into the following unconstrained maximization problem and to solve it by heuristic algorithms: where ( ) is a fitness function about attribute subset . For this optimization problem, the equivalence of optimality between the minimum attribute reduction problem (3) and the fitness maximization of the function ( ) must be guaranteed. Unfortunately, most of the functions do not satisfy the requirement in the literatures [19].
where It suggests that the fitness function cannot distinguish between the two. That is to say, the two are equivalent when running the algorithm. This drawback will reduce the algorithmic performance. Moreover, in practical applications, since contains classification quality , the value of the function 1 ( ) is not an integer. In the search process, the calculation error will reduce the probability of searching a minimum reduction. At the end, the consistency of decision table must be determined before calculating the fitness value, which increases the computational complexity.

A New Function and Its Properties
In this section, we present a new fitness function. Firstly, the relevant definitions were introduced as follows.
By the definition of reduction, for any consistent set ∈ , either is a reduction itself or contains a reduction. If is minimal, then is a reduction of . It is very obvious that Red ⊆ . A nonlinearly constrained combinatorial optimization problem correlated with is defined as Let be a set which is defined as follows: By the definition of , we know that is a subset of and the following proposition is apparent.

Proposition 4. Let ⊂ ; then ∈ if and only if is the optimal solution of the optimization problem (9).
By Proposition 4, is the solution set of the optimization problem (9). According to the previous definition, we present a new fitness function as follows: ∀ ⊆ , where = | |. According to Proposition 5, the codomain of the function ( ) can be divided into two disjoint sets [min ( ), ] and ( , max ( )), where min ( ) and max ( ) are the minimum and maximum value of ( ), respectively. By (2), the defect of the function proposed in [19] was avoided. A minimum optimization problem is defined as follows by the definition of ( ): Theorem 6. Let ⊂ ; then ∈ if and only if is a minimum attribute reduction; that is, = .
Proof. ∈ ; then is a reduction by Proposition 3. If is not a minimum attribute reduction, then there is a minimum attribute reduction such that | | < | |.
On the other hand, since is an attribute reduction, then ∈ . By the definition of and ∈ , | | ≥ | |, a contradiction and vice versa.
By Theorem 6, all the elements of are minimum attribute reduction; then according to Proposition 4, we can get that all the solutions of the optimization problem (9) are the minimum attribute reductions.

Theorem 7. ∀ ⊂
, then is an optimal solution of the optimization problem (12). It is equivalent to the case where all the elements of are the optimal solutions of the optimization problem (12).
Proof. We use proof by contradiction. Assume that there is a ∈ such that is not an optimal solution of the optimization problem (12). Then ∃ ⊆ such that ( ) < ( ). So Then, there are two cases of that will be discussed in the following.
a contradiction.
From the above proof we know that all the elements of are the optimal solution of the optimization problem (12); that is to say, the minimum attribute reduction is the optimal solution of the optimization problem (12).

Theorem 8. Let
⊂ ; if is an optimal solution of the optimization problem (12) a contradiction, so ∈ .
According to Theorem 8, the optimal solution of the optimization problem (12) is a minimum attribute reduction. Then, by Theorems 7 and 8, we can obtain the following theorem obviously. Theorem 9. Let ⊂ ; then is an optimal solution of the optimization problem (12) if and only if ∈ .
By Theorem 9, we can get that the optimal solution of the optimization problem (12) is equivalent to the minimal attribute reduction. Then, according to Theorem 6, we can obtain the following theorem.

Theorem 10. Let ⊂ ; then is a minimum attribute reduction if and only if
is an optimal solution of the optimization problem (12).
By Theorem 10, the minimum value of the function proposed in this paper is equivalent to the minimal attribute reduction. We may get the minimum attribute reduction by searching the minimum value of function, and then stochastic category can be used.

Performance Comparison
In order to analyze and evaluate the effectiveness of the fitness function proposed in this paper, in the literature [19], an experiment was designed in the following way. Two existing minimum attribute reduction algorithms based on different types of stochastic optimization technique are used in the comparison. The first is the particle swarm optimization-based attribute reduction algorithm proposed in [20], denoted by PSO, and the other is a genetic algorithm based attribute reduction algorithm presented in [7,8,16], denoted by GA in short. In experiments, both algorithms were implemented and tested on a number of datasets using two different fitness functions: one is proposed in [19] ( 1 in short) and the new fitness function is proposed in this paper ( in short). Seven datasets were chosen from the UCI machine learning repository. Most of these datasets are commonly used for evaluating attribute reduction algorithms in the literature [7,8,11,14,16,17,19].
For each of the two fitness functions, both algorithms were run 50 times on each of the datasets with the same setting of the parameters involved. For each run, three values need to be recorded: the first is the length of the output, the second is the output which corresponds to a reduction or not, and the last is run time. If an output is a reduction, then the output is said to be a normal output; otherwise, it is an unsuccessful output. If the length of the normal output is minimal, then the output is said to be a successful output. Let STL denote the length of the successful output, AVL denote the average length, and AVT denote the average run time during the 50 runs. The ratios of successful and normal outputs are denoted, respectively, by 1 and 2 .
A PC running Windows 7 (32-bit) with 2.1 × 2 GHz CPU and 2 GB of main memory was used to run both algorithms. Both algorithms were programed by the MATLAB. Parameter settings for both algorithms were shown in Table 1. In Table 1, -size refers to the size of population (in GA) or particle swarm (in PSO), max is the maximum allowed number of iterations (or generations), max is the upper bound on velocity needed in the PSO algorithm, and are the probabilities of mutation and crossover in GA, and 1 and 2 are the learning coefficients in PSO. Tables 2 and 3 present the main performance of PSO and GA using each of the two fitness functions.
From Table 2, for the algorithm PSO, on the index of STL, both fitness functions have the same value. It means that both fitness functions can output the minimum reduction. On the index of AVL, the value of is not more than 1 's except for the last date. It reflects that the output of is more focused on STL. The same conclusion can be arrived at from the index 1 . It means that it has a higher probability to find a minimum reduction by using the proposed fitness function. For AVT,  all the values of are less than the values of 1 . Therefore, the experimental data show that the efficiency of the proposed fitness function is better than 1 by using the algorithm PSO. The same conclusion can be obtained from Table 3 by using the algorithm GA. The above two experiments show that the proposed fitness function is more adequate than the other fitness functions on the datasets.

Conclusions
In this paper, in order to overcome the drawback of the existing fitness functions for the problem of minimum attribute reduction, we discussed the fitness function and a simpler fitness function was proposed in this paper. Theoretical analysis and experimental results show that it can ensure the optimality equivalence and is more adequate than the existing fitness function.