Ant Colony Optimization with Three Stages for Independent Test Cost Attribute Reduction

Minimal test cost attribute reduction is an important problem in cost-sensitive learning. Recently, heuristic algorithms including the information gain-based algorithm and the genetic algorithm have been designed for this problem.However, inmany cases these algorithms cannot find the optimal solution. In this paper, we develop an ant colony optimization algorithm to tackle this problem. The attribute set is represented as a graph with each vertex corresponding to an attribute and weight of each edge to pheromone. Our algorithm contains three stages, namely, the addition stage, the deletion stage, and the filtration stage. In the addition stage, each ant starts from the initial position and traverses edges probabilistically until the stopping criterion is satisfied.The pheromone of the traveled path is also updated in this process. In the deletion stage, each ant deletes redundant attributes. Two strategies, called the centralized deletion strategy and the distributed deletion strategy, are proposed. Finally, the ant withminimal test cost is selected to construct the reduct in the filtration stage. Experimental results on UCI datasets indicate that the algorithm is significantly better than the information gain-based one. It also outperforms the genetic algorithm on medium-sized dataset Mushroom.


Introduction
Cost-sensitive attribute reduction has gained much research interest in rough sets.People have considered three types of costs, namely, test cost [1], misclassification cost [2], and delay cost [3,4].Test cost is the money and/or time that is required to obtain attribute values.In real applications, only part of tests is needed to maintain enough information for classification.We would like to choose an attribute reduct [5] that minimizes the total test cost.This issue is called the minimal test cost reduct (MTR) problem [1].
The minimal test cost attribute reduction (MTR) is proposed for saving resources, and the problem is meaningful in applications.The MTR problem is more general than the minimal reduct problem which is NP-hard; hence the MTR problem is at least NP-hard.Consequently, heuristic algorithms are needed to deal with such problem.Heuristic algorithms including information gain-based -weighted reduction algorithm [1] and genetic algorithm [6] have been designed to deal with MTR problem.Unfortunately, they often do not find the optimal solution for medium-sized datasets.Although the competition approach [1] helps to improve the performance through constructing a population of reducts, the results are still unsatisfactory.There is still room to obtain more sophisticated algorithms through other techniques.
In this paper, we develop an algorithm based on ant colony optimization for the MTR problem.The attribute set is represented as a complete graph with each vertex corresponding to one attribute.Then batches of ants are generated for attribute subset selection.Our algorithm contains three stages, namely, the addition stage, the deletion stage, and the filtration stage.First, in the addition stage, core vertexes are compressed into one vertex as the initial position of all ants.From the initial position, each ant selects next vertex according to test costs of each adjacent vertex and pheromone of each adjacent edge.If the attribute vertexes an ant has traveled satisfy the positive region condition, the ant stops, otherwise continues to add attributes.Second, in the deletion stage, redundant attributes are deleted from the obtained attribute subsets, and the pheromone of each edge is updated.Two strategies including centralized deletion and distributed deletion are designed for this stage.Third, the ant with the least test cost is selected, and the attribute subset corresponding to its path is output as the result in the filtration stage.
To evaluate the performance of algorithms, we adopt three measures, namely, finding optimal factor (FOF), maximal exceeding factor (MEF), and average exceeding factor (AEF) [1].Experimental results indicate that our algorithm outperforms the information gain-based algorithm in most datasets with different test cost distributions.It can obtain better results than the genetic algorithm except some small datasets.One possible reason is that the ant colony optimization algorithm produces more diverse solutions than the existing ones.The distributed deletion strategy is superior to the centralized one on medium-sized dataset Mushroom.
The rest of the paper is organized as follows.Section 2 reviews the basic concepts in rough sets and decision system.Section 3 proposes the ant colony optimization to tackle the minimal test cost reduction problem.In Section 4, we present our experiment schemes and show the results.We also give a simple analysis of our experimental results.Finally, Section 5 presents the conclusion.

Preliminaries
This section reviews basic knowledge: test-cost-independent decision systems, relative reduct, minimal test cost reduct problem, genetic algorithm, and ant colony optimization.

Test-Cost-Independent Decision Systems.
Most supervised learning approaches are based on decision systems.A decision system is often denoted by  = (, , , {  |  ∈ ∪}, {  |  ∈ ∪}), where  is a finite set of objects called the universe,  is the set of conditional attributes, also called the set of tests,  is the set of decision attributes, also called the decision,   is the set of values for each  ∈  ∪ , and   :  →   is an information function for each  ∈  ∪ .We often denote {  |  ∈  ∪ } and {  |  ∈  ∪ } by  and , respectively.A decision system is often represented by a decision table, as shown in Table 1.
We consider the simplest case though most widely used type of cost-sensitive decision systems as follows.
Definition 1 (see [7]).A test-cost-independent decision system (TCI-DS)  is the 6-tuple  = (, , , , , ) , where , , , , and  have the same meanings as in a decision system and  :  → R + ∪ {0} is the test cost function.Test costs are independent of one another; that is, () = ∑ ∈ () for any  ⊆ .We usually use a vector  = [( 1 ), ( 2 ), . . ., ( || )] to represent the cost function.An exemplary cost vector is shown in Table 2.In fact, cost-sensitive decision systems are more general than decision systems.If all elements in  are 0, a TCI-DS coincides with a DS.For simplicity, free tests are not considered in this work.This consideration is reasonable since we always need some cost to obtain data.
Most existing reduct problems aim at finding the minimal description of the data.And the objectives include finding attribute subsets with the minimal size [5,19], the minimal space [20], or a covering with the minimal number of subsets [16].Since the test cost issue is the focus of this paper, we are interested in reducts with the minimal test cost.This type of reducts is defined as follows.
The set of all minimal test cost reducts is denoted by MTR().And the problem of constructing MTR() is called the minimal test cost reduct (MTR) problem.As indicated in [1], the time complexity of computing MTR() is the same as Red().

The Minimal Test Cost Reduct Problem.
Attribute reduction is a key issue of the rough sets research.The classical [5], covering-based [16,21], decision-theoretical [3], variableprecision [10], dominance-based [22], and neighborhood [23] rough sets models address the reduction problem from different perspectives.A number of definitions of relative reducts exist [5,19,23,24] for different rough sets models.This paper employs the definition based on the positive region.

The Genetic Algorithm.
In the computer science field of artificial intelligence, a genetic algorithm (GA) is a search heuristic that mimics the process of natural evolution [25].This heuristic (also sometimes called a metaheuristic) is routinely used to generate useful solutions to optimization and search problems.Genetic algorithms belong to the larger class of evolutionary algorithms (EA), which generate solutions to optimization problems using techniques inspired by natural evolution, such as inheritance, mutation, selection, and crossover [25].In [26], the genetic algorithm is employed to evolve the cost-sensitive decision trees.Recently, the genetic algorithm has been employed to tackle the minimal test cost reduction problem [6] and attribute reduction with test cost constraint [27].

The Ant Colony
Optimization.Swarm intelligence is a relatively new approach to problem solving that takes inspiration from the social behaviors of insects and other animals [28].The ant colony optimization (ACO) algorithm is a probabilistic technique for solving computational problems, which can be reduced to finding good paths through graphs.ACO algorithms are state-of-the-art for the sequential ordering problem [29], the vehicle routing problem with time window constraints [30], the quadratic assignment problem [31], the arc-weighted l-cardinality tree problem [32], and the shortest common supersequence problem [33].In rough sets, the classical attribute reduction problem has been tackled by the ACO [34,35].

Evaluation Measures.
For evaluating the experiment results, we adopt evaluation measures proposed in [1].The new algorithm is compared with the information gain-based heuristic algorithm on four UCI datasets [36].
We need a measure to evaluate the quality of one particular reduction.Since an algorithm can run on many datasets or one dataset with different test cost settings, we adopt three metrics from a statistical viewpoint.They are finding optimal factor (FOF), maximal exceeding factor (MEF), and average exceeding factor (AEF) [1].

Finding Optimal Factor.
Let the number of experiments be  and the number of successful searches of an optimal reduct .The finding optimal factor is defined as op =   . (2) 2.6.2.Exceeding Factor.For a dataset with a particular test cost setting, let   be an optimal reduct.The exceeding factor of a reduct  is The exceeding factor provides a quantitative metric to evaluate the performance of a reduct.It indicates the badness of a reduct when it is not optimal.Naturally, if  is an optimal reduct, the exceeding factor is 0. To demonstrate the performance of an algorithm, statistical metrics are needed.Let the number of experiments be .In the th experiment (1 ≤  ≤ ), the reduct computed by the algorithm is denoted by   .The maximal exceeding factor is defined as This shows the worst case of the algorithm given some dataset.Although it relates to the performance of one particular reduct, it should be viewed as a statistical rather than an individual metric.The average exceeding factor is defined as Since it is averaged on  different test-cost-sensitive decision systems, it shows the overall performance of the algorithm solely from a statistical perspective.

The Algorithm
In this section, one hand, we revise the classical ant colony optimization.On the other hand, we present our ant colony optimization with different techniques to tackle the minimal test cost reduct problem.

The Problem Representation and Algorithm Framework.
In order to apply ant colony optimization to tackle the minimal test cost reduction problem, we adopt the following model.
Graph.The decision system is represented as a graph.
Vertex.An attribute is represented as a vertex.Each feature vertex has information about test cost.
Edge.There is an edge between any two vertexes.Each edge has information on pheromone density.
Adjacent Matrix.The values of the matrix represent pheromone density of each edge.
Because our objective is attribute reduction, not classification which produces rule sets, we do not adopt the tree structure such as the ant colony decision tree.In this section, we employ the first, simplified ant colony optimization-AS.We represent the general algorithm framework as follows.
Stage 1 (addition stage).Compute the core of the dataset. batches of ants are created with each batch containing  ants, therefore giving a total of  ×  ants.Each ant takes the core as the starting position.From the initial positions, each ant traverses edges probabilistically until the stopping criterion is satisfied.
Stage 2 (deletion stage).Delete redundant attributes and update the pheromone of the traveled path.

Stage 3 (filtration stage).
Gather the attribute subsets obtained by ants, and compute their test cost.Choose the attribute subset with minimal test cost, and output it as the result.
Throughout the paper, the stopping criterion is the positive region condition.The selecting probability depends on the test cost of each adjacent attribute vertex and the pheromone of each adjacent edge.The probabilistic transition rule is where  is the number of the ant,   = 1000/(  ) as the heuristic information,   means the pheromone density of the edge (, ),  is the exponent of   ,  is the exponent of   , and N   is the set of all unvisited adjacent vertexes of the attribute .
The difference between the centralized deletion strategy and the distributed strategy is the time to delete attributes.The deletion of redundant attributes follows the finish of the journey of all ants in the case of the centralized deletion strategy.When using the distributed deletion strategy, the algorithm deletes redundant attributes after each ant travels a path, and then it adjusts the path.In this situation, the pheromone of edges adjoining to redundant attribute vertexes will not be updated.The adjusting method is illustrated in Figure 2. Monte Carlo methods are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results that is by running simulations many times over in order to calculate those same probabilities heuristically just like actually playing and recording your results in a real casino situation: hence the name.We employ Monte Carlo method to simulate the process that an artificial ant selects the next attribute vertex probabilistically.

The ACO with Distributed Deletion.
We represent the substantial algorithm of the ant colony optimization with distributed deletion strategy as Algorithm 1.The algorithm with centralized deletion strategy is similar.
We explain the algorithm in details as follows.Lines 1 through 8 correspond to Stage 1 in the algorithm framework.In lines 2 through 4,  batches of ants are created, with each batch containing  ants, therefore giving a total of  ×  ants.Each ant takes the core, as indicated by line 5. Lines 9 through 21 represent Stage 2. Select vertexes until all ants in the batch meet the positive region condition.Whenever an ant stops, it removes the redundant attributes and adjusts the traveled path.In line 18, the ants release pheromone to the adjusted paths.After all ants finish their journey, select the best reduct as lines 22 through 27 have shown.Obviously, these lines relate to Stage 3. If we adopt the centralized deletion strategy, the algorithm deletes redundant attributes after all ants finish their journey.

A Running Example.
The TCI-DS is given by Tables 1  and 2. We illustrate the process of an ant in Figure 1, where attributes Headache, Muscle, Temperature, and Snivel are represented by 0, 1, 2, and 3, respectively.
After an ant travels a path, the redundant attributes are removed.After deletion, the algorithm reconstructs the path.This adjusting process is represented in Figure 2.

Stage 1 (addition stage)
We compute the probability of three selections: Because  03 >  01 >  02 , the ant will select this as next feature with high probability.That is to say, the ant does not select the attribute necessarily.In this case, we assume the ant choose the feature Snivel.At the same time, the ant will add the selected attribute to the visited attribute set.We find the selected attribute set the ant has visited does not satisfy the positive region condition, so it will continue to select next attribute.
More over  31 >  32 ; then the artificial ant[0] almost selects the attribute Muscle pain.

Stage 1.3 (pheromone updating).
After an artificial ant stops working, it will update the pheromone density of edges traveled by itself.In our algorithm, when one path is crossed by an ant, the pheromone diffusion will increase by one.Of course, the rule of pheromone updating may be designed in other methods.We adopt the way in this paper.
In this example, the sequence attribute selection of ant[0] is Headache, Snivel, and Muscle pain.The pheromone of edge (0, 3) and edge (3, 1) increases by one.
Each artificial ant runs in above three stages.
Stage 2 (deletion stage).After all ants have stopped working, the algorithm will delete redundant attributes from the selected attribute sets of last few ants using positive region.If an attribute does not contribute to the positive region, we remove it.

Stage 3 (filtration stage).
After deletion, we choose the reduct obtained from last  ants with minimal test cost as the result.
In this example, we adopt the centralized deletion strategy.Figure 2 illustrates the distributed deletion strategy in an iteration.In the figure, the ant travels the paths 0, 1, 2, and 3. Suppose the attribute 1 is a redundant attribute, we remove it from the attribute subset and adjust the path.The adjusting process is shown in Figure 2(c).

Experiments
In this section, we try to answer the following questions by experimentation.
(1) How does the number of ants in the ant colony influence the result?
(2) How does the strategy of deleting redundant attributes influence the quality of the result?
(3) Does our ant colony optimization algorithm outperform the existing one?
The UCI datasets we used are Zoo, Voting, Tic-tac-toe, and Mushroom.Since these datasets have no test cost settings, for statistical purposes, we apply three common distributions to generate random test cost.The three distributions are uniform distribution, normal distribution, and Pareto distribution.In this paper, the test cost is a random integer ranging from 1 to 100.The exponent  in ( 6) is set to be 2, since under many circumstances this value is a good setup [28].
We do not design the parameter learning mechanism.The employed competition approach has the selection mechanism of the parameter.Different applications use the same range, and users do not specify the value of the parameter.The strategy is more straightforward than many other parameter tuning strategies.However, we may design other strategies to save time.Finding optimal factor, minimal exceeding factor, and average exceeding factor are employed to measure the effectiveness of the algorithms.We run each algorithm on datasets with 1000 times.The results have statistical characteristics.

The Influence of Ant Counts on Experiment Results
. In order to find the relationship between the number of ants and the quality of the result, we conduct this experiment.We run our algorithm with 100 ants, 150 ants, 200 ants . . .400 ants.In this section, the number of experiments is set to be 100.The exponent  in ( 6) is set to be 2. Results are shown in Tables 3  and 4.

Comparison with Existing Heuristic
Algorithms.According to the result of the above experiments, we find 100 is the optimal setting of the number of the ants.We conduct an empirical study to examine the effect of our algorithm.To improve the performance, we use the competition approach [1] to enhance our algorithm.The exponent  in ( 6) is set as integers ranging from 1 to 4. We illustrate the results among the information gain-based algorithm, GA-1, GA-2, ACO with centralized deletion and the ACO with distributed deletion, in Table 5.For clarity, the results on dataset Mushroom are shown in Figure 5.
Our algorithm is tested on the UCI datasets 1000 times, respectively.The new algorithm with different techniques is compared with the information gain-based algorithm [1] and the genetic algorithm [6].Experimental results are listed in Figures 3, 4, and 5 and Tables 3, 4, and 5.

Experimental Results.
Now we can answer the questions proposed at the beginning of this section.
(1) Experiment results indicate that the effect of the algorithm becomes worse with the increment of the ant counts.We find that 100 is the optimal setting of the number of ants.So, we run our algorithm with 100 ants to compare with the existing heuristic algorithms.

Conclusions
In this paper, we have pointed out the shortcoming of the existing heuristic algorithms including the information gainbased one and the genetic one.We have designed a new algorithm based on ant colony optimization to tackle the MTR problem.In deletion stage, our algorithm contains centralized deletion strategy and distributed deletion strategy.We have tested the new one with three representative test cost distributions on four UCI datasets.Experimental results show that the new algorithm outperforms the existing ones significantly, especially 1'on medium-sized dataset such as Mushroom.Moreover, when distribution is normal, the distributed deletion strategy obtains better results than the centralized one on Mushroom dataset.

Figure 2 :
Figure 2: (a) The path which an ant traveled, (b) Delete the redundant attribute, (c) Reconstruct the path.

Table 1 :
An exemplary decision table.

Table 2 :
The test cost vector.

Table 3 :
The finding optimal factor of ACO with different numbers of ants without competition method using centralized deletion.

Table 4 :
The finding optimal factor of ACO with different numbers of ants without competition method using distributed deletion.

Table 5 :
Results of -information gain algorithm and the test cost based ant colony optimization with centralized deletion and distributed deletion.