Research Article Adjustable Fuzzy Rough Reduction: A Nested Strategy

As a crucial extension of Pawlak’s rough set, a fuzzy rough set has been successfully applied in real-valued attribute reduction. Nevertheless, the traditional fuzzy rough set is not provided with adjustable ability due to the maximal and minimal operators. It follows that the associated measure for attribute evaluation is not always appropriate. To alleviate such problems, a novel adjustable fuzzy rough set model is presented and further introduced into the parameterized attribute reduction. Additionally, the inner relationship between the appointed parameter and the reduct result is discovered, and thereby a nested mechanism is adopted to accelerate the searching procedure of reduct. Experiments demonstrate that the proposed heuristic algorithm can oﬀer us more stable reducts with higher computational eﬃciency as compared with the traditional approaches.


Introduction
Rough set theory [1] is an effective mathematical tool to qualitatively and quantitatively describe the uncertainty information in data. Due to such characteristics, it has been frequently applied in attribute reduction [2][3][4][5][6][7][8][9], which aims to select a condition attribute subset that can retain the identifiable ability of the original data. It should be pointed out that among existing attribute reduction methods, fuzzy rough set-based ones [10][11][12][13][14][15] are widely concerned with handling indiscernibility and fuzziness in data with realvalued condition attributes.
Up to now, attribute reduction with the fuzzy rough set has been studied extensively, leading to a large variety of algorithms for computing reducts [16][17][18][19][20]. Actually, the core of these algorithms lies in the fuzzy relation induced from the similarity between every two samples. Generally, a method will be regarded as favorable if it empirically performs well on the attribute evaluation used for identifying the discernibility power of candidate attributes. For example, Hu et al. [21] introduced Gaussian kernels to acquire T-fuzzy equivalence relation and gave a more effective approach for finding reducts; Hu et al. [22] also leveraged k-nearest neighbor rule to develop a robust attribute reduction method; Ye et al. [23] redefined two types of reflexive fuzzyneighborhood operators to evaluate attributes appropriately. With a careful reviewing of these methods, unfortunately, these technologies are not provided with adjustable ability. Consequently, no matter how the analytical requirements change, the characterizations of fuzzy rough sets are always fixed. Such a case may result in inflexible attribute evaluation such as the commonly used measure, i.e., approximate quality, which can further cause the undesirable generalization performance of the final reduct.
To fill such a gap, a novel adjustable parameterized fuzzy rough set model is presented firstly, and then the nested strategy is introduced into the process of searching reducts. Experimental results demonstrate that our proposal can offer us more stable reducts with higher computational efficiency as compared with the traditional approaches. e main contributions of this study are outlined as follows: (1) An adjustable fuzzy rough set model is constructed.
Two special fuzzy relations referred to as strong fuzzy relation and weak fuzzy relation are developed and discussed. Based on these two relations, a parameterized binary operator is reported and then introduced into the construction of the adjustable fuzzy relation, which can be adjusted by the setting of the parameter. Additionally, we have also suggested some properties of such fuzzy relation. (2) e nested mechanism is adopted to perform the procedure of attribute reduction. Our algorithm consists of two aspects. One is attribute evaluation, on the basis of the adjustable fuzzy rough set model, the corresponding approximate quality is presented to measure whether the candidate attribute is qualified or not; the other is searching strategy, considering the appointed parameter in the designed model, the inner relationship between the parameters and the reducts is discovered, and three types of nested reduction are discussed to find the optimal or near optimal reduct, which is defined as forward nested reduction, reverse nested reduction, and weakly rested reduction, respectively. e rest of this paper is organized as follows. Section 2 provides some background materials on the fuzzy rough set model and its attribute reduction. Section 3 introduces the proposed adjustable fuzzy rough set model, and then the nested strategy-based attribution reduction method is presented. Section 4 describes data sets, evaluation metrics, and experimental settings and then analyzes the results of comparative studies on 6 UCI data sets. Finally, Section 5 summarizes and sets up several issues for future work.

Fuzzy Rough Set
Model. Formally, a decision system DS can be considered as the 5-tuple < U, C ∪ D, V, f >, in which U � {x 1 , x 2 ,. . ., x n } is a nonempty finite set of n samples called the universe of discourse; C � {a 1 , a 2 ,. . ., a m } is a nonempty finite set of condition attributes aimed to characterize the samples; D is the decision attributes; ∀a ∈ C ∪ D, V a is the domain of attribute a; f: U × A ⟶ V is an information function.
Let U ≠ � be a universe of discourse. F: U ⟶ [0, 1] is a fuzzy set [24] on U, F (x) ∈ [0, 1] is the membership function of F. F (U) is the set of all fuzzy sets on U, fuzzy subset R ∈ F (U × U) is a fuzzy relation, and (U, R) is referred to as a fuzzy approximation space. A fuzzy relation R may own the following properties: (1) R is linear if and only if ∀x ∈ U, ∃y ∈ U, R (x, y) � 1 holds; (2) R is reflexive if and only if ∀x ∈ U, R (x, x) � 1 holds; (3) R is symmetric if and only if ∀x, y ∈ U, R (x, y) � R (y, x) holds; (4) R is transitive if and only if ∀x, y, z ∈ U, ∧y (R (x, y), R (y, z)) ≤ R (x, z). e fuzzy relation R discussed in this paper is reflexive at least. Definition 1. Let < U, C ∪ D, V, f > be a decision system, ∀a t ∈ C, the fuzzy relation induced by attribute a t is represented in the form of the following: where n is the number of samples in U, and r i j ∈ [0, 1] is the similarity between x i and x j . By the fuzzy relation in Definition 1, we can have the following: Definition 2 (see [24]). Let U ≠ φ be a universe of discourse, R is a fuzzy relation on U, ∀F ∈ F (U), the fuzzy lower and upper approximations of F in the fuzzy approximation space (U, R) are denoted by R(F) and R(F), respectively,∀x ∈ U, the memberships that x belongs to, R(F) and R(F), are defined as follows: where R (x, y) is the similarity between x and y based on the fuzzy relation R. e pair [R(F), R(F)] is referred to as a fuzzy rough set of F.

Attribute Reduction in Fuzzy Rough
Set. Attribute reduction is an effective way to eliminate irrelevant and redundant information in a decision system. It aims to achieve the minimal subset of condition attributes, which preserves a specified measure of discernibility power invariant. e common measures of discernibility power contain dependency function [25][26][27], information entropy [28,29], and monotonic measure [30]. For simplicity, we take approximate quality as an example in this subsection. Definition 3. Let < U, C ∪ D, V, f > be a decision system, ∀B ⊆ C, R B is the fuzzy relation generated by the set of attributes B, U|IND (D) � {d 1 , d 2 , ... , d p } is the partition induced by the decision D, and then the approximate quality of U|IND (D) with respect to B in the fuzzy rough set is defined as follows: where U is the sample set, B is the attribute set, D is the decision set, R B is the fuzzy relation generated by the set of attributes B, x j is the jth sample combined with U, |•| is the cardinality of a set, and d i is a decision class. By Definition 3, we can see that c (B, D) reflects the approximation abilities of the granulated spaces generated by the set of attributes B to characterize the decision D. Literatures [21,31] have proved that approximate quality is monotonic with the increasing or decreasing of condition attributes in a decision system, i.e, c (B 1 In a fuzzy rough set, the similarity of samples will decrease with the number of attributes increasing, and then the lower approximation of the decision system will increase. at is to say, the positive region will be enlarged. It is well known that the samples of the positive region are usually regarded as to be certain. erefore, the certainty of the decision system will improve. It is consistent with our intuition that new attributes (features) will bring more information about granulation and classification.
Let < U, C ∪ D, V, f > be a decision system, ∀a t ∈ B ⊆ C, and we define a coefficient as the significance of at relative to decision D.
Sig in (a t , B, D) reflects the change of approximate quality if attribute a t is eliminated from B. Similarly, we can also define the following: where a t ∈ C − B, Sig out (a t , B, D) reflects the change of approximate quality if attribute a t is introduced into B. From the above perspective, many researchers [9,[32][33][34] iteratively select the most significant attributes with forward greedy algorithm until no more deterministic rules generating with the increasing of attributes. Formally, a forward attribute reduction algorithm can be designed as follows.
For Algorithm 1, if approximate quality c (·, ·) is replaced by some other measures of discernibility power, then it can also be used to compute reducts with different measurement indices.

Strong and Weak Fuzzy Rough Set Model.
In < U, C ∪ D, V, f >, ∀a t ∈ C (1 ≤ t ≤ m), a series of fuzzy relations R t can be induced by each attribute a t . For all the m condition attributes, we can consider the following two special fuzzy relations.
(1) Strong fuzzy relation: R S � R 1 ∩ R 2 ∩ · · · ∩ R m , R S corresponds to the strong fuzzy relation matrix M(R S ). In M(R S ), ∀x i , x j ∈ U, the similarity between x i and x j , denoted by r s ij , is minimum. (2) Weak fuzzy relation: R w � R 1 ∪ R 2 ∪ · · · ∪ R m , R w corresponds to the weak fuzzy relation matrix M(R w ). In M(R w ), ∀x i , x j ∈ U, the similarity between x i and x j , denoted by r w ij , is maximum.
Definition 5. Let U ≠ � be a universe of discourse, R S is a strong fuzzy relation on U, ∀F ∈ F (U), the strong fuzzy lower and upper approximations of F in the fuzzy approximation space (U, R S ) are denoted by R S (F) andR S (F), respectively, ∀x ∈ U, the membership that x belongs to R S (F) and R S (F) are defined as follows: where R S (x, y) is the similarity between x and y based on the strong fuzzy relation R S . e pair[R s (F), R s (F)] is referred to as a strong fuzzy rough set of F. Definition 6. Let U ≠ � be a universe of discourse, R w is a weak fuzzy relation on U, ∀F ∈ F (U), and the weak fuzzy lower and upper approximations of F in the fuzzy approximation space (U, R w ) are denoted by R w (F) and R w (F), respectively, ∀x ∈ U, the memberships that x belongs to, R w (F) and R w (F), are defined as follows: where R w (x, y) is the similarity between x and y based on the weak fuzzy relation R w . e pair [R w (F), R w (F)] is referred to as a weak fuzzy rough set of F.
Proof. By formula (11), we can discuss in following three cases: , a series of fuzzy relations R t can be induced by each attribute at ∀x i , x j ∈ U, the similarity between x i and x j based on all the m fuzzy relations, denoted by r λ ij , can be adjusted by the above binary operator. e fuzzy relation constructed by r λ ij is referred to as an adjustable fuzzy relation, and it is denoted by R λ . R λ corresponds to the adjustable fuzzy relation matrix M (R λ ). □ Theorem 2. Let < U, C ∪ D, V, f > be a decision system. e adjustable fuzzy relation R λ can be obtained by adjusting the value of λ between strong fuzzy relation R s and weak fuzzy relation Proof. Since r s ij ≤ r w ij , we can discuss in the following three cases: Let U ≠ � be a universe of discourse, R λ is an adjustable fuzzy relation on U, ∀F ∈ F (U), the adjustable fuzzy lower and upper approximations of F in the fuzzy approximation space (U, R λ ) are denoted by R λ (F) and R λ (F), respectively, ∀x ∈ U, the membership that x belongs to R λ (F) and R λ (F) are defined as follows: where R λ (x, y) is the similarity between x and y based on the adjustable fuzzy relation R λ . e pair [R λ (F), R λ (F)] is referred to as an adjustable fuzzy rough set of F.
(1) By formula (10) ALGORITHM 1: Heuristic approach to compute reduct (HACR). 4 Computational Intelligence and Neuroscience eorem 3 tells us that both strong and weak fuzzy rough sets are special cases of an adjustable fuzzy rough set, and an adjustable fuzzy rough set is a generalization of strong and weak fuzzy rough sets.
eorem 4 tells us that adjustable fuzzy lower approximation is between weak and strong fuzzy lower approximations, and adjustable fuzzy upper approximation is between strong and weak fuzzy upper approximations. □

Approximate Reduction in Adjustable Fuzzy Rough Set.
As discussed in Subsection 2.2, approximate quality reflects the relevance between condition attributes and decision, and it can be used to measure the significance of a candidate attribute. Similarly, we will give the definition of approximate quality in strong, weak, and adjustable fuzzy rough sets firstly.
are the strong, weak, and adjustable fuzzy relations generated by the set of attributes B, respectively, U|IND (D) � {d 1 , d 2 ,. . ., d p } is the partition induced by the decision D, then the approximate quality of U|IND (D) with respect to B in the strong, weak, and adjustable fuzzy rough sets are defined as follows, respectively: where R s B ,R w B , and R λ B are the strong, weak, and adjustable fuzzy relations generated by the set of attributes B, respectively, U is the sample set, B is the attribute set, D is the decision set, | · | is the cardinality of a set, and di is a decision class. D) holds. eorem 5 tells us that approximation quality in the adjustable fuzzy rough set is between that in weak and strong fuzzy rough sets.
Algorithm 1 can also be applicable for attribute reduction in the adjustable fuzzy rough set, which is similar to the classic fuzzy rough set. Nevertheless, in practice, the criteria of reduction are too harsh in Algorithm 1. To expand the application scope of attribute reduction (dimension reduction, feature selection), some researchers [9,21,27,28,31,35] have introduced the threshold ε to control the change of discernibility power for relaxing the criteria of reduction, and they also consider B as a reduct of C satisfying the following conditions: e above conditions show that ε is aimed at eliminating redundant attributes as much as possible while maintaining the change of approximate quality in a smaller range. In general, ε is recommended to be 0%-10% of the original approximate quality in a decision system.
Formally, a tolerant forward approximate reduction algorithm in the adjustable fuzzy rough set can be designed as follows.

Nested Strategy-Based Reduction in Adjustable Fuzzy
Rough Set. Algorithm 2 is an enhancement version of Algorithm 1, which introduced a tolerant parameter to improve the efficiency of attribute reduction. However, in realworld applications, we always encounter data with too many objects or attributes. In this circumstance, starting with an empty set for the attribute reduction process may be time consuming and result in algorithm performance degradation. In order to deal with this issue, a nested strategy-based attribute reduction approach is introduced into the Computational Intelligence and Neuroscience 5 adjustable fuzzy rough sets model. In this subsection, some notions of the nested strategy-based reduction, i.e., Forward Nested Reduction, Reverse Nested Reduction, and Weakly Nested Reduction, are firstly presented.
Definition 9 (Forward Nested Reduction [36]). In a parameterized model, if the reducts on different approximations satisfy ∀Reduct (β), ∃Reduct (α), we have Reduct (α) ⊆ Reduct (β) which holds; then this type of nested structure is "Forward Nested Reduction." Definition 10 (Reverse Nested Reduction [36]). In a parameterized model, if the reducts on different approximations satisfy ∀Reduct (β), ∃Reduct (α), we have Reduct (α) ⊇ Reduct (β) which holds; then this type of nested structure is "Reverse Nested Reduction." Definition 11 (Weakly Nested Reduction [36]). In a parameterized model, if the reducts on different approximations satisfy ∀Reduct (β), ∃P ⊇ Reduct (β), we have P ⊇ Reduct (α) which holds; then this type of nested structure is "Forward Weakly Nested Reduction." By Definition 9-Definition 11, we know that the forward nested structure satisfies that there exists a smaller reduct included in the known larger reduct (the former is a subset of the latter), and the reverse nested structure satisfies that there exists a larger reduct containing the known smaller reduct (the latter is a subset of the former). However, the weakly nested structure is a much weaker relation than the inclusion relation of forward and reverse nested structure, that is, the intersection relation of two reducts. Theorem 6. Let < U, C ∪ D, V, f > be a decision system, ∀B ⊆ C, approximate quality c λ (B, D) is monotonic with B in the adjustable fuzzy rough set, then with the parameter λ increasing, the weakly nested reduction holds. eorem 6 presents a necessary condition for constructing nested attribute reduction. In light of the nested strategy of reduction, an algorithm is designed as follows to search for a different reduct quickly with an existing reduct. e detailed processes are presented in Algorithm 3. e most striking difference between Algorithm 3 and Algorithm 2 is that the prereduct is considered for searching new reduct in Algorithm 3. When the parameters are changed, the reduction under the new parameters can be obtained by changing only a few attributes sets of reduction with the original parameters. Since the proposed algorithm NS-THACR does not need to add attributes from an empty attribute set, the operation steps of approximation and approximation quality under repeated calculation are greatly reduced, and the efficiency of reduction execution is improved.

Experimental Analysis
To evaluate the performance of our nested strategy-based heuristic attribute reduction approach, 6 real-world UCI data sets have been employed in this paper. Table 1 summarizes some details of these experimental data sets. All the experiments are run on a workstation equipped with a 3.10 GHz processor and a 8.00 G memory. e programming language is Matlab R2014b.
In the context of this paper, 10-fold cross-validation (10-CV) is used for evaluating the effectiveness of the proposed algorithm in our experiments. In the following experiments, all samples are broken into ten groups of the same size, the nine groups compose the training set, and the one group composes the testing set. e attribute reduction process is repeated 10 times in turn and the mean value and standard deviation of 10 experimental results are recorded.
In this section, we focus on the elapsed time comparison for computing reduct by using the nested reduction strategy or not. e comparison in this part is set up as follows: (1) Ten values of the parameter λ are selected on the interval [0, 1] with the step length 0.1. (2) Seven values of ε are chosen to control the changes of approximate quality, i.e., 0%, 2%, 4%, 6%, 8%, 10%. In fact, ε � 0% means that the approximate quality of obtained reduct is equal to the original decision system and similar explanations can be achieved for another six values. (3) In order to show the effectiveness of the nested reduction strategy, THACR discussed in Algorithm 2 is presented as a benchmark algorithm. e detailed results are shown in the following Tables 2 -7. In these six tables, the first row is the elapsed time of computing THACR and the second row is the elapsed time of computing NS-THACR and the minimum elapsed time is bold. With a careful study of these tables, we can obtain the following remarks.
(1) From the results in Tables 2-7, it can be seen that for all experiments, the elapsed time of computing NS-THACR is better than the elapsed time of computing THACR, and the calculation time is reduced by more than 80 percent. In addition, the difference between ε and λ also has a certain impact on the experimental results, and the time consumed by the two methods is also different. e choice of these two parameters also requires different attempts in the experiment to search for an appropriate subset of attributes. (2) e elapsed time of computing reduct by THACR is higher than that of computing reduct by NS-THACR. For example, when we search for a reduct for dataset E. coli (Table 3), if we set λ � 0.3 and ε � 8%, then we can note from Computational Intelligence and Neuroscience THACR is 1.7782 when ε � 6%. As far as NS-THACR is concerned, the average elapsed time is 0.2985 when ε � 0%, whereas the average elapsed time is 0.2032 when ε � 6%. It is because that ε � 0% is too strict; that is, the approximate quality of reduct must be equal to the original dataset. erefore, more time is needed to search for an appropriate attribute subset.
In addition, the proposed method is compared with two other popular attribute reduction methods, one is the classical Nested Strategy-based Heuristic Approach to Computing Reduct (NS-HACR), and another is DMF-FN [36,37]. MF-FN is the nested reduction approaches designed by the discernibility matrix. e detailed results of the above six data sets are shown in Figure 1, in which Inputs: A decision systemDS � < U, C ∪ D, V, f > , the adjustable parameter λ, the threshold ε for controlling the change of approximate quality; Outputs: A reduct B.
Step 1. compute c λ (C, D); c λ (B, D) Step 5. Do ALGORITHM 2: Tolerant heuristic approach to compute reduct (THACR). Step 2. Compute the approximate qualities of c λ α (B, D)and c λ α (C, D), respectively; Step Step 4. Do Computational Intelligence and Neuroscience Table  2: Computational time of attribute reduction with using nested reduction or not (mean ± std. deviation) on diabetes.
Glass  Table  5: Computational time of attribute reduction with using nested reduction or not (mean ± std. deviation) on Iris.
Iris  Table  6: Computational time of attribute reduction with using nested reduction or not (mean ± std. deviation) on Seeds.
Seeds  Table  7: Computational time of attribute reduction with using nested reduction or not (mean ± std. deviation) on Wine.
Iris xx axis denotes different data sets and yy is the average elapsed time of three different algorithms. It is difficult to note from Figure 1 that the proposed method NS-THACR obtained the minimum elapsed time. With the above discussions, we can conclude that the proposed NS-THACR method is an effective and efficient attribute reduction method.

Conclusions
In this paper, we have discussed some weaknesses in attribute reduction based on the traditional fuzzy rough set, which is often constructed by the operator without adjustability, and thereby a new adjustable fuzzy rough set has been presented. In our approach, a parameterized operator has been applied to develop strong and weak fuzzy relations, and such two special relations can offer fuzzy rough set model adjustability. Furthermore, inspired by the inner relationship between parameter and reduct, we have also employed a nested mechanism to accelerate the searching process of parameterized reduct. e following topics deserve our further investigations: (1) some evolutionary algorithms [2] such as particle swarm optimization algorithm and ant colony algorithm can be applied to find the optimal parameter used in our proposed model; (2) some accelerated searching strategy [12] such as sampling technique and attribute group technique can be used to further reduce the elapsed time over one single parameter.

Data Availability
To evaluate the performance of our nested strategy-based heuristic attribute reduction approach, 6 real-world UCI data sets have been employed in this paper. Table 1 summarizes some details of these experimental data sets. All the experiments are run on a workstation equipped with a 3.10 GHz processor and a 8.00 G memory. e programming language is Matlab R2014b.  14 Computational Intelligence and Neuroscience