Towards Utilization of Neurofuzzy Systems for Taxonomic Identification Using Psittacines as a Case Study

Demonstration of the neurofuzzy application to the task of psittacine (parrot) taxonomic identification is presented in this paper. In this work, NEFCLASS-J neurofuzzy system is utilized for classification of parrot data for 141 and 183 groupings, using 68 feature points or qualities.The reported results display classification accuracies of above 95%, which is strongly tied to the setting of certain parameters of the neurofuzzy system. Rule base sizes were in the range of 1,750 to 1,950 rules.


Introduction
The traditional approach for class recognition involves a detailed visual inspection of the creature and comparison of that to reference images or specimens [1].This can be a slow and tedious process that is not always reliable (even experts sometimes disagree on the proper classification).A computerized system for taxonomic identification could produce more objective results in much less time.Taxonomists are just now starting to become aware of artificial intelligence techniques and their vast utility.The first systems to bridge the gap of computer science and taxonomy, or systematics, presented their work in 2005 at the symposium Algorithmic Approaches to the Identification Problem in Systematics, which was held in London [1].Many of the techniques presented included artificial neural networks or self-organizing maps.They achieved results in the range of 70% to 98% accuracy for classifying organisms into one of 19 (on average) biological groups.The specimens being studied included plankton, bees, wasps, spiders, and trees [1].In these cases, detailed photographic images (some even using electron microscopes) of the subject were provided as input to the system.The systems needed to identify very specific feature points, handle typical image processing problems, generalize well, and give accurate results.Even though the demands were met for those specimens studied, the systems do not appear ready for application to less-willing specimens, such as that of living-breathing, active, acrobatic, contortionist-like parrots.
It has been said that the illegal trade in wildlife is second only to that in narcotics and that it is worth US$5-8 billion a year [2].Of that, US$60 million is from the trade in cage birds, at which parrots constitute a large percentage.This commercial exploitation coupled with habitat destruction has placed many parrot species in a precarious position.The taxonomic order Psittaciformes consists of parrots or psittacines.Of the roughly 355 nonextinct species in Psittaciformes, over a quarter (94 species) are currently threatened with extinction (e.g., IUCN Red Listed as Critically Endangered, Endangered, or Vulnerable) [3].The illegal trade in parrots involves both smuggling and laundering or falsification of shipping documents.All US customs officials now have a list of the birds that cannot be imported or exported under CITES [4].However, it is difficult to teach busy custom officials how to recognize the species that are protected and the traders do not hesitate to deliberately misidentify the species on any required documents [4].A quick, computerized tool for taxonomic identification of parrots could greatly help officials in supporting CITES (thereby helping parrot conservation) and reducing the injustices imposed unnecessarily on so many parrots.
The few systems specializing in taxonomic identification are accurate and useful.However, as many of their researchers have noted (complained), one cannot easily review or trace the logic used to arrive at a classification result [1].That is because artificial neural networks are black boxes, which leave little for the user to make assumptions about the logic.Fuzzy systems can reason in a human-like way in the face of uncertainty and their logic can be easily understood [5].They are also simple, inexpensive, and can incorporate expert knowledge [6].However, fuzzy systems cannot inherently learn [5].On the other hand, artificial neural networks may not result in a logic that is obvious, but they can learn without human intervention through training and feedback [5,7].By hybridizing these two soft computing techniques, the best features of both can be realized in a single, reliable neurofuzzy system.Neurofuzzy systems apply learning algorithms from neural networks theory to fuzzy systems [6].Since interpretability is often considered a key element, neurofuzzy systems constrain their learning algorithms to ensure that the semantics of the trained fuzzy systems are still meaningful and accurate [8][9][10].
The goal of this research is to produce an accurate and interpretable neurofuzzy system for the taxonomic identification of psittacines using supervised learning.A good minimum accuracy would be 85% correct identifications and interpretability can be judged through the number of rules per class and the number of antecedent variables per fuzzy rule.Here, the system NEFCLASS-J is applied.It automatically determines the size of the rule base and adjusts the membership functions.The user guides the system by choosing the overall shape of the membership functions, number of fuzzy sets for each linguistic variable, rule learning procedure, aggregation function, learning rate, fuzzy set constraints, and stopping control.If a system was to be created that met all the demands, then it would be beneficial to make it web-accessible and free to use for both the public and international governments.Experts in the field of parrot taxonomy could alter the rules learned so as to reduce the number of antecedent parameters and help increase the accuracy and interpretability of the system.Then, customs officials and/or the general public could use the provided system to determine the species of any parrot.If a dispute ever arose over the decision made by the system, the logic can be traced by reading the rules for that class.
Section 2 provides an introduction to neurofuzzy systems followed by the description of NEFCLASS/NEFCLASS-J. Section 3 covers the variables, data, and methods used.Section 4 lists the results achieved and their analysis.Finally, Section 5 concludes the work.

Neurofuzzy Systems and NEFCLASS-J
The blending of neural networks and fuzzy systems, that is, neurofuzzy systems, can be classified into several different groups [11,12].For the purpose of this work, we have applied NEFCLASS Neurofuzzy Classification to the problem in hand.This is a fuzzy classifier represented in a two-layer feedforward neural network structure that is based on the fuzzy perceptron [13].It should be noted that Nauck and Kruse refer to their network as three-layer; however, this paper will follow the convention noted by Bishop and only count the layers which have weights applied to the output values of the nodes [13,14] (see Figure 1).Constraints on the learning algorithm help NEFCLASS produce interpretable results, which can sometimes be further improved through pruning [13].The fuzzy classification rules are similar in style to Mamdani fuzzy inference rules and are of the following form: IF  1 is  1 AND . . .AND   is   THEN  belongs to   , where X = ( 1 , . . .,   ) is a pattern,  1 , . . .,   are fuzzy sets, and   is a class [13].For example, a fuzzy rule might be as follows.
IF size is small AND ear length is long AND locomotion is hops THEN creature belongs to Rabbits.
The goal of NEFCLASS is to discover these rules and the satisfactory shapes for the membership functions [13].In order for the resulting system to be interpretable, the following criteria should be met: (a) few meaningful fuzzy rules, (b) few variables in the antecedents, (c) no rule weights, (d) identical linguistic terms that are represented by identical fuzzy sets, and (e) only normal fuzzy sets that are used [13].
These restrictions on the system can lower the accuracy of it, but a fuzzy system without these features can simply become a black-box model.Interpretability and accuracy tend to vary inversely.

Learning in NEFCLASS.
The NEFCLASS learning algorithm has two stages: structure learning and parameter learning [15].Structure, or rule, learning is done by partitioning the input space by the given initial fuzzy sets and creating the antecedents for the prospective rules [15].In an effort to reduce the learning time, NEFCLASS selects rules from a grid, instead of searching for hyperellipsoidal or hyperrectangular clusters [13].In parameter learning, a backpropagation-like procedure is used to adjust the fuzzy sets [15].The procedure used relies on simple heuristics that shift the fuzzy sets and enlarge or reduce their support [15].To be sure that for each linguistic value there is only one representation as a fuzzy set, shared weights are used on some network connections (but only those that come from the same input unit) [13].
2.2.NEFCLASS-J.NEFCLASS-J is a Java implementation of NEFCLASS with a GUI provided [13].NEFCLASS-J has some features NEFCLASS does not have such as the ability to handle missing values, automatically determining the number of rules, incorporating prior knowledge, learning constraints available to the user, and automatic pruning and/or crossvalidation options [13].NEFCLASS-J treats missing values as though any value may be possible [13].The initial rule base of NEFCLASS-J consists of all rules supported by the training data.The size of the initial rule base is bound from above by min{||, Π   }, where || is the cardinality of the training data, the product is for i = 1 through n, and   is the number of fuzzy sets given for   [13].NEFCLASS-J removes rules from this maximally sized rule base using pruning techniques [13].
Prior knowledge can also be incorporated by entering the rules manually before training the system.In NEFCLASS-J, this and rule pruning are the only ways to have rules that do not include every input parameter in the antecedents.
There are four pruning strategies used by NEFCLASS-J: pruning by correlation, pruning by classification frequency, pruning by redundancy, and pruning by fuzziness [15].Each of these pruning strategies is used until one fails (e.g., the error has increased or the rule base cannot be made consistent).After each pruning step, the membership functions are retrained.When pruning by correlation,  2 or information gain is used to find and delete the variable with the smallest influence on classification.When pruning by classification frequency, the rule with the largest degree of fulfillment in the least number of cases is deleted.For pruning by redundancy, the linguistic term that generates the minimal degree of membership in an active rule in the least number of cases is deleted [15].Finally, when pruning by fuzziness, the fuzzy set with the largest support is determined and all the terms used in it are removed from the antecedents of all other rules.NEFCLASS-J gives the user the option to create a classifier, create a pruned classifier, or prune an already existing classifier.The user can also choose to use -fold cross-validation, where the user enters the value for .
NEFCLASS-J is a powerful model to use when comprehensibility, low cost, simplicity, and tolerance for vagueness are important.The model works best when the problem has the following attributes: direct dependency between input variables and classification, no deep knowledge about the distribution or dependency of the variables, low dimensionality ("less than 20 variables"), and a fast solution that is desired [13].Disadvantage of NEFCLASS-J would be the inability to alter the number of hidden neurons, hidden layers, or the initial weights of the system and no built-in ability for bootstrapping the data.

Methodology
3.1.Variables.Selected features include the color of twentyone regions on the bird (see Figure 2), degree of existence of a crest, the bird's size as its weight, and its predominant color, which is the overall impression for the bird's color.All twentytwo color features are represented as three variables each, one variable for each attribute of the HSI (hue, saturation, and intensity) color model.The HSI color model was chosen over the RGB (red, green, and blue) and CMYK (cyan, magenta, yellow, and black) color models because HSI is a more natural and intuitive tool for humans to describe and interpret the color of objects [16].The HSI model space can be applied as a triangular, circular, or hexagonal cone or double cone; a cylinder; or a sphere [16,17].The one implemented here is the circular double cone (Figure 3).Hue describes the color (tint), such as the difference between blue and yellow, and is found to be the size of the angle from the red axis to the color point [16,17].Saturation is the amount of color present (shade), such as the difference between red and pink, and is the horizontal distance from the central vertical axis of the model out to the color point [17].Intensity (tone) is sometimes called brightness, lightness, or value in similar color models [17].It is the amount of light in the color, such as the difference between dark green and light green, and is found to be the vertical distance along the central axis of the HSI color model [17].
Conversion between RGB and HSI is straightforward and the formulas depend on the shape of the HSI color space chosen [7].In our case, the following formulas can be used.
Given RGB values in decimal form,  = /255,  = /255, and  = /255, if  =  = , then  = 90 ∘ and  = 0; otherwise, where  is in ∘ .If  ≤ , then  = 0; otherwise, The degree to which the bird has a crest is a subjective value on the interval [0, 1].Most parrots either definitely do not have a crest, 0, or definitely do have a crest, 1.Those birds whose crest is smaller or less conspicuous had values in the middle of the interval.The birds' size was basically equivalent to their weight, as listed in the current text by Forshaw [18].A small program utilizing Java's Random Class was used to generate pseudorandom values for the size of each bird within the ranges outlined by Forshaw.For species without weights listed in Forshaw, other references were sought but they too did not provide weights in those cases.The choices for handling these missing values included the following: (1) Leaving those values blank and NEFCLASS-J would not make any assumptions regarding them; but the error rate would increase for the system.
(2) Use only cases with complete data, thus entirely excluding some species from the system.
(3) Replace the missing values with a logical approximation as to their likely value [15].
Here, the decision was for the last option.In the majority of the cases with missing weight values, the average weight for the rest of the genus could be calculated and substituted for the missing values.In a small number of cases, the distribution of the weights in the remainder of the genus was too scattered to be taking an average from.So, the average weight of the most similar bird(s) within the genus was used to fill in the missing values.

Data.
All data was normalized to within the range [0, 1] prior to being given to NEFCLASS-J.The formulas provided for RGB to HSI conversion result in normalized HSI values and the degree of "crestedness" is already on the desired interval.For the sizes, no adult parrot had a weight listed that was less than 10 grams and the largest (the male Kakapo) was around 3,000 grams.Since the male Kakapo's weight is significantly greater than the female's (1,600 grams) and more than double the next heaviest parrot species (Hyacinth Macaw at around 1,450 grams), the upper bound for the size was set to 2,000 grams.So weights 2,000 grams and above were treated as equal values.The cutoff of 2,000 grams (and not something closer to 1,500 or 1,600 grams) was selected because some of the Macaws around that size also have a tendency to become overweight in captivity, if not properly cared for.The size values were normalized by shifting the values left and dividing, as follows: Size = (Size − 10)/1,990.
Only complete genera were included in the system.Some genera could not be separated by the features selected.Inclusion of features to distinguish among them would have caused the antecedents of all the rules for all the species to grow much larger than necessary.Since one of the main goals is for the resulting system to be interpretable, those species were better left not included for the greater sake of the overall system.The list of included species by scientific name and common name can be found in the Appendix.Also, some species with moderate to extreme sexual dimorphism (the sexes look different) were tested both when together as one case for the species and when separated as two cases for the same species.Species that were separated by gender are marked in the Appendix.When genders of a species were separated each of their data was duplicated to create the same number of patterns each, but the size values were regenerated.Every species had 20 patterns determined for it individually.I created the patterns for each case by looking at references for the species and finding the colors most closely matching, with the restriction they had to be web-safe colors.The main reference text used was that of Forshaw (2006); however, multiple sources were sought.The references used for each species are noted with a letter in the Appendix, as well.2009), which is freely available, was used on both Windows 7 Pro with SP2 and Ubuntu Linux 12.04 operating systems.Classifiers were created from scratch by choosing the options "New Project" and "Create Classifier" from the main NEFCLASS-J GUI and following their required steps.First, a project name and description are entered and then the data file is submitted.Next, the number and shape of the membership functions for each fuzzy set can be set, along with the type of aggregation function (maximum or weighted sum).Though a drop-down menu is provided, the only option listed for interpretation of the classification result is "winner-takes-all."In the Rule Creation tab, the size of the rule base can be either set to a constant value or determined automatically; the rule learning procedure can be either best or best per class; and user can choose to have the system relearn the rule base.Then, any of the following learning constraints on the fuzzy sets  can be selected: keep their relative order, always overlap, be symmetrical, and/or intersect at 0.5.Rule weights can either be not used, stay within [0, 1], or be arbitrary.

Methods. NEFCLASS-J version 2.0 (
Here, the user can also enter the learning rate () for the system.In the Training Control tab, the method for validation can be set to none, cross-validation, or single test.In the latter two choices, the user can enter the  value for cross-validation or the percentage of patterns to withhold from training to use for single testing.Lastly, the maximum number of epochs, minimum number of epochs, number of epochs after optimum, and number of admissible classification errors are entered.See Figures 4-7.
In all trials using the system, the number of membership functions representing the size, crest, saturation, and intensity variables was set to five, for example, {very small, small, medium, large, very large}, the size of the rule base was set to automatic, rule weights were not used, 5-fold cross-validation was chosen, and the number of admissible classification errors was kept at zero.Four different rounds were completed.The first round of twelve runs used data that had no species separated by gender and the parameters modified include shape of the membership functions, aggregation function, and rule learning procedure.Only one modification was done per trial run.The second round of twelve runs did use data that had some species separated by gender.This round used the same parameters of the previous.These first, second, and fourth rounds all used sixteen membership functions for the hue variables, for example, {red, red-orange, orange, orange-yellow, yellow, yellow-green, green, green-blue, blue, blue-violet, violet, violet-purple, purple, purple-magenta, magenta, magenta-red}.A third round of five different trials using the gender-separated species data was done.Each of these trials experimented with improving the interpretability of the trials that had resulted in the most accurate systems or improving the accuracy of the trials that had resulted in the most interpretable systems.Accuracy was determined by the number of misclassifications and interpretability was determined by the size of the rule base.
The parameters modified include the number of membership functions for the hue variables, the learning rate, the learning constraints, and the maximum, minimum, and postoptimal number of epochs during training.Lastly, in the fourth round of six runs the maximal size of the rule base was controlled and all four learning constraints available were used.These choices were made to increase the comprehensibility of the resulting system.It should be noted that in none of the runs were the data statistics viewable.This appears to be due to the large size of the data to display in a window too small for it (without a scroll bar provided).

Results and Analysis
4.1.Results.Four rounds of trials were performed.The first round consisted of twelve runs and utilized data that did not contain any species separated by gender.The second round was another twelve runs with similar parameters as to the first, but the data included certain species separated as two different cases, one for each gender.The third and fourth rounds consisted of five and six runs, respectively, on the same data used by the previous round.The third round was various experiments at improving either the accuracy or interpretability of the resulting systems.The fourth round restricted the size of the rule base on five runs and used all the available learning constraints.Results for all the runs are listed in Tables 1-4.

Analysis.
In the first round of training and testing using 5-fold cross-validation, the highest accuracy (percentage of correct number of classifications) was found in two runs which both had an accuracy of 97.84%.Their parameters were trapezoidal membership functions, best learning procedure, and maximum or weighted sum aggregation functions.These two runs also had the largest sized rule bases (1,931 rules) out of their round.The smallest rule base was found in 1,776 rules when the membership functions were triangular, the aggregation function was maximum, and best per class was used.However, another two runs had very similar sized rule bases (both were 1,777).The common thread among these top three runs (according to rule base size) was best per class learning procedures in all those runs and two runs were triangular membership functions and two had the maximum aggregation function.The worst performance from the round came from all four runs with bell-shaped membership functions.In these instances, only one rule was generated, which leads the system to perform extremely badly.It is difficult to say the exact cause for only this singular rule generation.It is noted online (on the website to download NEFCLASS-J) that data files without an INRANGES section are not processed properly and cause NEFCLASS-J to create just one rule or no rule at all.All the data files used in these trials had INRANGES correctly specified, but perhaps a similar problem is occurring that limits the rule base size.The source code for NEFCLASS-J is not available, the documentation that comes with the software is incomplete, and the application is no longer supported, so it is hard to determine the root cause of the problem.If it is not an internal error related to NEFCLASS-J, then the only guess available is that the bell-shaped membership functions combined with the large number of membership functions, 16, result in an odd distribution across the fuzzy set domain, thus rendering proper rule generation impossible.
In the second round, over 30 species were separated by gender as different cases for the system to learn.Compared to the first round, the second round resulted in a higher percentage of correct classifications and a larger rule base on all the runs (excluding the abnormal results from the bell-shaped membership function systems).These differences were only slight, though.It would appear, then, that the accuracy improved when some species had separated genders, but that interpretability may have declined.However, because of the separation of a number of the species, the number of classes to classify also increased, which also affects the size of the rule base.Almost undoubtedly, the number of rules per class decreased in the second round even though the overall size of the rule base grew.The smallest overall rule base size in the second round (excluding bell-shaped) was 1,784, which came from the system generated with the parameters of triangular membership functions, maximum aggregation function, and best per class learning procedure.A very close second, though, comes from another system with triangular membership functions and best per class learning procedure (1,785 rules).
The third round involved experimentation with various values for more of the parameters.Run 1 was modeled after the previous run in the second round that had the highest accuracy and the lower mean error of those top two.Here the number of membership functions representing the hue variables was cut in half to 8 to see if the rule base size could be lowered and, thus, interpretability improved.The result was a correct classification rate not that different (97.9% in round 2 and 97.02% in round 3) and a rule base size that was 150 rules smaller.Runs 2 and 4 were an effort to increase the accuracy of the system generated in round 2 that had the smallest rule base size.The learning rate and the parameters surrounding the number of epochs were altered.In both cases the rule base size and the percentage of correct classification were effectively unchanged.Run 3 was another attempt to improve interpretability.It included more learning constraints on the fuzzy set and was similar to run 1.Compared to run 1, run 3 had exactly the same size rule base and only a negligible decrease in accuracy.Finally, run 5 was an attempt at getting the best accuracy out of the already most accurate system.This was done even after considering

Scientific name
Common name the potential for overfitting and/or loss in interpretability.Doubling the values for each of the epoch-related parameters had no effect on the results, though.Lastly, the fourth round varied the maximum number of fuzzy rules and kept all four of the learning constraint options.All runs in this last round used the best per class rule learning procedure.If the choice had been made for best and the rule base size is limited, then it is possible for some of the classes (i.e., species) not to be represented at all in the rule base, hence the usage of best per class.The first run was a control run where the rule base size was left at automatic.It achieved an accuracy of 97.32% correct classifications with a rule base of 1,917 rules.Of the next five runs, only run two was able to keep an accuracy above 80% (81.23% specifically) and it had a rule base size of 1,499.For run two, the average number of rules per class was 8.191, the median (and highest value) was 9, and the lowest was 1 rule per class.The Appendix lists the number of rules for each species or class for the system generated by run two.The remaining runs all had much poorer performance, though with much smaller rule bases.The last run with a rule base size of 183 rules effectively generated a system with only one rule per class.It had the worst performance of the round (20.87% correct classifications).Also, it appears that the user provided maximal size of the rule base is only a soft limit and that NEFCLASS-J will simply use that as a guide when creating the rule base.
Overall, when trying to minimize the size of the rule base, triangular membership functions and the best per class rule learning procedure seemed to produce the top results.Limiting the size of the base, but not too much, also would help.If the goal is to maximize the accuracy, then trapezoidal membership functions and the best rule learning procedure seem to be key.Additionally, it seems that interpretability of those systems aiming for accuracy can be increased by reducing the number of membership functions for the hue fuzzy variables and increasing the constraints on all the fuzzy sets, all without reducing the accuracy by much (less than 2%).

Conclusion
A case for neurofuzzy techniques to be applied to taxonomic identification has been presented in this paper.It is, perhaps, the first ever application of such an approach.This provides a framework for other researchers to further investigate this potentially very fruitful method.Previous efforts in this area predominantly involved taxonomists building identification systems using neural networks.It could be surmised, though, that the best results would probably come from interdisciplinary team of researchers.It could be composed of a computer scientist specializing in artificial intelligence techniques, a qualified taxonomist or systematics researcher, and an expert from the field of the organisms being studied (i.e., an ornithologist).
Taxonomists desire an easy to use, quick, reliable, and accurate system for biological group classification.The taxonomist should be able to analyze and alter the logic followed for each classification.These qualities combined with the vague, imprecise nature of the data and the ease with which experts can express their knowledge using linguistic terms make neurofuzzy systems an attractive tool to use in taxonomic identification.In order to realize the potential of this, researchers should come together across departments.The impact it could have on endangered species is enormous.

Figure 1 :
Figure 1: Architecture of the NEFCLASS-J system:  1 and  2 are inputs provided and {, , } represent the shared weights in the network.

Figure 3 :
Figure3: HSI color space represented as a circular double cone.Hue is the angle from the red axis, saturation is the distance from the intensity axis, and intensity is the position on the vertical axis.

Table 1 :
Round 1.Data not separated by gender for any species; 2,820 patterns presented for use in training; NEFCLASS-J defaults for learning constraints and learning rate were kept; maximum number of epochs was 500, minimum number of epochs was 10, and number of epochs after optimum was 25.

Table 2 :
Round 2. Data separated by gender for some species (see the Appendix); 3,660 patterns presented for use in training; NEFCLASS-J defaults for learning constraints and learning rate were kept; maximum number of epochs was 500, minimum number of epochs was 10, and number of epochs after optimum was 25.

Table 3 :
Round 3.Data separated by gender for some species (see Appendix); 3,660 patterns presented for use in training; the learning constraints on the fuzzy sets are as follows: 1: keep their relative order, 2: always overlap, 3: be symmetrical, and 4: intersect at 0.5.

Table 4 :
Round 4.Data separated by gender for some species (see the Appendix); 3,660 patterns presented for use in training; NEFCLASS-J default for learning rate was kept; number of MF for hue variables was 16; MF shape was trapezoidal; aggregation function was maximum; learning procedure was best per class; all four learning constraints were used; maximum number of epochs was 500, minimum number of epochs was 10, and number of epochs after optimum was 25.