Application of Artificial Bee Colony Optimization Algorithm for Image Classification Using Color and Texture Feature Similarity Fusion

With the advancement in image capturing device, the image data is being generated in high volumes. The challenging and important problem in image mining is to reveal useful information by grouping the images into meaningful categories. Image retrieval is extensively required in recent decades because CBIR is regarded as one of the most effective ways of accessing visual data. Conventionally, the way of searching the collections of digital image database is by matching keywords with image caption, descriptions and labels. Keyword based searching method provides very high computational complexity and user has to remember the exact keywords used in the image database. Instead, our paper proposes image retrieval system with Artificial Bee Colony optimization algorithm by fusing similarity score based on color and texture features of an image thereby achieving very high classification accuracy and minimum retrieval time. In this scheme, the color is described by color histogram method in HSV space and texture represented by contrast, energy, entropy, correlation and local stationary over the region in an image. The proposed Comprehensive Image Retrieval scheme fuses the color and texture feature based similarity score between query and all the database images. The experimental results show that the proposed method is superior to keywords based retrieval and content based retrieval schemes with individual low-level features of image.


Introduction
Images are rich in content and they facilitate international exchanges without language restrictions. Retrieving images from a vast collection has become one of the interesting challenges and has drawn the attention of researchers towards the development of retrieval approaches based on low-level features of image. In recent years, CBIR system has very broad and important application in areas including military affairs, medical science, education, architectural design, and agriculture. The traditional Keywords-Based Image Retrieval (KBIR) [1] retrieves images by matching keywords with image annotations, labels, often returning irrelevant results and consuming more time. In addition, it is clearly partial to describe content-rich multimedia information with a small amount of text [2]. The CBIR system deals with the low-level features to describe the content of the image and it breaks through the limitation of traditional text query techniques. The main idea of image retrieval system is to analyze image information by low level features of an image [3] which include color, shape, structure, and texture of object and to set up feature vectors of an image as its index. The implementation of image retrieval system based on single feature describes the content of an image from a specific angle that might be suitable for simple images, besides relating an image with a single feature that is incomplete. Representing an image with multifeatures from multiangles is expected to achieve better results since different features reflect the uniqueness of the image. In the proposed method, the similarity scores of color and texture features are combined in a suitable way to achieve the intended results. The stages of the Comprehensive Image Retrieval (CIR) scheme are depicted in Figure 1. Initially the query image is resized to 150 × 150 in the preprocessing stage and it is followed by the feature extraction stage. In this paper, the feature extraction stage derives the color and texture features from the image. Color features are extracted using color histogram method and the texture features are derived from cooccurrence matrix of the image. Different features reflect the different characteristics of the image; if those features are integrated reasonably, the retrieval process will be a complete one. In our proposed system, the similarity between query image and each of the images from target database are derived and they are normalized. The normalized similarity values of color and texture are combined using fusion algorithm and fusion weights are assigned adaptively by Artificial Bee Colony optimization algorithm [4] to improve the image retrieval performance.
Yue et al. [5] proposed a technique to retrieve images by combining the similarity score between query image and the database image based on color and texture features. The basic experimental idea is extracted from this paper. The local and global color histogram is used as color feature and texture feature vectors are derived from cooccurrence matrix of the image. The fusion weights of color and texture are assigned equal values. As fusion weights were not optimum, the effectiveness of retrieval performance was inconsistent. Prasad et al. [6] proposed a technique to retrieve images by region matching using a combined feature index based on color, shape, and location within the framework of MPEG-7. Simulation results show that the image retrieval was done only based on dominant regions within each image and hence the retrieval performance was incomplete. Chun et al. [7] introduced a CBIR method based on an efficient combination of multiresolution color and texture features. Autocorrelograms of the hue and saturation component of image in HSV space are considered as color feature and value component from Block Difference of Inverse Probabilities (BDIP)-Block Variation of Local Correlation Coefficients (BVLC) extracted in the wavelet transform domain were considered as texture feature. Experimental results for various image resolutions provided considerable retrieval accuracy only with high-dimension feature vectors.
Tai and Wang [8] proposed a novel image feature called color-texture correlogram which is the extension of color correlogram, and the simulation performance shows that the retrieval of images using spatial correlation of colortexture feature vectors gives good retrieval accuracy. Almost all the related work concentrated on extraction methodologies of a range of low-level features of the image and comprehension of those features to have efficient retrieval routine without considering the optimum fusion weight value. In [9], two-dimensional histograms of the CIELab chromaticity coordinates were chosen as color features and variance extracted by discrete wavelet frames analysis were preferred as texture features. The mean and covariance of RGB values and the energies of Discrete Cosine Transform (DCT) coefficients are used as color and texture features in [10]. A number of approaches have been proposed in literature to obtain substantial classification accuracy by combining low-level features. Typical examples of the CBIR retrieval systems include QBIC [11], Virage [12], Photobook [13], VisualSEEK [14], Netra [15] and SIMPLIcity [16].
The proposed ABC algorithm-based similarity score fusion technique is capable of estimating a high classification accuracy and reasonable retrieval rate with less retrieval time by means of optimal fusion weights. Image retrieval results are analyzed based on Multifeature Similarity Score Fusion (MSSF) without fusion weight optimization and proposed MSSF with ABC optimization algorithm. The sections of the paper are organized as follows: Section 2 deals with the details and the scheme of comprehensive retrieval model with color and texture features. Section 3 covers the experimental results and performance analysis of the multifeature similarity score fusion-based image retrieval method and its hybrid version with ABC algorithm. Section 4 depicts the overall conclusion and the scope of future work.

Architecture of Comprehensive Image Retrieval Scheme
Image classification and retrieval utilizes the visual content of an image, to search for similar images in large-scale image databases, according to user's interest. The CBIR problem is motivated by the need to search the exponentially increasing space of image and video databases efficiently and effectively. In a typical image retrieval system, features related to visual content such as color, shape, structure, or texture are extracted from the query image; the similarity between the individual or set of features of the query image and that of each image in a benchmark database is computed, and target images which are most similar to the query image are retrieved. In our proposed system, the content of an image is analyzed in terms of combined low-level features such as color and texture extracted from the query image and used for retrieval. The proposed image retrieval scheme with the stages such as image preprocessing, feature extraction, similarity measurement with normalization, multi-feature similarity score fusion, and fusion weight optimization are illustrated in Figure 2.  important task that must often be done when processing digital images before features are derived. The first step in feature extraction process is to pull out the image features to a distinguishable extent. When the input data to an algorithm is too large to be processed and if the data has no much information, then the input data will be transformed into a reduced representation set of features and also named as feature vector. Transforming the input data into the set of features is called feature extraction. If the features extracted are carefully chosen it is expected that the features set will extract the relevant information from the input data in order to perform the desired task using this reduced representation instead of the full size input. Color information is the most intensively used feature for image retrieval because of its strong correlation with the underlying image objects. The texture is another widely used feature in image retrieval, which is intended to capture the granularity and repetitive patterns of surfaces within an image [3]. Color and texture feature extraction methods are elaborated in this section.

Color Feature Extraction.
The goal of color feature extraction is to obtain compact, perceptually relevant representation of the color content of an image. Color is one of the most widely used visual features and is invariant to image size and orientation [17,18]. The image retrieval problem is motivated by the need to search the exponentially increasing space of image and video databases efficiently and effectively. The visual content of an image is analyzed in terms of lowlevel features extracted from the image. One of the most important features that make possible the recognition of images by humans is color. Color is a property that depends on the reflection of light to the eye and the processing of that information in the brain. Colors are defined in threedimensional color spaces such as RGB (Red, Green, and Blue), HSV (Hue, Saturation, and Value), or HSB (Hue, Saturation, and Brightness).
In this proposed scheme the color features are extracted using Color Histogram in HSV color space. The steps to find the color feature are as follows.
Step 1. RGB image is transformed to HSV model and the perceived distance between colors is in proportion to Euclidean distance between corresponding pixels in HSV color model.
Step 2. Quantization in terms of color histograms refers to the process of reducing the number of bins by taking colors that are very similar to each other and putting them in the same bin. In the HSV space the Hue is quantized into 16 bins and is among [0, 15], saturation is quantized into 4 bins and is among [0, 3], and value is quantized into 4 bins and is among [0, 3]. Among these, the human cognition about color is mainly based on hue, then saturation and finally value. Thus the quantized results are coded as in where C is a integer between 0 and 255.
Step 3. Obtain the color histogram of an image in HSV color space. Color histogram is a representation of the distribution of colors in an image. For digital images, a color histogram represents the number of pixels that have colors in each of a fixed list of color ranges that span the image's color space, the set of all possible colors.
Step 4. The similarity between color feature values of query image and that of the database image is calculated using 4 ISRN Artificial Intelligence Euclidean distance similarity measure given in (2); the closer the distance, the higher the similarity where q = {q 0 , q 1 , . . . q L−1 } is the feature vector of query image, and s = {s 0 , s 1 , . . . s L−1 } is the feature vector of the database images, L is the number of dimensions of image features.

Texture Feature Extraction.
Texture is a visual feature that refers to inherent surface properties of an object and their relationship to the surrounding environment. This section proposes a texture feature representation scheme based on image cooccurrence matrix. Cooccurrence matrix is widely used to extract texture feature in gray-scale image and has been shown to be very efficient. The color image will be converted to a gray-scale image by (3) and the number of the gray scale value is 256: where Y is the gray-scale value and R, G, B represent red, green, and blue components, respectively. The cooccurrence probabilities provide a second-order method for generating texture features [18]. These probabilities represent the conditional joint probabilities of all pair wise combinations of gray levels in the spatial window of interest given two parameters: inter pixel distance (δ) and orientation (θ). The probability measure can be defined as where C i j (the cooccurrence probability between gray levels i and j) is defined as where P i j represents the number of occurrences of gray levels i and j within the given window, given a certain (δ, θ) pair; and G is the quantized number of gray levels. The sum in the denominator thus represents the total number of gray level pairs (i, j) within the window. Statistics applied to the cooccurrence probabilities to generate the texture features are defined in Local The gray-scale quantification is made and the corresponding cooccurrence matrix of size 256 × 256 is obtained. The statistical properties such as contrast, energy, entropy, correlation, and local stationary are calculated using (6)- (10) to describe the image content. The texture features are extracted in the following five steps.
Step 1. The color image is converted to gray-scale image and the image cooccurrence matrix is derived using (4) and (5).
Step 3. Mean and variance of the above five parameters are taken. The results are the ultimate texture features and are denoted as Step 4. The similarity value between the query image and that of the database images are calculated using Euclidean distance by (2); the closer the distance, the higher the similarity.

CIR Based on Color and Texture Feature Similarity Scores.
The features such as color, shape, structure, and texture are used to describe the image content from different angles. In image retrieval, usage of additional features provides more information on the image content than the usage of single feature. Hence, in our paper, the similarity scores based on color and texture features are combined with a weighted fusion function for describing every image in the database. Since the physical meanings of different features are different, the similarity value ranges are totally different. Before comprehending the features the similarity scores must be normalized. Normalization is a scaling down transformation of the features to appreciably low values. This is important for the classification algorithm to have better performance with less computation time. The min-max normalization method is the most preferred method in majority of the research work because it yields a steady range of normalized values. It performs a linear transformation on the original data scores. Hence the similarity scores of color and texture are normalized by means of min-max normalization using (8). The normalization procedure is given as follows: let Q be the query image and ISRN Artificial Intelligence 5 The weighted fusion algorithm is applied to find comprehensive similarities to improve the retrieval accuracy and precision using where S Ci and S Ti are normalized similarity values of color and texture features, W C and W T are the weight values for color and texture similarity scores, respectively. In this CIR scheme, the two weight values are approximately equal, that is, the weights of the color and texture are 0.5 and 0.5. The retrieval performance of combined color and texture feature similarity scores method is better than the retrieval methods with single feature.

Application of Artificial Bee Colony Algorithm for CIR
Scheme. During the course of similarity score fusion, a key problem is how to assign the weights of similarity score. It affects directly the retrieval performance of the system. To resolve the problem of obtaining the optimum weight value for color and texture feature similarity score and to implement a fast and robust CIR scheme, Artificial Bee Colony algorithm is used. In ABC algorithm, the position of a food source represents a possible solution to the optimization problem and the amount of nectar of a food source corresponds to the quality of the associated solution.
The number of employed bees or the onlooker bees is equal to the number of solutions in the population. At the first step, the ABC generates a randomly distributed initial population (P) of SN solutions where SN denotes the size of employed bees or onlooker bees. Each solution x i is a D-dimensional vector where D is the number of optimization parameters and i = 1, 2, 3, . . . SN. After initialization, the population of the positions is subject to repeat Maximum Number of Iterations (MNIs) of the search processes of the employed bees, the onlooker bees, and the scout bees. An employed bee produces a modification on the position in bee's memory depending on the local information and tests the amount of nectar at the source. If the amount of nectar of the new one is higher than that of the previous one, the bee memorizes the new position and forgets the old one. Otherwise the bee keeps the position of the previous one in its memory. The ABC algorithm is shown below.
Step 1. Create an initial population of artificial bees within the search space x i j .
Step 2. Evaluate the fitness of the population using sphere function given in Step 3. While (stopping criterion not met) where stopping condition is Maximum Number of Iterations (MNI).
(ii) Evaluate the fitness value using (14) and apply the selection process between x i j and v i j using (16) and (17). An artificial onlooker bee chooses a food source depending on the new positions, using In order to select the better nectar position found by an onlooker, O b is defined as where P i is the best fitness value of the solution i which is proportional to the nectar amount of the food source in the position i and n is the number of food sources which is equal to the number of employed bees.
(iv) Produce new solutions (new positions) v i j for the onlookers from the solutions x i j selected depending on P i and evaluate them.
(v) Determine the abandoned solution (source) x i j , if it exists and replace it with a new randomly produced solution x i j for the scout bee using where x min and x max are the lower and upper limit, respectively, of the search scope on each dimension.
(vi) Memorize the best food source position (solution) achieved so far.
The application of this algorithm for assigning fusion weight in multifeature similarity score fusion method is helped to gain a better image retrieval performance.

Experimental Results and Performance Analysis
The experimental analysis and simulation determine the efficiency and capability of the work. In this simulation, results for various modules implemented, namely, image preprocessing, feature extraction, similarity score calculation and normalization, score fusion, and ABC optimization technique are explained to quantify the benefits of image retrieval based on multifeatures. The system is tested on Intel Core 2 Duo (1.66 GHZ, clock speed), 2 GB RAM desktop PC using MATLAB R2000b. For testing the validity of the proposed scheme, a set of 800 images from COREL database library are taken. This dataset was used in various scientific articles for content-based image retrieval systems.

ISRN Artificial Intelligence
The database contains 10 image classes with 100 images each. The classes are Africa, Beach, Buildings, Buses, Dinosaurs, Flowers, Elephants, Horses, Food, and Mountains. Six images from each class are randomly selected and used as query image. The application of ABC in the CIR scheme creates a significant improvement in the classification accuracy level and reduction in computational time apart from getting consistent results.

Image Preprocessing and Feature Extraction.
The images from the COREL image library are resized to 150 × 150 to make the retrieval process efficient. Then the two features, namely, color and texture features are derived for all the images. Feature extraction is the process of interacting with images and performs extraction of meaningful information of images with descriptions based on properties that are inherent in the images themselves. A Feature extracted from an image is generally represented as a vector of finite dimension. The feature vector dimension is one of the most important factors that determine the amount of storage space, the retrieval accuracy, and the retrieval time [19]. All the RGB images in the database are transformed to HSV space model using simple nonlinear transformation. The transformed images are used to acquire color histogram, and then quantization is done to reduce the number of bins by taking colors that are very similar to each other and putting them in the same bin. The quantized results using (1) in Section 2 are stored in a text file for further processing. The coded value obtained is an integer between 0 and 255. Texture is another widely used feature in content-based image retrieval system. The statistical properties, namely, contrast, energy, entropy, correlation, and local stationary are calculated in all four orientations from cooccurrence matrix. Finally the mean and variance of each parameter are taken as a component of the texture feature. The mean and variance of the statistical properties in all four orientations are tabulated in Table 1 for sample images.

Similarity Score Calculation and Normalization.
In image retrieval, the sum of squares of the difference between feature vectors of query image and the image from target database is calculated to get the Euclidean distance. The similarity score between query image and images in database is derived using the Euclidian distance with (2) in Section 2.1.1. Sample similarity scores are tabulated in Table 2; the closer the distance, the higher the similarity. If the query and database image are same then similarity score is equal to zero. If they are different the Euclidian distance is the similarly score and is stored in a text file for further processing. Since the physical meanings of different features are different, and value ranges are totally different, similarity scores of different features cannot be compared. So, before fusing the multifeature similarity scores, they should be normalized. The normalization is done using (12) given in Section 2.2 and the values obtained for few samples of color-based similarity score is tabulated in Table 3.

Standard and Evolutionary-Based Fusion of Similarity
Scores. The individual features cannot express the semantic information of an image sufficiently. Hence, the color and texture features are combined to realize comprehensive image retrieval. The global similarity function S Fi is computed as a weighted sum of the similarities shown in (13) in Section 2.2. In our experiment, the fusion weights W C and W T are obtained from Artificial Bee Colony algorithm. The performance is analyzed using Confusion matrix method which contains information about actual and predicted categories of a retrieval system. The query images are presented to the CIR scheme with fusion weight optimization using ABC algorithm and the true-positive, true-negative, false-positive, and false-negative values are tabulated in Table 4. The precision rate, recall rate, classification accuracy, and error rate are calculated using (19) for all six query images of bus class and are shown in Table 4. The class wise performances are recorded in Table 5 for 800 images from COREL image library. The experiment shows that the classification accuracy is more by a factor of 4% than colorbased retrieval and 10% more compared to texture-based image retrieval method For the horse image, classification accuracy is 0.72% to more than color-based retrieval, 14% more than texture-based retrieval and 1.89% more than Fusion algorithm with equal weights. In our proposed method, the retrieval time is also 0.6 seconds less than the fusion algorithm with equal weights. The variation in classification accuracy is recorded by increasing the number of retrieved images from 10 to 100, where 100 is the maximum number of images in each class. From Figure 3, it is clear that the classification accuracy of conventional color-based, texture-based, and MSSF with equal weights methods is less compared to our proposed method. For example, when we consider the number of images retrieved to be equal to 100, the classification Table 1: Mean and variance of texture feature derived using cooccurrence matrix for contrast, energy, entropy, correlation, and local stationary in four orientations.      accuracy of color-based retrieval method at that point is less by a factor of 10% from our proposed method for bus image. Precision and recall rate are found to evaluate the performance of the proposed method and the number of returned images is ranging from 10 to 100. For comparison, the image retrieval methods based on color feature, texture feature, and two-feature similarity score fusions with equal weights are implemented. The precision and recall rate relationship of these methods are shown in Table 6. From Table 6, it is obvious that the image retrieval of CIR scheme using ABC algorithm presents precision 10% more than color-based retrieval and 20% more than the texture-based retrieval and 5% more than the CIR scheme with equal weights for almost same recall rate for the bus as the query image. Figure 4   shows that image retrieval based on multifeature similarity score fusion using Artificial Bee Colony algorithm ranked first. Compared with image retrieval method based on color feature, the performance of image retrieval based on texture feature is poor. The average precision and recall rates for query images for various classes are shown in Table 7. With proposed system, the average precision is 10% more than the colorbased retrieval for the bus as query image. Figure 5 shows the overall performance of image retrieval methods, that is, color based, texture based, and CIR with equal weights and the proposed CIR scheme with ABC algorithm for various image classes considered. Our proposed method gives the average precision rate for bus, elephant, and  horse classes to be higher by a factor of 5%, 2%, and 6% when compared with CIR method with equal weights. However, it gives average precision rate less than CIR scheme with equal weights by a factor of 20% and 24.67% for dinosaur and rose classes, respectively. In this study, the color-based image retrieval is better because the color differences of different images are more obvious in COREL image library. In addition, the extracted texture feature from this image library may be insufficient to reflect the differences between different classes, which make performance of texture-based image retrieval poor.

Conclusion and Future Work
Image data mining has the potential to provide significant information about the images and extract the implicit knowledge stored in the image to the users. The content of an image can be expressed in terms of different features such as color, shape, and texture. These low-level features are extracted directly from digital representation of the image and do not necessarily match the human perception of visual semantics. In our proposed system, only color and texture features are considered and the similarity scores of individual features are fused. Fusion weights in the system implementation are incorporated with latest optimization technique, Artificial Bee Colony Algorithm which makes the image retrieval  process a unique one. The results demonstrate that the color and texture features are useful in retrieving similar images when a query image is presented. As per analysis, our scheme produces average precision rate of 95% for simple images and 82% for complex images. This model can be enhanced in future, by including more low-level features such as shape and spatial location features. In this paper, while evaluating the fitness of an individual, we considered only the occurrence frequencies of an image in retrieval result and not the location of it. So, this factor should be taken into account when evaluating the fitness of an individual in future work.