Feature Extraction with Ordered Mean Values for Content Based Image Classification

Categorization of images into meaningful classes by efficient extraction of feature vectors from image datasets has been dependent on feature selection techniques. Traditionally, feature vector extraction has been carried out using different methods of image binarization donewith selection of global, local, ormean threshold.This paper has proposed a novel technique for feature extraction based on ordered mean values. The proposed technique was combined with feature extraction using discrete sine transform (DST) for better classification results using multitechnique fusion. The novel methodology was compared to the traditional techniques used for feature extraction for content based image classification. Three benchmark datasets, namely, Wang dataset, Oliva and Torralba (OT-Scene) dataset, and Caltech dataset, were used for evaluation purpose. Performance measure after evaluation has evidently revealed the superiority of the proposed fusion technique with ordered mean values and discrete sine transform over the popular approaches of single view feature extraction methodologies for classification.


Introduction
Massive expansion of image data has been observed due to the use of digital cameras, Internet, and other image capturing devices in recent times. Classifying images has been considered as a vital research domain for efficient handling of image data as discussed by Lu and Weng in [1]. Recognition of images based on the content has been dependent on extraction of visual features from the dataset as suggested by Liu and Bai in [2], Agrawal et al. in [3], and Kekre and Thepade in [4]. Conventional approaches for feature extraction from images have considered binarization as a means to differentiate the image into higher and lower intensity values as adopted in one of their approaches by Kekre and Thepade in [5] and Shaikh et al. in [6], respectively. Multiple applications of binarization on graphic images and document images have been implemented, some of which were proposed by Ntirogiannis et al. [7], Sezgin and Sankur [8], and Yang and Yan [9]. A novel technique for feature extraction using values of ordered means has been proposed in this work. However, an image encompassed diverse features which can hardly be described with a single technique of feature extraction. Image recognition has been stimulated in the past by feature extraction with partial coefficient in transform domain as discussed by Kekre et al. [10]. Hence discrete sine transform and Kekre transform were applied on the images to extract partial coefficients as feature vectors in transform domain. The two transform domain techniques were compared for superior classification results and discrete sine transform (DST) was chosen over Kekre transform for fusion with the ordered mean feature extraction process for better classification results. It was evaluated for classification performance and was compared to existing widely used techniques for feature extraction. The results have clearly indicated superior performance of classification with multiview method of feature extraction with proposed technique over the existing techniques.

Related Work
Selection of threshold has been an important criterion for feature extraction with binarization. Threshold selection has 2 Advances in Computer Engineering been primarily categorized into three different categories, namely, mean threshold selection as adopted in some of the approaches for feature extraction using binarization suggested by Thepade et al. in [11,12] and by Kekre et al. in [13,14], global threshold selection proposed by Otsu [15], and local threshold selection proposed by Niblack [16], Sauvola and Pietikäinen [17], and Bernsen [18]. Binarization with mean threshold selection has been used to compute a mean threshold value for all the gray values present in an image based on which the gray values were divided into upper intensity groups and lower intensity groups. Shaikh et al. [6] have considered global threshold method for binarization proposed by Otsu [15] for calculation of a single threshold when two distinct peaks were identified in an image histogram and have portrayed efficiency in pattern recognition for images having different artifacts such as shadow and nonuniform illumination. Image binarization with Otsu's [15] method of threshold selection has also been effective in optimizing the simultaneous classification of documents, photos, and logos as reported by Lins et al. [19]. The process of threshold selection has largely been affected by a number of parameters, namely, ambient illuminations, variance of gray levels within the object and the background, inadequate contrast, and so forth, as discussed by Chang et al. [20] and Gatos et al. [21]. Valizadeh et al. [22] have done image binarization using Niblack's [16] technique and have calculated the local thresholds to binarize by using standard deviation and variance as measures of dispersion for better classification of degraded images. The uneven contrast and brightness of the image has been considered as an important factor by Sauvola and Pietikäinen [17] and Bernsen [18] during threshold calculation and has been efficiently used to categorize stained images as discussed by Hamza et al. [23] and Yang and Zhang [24]. Classification performance with the proposed methodology of feature extraction fused with a transform domain technique of feature extraction applying discrete sine transform (DST) was compared with feature extraction using binarization by existing techniques. The efficiency of the proposed method was established by the quantitative evaluation.

Existing Binarization Techniques for
Feature Extraction 3.1. Technique 1. Traditional Otsu's method in [6,15] of global threshold selection has been widely used for image binarization. A single global threshold was computed in this method to binarize the image into higher intensity values and lower intensity values for feature extraction. The method searched for the threshold meticulously to diminish the intraclass variance. Derivation of weighted within-class variance was done by Assessment of the class probabilities for the gray level pixels was given by Pr ( ) . (2) Next step corresponds to calculation of class means given by Thus, the total variance was given by the summation of within-class variance and between-class variance. The between-class variance was given by 2 The effect of using Otsu's threshold for binarization has been demonstrated in Figure 1.

Technique 2. Local threshold selection proposed by
Niblack in [16] and by Valizadeh et al. in [22] has given another binarization technique for feature extraction as in Figure 2. The popular method has selected thresholds for each pixel by sliding a rectangular window over the entire image. Local mean calculation mean( , ) along with standard deviation ( , ) has been adopted for threshold calculation. The window size was considered as × . The expression for threshold has been given by Thresh( , ) = mean( , ) + ⋅ ⋅ ⋅ ( , ). Here, the constant has assumed value in between 0 and 1.

Technique 3.
Sometimes the background surfaces of images were faded, having huge disparity, or tarnished, having uneven illumination. Sauvola's method of local threshold selection in [17,23] was proposed especially for binarization of these types of images. The method was an upgraded version of Niblack's method and threshold selection was given by Standard value considered for was 0.5 and for was 128. Effect of binarization with Sauvola's threshold has been shown in Figure 3.

Technique 4.
Bernsen's method in [18,24] for local threshold selection for binarization of images was based on contrast. The variation between maximum and minimum gray values was considered to estimate the contrast. Threshold calculation was done with a local window, where = 31. Pixel inside the window was set to 0 for lower value of contrast compared to the threshold value within the window and it was set to 1 for higher value of contrast compared to the threshold in the local window. The effect of binarization with Bernsen's threshold technique has been given in Figure 4.       Thepade et al. in [11,12] and by Kekre et al. in [13]. The techniques for feature extraction with mean threshold have divided the images into two different levels of intensity values (Guo and Wu [25]). Mean of values greater than the mean threshold has been taken for higher intensity values and mean of values smaller than mean threshold has been taken to estimate the lower intensity values. Thus, the technique helps in efficient extraction of feature vectors with binarization of images. The effect of binarization with mean threshold has been given in Figure 5.

Proposed Methodology
The proposed approach has followed the fusion of feature extraction by ordered mean values with feature extraction using partial transform coefficients as described in the following subsections. The ordered values were stored in a one-dimensional array ODA as shown in Algorithm 1. The descending values of mean of intensity values for each subdivision arranged in descending order were considered to form the feature vector of that block. The feature vectors from the blocks thus generated were combined to create the feature vector of the image. A block diagram of the proposed method has been given in Figure 6.     [26]). Transform matrix for Kekre's transform can be of any size × and was not in the power of 2 as is the case for most of the other transforms. All the values in upper diagonal and diagonal in the matrix were 1 and the lower diagonal part excluding the value just below the diagonal was 0. A generalized Kekre's transform matrix has been given in Kekre transform matrix

Applying
Further, discrete sine transform (DST) was separately applied to the images for feature vector generation (Kekre et al. [27]). It was defined by a × sine transform matrix and was 6 Advances in Computer Engineering  a linear and invertible function. The DST matrix was formed by rowwise arrangement of the sequences given in for 0 ≤ , ≤ − 1.

Feature Vector Extraction from Transformed Image Coefficients.
The transformed coefficients obtained from the test images were stored as complete set of feature vectors. At the beginning, the size of the feature vector was the same as the size of the image. Subsequently, partial coefficients were extracted from the full set of feature vectors to identify the high frequency component at the upper portion of image which were crucial for image identification. Extraction of partial coefficients from the images was done in the manner shown in Figure 7.

Proposed Methodology for Classification.
A fusion based framework was proposed for the classification process. The method has amalgamated the classification decision obtained from two different feature extraction methodologies and fused the results for a single final decision of class levels. Two different distance measures, namely, Canberra distance and city block distance, were used to measure the classification performances of two different techniques of feature extraction as given in where is the query image and is the training set image. A normalization technique, namely, score normalization, was used for the purpose of fusion of the classification decisions obtained from each of the feature extraction techniques.
Equation (8) has given the process of calculating the final distance measure for classification by fusion with score normalization which was conducted over the mean and standard deviation of the fused distance of the first 30 nearest neighbours of the image to be classified (Walia and Pal [28]):

Experimental Verification
The proposed technique was tested with Wang dataset (10 categories with 1000 images) used by Li and Wang [29], Caltech dataset (20 categories with 2533 images), and Oliva and Torralba (OT-Scene) dataset (8 categories with 2688 images) used by Walia and Pal [28]. The three datasets are extensively used public datasets. An illustration of the original datasets considered has been shown in Figures 8, 9, and 10. Cross validation scheme has been applied to assess the classification performances for different feature vector extraction techniques as given by Sridhar in [30]. The system considered fold cross validation and value of was assigned to be 10. One subset out of the ten subsets was considered as the testing set and the rest of the subsets were considered to be training set. The method was iterated for 10 trials and final result of classification was inferred by combining the 10 results.

Classification Methods
The performance measures were done with two different categories of classifiers as given below.

K-Nearest Neighbor (KNN) Classifier (Distance Based Classifier).
Principle of KNN classifier is to find out the nearest neighbour in the instance space. It follows Canberra distance and city block distance as given in (7) to designate the unknown instance with the same class of the identified nearest neighbour as discussed by Han et al. in [31].

RIDOR Classifier (Rule Based Classifier)
. RIDOR classifier implements a set of if-then rules like other rule based classifiers. A single rule covered each database record by implementing mutually exclusive rules and mutually exhaustive rules. Classification has been initiated with an empty rule which was followed by increasing one rule as discussed by Kotsiantis in [32]. The training records covered by this  rule were removed and the previous steps were repeated until the stopping criteria were met. The default rule was first generated by Ripple-Down Rule (RIDOR) learner. The exceptions having lowest error rate were generated for the default rule followed by generation of "best" exception for each exception. A tree-like expansion of exception was thus carried out with its leaf having the only default rule without exception.

Metrics of Evaluation
Evaluation was carried out primarily by considering the misclassification rate (MR) and 1 score for classification by feature extraction with different numbers of ordered mean values as features as discussed in the proposed method to determine the optimal numbers of ordered mean values required as features for minimum misclassification rate (MR) and maximum 1 score with different classifier environments. Further, the classification performance with proposed feature extraction technique was compared to the existing feature extraction techniques done by image binarization in terms of precision, recall, and accuracy. Different metrics of evaluation have standard definitions discussed by Sridhar [30] as follows.
7.5. Accuracy. It is considered as the capability of a classifier to categorize instances accurately. It is given in       as feature vectors computed from eight descending order subdivisions of the ordered one-dimensional array. Classification with RIDOR classifier was possible up to four descending ordered mean values as feature vectors. Misclassification rate (MR) was increasing and 1 score was degrading for higher numbers of mean values as feature vectors. Feature extraction by calculating three-ordered means in descending order as feature vectors from three descending ordered subdivisions has shown the least misclassification rate (MR) and best 1 score by proposed method of feature extraction as observed in Tables 3 and 4, respectively, for RIDOR classifier. The minimum misclassification rate observed was 8.3% and the maximum 1 score observed was 61.8% with three-ordered mean values. Categorywise best classification performance for all sets of feature vectors considered for classification has been shown by the Dinosaur category and the worst classification performance with RIDOR classifier was found with the gothic structure category for all sets of feature vectors.
The proposed technique of feature extraction with ordered mean values was further tested with Caltech dataset and OT-Scene dataset along with Wang dataset for 1 score value of classification as given in Table 5. The experiment was carried out using KNN classifier and RIDOR classifier. The confusion matrices have been given in Tables 6,7,8,9,10,and 11. Further, classification with partial coefficient extracted from the two frequency domain techniques, namely, Kekre transform and discrete sine transform, was compared by precision results for classification done by KNN classifier as in Table 12.
The illustration in Figure 11 has clearly established that highest value of 1 score of 0.683 for classification with partial coefficient extracted by applying discrete sine transform (DST) has exceeded the maximum value of 1 score of 0.541 for classification with partial coefficient extracted by Kekre transform. The highest 1 score was given for feature size of 12.5% of the actual size of the image by Kekre transform. On the other hand, DST has given the highest 1 score of 0.68 for feature size of 0.012% of the actual image size. Hence, the feature size for classification was significantly small for discrete sine transform (DST) compared to Kekre transform. The features obtained from partial coefficient of 0.012% of actual image size by applying DST on the image were further assessed for classification results with two other datasets, namely, Caltech and OT-Scene datasets, along with Wang dataset for 1 score and misclassification rate. The evaluation was done with KNN and RIDOR classifiers as shown in Table 13. The confusion matrices have been given in Tables 14,15,16,17,18,and 19. Hence it was observed that feature extraction with partial coefficient by applying discrete sine transform was efficient and had much smaller feature size compared to Kekre transform. It was also observed that for both techniques in spatial domain and frequency domain, respectively, the KNN classifier has performed much better compared to the RIDOR classifier. Therefore, KNN classifier was chosen for fusion of the two feature extraction techniques, namely, feature extraction with ordered mean and feature extraction The average precision value obtained for individual techniques for feature extraction and the proposed fusion approach has been shown in Table 20.
The proposed fusion technique has the highest precision value compared to the individual techniques as seen in Table 20. Further, the fusion technique comprising feature extraction with ordered mean values and feature extraction with partial coefficients of discrete sine transform applied on images was compared for precision, recall, accuracy, misclassification rate, and 1 score values of classification with respect to the existing techniques. The comparison has been given in Figure 12 and the confusion matrices have been shown in Tables 21, 22  Comparison of F1 score for Kekre transform and DST transform DST F1 score N * N feature size for N * N image 50% of (N * N) feature size for N * N image 25% of (N * N) feature size for N * N image 12.5% of (N * N) feature size for N * N image 6.25% of (N * N) feature size for N * N image 3.125% of (N * N) feature size for N * N image 1.5625% of (N * N) feature size for N * N image 0.7813% of (N * N) feature size for N * N image 0.39% of (N * N) feature size for N * N image 0.195% of (N * N) feature size for N * N image 0.097% of (N * N) feature size for N * N image 0.048% of (N * N) feature size for N * N image 0.024% of (N * N) feature size for N * N image 0.012% of (N * N) feature size for N * N image 0.006% of (N * N) feature size for N * N image Figure 11: Graphical representation of comparison between 1 score for Kekre transform and discrete sine transform.