A Novel Technique Based on Visual Words Fusion Analysis of Sparse Features for Effective Content-Based Image Retrieval

1Department of Software Engineering, University of Engineering and Technology, Taxila 47050, Pakistan 2Department of Computer Science, University of Engineering and Technology, Taxila 47050, Pakistan 3College of Computer and Information Sciences, Prince Sultan University, Riyadh 11586, Saudi Arabia 4College of Computer and Information Systems, Al-Yamamah University, Riyadh 11512, Saudi Arabia 5Department of Computer Engineering, Umm Al-Qura University, Makkah 21421, Saudi Arabia


Introduction
Image retrieval on the basis of image contents has been a vigorous area of research in the last three decades [1].Many approaches have been introduced regarding image retrieval on the basis of image contents [2,3].A text-based image retrieval system has two issues.Firstly, the annotation task takes a longer time, which makes it unfeasible for huge databases.Secondly, assigning keywords for image annotation is subjective.These two drawbacks led to the development of a new system, which is CBIR [2].CBIR aims to develop techniques which can be used for extracting similar images from image archives.Current CBIR methods are further categorized as global and local features [1,4,5].
Low-level features such as color, texture, shape, and spatial layout form the basis of CBIR [3,[6][7][8][9][10].The main problem with CBIR is the issue of the semantic gap [3,11] prevailing among high-level image concepts and low-level image features.The bag-of-visual-words (BoVW) model is a standard way to scramble local features into a vector of fixed length.It is one of the most widely used image feature representation methods [12].The BoVW framework was suggested for the first time in the text retrieval domain for the analysis of text documents.It has subsequently been used in applications of computer vision [12][13][14][15][16][17].In this model, feature vectors are quantized into visual words to formulate a dictionary or codebook.Visual words are formulated by clustering the local features [18].Human eyes discriminate images based on their visual contents.When we apply a feature extraction technique to the images that have a similar visual appearance, it may produce close feature vectors values that reduce the performance of the CBIR.The images shown in Figure 1 belong to two different semantic categories.These images are visually as well as semantically similar to each other.When a machine learning technique like support vector machine (SVM) classifies such type of images, it is possible that some images may be wrongly classified due to their similar semantic or visual appearance, which reduces the performance of the CBIR system.
SIFT performs better in the case of scale changes and on invariant rotations.However, SIFT does not perform better when there are low contrast and illumination changes within an image [19].LIOP performs better in cases of low contrast and illumination changes within an image [20].SIFT even performs better when there is large rotation and scale changes, while LIOP does not perform well in such cases [20].
In this article, we propose a novel technique based on visual words fusion as well as features fusion of the SIFT and LIOP feature descriptors based on the bag-of-visualwords (BoVW) methodology in order to deal with the aforementioned issues.For each image collection, the images are categorized into training and test sets, and SIFT and LIOP features are extracted separately from each image in the sets.After that, k-means clustering algorithm [21] is applied to the extracted features that represent image features in the form of clusters.Each cluster is specified as a visual word, and the combination of visual words constitutes a dictionary.For the proposed technique based on visual words fusion of SIFT and LIOP descriptors, clustering is applied individually to the extracted SIFT and LIOP features that have produced two dictionaries.After that, both dictionaries are fused or integrated together which results in the fusion of SIFT and LIOP visual words.For the proposed technique based on features fusion of SIFT and LIOP descriptors, both extracted features are fused together.Subsequently, clustering is applied to the fused features that constitute a single dictionary.These visual words are used to formulate a histogram from each image in the training set.Following this, these histograms are used to train the SVM classifier.At the end, images are retrieved from an image collection by applying the similarity measure technique based on the Euclidean distance between the query image and the images stored in an image collection.
The main contributions of this research article are as follows: (1) A novel image representation in the form of the visual words fusion of SIFT and LIOP feature descriptors based on the BoVW methodology (2) A novel image representation in the form of the features fusion of SIFT and LIOP feature descriptors based on the BoVW methodology (3) Reduction of the semantic gap between low-level features of an image and high-level semantic concepts The remaining sections of this article are organized as follows: the relevant state-of-the-art CBIR techniques are briefly described in Section 2 entitled as "Related Work."The detailed methodology of the proposed technique is discussed in Section 3 entitled as "Proposed Methodology."Section 4 presents the details of the experiments and performance analysis on three image collections.Section 5 concludes the proposed technique.

Related Work
CBIR has been an active research area for the last three decades due to its wide range of applications in image retrieval techniques [22].The term "content-based" refers to the fact that the search technique evaluates the actual contents of an image rather than using traditional image annotation techniques for image retrieval.The term "content" in this framework refers to texture, color, shape, or any other information that can be derived from the image itself.There are various types of image retrieval techniques which are based on texture, shape, color, and spatial layout [23,24].Different interest points based detectors and descriptors have been proposed for feature extraction in image retrieval techniques [25][26][27][28][29][30].
Liu et al. [7] propose a novel descriptor known as microstructure descriptor (MSD).MSD is determined by underlying colors and edge orientation which perfectly depicts the image features.To retrieve the images effectively, the method assimilates color, texture, shape, and spatial layout information.However, this approach is inadequate for global properties of the image and is unable to exploit relations among positions of dissimilar entities in the proposed design.Mansoori et al. [2] also propose a CBIR technique based on a SIFT descriptor, a hue descriptor, and soft assignment.The SIFT is used for extracting keypoints, while local patches around them are described by applying SIFT and hue descriptors.The distinct vocabulary is created for each descriptor which is then quantized by applying a k-means clustering algorithm.In this model, the soft assignment is used instead of a hard assignment in order to overcome the forfeiture in quantization that can reduce retrieval performance.The proposed technique reveals enhanced performance in comparison with other comparable CBIR techniques.Chang et al. [6] present a novel framework for content-based image retrieval by investigating the particle swarm optimization algorithm (PSOA).The proposed technique extracts three kinds of features from each image, namely, color, texture, and shape features, to find the similarities between the query image and images from the catalog.It employs appropriate distance measure for each kind of feature utilized.The PSOA is incorporated to elevate the proposed technique via finding out close prime combinations among features and their corresponding similarity measurements.Shen and Wu [4] develop an innovative method for CBIR by merging color, spatial, and texture features of the image.A feature vector is formed by utilizing all three of these features.The CENsus transform hISTogram (CENTRIST) feature is used for spatial structure and a principle component analysis (PCA) is applied on CENTRIST for dimension reduction.This algorithm incorporates diverse density (DD) and multiple instance learning (MIL) to achieve objective occurrences.This technique produces better results when compared to the state-of-the-art CBIR techniques.However, a few limitations of this method have been found, leading to the conclusion that more research is needed in some aspects.Talib et al. [5] introduce a framework for CBIR by constructing a weighted dominant color (DC) descriptor.In order to extract semantic features, the descriptor assigns weights to each DC in the image.This technique overcomes the shortcomings of dominant color descriptor (DCD) and diminishes the consequence of image background during the image matching decision.The technique tends to increase the performance.Pedronette et al. [31] exploit the reranking technique for retrieving images based on their visual contents.The proposed technique improves the effectiveness of CBIR.The reranking method does not entail distance information among complete ranked lists or images of a given collection.The proposed technique counts on the ranked list that was generated by efficient indexing structures and it is considered appropriate for large image collections as it scales up very well.
Zheng et al. [32] embed multiple binary features at the indexing level for large scale image retrieval.The multi-IDF scheme models correlation between features.The Hamming embedding method is used as a matching verification method.In order to lessen the effect of incorrect detection and boost the accuracy of visual matching, SIFT visual words are integrated with binary features.Karakasis et al. [33] propose a CBIR technique that uses an affine moment in order to describe the invariants lying in the local areas of the image for the sake of image retrieval.The produced moments are incorporated into the BoVW model in order to produce detailed feature vectors.A setup of three different design elements is used.Firstly, affine moments are computed.Secondly, invariants are calculated over the results of the real image.In the last phase, the process of normalization is executed in order to increase the range of invariants.The second phase intends to improve the first phase, while the third phase improves the results of the second phase.Rahimi and Moghaddam [34] introduce a CBIR technique based on intraclass and interclass features.Intraclass features are called the distribution of color tone, whereas singular value decomposition (SVD) and complex wavelet transform produce interclass features.A self-organizing map (SOM) is given by these features based on the artificial neural network (ANN) in order to improve the performance of the CBIR.Rashno et al. [35] introduce a novel CBIR technique in which feature extraction is done through wavelet transform and color feature selection.In this scheme, each image in the image collection is represented using a feature vector which is comprised of texture features from wavelet transform and color features from RGB and HSV domains.For texture features based on wavelet transform, images are decomposed into four subbands and then a low-frequency subband is used as texture features.For color features, DCD is used for the quantization of the image, while color statistics and histogram features are calculated.The ant colony optimization technique is used for selecting relevant and unique features from the entire feature set which contains both color and texture features.Mehmood et al. [36] present a CBIR technique that utilizes local and global histograms of visual words from the image.Both histograms contain the information regarding the semantics of an image.The global histogram is constructed by utilizing the visual information of the whole image, whereas the local histogram is constructed by extracting visual information from a local rectangular region of the image.The local histogram contains the spatial information of the salient objects within the image.The proposed technique has significantly improved the performance of the CBIR.
Zhao et al. [38] propose a CBIR technique which integrates three image descriptors for identifying visual contents of the image.These features are based on color, texture, and shape.The association in the distribution of color range in an image is taken by color distribution entropy.The color level cooccurrence algorithm makes use of the texture level matrix in order to seize the recurrence of textures as descriptors.The shape, rotation, and rescaling are done by the use of invariant moments.Euclidean distance is used to compute the similarity measure.de Ves et al. [39] put forward a subjective methodology in order to reduce the semantic gap while incorporating concerned users' interests and their relative responses.The main intention is to achieve the objective of reducing the semantic gap using the PCA and regression model.The former approach is responsible for rescaling the feature vectors, thereby reducing their dimensions, whereas the latter one is adjusted by the use of groups of nonoverlapped principal components.The local and dynamic nature of the proposed algorithm helps to achieve the intended results semantically.Xia et al. [40] present a CBIR technique to preserve the privacy of images in the cloud.While the cloud has solved the problem of low storage, at the same time, the privacy of users is highly concerned while outsourcing the images.The proposed technique exploits KNN in order to encode the visual features.These features are then utilized to compute the relevance, which in turn is utilized in the reranking procedure.In order to prevent the illegal copying and dissemination of retrieved images, the water-based protocol is exploited.Significant improvement has been observed in image search.The drawback of this technique lies in the lack of strength of the watermarking method.

Proposed Methodology
This section describes the detailed procedure of the proposed technique based on visual words fusion as well as features fusion of SIFT and LIOP descriptors based on the BoVW methodology for an effective CBIR.The block diagram of the proposed technique based on visual words fusion of SIFT and LIOP descriptors is shown in Figure 2.
The detailed procedure of the proposed technique is given as follows: (1) For each image in the training and test sets, SIFT and LIOP features are computed.
(2) The SIFT features [48] are computed from each image over dense grid by applying the following mathematical equations: where  is scale,  is orientation,  is the center of the detected keypoint of the SIFT descriptor,  is descriptor magnification factor,  is gradient, ℎ is the histogram of descriptors,  ang represents the angular velocity, and (  ,   ) represent the coordinate points of the (th, th) position.The kernels   and   are defined for a sample coordinate point (, ) by the following mathematical equations: where the side of the flat window is represented by  win .
(3) The LIOP features [20] are also computed from each image by applying the following mathematical equation: LIOP descriptor = (des 1 , des 2 , . . ., des  ) , (3) In (4) For the proposed technique based on visual words fusion of SIFT and LIOP descriptors, -means [21] clustering technique is applied to the extracted features of SIFT and LIOP descriptors that produced two dictionaries.The resultant SIFT-based dictionary contains visual words of SIFTbased features, while LIOP-based dictionary contains visual words of LIOP-based features.Both dictionaries are fused together in order to perform visual words fusion of SIFT and LIOP features.The dictionary of each descriptor is formulated by applying the following mathematical equation on the extracted features of each descriptor: where  represents the dictionary,   is the mean of all the points in the cluster   , and   represents the th cluster or visual word.
After applying the clustering technique to extracted features of SIFT and LIOP descriptors, it produces two dictionaries that are represented by the following mathematical equations: where  SIFT and  LIOP are the resultant dictionaries that contain  visual words (i.e., {V 1 , V 2 , V 3 , . . ., V  } and {V 1 , V 2 , V 3 , . . ., V  }) of SIFT and LIOP-based features, respectively.After computing dictionaries for SIFT and LIOP feature descriptors, both dictionaries are concatenated which results in visual words fusion of both descriptors, represented mathematically as follows: where   is the resultant dictionary that contains SIFT and LIOP features in the form of fused visual words for more compact representation of image visual contents.
(5) For the proposed technique based on features fusion of SIFT and LIOP descriptors, SIFT and LIOP features are computed from each image, fused or integrated together, and at the end, -means clustering technique [21] is applied to the fused features which produces a single dictionary.
The proposed technique based on visual words fusion of SIFT and LIOP descriptors results in better performance compared to the proposed technique based on features fusion of the SIFT and LIOP descriptors and the state-of-the-art CBIR techniques because the size of the dictionary representing visual contents of the images is twice as large compared to features fusion technique, which represents visual contents of the images by formulating a single dictionary.
(6) After applying the -means [21] clustering technique, the visual contents of each image are now in the form of visual words.These visual words are used to build a histogram for each image.
(7) For image classification, the SVM classifier is selected along with Hellinger kernel [49] instead of the linear kernel.The learning of the SVM classifier is performed using histograms that are formulated from each image in the training set.The Hellinger kernel function is used with the SVM classifier because it explicitly computes the features map instead of computing the kernel values, while the classifier still remains linear.The mathematical representation of the Hellinger kernel function of the SVM on the normalized histograms is as follows: where  and   represent the normalized histograms of each image.(8) After training the proposed CBIR model, the testing of the proposed technique is performed by taking an image from the test set and applying the same aforementioned process to compute the histogram from the test image.The images are retrieved by measuring the similarity between the test image representation and training images stored in an image collection by applying the Euclidean distance formula.

Evaluation Metrics, Experimental Results, and Discussions
This section presents the performance measurements of the proposed technique.The performance is evaluated using precision, recall, and precision-recall (PR) curve parameters on Corel-A/1000 [50,51], Corel-B/1500 [30], and Caltech-256 [52] image collections and the results are compared with the state-of-the-art CBIR techniques.All the results of the experiments are reported by performing each experiment 10 times.The dictionary size and features percentages per image are two important parameters that affect the performance of the proposed technique.Increasing the size of the dictionary at some certain level for compact representation of the visual contents of the images increases the performance of the image retrieval, while larger sizes of the dictionary result in overfitting problem of CBIR.Similarly, in order to reduce the computational cost of the proposed technique that is slightly increased due to visual words fusion as well as the features fusion of SIFT and LIOP feature descriptors, performance analysis is carried out using different features percentages per image as reported in the subsequent sections.
The precision measures the specificity or accuracy while recall measures the sensitivity or robustness of the CBIR techniques.Both are mathematically represented by the following equations: where   represents the number of correctly retrieved images,   represents the total number of retrieved images, and   represents the total number of the images in a particular semantic category.The performance analysis in terms of the mean average precision (MAP) versus different sizes of the dictionary of the proposed technique based on features fusion of SIFT and LIOP descriptor that is compared with the MAP performance of the standalone SIFT and standalone LIOP techniques based on the BoVW methodology is presented in Figure 3.According to the experimental details shown in Figure 3, the best MAP performance of 82.90% is achieved on a dictionary size of 800 visual words using 75% feature per image.The proposed technique based on features fusion of SIFT and LIOP descriptors outperform in terms of the MAP performance as compared to the MAP performance of the standalone SIFT and standalone LIOP techniques on all the reported dictionary sizes.Table 1 presents the experimental details of the proposed technique based on visual words fusion of SIFT and LIOP descriptors on different reported sizes of the dictionary using different features percentages per image.The best MAP performance of 87.30% is achieved with a dictionary size of 800 visual words using 50% features per image.In order to verify the statistical significance of the experimental results of the proposed technique based on visual words fusion, the results of the statistical analysis are also reported in Table 1.The statistical results of the nonparametric Wilcoxon matched-pairs signed-rank test are also reported by comparing obtained MAP performance on dictionary size of 800 visual words with other reported dictionary sizes (20, 50, 100, 200, 400, 600, 800, 1000, and 1200) as well as with [36] using standard 95% confidence interval value.According to the statistical results of the nonparametric Wilcoxon matchedpairs signed-rank test, the proposed technique based on visual words fusion is statistically more effective because the value of  is less than the level of the significance (i.e., ∝ ≤ 0.05) for all the reported dictionary sizes.

Analysis of the Evaluation Metrics on the
In order to demonstrate the robustness of the proposed technique based on visual words fusion of SIFT and LIOP descriptors, its MAP performance is also compared with the MAP performance of the proposed technique based on features fusion as well as with the state-of-the-art CBIR techniques [36,[41][42][43][44], whose experimental details are shown in Figure 4 and Table 2.According to the experimental details, the proposed technique based on visual words fusion significantly outperforms in terms of the performance analysis as compared to its competitor CBIR techniques.The performance analysis in terms of the precision-recall (PR) curve as shown in Figure 5 is also carried with the state-ofthe-art CBIR techniques [36,37] which also demonstrate the robustness of the proposed technique based on visual words fusion of SIFT and LIOP descriptors on the Corel-A image collection.
The image retrieval results of the proposed technique based on visual words fusion of SIFT and LIOP descriptors for the semantic category "Beach" of the Corel-A image  According to the experimental results shown in Table 2, the proposed techniques based on the visual words fusion of the SIFT and LIOP descriptors outperform in terms of the MAP performance as compared to the LGH-BoVW [36] technique as well as the state-of-the-art CBIR techniques [41][42][43][44] based on the BoVW methodology.For a dictionary size of  number of visual words, the proposed technique of this article represents visual contents of the images by assigning 2 ×  visual words due to the feature extraction from each image by applying two feature descriptors (i.e., SIFT and LIOP that formulate two dictionaries) as well as visual words of the resultant dictionary which contains the features of the SIFT and LIOP descriptors due to visual words fusion, while in case of the LGH-BoVW [36] technique, visual contents of the images are represented by assigning  number of visual words because single feature descriptor is applied on each image as well as visual words of the resultant dictionary which contains the feature of the single descriptor.

Analysis of the Evaluation Metrics on the Corel-B Image
Collection.The Corel-B image collection is a subset of the WANG image collection that contains images of different resolutions (i.e., 256 × 384, 384 × 256, 128 × 192, and 192 × 128).The total number of images in the Corel-B image collection is 1500; these are categorized into 15 semantic categories known as "Women," "Tigers," "Sunsets," "Postcards," "Caves," "Food," "Horses," "Mountains," "Flowers," "Elephants," "Dinosaurs," "Buses," "Buildings," and "Africa." The images are divided into two sets known as training (50% images) and test (50% images) sets for training and testing purposes.The performance analysis in terms of the MAP performance on different sizes of the dictionary is shown in Figures 7 and 8 and Table 3 for the proposed techniques based on visual words fusion, feature fusion, standalone SIFT, and standalone LIOP features based on the BoVW methodology.In the case of the proposed technique based on visual words fusion of SIFT and LIOP features, the best MAP performance of 85.20% is obtained with a dictionary size of 1000 visual words and using 50% features per image.The best MAP performance is achieved using the proposed technique based on features fusion of SIFT and LIOP features which is 82.96% with a dictionary size of 1000 visual words and using 75% features per image.According to the experimental details shown in Figures 7 and 8 and Table 3, the proposed technique based on visual words fusion outperforms as compared to the proposed technique based on features fusion, standalone SIFT, standalone LIOP, and the state-of-the-art CBIR techniques [3,45] on a dictionary of all the reported sizes.
According to the experimental details shown in Figure 5 (experimental details provided earlier in Section 4.1), the performance measurement using PR-curve also demonstrates the robustness of the proposed technique based on visual words fusion that is compared with PR-curve of the proposed technique based on features fusion of SIFT and LIOP feature descriptors.
The results of image retrieval for the semantic categories "Sunset" and "Postcards" of the Corel-B image collection are shown in Figures 9 and 10.

Analysis of the Evaluation Metrics on the Caltech-256
Image Collection.We have also examined the performance analysis of the proposed technique on the Caltech-256 image collection [52].The dimensions of each image in this collection are 300 × 200.There are 256 image semantic categories and each semantic category includes a minimum of 80 images.The total number of images in this collection is 30,607.
The performance analysis in terms of the MAP performance of the proposed technique based on features fusion, standalone SIFT, and standalone LIOP features techniques on different sizes of the dictionary is shown in Figure 11.According to the experimental details shown in Figure 11, the proposed technique based on features fusion of SIFT and LIOP descriptors performs better than the standalone SIFT and standalone LIOP features techniques based on the    BoVW methodology on a dictionary of all the reported sizes.According to the experimental details shown in Figure 12 and Table 4, the proposed technique based on visual words fusion of SIFT and LIOP descriptors outperforms in terms of MAP performance as compared to the features fusion technique and the state-of-the-art CBIR techniques [7,46] on a dictionary of all the reported sizes.In the case of the proposed technique based on visual words fusion, the best MAP performance is achieved on a dictionary size of 1200 visual words that is 30.30%.The best MAP performance in case of features fusion technique is 25.82%, which is achieved on a dictionary size of 1500 visual words.
Figure 5 (experimental details provided earlier in Section 4.1) shows a comparison of performance analysis in    Table 5: Performance analysis in terms of the computational complexity of complete framework.

Proposed technique based on the visual words fusion of SIFT-LIOP
LGH technique [36] FBWN technique [47] Foremost-20 0.7761 0.7837 0.87 features fusion.According to the experimental details shown in Figure 5, the performance analysis using PR-curve also demonstrates the robustness of the proposed technique based on visual words fusion as compared to the proposed technique based on features fusion of SIFT and LIOP descriptors.

Performance Analysis in Terms of the Computational
Complexity.All the experiments are performed on a Dell laptop with the following specifications: Intel (R) Pentium CPU B950 @ 2.10 GHz, 2.00 GB RAM, external SSD hard drive with a capacity of 120 GB, and Windows 7 64 bit operating system.The proposed technique is implemented in MATLAB R2015b and the dictionary is formulated offline by taking all the images of a training set.The performance is tested at runtime by taking a sample image from the test set using Corel-A image collection.The computational complexity (in seconds) of the complete framework from features computation to retrieved images is shown in Table 5 which is a proof of the robustness of the proposed technique in terms of the computational complexity as compared to the state-of-the-art CBIR techniques [36,47].

Conclusions
The semantic gap between the low-level features of an image and high-level semantic concepts is an important issue that affects the performance of the CBIR.Increasing the size of the dictionary to represent visual contents of the images at some certain level increases the performance of the image retrieval, while larger sizes of dictionary tend to overfit.In this article, the proposed technique based on visual words fusion of SIFT and LIOP feature descriptors significantly improves the performance of the image retrieval by reducing the semantic gap issue of CBIR and assigning more visual words per image.The performance of the proposed technique based on visual words fusion is significantly improved as compared to the features fusion technique and the state-of-the-art CBIR techniques because the size of the dictionary to represent visual contents of the images is twice as large compared to the feature fusion technique.Additionally, the resultant dictionary contains features of the SIFT and LIOP descriptors in the form of visual words as compared to the state-of-the-art CBIR techniques.In order to reduce the computational cost of the proposed technique, which is slightly increased due to the fusion of SIFT and LIOP feature descriptors, different feature percentages per image are suggested without affecting the performance of the proposed technique.

Figure 1 :
Figure 1: Images of two different semantic categories with close visual appearance and semantic layout.

Figure 2 :
Figure 2: Block diagram of the proposed technique based on visual words fusion of SIFT and LIOP descriptors.

Figure 3 :
Figure 3: Performance comparison in terms of MAP performance between the proposed techniques based on features fusion, standalone SIFT, and standalone LIOP features on different sizes of the dictionary on the Corel-A image collection.

, and 10 ,
respectively.The numeric value shown at the top of each image is the score of the respective image.The image shown at the top of each figure is the query image, while the rest of the images are the retrieved images that are obtained by applying the Euclidean distance formula between a score of the query image and scores of the retrieved images.The images whose numeric values are more close to the score of the query image are more identical to the query image which shows reduction of the semantic gap between low-level features of the image and high-level image semantic concepts and vice versa.
fusion of SIFT-LIOP descriptors based on the BoVW methodology Visual words fusion of SIFT-LIOP descriptors based on the BoVW methodology

Figure 4 :Figure 5 :
Figure 4: Performance comparison in terms of MAP performance between the proposed technique based on visual words fusion versus features fusion of SIFT and LIOP features techniques on different sizes of the dictionary on the Corel-A image collection.

Figure 6 :
Figure 6: Semantic category "Beach" of the Corel-A image collection shows a reduction of the semantic gap between retrieved images according to the query image.

Figure 7 :
Figure 7: Performance comparison in terms of MAP performance between the proposed techniques based on features fusion, standalone SIFT, and standalone LIOP features on different sizes of the dictionary on the Corel-B image collection.

Figure 8 :
Figure 8: Performance comparison in terms of MAP performance between the proposed techniques based on visual words fusion versus features fusion of SIFT and LIOP features on different sizes of the dictionary on the Corel-B image collection.

Figure 9 :
Figure 9: Semantic category "Sunset" of the Corel-B image collection shows a reduction of the semantic gap between retrieved images according to the query image.

Figure 10 :
Figure 10: Semantic category "Postcards" of the Corel-B image collection shows a reduction of the semantic gap between retrieved images according to the query image.

Figure 11 :
Figure 11: Performance comparison in terms of MAP performance between the proposed techniques based on features fusion, standalone SIFT, and standalone LIOP features on different sizes of the dictionary on the Caltech-256 image collection.
the above equation, for a sample point   , (  ) represents the intensity of the th neighboring sample, () is the dimensional feature vector of the intensities which represents the  neighboring sample points of a point  in the local patch, the mapping  sorts the elements of the -dimensional feature vector, preset threshold is represented by   , sign function is represented by sgn, () represents the weighted function of the LIOP descriptor, the feature mapping function is represented by , and ,  represent the coordinate position of the th sample point   .

Table 1 :
Statistical analysis and MAP performance of the proposed technique based on visual words fusion on different dictionary sizes and features percentages per image (bold values indicate the best performance).

Table 2 :
Performance analysis of the proposed technique based on visual words fusion on the Corel-A image collection which is reported using dictionary size of 800 visual words and features percentage of 50% per image (bold values indicate the best performance).

Table 3 :
Performance analysis of the proposed technique based on visual words fusion on the Corel-B image collection which is reported using dictionary size of 1000 visual words and features percentage of 50% per image (bold values indicate the best performance).

Table 4 :
Performance analysis of the proposed technique based on visual words fusion on the Caltech-256 image collection which is reported using dictionary size of 1200 visual words and features percentage of 75% per image.