Content-based image retrieval is nowadays one of the possible and promising solutions to manage image databases effectively. However, with the large number of images, there still exists a great discrepancy between the users’ expectations (accuracy and efficiency) and the real performance in image retrieval. In this work, new optimization strategies are proposed on vocabulary tree building, retrieval, and matching methods. More precisely, a new clustering strategy combining classification and conventional
Nowadays, content-based image retrieval (CBIR) has more and more applications and constitutes one of the core problems in computer vision. Its features were thoroughly discussed by Smeulders et al. [
Content-based image retrieval procedure.
In state-of-art techniques, a tree structure is usually built to store the image database. In other words, a training set is needed to get a discriminative and representative tree. The training set is a group of images, which are first transformed into SIFT [
Once a well-organized data structure is built, the image database can be stored. From the image database, the SIFT descriptors must also be extracted [
After all descriptors of a test image are stored in the vocabulary tree, a weight of a leaf node is calculated based on TF-IDF strategy so as to test the effectiveness of different leaf nodes accurately by the following formula:
After organizing the information of test images and database images, two vectors can be obtained for the test image and the database images as follows.
For the test image,
For the database images,
Now, the ranking results can be obtained by the following formula:
With the usual approach, accuracy issues often occur. First, as numbers of the test images increase, more noises and clutters are brought into the information database, which undoubtedly results in decreasing the retrieval accuracy. Secondly, more information in the image database leads to more time required to search the similar images from the database, which usually cannot satisfy the real-time demands. Finally, after dozens of trial-and-error tests, it is found that the norms of calculating the match degree cannot remove the magnitude of different image SIFT numbers, which reveals a loss of accuracy. That is what motivates to propose the improvements detailed hereafter.
The paper is organized as follows: three improvements are described in Section
From the process described above, it is known that the height and the branch number of the tree are both predefined, namely, a complete tree (see Figure
Traditional vocabulary tree (a) and new vocabulary tree (b).
In practical applications, the quantities of information in different test sets are different, and different trees are therefore needed. When the tree need not even be a complete tree, the conventional method certainly leads to some errors. In order to reduce or even eliminate these errors, the conventional
The proposed technique called Hierarchical Classification method (HCM) is done with two thresholds: one is for the number of descriptors in a part and the other for the distance inside a part. These two thresholds can determine when the clustering operations terminate; thus we will not know how many levels the tree has and will not know how many children nodes a parent node owns. The structure of two different trees can be shown as follows, respectively (Figure
In previous works, a classification was often obtained by Euclidean distance of the children nodes of the root node, not the information of root node directly in the left of Figure
Traditional classification (a) and new clustering technique (b).
As the distances between root node and its children nodes are all calculated in advance and reserved in the root node position, the proposed clustering technique consists in finding the next children tree by using only the first term of (
Define
In the traditional method,
Important notation: when programming, there may be millions of pictures in the image resource, and there will be even more than 106 leaf nodes, while for indexing each image in the database, there will be thousands of dimensions equal to 0 in vectors. In order to save the memory space, assigning storage dynamically is proposed.
Ukbench image database contains 2550 groups of images, and every group includes 4 similar images. More precisely, these 4 images with a much similar characters are snapshots in a same image but in different illumination intensity and orientation. Analyzing the expectations of users, the following strategies are taken: indexing one image from the database images, if three of four similar images can be exhibited in the ranking 10 results, this image retrieval implementation is a successful process. The index frequency is calculated by
The average of this accuracy is finally used to test the effectiveness of different
Comparison of three different
From Figure
In building test image base, taking the strategy of discarding into action, much better performance is achieved as listed in Table
Improvements on discarding invalid descriptors.
Discard | Quantity | |||
---|---|---|---|---|
100 | 500 | 1000 | 10000 | |
No | 90.6% | 82.3% | 76.2% | 53.9% |
Yes | 94.3% | 86.3% | 83.25% | 70.2% |
Based on these two important improvements, the efficiencies of classification with the famous HKM method and HCM (the proposed method) are compared. The results on Table
Comprehensive comparison of new and traditional mechanism.
Method | Quantity | |||
---|---|---|---|---|
100 | 500 | 1000 | 10000 | |
HKM | 89.5% | 84.4% | 80.25% | 62.3% |
HCM | 94.3% | 86.3% | 83.25% | 70.2% |
In this work, three improvements are proposed during the content-based image retrieval: strategy in image classification, mechanism to calculate the Euclidean distance of eigenvectors between source images and research image, and development of the inverse file. As a result, the index accuracy can be greatly enhanced. Furthermore, we can get a faster index procedure, which satisfies the real-time image retrieval quite well. In the point of theoretical view, the proposed technique takes only about one-sixth time of traditional method needed. In the future work, it is necessary to verify the efficiency of proposed improvement in a practical situation as the time is really very short in the above example.
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work has been supported by the Basic Project Foundation of Northwestern Polytechnical University (Grants no. JC20120241) and by the National Natural Science Foundation of China (Grants no. 11302173).